Re: [PERFORM] Tsearch2 performance on big database
Oleg Bartunov wrote: On Tue, 22 Mar 2005, Rick Jansen wrote: Hmm, default configuration is too eager, you index every lexem using simple dictionary) ! Probably, it's too much. Here is what I have for my russian configuration in dictionary database: default_russian | lword| {en_ispell,en_stem} default_russian | lpart_hword | {en_ispell,en_stem} default_russian | lhword | {en_ispell,en_stem} default_russian | nlword | {ru_ispell,ru_stem} default_russian | nlpart_hword | {ru_ispell,ru_stem} default_russian | nlhword | {ru_ispell,ru_stem} Notice, I index only russian and english words, no numbers, url, etc. You may just delete unwanted rows in pg_ts_cfgmap for your configuration, but I'd recommend just update them setting dict_name to NULL. For example, to not indexing integers: update pg_ts_cfgmap set dict_name=NULL where ts_name='default_russian' and tok_alias='int'; voc=# select token,dict_name,tok_type,tsvector from ts_debug('Do you have +7 bucks'); token | dict_name | tok_type | tsvector +-+--+-- Do | {en_ispell,en_stem} | lword| you| {en_ispell,en_stem} | lword| have | {en_ispell,en_stem} | lword| +7 | | int | bucks | {en_ispell,en_stem} | lword| 'buck' Only 'bucks' gets indexed :) Hmm, probably I should add this into documentation. What about word statistics (# of unique words, for example). I'm now following the guide to add the ispell dictionary and I've updated most of the rows setting dict_name to NULL: ts_name | tok_alias | dict_name -+--+--- default | lword| {en_stem} default | nlword | {simple} default | word | {simple} default | part_hword | {simple} default | nlpart_hword | {simple} default | lpart_hword | {en_stem} default | hword| {simple} default | lhword | {en_stem} default | nlhword | {simple} These are left, but I have no idea what a 'hword' or 'nlhword' or any other of these tokens are. Anyway, how do I find out the number of unique words or other word statistics? Rick -- Systems Administrator for Rockingstone IT http://www.rockingstone.com http://www.megabooksearch.com - Search many book listing sites at once ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] best practices with index on varchar column
I have an experience using LIKE in a VARCHAR column and select statement suffers a lot so I decided to go back in CHAR Note: my database has about 50 millions records a b tree index Can I use an index on a varchar column to optimize the SELECT queries that use column LIKE 'header%' ? Yes If yes what is the best tree algotithm to use ? Btree Note that if you want case insensitive matching you need to make an index on lower(column) and SELECT WHERE lower(column) LIKE 'header%' Locales may bite you. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] __ NOD32 1.1023 (20050310) Information __ This message was checked by NOD32 Antivirus System. http://www.nod32.com ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] best practices with index on varchar column
On Wed, 23 Mar 2005 12:11:56 +0800, Michael Ryan S. Puncia [EMAIL PROTECTED] wrote: I have an experience using LIKE in a VARCHAR column and select statement suffers a lot so I decided to go back in CHAR Note: my database has about 50 millions records a b tree index Strange... According to the PostgreSQL's documentation: Tip: There are no performance differences between these three types, apart from the increased storage size when using the blank-padded type. While character(n) has performance advantages in some other database systems, it has no such advantages in PostgreSQL. In most situations text or character varying should be used instead. To my best knowledge char and varchar are stored in a same way (4-byte length plus textual value), so using char should make tables bigger in your case. Then again, having each row exactly the same size makes it easier to delete and then later insert a new row in a same spot. Am I thinking correct? Is it a case where using char(n) makes that table avoid hmm fragmentation of some sort? Regards, Dawid ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[PERFORM] SQL function inlining (was: View vs function)
I observed slowdowns when I declared SQL function as strict. There were no slowdowns, when I implmented the same function in plpgsql, in fact it got faster with strict, if parameters where NULL. Could it be side-effect of SQL function inlining? Is there CASE added around the function to not calculate it, when one of the parameters is NULL? The functions: create or replace function keskmine_omahind(kogus, raha) returns raha language sql immutable strict as ' SELECT CASE WHEN $1 0 THEN $2 / $1 ELSE NULL END::raha; '; create or replace function keskmine_omahind2(kogus, raha) returns raha language plpgsql immutable strict as ' BEGIN RETURN CASE WHEN $1 0 THEN $2 / $1 ELSE NULL END::raha; END; '; With strict: epos=# select count(keskmine_omahind(laokogus, laosumma)) from kaubad; count --- 9866 (1 row) Time: 860,495 ms epos=# select count(keskmine_omahind2(laokogus, laosumma)) from kaubad; count --- 9866 (1 row) Time: 178,922 ms Without strict: epos=# select count(keskmine_omahind(laokogus, laosumma)) from kaubad; count --- 9866 (1 row) Time: 88,151 ms epos=# select count(keskmine_omahind2(laokogus, laosumma)) from kaubad; count --- 9866 (1 row) Time: 178,383 ms epos=# select version(); version -- PostgreSQL 7.4.5 on i386-pc-linux-gnu, compiled by GCC i386-linux-gcc (GCC) 3.3.4 (Debian 1:3.3.4-9) Tambet -Original Message- From: Neil Conway [mailto:[EMAIL PROTECTED] Sent: Monday, March 21, 2005 7:13 AM To: Bruno Wolff III Cc: Keith Worthington; pgsql-performance@postgresql.org Subject: Re: View vs function Bruno Wolff III wrote: Functions are just black boxes to the planner. ... unless the function is a SQL function that is trivial enough for the planner to inline it into the plan of the invoking query. Currently, we won't inline set-returning SQL functions that are used in the query's rangetable, though. This would be worth doing, I think -- I'm not sure how much work it would be, though. -Neil ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] best practices with index on varchar column
Dawid Kuroczko wrote: On Wed, 23 Mar 2005 12:11:56 +0800, Michael Ryan S. Puncia [EMAIL PROTECTED] wrote: I have an experience using LIKE in a VARCHAR column and select statement suffers a lot so I decided to go back in CHAR According to the PostgreSQL's documentation: Tip: There are no performance differences between these three types, apart from the increased storage size when using the blank-padded type. While character(n) has performance advantages in some other database systems, it has no such advantages in PostgreSQL. In most situations text or character varying should be used instead. To my best knowledge char and varchar are stored in a same way (4-byte length plus textual value), so using char should make tables bigger in your case. Then again, having each row exactly the same size makes it easier to delete and then later insert a new row in a same spot. Am I thinking correct? Is it a case where using char(n) makes that table avoid hmm fragmentation of some sort? There aren't any noticeable differences between char and varchar. MVCC doesn't overwrite rows anyway, so static size is irrelevant. In any case, PG's toast setup splits out large text fields and compresses them - so it's not that simple. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] SQL function inlining (was: View vs function)
Tambet Matiisen [EMAIL PROTECTED] writes: I observed slowdowns when I declared SQL function as strict. There were no slowdowns, when I implmented the same function in plpgsql, in fact it got faster with strict, if parameters where NULL. Could it be side-effect of SQL function inlining? Is there CASE added around the function to not calculate it, when one of the parameters is NULL? IIRC we will not inline a STRICT SQL function if the resulting expression would not behave strict-ly. This is clearly a necessary rule because inlining would change the behavior otherwise. But the test for it is pretty simplistic: CASEs are considered not strict, period. So I think you are measuring the difference between inlined and not-inlined. I'd suggest just leaving off the STRICT if you are writing a SQL function you hope to have inlined. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL on Solaris 8 and ufs
On Tue, Mar 22, 2005 at 03:23:18PM -0600, Brandon Metcalf wrote: s What are you using to measure s performance? Nothing too scientific other than the fact that since we have moved the DB, we consistenly see a large number of postmater processes (close to 100) where before we did not. What did you move from? The Solaris ps (not in ucb, which is the BSD-style ps) shows the parent process name, so everything shows up as postmaster rather than postgres. There's always one back end per connection. If you are in fact using more connections, by the way, I can tell you that Solaris 8, in my experience, is _very bad_ at managing context switches. So you may not be merely I/O bound (although your other reports seem to indicate that you are). A -- Andrew Sullivan | [EMAIL PROTECTED] The whole tendency of modern prose is away from concreteness. --George Orwell ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] PostgreSQL on Solaris 8 and ufs
a == [EMAIL PROTECTED] writes: a On Tue, Mar 22, 2005 at 03:23:18PM -0600, Brandon Metcalf wrote: a s What are you using to measure a s performance? a a Nothing too scientific other than the fact that since we have moved a the DB, we consistenly see a large number of postmater processes a (close to 100) where before we did not. a What did you move from? The Solaris ps (not in ucb, which is the a BSD-style ps) shows the parent process name, so everything shows up a as postmaster rather than postgres. There's always one back end a per connection. a If you are in fact using more connections, by the way, I can tell you a that Solaris 8, in my experience, is _very bad_ at managing context a switches. So you may not be merely I/O bound (although your other a reports seem to indicate that you are). We moved from an HP-UX 10.20 box where the pgsql installation and data were on a vxfs fileystem. And we're definitely seeing more connections at a time which indicates that each process is taking longer to complete. -- Brandon ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] PostgreSQL on Solaris 8 and ufs
On the context switching issue, we've found that this setting in /etc/system helps: set rechoose_interval=30 this sets the minimum time that a process is eligible to be switched to another cpu. (the default is 3). You can monitor context switching with the cs column in vmstat. We've found that high context switching seems to be more a symptom, rather than a cause of problems -- for example we had an issue with column statistics and some really bad queries, and the cpu's start context switching like crazy. (20,000 - 50,000 or more in a 5 second period, normally 5000 per 5 second period under heavy load.) Brandon Metcalf wrote: a == [EMAIL PROTECTED] writes: a On Tue, Mar 22, 2005 at 03:23:18PM -0600, Brandon Metcalf wrote: a s What are you using to measure a s performance? a a Nothing too scientific other than the fact that since we have moved a the DB, we consistenly see a large number of postmater processes a (close to 100) where before we did not. a What did you move from? The Solaris ps (not in ucb, which is the a BSD-style ps) shows the parent process name, so everything shows up a as postmaster rather than postgres. There's always one back end a per connection. a If you are in fact using more connections, by the way, I can tell you a that Solaris 8, in my experience, is _very bad_ at managing context a switches. So you may not be merely I/O bound (although your other a reports seem to indicate that you are). We moved from an HP-UX 10.20 box where the pgsql installation and data were on a vxfs fileystem. And we're definitely seeing more connections at a time which indicates that each process is taking longer to complete. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL on Solaris 8 and ufs
On Wed, Mar 23, 2005 at 11:16:29AM -0600, Brandon Metcalf wrote: We moved from an HP-UX 10.20 box where the pgsql installation and data were on a vxfs fileystem. My best guess, then, is that ufs tuning really is your issue. We always used vxfs for our Sun database servers (which was a nightmare all on its own, BTW, so I don't actually recommend this), so I don't have any real ufs tuning advice. The Packer Solaris database book (Packer, Allan N., _Configuring Tuning Databases on the Solaris Platform_. Palo Alto: Sun Microsystems P, 2002. ISBN 0-13-083417-3) does suggest mounting the filesystems with forcedirectio; I dimly recall using this for the wal partition on one test box, and STR that it helped. Also, you want to make sure you use the right fsync method; if it's still set to fsync in the config file, you'll want to change that. I remember finding that fsync was something like 3 times slower than everything else. I don't have any more Solaris boxes to check, but I believe we were using open_datasync as our method. You'll want to run some tests. You also should enable priority paging, but expect that this will give you really strange po numbers from vmstat and friends. Priority paging, I found, makes things look like you're swapping when you aren't. Procmem is useful, but if you really want the goods on what's going on, you need the SE toolkit. Just be careful using it as root -- in some cases it'll modify kernel parameters behind the scenes. In my case, I didn't have superuser access, so there wasn't a danger; but I've heard sysadmins complain about this. A -- Andrew Sullivan | [EMAIL PROTECTED] This work was visionary and imaginative, and goes to show that visionary and imaginative work need not end up well. --Dennis Ritchie ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL on Solaris 8 and ufs
Andrew, The Packer Solaris database book (Packer, Allan N., _Configuring Tuning Databases on the Solaris Platform_. Palo Alto: Sun Microsystems P, 2002. ISBN 0-13-083417-3) does suggest mounting the filesystems with forcedirectio; I dimly recall using this for the wal partition on one test box, and STR that it helped. This is a good idea for a WAL partition, but is NOT a good idea for the database. You pay want to look into setting segmap_percent to 70% or something. On Solaris 10 at least, the OS by default only uses 10% of memory for disk buffers. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]