Re: [PERFORM] Tsearch2 performance on big database

2005-03-23 Thread Rick Jansen
Oleg Bartunov wrote:
On Tue, 22 Mar 2005, Rick Jansen wrote:
Hmm, default configuration is too eager, you index every lexem using 
simple dictionary) ! Probably, it's too much. Here is what I have for my 
russian configuration in dictionary database:

 default_russian | lword| {en_ispell,en_stem}
 default_russian | lpart_hword  | {en_ispell,en_stem}
 default_russian | lhword   | {en_ispell,en_stem}
 default_russian | nlword   | {ru_ispell,ru_stem}
 default_russian | nlpart_hword | {ru_ispell,ru_stem}
 default_russian | nlhword  | {ru_ispell,ru_stem}
Notice, I index only russian and english words, no numbers, url, etc.
You may just delete unwanted rows in pg_ts_cfgmap for your configuration,
but I'd recommend just update them setting dict_name to NULL.
For example, to not indexing integers:
update pg_ts_cfgmap set dict_name=NULL where ts_name='default_russian' 
and tok_alias='int';

voc=# select token,dict_name,tok_type,tsvector from ts_debug('Do you 
have +7 bucks');
 token  |  dict_name  | tok_type | tsvector 
+-+--+--
 Do | {en_ispell,en_stem} | lword|
 you| {en_ispell,en_stem} | lword|
 have   | {en_ispell,en_stem} | lword|
 +7 | | int  |
 bucks  | {en_ispell,en_stem} | lword| 'buck'

Only 'bucks' gets indexed :)
Hmm, probably I should add this into documentation.
What about word statistics (# of unique words, for example).
I'm now following the guide to add the ispell dictionary and I've 
updated most of the rows setting dict_name to NULL:

 ts_name |  tok_alias   | dict_name
-+--+---
 default | lword| {en_stem}
 default | nlword   | {simple}
 default | word | {simple}
 default | part_hword   | {simple}
 default | nlpart_hword | {simple}
 default | lpart_hword  | {en_stem}
 default | hword| {simple}
 default | lhword   | {en_stem}
 default | nlhword  | {simple}
These are left, but I have no idea what a 'hword' or 'nlhword' or any 
other of these tokens are.

Anyway, how do I find out the number of unique words or other word 
statistics?

Rick
--
Systems Administrator for Rockingstone IT
http://www.rockingstone.com
http://www.megabooksearch.com - Search many book listing sites at once
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] best practices with index on varchar column

2005-03-23 Thread Michael Ryan S. Puncia

I have an experience using LIKE in a VARCHAR column and select statement
suffers a lot so I decided to go back in CHAR 

Note: my database has about 50 millions records a b tree index 



 Can I use an index on a varchar column to optimize the SELECT queries  
 that
 use  column LIKE 'header%'  ?

Yes

 If yes what is the best tree algotithm to use ?

Btree

Note that if you want case insensitive matching you need to make an
index  
on lower(column) and SELECT WHERE lower(column) LIKE 'header%'

Locales may bite you.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

__ NOD32 1.1023 (20050310) Information __

This message was checked by NOD32 Antivirus System.
http://www.nod32.com



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] best practices with index on varchar column

2005-03-23 Thread Dawid Kuroczko
On Wed, 23 Mar 2005 12:11:56 +0800, Michael Ryan S. Puncia
[EMAIL PROTECTED] wrote:
 
 I have an experience using LIKE in a VARCHAR column and select statement
 suffers a lot so I decided to go back in CHAR
 
 Note: my database has about 50 millions records a b tree index

Strange...

According to the PostgreSQL's documentation:

 Tip: There are no performance differences between these three types,
apart from the increased storage size when using the blank-padded type.
While character(n) has performance advantages in some other database
systems, it has no such advantages in PostgreSQL. In most situations text
or character varying should be used instead.


To my best knowledge char and varchar are stored in a same way
(4-byte length plus textual value), so using char should make tables
bigger in your case.  Then again, having each row exactly the same
size makes it easier to delete and then later insert a new row in
a same spot.  Am I thinking correct?  Is it a case where using char(n)
makes that table avoid hmm fragmentation of some sort?

   Regards,
  Dawid

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] SQL function inlining (was: View vs function)

2005-03-23 Thread Tambet Matiisen
I observed slowdowns when I declared SQL function as strict. There were
no slowdowns, when I implmented the same function in plpgsql, in fact it
got faster with strict, if parameters where NULL. Could it be
side-effect of SQL function inlining? Is there CASE added around the
function to not calculate it, when one of the parameters is NULL?

The functions:

create or replace function keskmine_omahind(kogus, raha) returns raha
language sql
immutable
strict
as '
SELECT CASE WHEN $1  0 THEN $2 / $1 ELSE NULL END::raha;
';

create or replace function keskmine_omahind2(kogus, raha) returns raha
language plpgsql
immutable
strict
as '
BEGIN
RETURN CASE WHEN $1  0 THEN $2 / $1 ELSE NULL END::raha;
END;
';

With strict:

epos=# select count(keskmine_omahind(laokogus, laosumma)) from kaubad;
 count
---
  9866
(1 row)

Time: 860,495 ms

epos=# select count(keskmine_omahind2(laokogus, laosumma)) from kaubad;
 count
---
  9866
(1 row)

Time: 178,922 ms

Without strict:

epos=# select count(keskmine_omahind(laokogus, laosumma)) from kaubad;
 count
---
  9866
(1 row)

Time: 88,151 ms

epos=# select count(keskmine_omahind2(laokogus, laosumma)) from kaubad;
 count
---
  9866
(1 row)

Time: 178,383 ms

epos=# select version();
   version

--
 PostgreSQL 7.4.5 on i386-pc-linux-gnu, compiled by GCC i386-linux-gcc
(GCC) 3.3.4 (Debian 1:3.3.4-9)

  Tambet

 -Original Message-
 From: Neil Conway [mailto:[EMAIL PROTECTED] 
 Sent: Monday, March 21, 2005 7:13 AM
 To: Bruno Wolff III
 Cc: Keith Worthington; pgsql-performance@postgresql.org
 Subject: Re: View vs function
 
 
 Bruno Wolff III wrote:
  Functions are just black boxes to the planner.
 
 ... unless the function is a SQL function that is trivial 
 enough for the 
 planner to inline it into the plan of the invoking query. 
 Currently, we 
 won't inline set-returning SQL functions that are used in the query's 
 rangetable, though. This would be worth doing, I think -- I'm 
 not sure 
 how much work it would be, though.
 
 -Neil
 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] best practices with index on varchar column

2005-03-23 Thread Richard Huxton
Dawid Kuroczko wrote:
On Wed, 23 Mar 2005 12:11:56 +0800, Michael Ryan S. Puncia
[EMAIL PROTECTED] wrote:
I have an experience using LIKE in a VARCHAR column and select statement
suffers a lot so I decided to go back in CHAR

According to the PostgreSQL's documentation:
 Tip: There are no performance differences between these three types,
apart from the increased storage size when using the blank-padded type.
While character(n) has performance advantages in some other database
systems, it has no such advantages in PostgreSQL. In most situations text
or character varying should be used instead.
To my best knowledge char and varchar are stored in a same way
(4-byte length plus textual value), so using char should make tables
bigger in your case.  Then again, having each row exactly the same
size makes it easier to delete and then later insert a new row in
a same spot.  Am I thinking correct?  Is it a case where using char(n)
makes that table avoid hmm fragmentation of some sort?
There aren't any noticeable differences between char and varchar. MVCC 
doesn't overwrite rows anyway, so static size is irrelevant. In any 
case, PG's toast setup splits out large text fields and compresses them 
- so it's not that simple.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] SQL function inlining (was: View vs function)

2005-03-23 Thread Tom Lane
Tambet Matiisen [EMAIL PROTECTED] writes:
 I observed slowdowns when I declared SQL function as strict. There were
 no slowdowns, when I implmented the same function in plpgsql, in fact it
 got faster with strict, if parameters where NULL. Could it be
 side-effect of SQL function inlining? Is there CASE added around the
 function to not calculate it, when one of the parameters is NULL?

IIRC we will not inline a STRICT SQL function if the resulting
expression would not behave strict-ly.  This is clearly a necessary rule
because inlining would change the behavior otherwise.  But the test for
it is pretty simplistic: CASEs are considered not strict, period.  So I
think you are measuring the difference between inlined and not-inlined.

I'd suggest just leaving off the STRICT if you are writing a SQL
function you hope to have inlined.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL on Solaris 8 and ufs

2005-03-23 Thread Andrew Sullivan
On Tue, Mar 22, 2005 at 03:23:18PM -0600, Brandon Metcalf wrote:
  s What are you using to measure
  s performance?
 
 Nothing too scientific other than the fact that since we have moved
 the DB, we consistenly see a large number of postmater processes
 (close to 100) where before we did not.

What did you move from?  The Solaris ps (not in ucb, which is the
BSD-style ps) shows the parent process name, so everything shows up
as postmaster rather than postgres.  There's always one back end
per connection.

If you are in fact using more connections, by the way, I can tell you
that Solaris 8, in my experience, is _very bad_ at managing context
switches.  So you may not be merely I/O bound (although your other
reports seem to indicate that you are).

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The whole tendency of modern prose is away from concreteness.
--George Orwell

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL on Solaris 8 and ufs

2005-03-23 Thread Brandon Metcalf
a == [EMAIL PROTECTED] writes:

 a On Tue, Mar 22, 2005 at 03:23:18PM -0600, Brandon Metcalf wrote:
 a   s What are you using to measure
 a   s performance?
 a 
 a  Nothing too scientific other than the fact that since we have moved
 a  the DB, we consistenly see a large number of postmater processes
 a  (close to 100) where before we did not.

 a What did you move from?  The Solaris ps (not in ucb, which is the
 a BSD-style ps) shows the parent process name, so everything shows up
 a as postmaster rather than postgres.  There's always one back end
 a per connection.

 a If you are in fact using more connections, by the way, I can tell you
 a that Solaris 8, in my experience, is _very bad_ at managing context
 a switches.  So you may not be merely I/O bound (although your other
 a reports seem to indicate that you are).


We moved from an HP-UX 10.20 box where the pgsql installation and data
were on a vxfs fileystem.

And we're definitely seeing more connections at a time which indicates
that each process is taking longer to complete.


-- 
Brandon

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL on Solaris 8 and ufs

2005-03-23 Thread Tom Arthurs
On the context switching issue, we've found that this setting in /etc/system 
helps:
set rechoose_interval=30
this sets the minimum time that a process is eligible to be switched to another 
cpu. (the default is 3).
You can monitor context switching with the cs column in vmstat.  We've found 
that high context switching seems to be more a symptom,
rather than a cause of problems -- for example we had an issue with column 
statistics and some really bad queries, and the cpu's start
context switching like crazy. (20,000 - 50,000 or more in a 5 second period, 
normally  5000 per 5 second period under heavy load.)
Brandon Metcalf wrote:
a == [EMAIL PROTECTED] writes:
 a On Tue, Mar 22, 2005 at 03:23:18PM -0600, Brandon Metcalf wrote:
 a   s What are you using to measure
 a   s performance?
 a 
 a  Nothing too scientific other than the fact that since we have moved
 a  the DB, we consistenly see a large number of postmater processes
 a  (close to 100) where before we did not.
 a What did you move from?  The Solaris ps (not in ucb, which is the
 a BSD-style ps) shows the parent process name, so everything shows up
 a as postmaster rather than postgres.  There's always one back end
 a per connection.
 a If you are in fact using more connections, by the way, I can tell you
 a that Solaris 8, in my experience, is _very bad_ at managing context
 a switches.  So you may not be merely I/O bound (although your other
 a reports seem to indicate that you are).
We moved from an HP-UX 10.20 box where the pgsql installation and data
were on a vxfs fileystem.
And we're definitely seeing more connections at a time which indicates
that each process is taking longer to complete.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL on Solaris 8 and ufs

2005-03-23 Thread Andrew Sullivan
On Wed, Mar 23, 2005 at 11:16:29AM -0600, Brandon Metcalf wrote:
 
 We moved from an HP-UX 10.20 box where the pgsql installation and data
 were on a vxfs fileystem.

My best guess, then, is that ufs tuning really is your issue.  We
always used vxfs for our Sun database servers (which was a nightmare
all on its own, BTW, so I don't actually recommend this), so I don't
have any real ufs tuning advice.  

The Packer Solaris database book (Packer, Allan N., _Configuring 
Tuning Databases on the Solaris Platform_.  Palo Alto: Sun
Microsystems P, 2002.  ISBN 0-13-083417-3) does suggest mounting the
filesystems with forcedirectio; I dimly recall using this for the wal
partition on one test box, and STR that it helped.  Also, you want to
make sure you use the right fsync method; if it's still set to
fsync in the config file, you'll want to change that.  I remember
finding that fsync was something like 3 times slower than everything
else.  I don't have any more Solaris boxes to check, but I believe we
were using open_datasync as our method.  You'll want to run some
tests.

You also should enable priority paging, but expect that this will
give you really strange po numbers from vmstat and friends.  Priority
paging, I found, makes things look like you're swapping when you
aren't.  Procmem is useful, but if you really want the goods on
what's going on, you need the SE toolkit.  Just be careful using it
as root -- in some cases it'll modify kernel parameters behind the
scenes.  In my case, I didn't have superuser access, so there wasn't
a danger; but I've heard sysadmins complain about this.   

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
This work was visionary and imaginative, and goes to show that visionary
and imaginative work need not end up well. 
--Dennis Ritchie

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL on Solaris 8 and ufs

2005-03-23 Thread Josh Berkus
Andrew,

 The Packer Solaris database book (Packer, Allan N., _Configuring 
 Tuning Databases on the Solaris Platform_.  Palo Alto: Sun
 Microsystems P, 2002.  ISBN 0-13-083417-3) does suggest mounting the
 filesystems with forcedirectio; I dimly recall using this for the wal
 partition on one test box, and STR that it helped.

This is a good idea for a WAL partition, but is NOT a good idea for the 
database.

You pay want to look into setting segmap_percent to 70% or something.   On 
Solaris 10 at least, the OS by default only uses 10% of memory for disk 
buffers.   

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]