Re: [PERFORM] increasing database connections

2007-03-01 Thread Magnus Hagander
On Thu, Mar 01, 2007 at 12:49:14AM -0500, Jonah H. Harris wrote:
> On 3/1/07, Shiva Sarna <[EMAIL PROTECTED]> wrote:
> >I am sorry if it is a repeat question but I want to know if database
> >performance will decrease if I increase the max-connections to 2000. At
> >present it is 100.
> 
> Most certainly.  Adding connections over 200 will degrade performance
> dramatically.  You should look into pgpool or connection pooling from
> the application.

Are you sure? I've heard of at least one installation which runs with
5000+ connections, and it works fine. (you know who you are - I don't
know if it's public info, so I can't put out the details - but feel free
to fill in :P)

That said, there's certainly some overhead, and using pgpool if possible
is good advice (depending on workload). I'm just wondering about
the "dramatically" part.

//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] increasing database connections

2007-03-01 Thread Florian Weimer
* Mark Kirkwood:

> Yeah - I thought that somewhere closer to 1 connections is where
> you get hit with socket management related performance issues.

Huh?  These sockets aren't handled in a single process, are they?
Nowadays, this number of sockets does not pose any problem for most
systems, especially if you don't do I/O multiplexing.  Of course, if
you've got 10,000 connections which are active in parallel, most users
won't be content with 1/10,000th of your database performance. 8-/ (I
don't see why idle connections should be a problem from a socket
management POV, though.)

-- 
Florian Weimer<[EMAIL PROTECTED]>
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] increasing database connections

2007-03-01 Thread Ron

At 01:18 AM 3/1/2007, Joshua D. Drake wrote:

Jonah H. Harris wrote:
> On 3/1/07, Shiva Sarna <[EMAIL PROTECTED]> wrote:
>> I am sorry if it is a repeat question but I want to know if database
>> performance will decrease if I increase the max-connections to 2000. At
>> present it is 100.
>
> Most certainly.  Adding connections over 200 will degrade performance
> dramatically.  You should look into pgpool or connection pooling from
> the application.

huh? That is certainly not my experience. I have systems that show no
depreciable performance hit on even 1000+ connections. To be fair to the
discussion, these are on systems with 4+ cores. Usually 8+ and
significant ram 16/32 gig fo ram.

Sincerely,

Joshua D. Drake


Some caveats.

Keeping a DB connection around is relatively inexpensive.
OTOH, building and tearing down a DB connection =can be= expensive.
Expensive or not, connection build and tear down are pure overhead 
activities.  Any overhead you remove from the system is extra 
capacity that the system can use in actually answering DB queries 
(...at least until the physical IO system is running flat out...)


So having 1000+ DB connections open should not be a problem in and of 
itself (but you probably do not want 1000+ queries worth of 
simultaneous HD IO!...).


OTOH, you probably do !not! want to be constantly creating and 
destroying 1000+ DB connections.
Better to open 1000+ DB connections once at system start up time and 
use them as a connection pool.


The potential =really= big performance hit in having lots of 
connections around is in lots of connections doing simultaneous 
heavy, especially seek heavy, HD IO.


Once you have enough open connections that your physical IO subsystem 
tends to be maxed out performance wise on the typical workload being 
handled, it is counter productive to allow any more concurrent DB connections.


So the issue is not "how high a max-connections is too high?".   It's 
"how high a max connections is too high for =my= HW running =my= query mix?"


The traditional advice is to be conservative and start with a 
relatively small number of connections and increase that number only 
as long as doing so results in increased system performance on your 
job mix.  Once you hit the performance plateau, stop increasing 
max-connections and let connection caching and pooling handle things.
If that does not result in enough performance, it's time to initiate 
the traditional optimization hunt.


Also, note Josh's deployed HW for systems that can handle 1000+ 
connections.  ...and you can bet the IO subsystems on those boxes are 
similarly "beefy".  Don't expect miracles out of modest HW.
Ron 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Performance Query

2007-03-01 Thread Abu Mushayeed
1. The function:
   
  SELECT a.birth_date FROM (
 SELECT  indiv_fkey, birth_dt as birth_date, intern_last_update::date as 
last_update, 'fed' as source

  FROM cdm.cdm_fedcustomer
  WHERE birth_dt IS NOT NULL
  AND indiv_fkey = $1
  UNION 
  SELECT  indiv_fkey, birthdate as birth_date, last_update::date as 
last_update, 'reg' as source
  FROM cdm.cdm_reg_customer
  WHERE birthdate IS NOT NULL
  AND indiv_fkey = $1
ORDER BY source asc, last_update desc limit 1
)   a 
   
  2. The query:
   
  INSERT INTO  indiv_mast.staging_birthdate
SELECT * FROM (
SELECT im.indiv_key,   indiv_mast.getbest_bday(im.indiv_key::integer) 
AS birth_date
FROM indiv_mast.indiv_mast Im
WHERE im.indiv_key >= 200 AND im.indiv_key < 400
) b
WHERE b.birth_date IS NOT NULL
;
   
  3. The query plan:
   
  Bitmap Heap Scan on indiv_mast im  (cost=28700.91..2098919.14 rows=1937250 
width=8)
  Recheck Cond: ((indiv_key >= 200) AND (indiv_key < 400))
  Filter: (indiv_mast.getbest_bday((indiv_key)::integer) IS NOT NULL)
  ->  Bitmap Index Scan on indiv_mast_pkey_idx  (cost=0.00..28700.91 
rows=1946985 width=0)
Index Cond: ((indiv_key >= 200) AND (indiv_key < 400))

  4. Number of records in the tables:
   
  indiv_mast.indiv_mast : 15Million
  cdm.cdm_fedcustomer: 18Million
  cdm.cdm_reg_customer: 9 Million
   
  The query (2) runs for hours. It started at 2:00Am last night and it is still 
running (6:00Am).
   
  Some of the postgresql.conf file parameters are below:
   
  shared_buffers = 2 #6
work_mem = 65536 #131072 #65536
maintenance_work_mem = 524288 #131072
max_fsm_pages = 800 
max_fsm_relations = 32768
wal_buffers = 128
checkpoint_segments = 256   # in logfile segments, min 1, 16MB each
checkpoint_timeout = 3600
checkpoint_warning = 300
effective_cache_size = 2
random_page_cost = 2# (same)
   
  I really do not know how to find out what the query is waiting on, unlike 
oracle db provides some of the information through its dynamic performance 
views.
   
  Please help in understanding how I can find out what the system is waiting 
for or why is it taking the query so long.
   
  I will really appreciate some help.
   
  Thanks
  Abu

 
-
Sucker-punch spam with award-winning protection.
 Try the free Yahoo! Mail Beta.

[PERFORM] Identical Queries

2007-03-01 Thread Rob Schall
Question for anyone...

I tried posting to the bugs, and they said this is a better question for here.
I have to queries. One runs in about 2 seconds. The other takes upwards
of 2 minutes. I have a temp table that is created with 2 columns. This
table is joined with the larger database of call detail records.
However, these 2 queries are handled very differently.

The queries:
First

calldetail=> EXPLAIN SELECT current.* FROM current JOIN anitmp ON
current.destnum=anitmp.ani AND istf=true;
  QUERY PLAN
--
 Nested Loop  (cost=0.00..2026113.09 rows=500908 width=108)
   ->  Seq Scan on anitmp  (cost=0.00..33.62 rows=945 width=8)
 Filter: (istf = true)
   ->  Index Scan using i_destnum on current  (cost=0.00..2137.36
rows=531 width=108)
 Index Cond: (current.destnum = "outer".ani)
(5 rows)

Second
calldetail=> EXPLAIN SELECT current.* FROM current JOIN anitmp ON
current.orignum=anitmp.ani AND istf=false;
QUERY PLAN
---
 Hash Join  (cost=35.99..3402035.53 rows=5381529 width=108)
   Hash Cond: ("outer".orignum = "inner".ani)
   ->  Seq Scan on current  (cost=0.00..907191.05 rows=10170805 width=108)
   ->  Hash  (cost=33.62..33.62 rows=945 width=8)
 ->  Seq Scan on anitmp  (cost=0.00..33.62 rows=945 width=8)
   Filter: (istf = false)
(6 rows)


The tables:
   Table "public.current"
  Column  |Type | Modifiers
--+-+---
 datetime | timestamp without time zone |
 orignum  | bigint  |
 destnum  | bigint  |
 billto   | bigint  |
 cost | numeric(6,4)|
 duration | numeric(8,1)|
 origcity | character(12)   |
 destcity | character(12)   |
 file | character varying(30)   |
 linenum  | integer |
 carrier  | character(1)|
Indexes:
"i_destnum" btree (destnum)
"i_orignum" btree (orignum)


Table "public.anitmp"
 Column |  Type   | Modifiers
+-+---
 ani| bigint  |
 istf   | boolean |


I was also asked to post the EXPLAIN ANALYZE for both:

calldetail=> EXPLAIN ANALYZE SELECT current.* FROM anitmp JOIN current ON 
istf=false AND current.orignum=anitmp.ani;
QUERY PLAN
---
 Hash Join  (cost=35.99..3427123.39 rows=5421215 width=108) (actual 
time=1994.164..157443.544 rows=157 loops=1)
   Hash Cond: ("outer".orignum = "inner".ani)
   ->  Seq Scan on current  (cost=0.00..913881.09 rows=10245809 width=108) 
(actual time=710.986..137963.320 rows=10893541 loops=1)
   ->  Hash  (cost=33.62..33.62 rows=945 width=8) (actual time=10.948..10.948 
rows=0 loops=1)
 ->  Seq Scan on anitmp  (cost=0.00..33.62 rows=945 width=8) (actual 
time=10.934..10.939 rows=2 loops=1)
   Filter: (istf = false)
 Total runtime: 157443.900 ms
(7 rows)

calldetail=> EXPLAIN ANALYZE SELECT current.* FROM current JOIN anitmp ON 
current.destnum=anitmp.ani AND istf=true;
 QUERY PLAN
-
 Nested Loop  (cost=0.00..2037526.69 rows=504602 width=108) (actual 
time=88.752..1050.295 rows=1445 loops=1)
   ->  Seq Scan on anitmp  (cost=0.00..33.62 rows=945 width=8) (actual 
time=8.189..8.202 rows=2 loops=1)
 Filter: (istf = true)
   ->  Index Scan using i_destnum on current  (cost=0.00..2149.40 rows=534 
width=108) (actual time=62.365..517.454 rows=722 loops=2)
 Index Cond: (current.destnum = "outer".ani)
 Total runtime: 1052.862 ms
(6 rows)


Anyone have any ideas for me? I have indexes on each of the necessary
columns.

Rob



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] increasing database connections

2007-03-01 Thread Joshua D. Drake
Magnus Hagander wrote:
> On Thu, Mar 01, 2007 at 12:49:14AM -0500, Jonah H. Harris wrote:
>> On 3/1/07, Shiva Sarna <[EMAIL PROTECTED]> wrote:
>>> I am sorry if it is a repeat question but I want to know if database
>>> performance will decrease if I increase the max-connections to 2000. At
>>> present it is 100.
>> Most certainly.  Adding connections over 200 will degrade performance
>> dramatically.  You should look into pgpool or connection pooling from
>> the application.
> 
> Are you sure? I've heard of at least one installation which runs with
> 5000+ connections, and it works fine.

We have one that high as well and it does fine. Although I wouldn't
suggest it on less than 8.1 ;). 8.2 handles it even better since 8.2
handles >8 cores better than 8.1.

Joshua D. Drake


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Identical Queries

2007-03-01 Thread Stephan Szabo
On Thu, 1 Mar 2007, Rob Schall wrote:

> Question for anyone...
>
> I tried posting to the bugs, and they said this is a better question for here.
> I have to queries. One runs in about 2 seconds. The other takes upwards
> of 2 minutes. I have a temp table that is created with 2 columns. This
> table is joined with the larger database of call detail records.
> However, these 2 queries are handled very differently.

How many rows are there in anitmp and how many rows in anitmp have
istf=true and how many have istf=false? If you don't currently analyze the
temp table after adding the rows, you might find that doing an analyze
helps, or at least makes the row estimates better.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] stats collector process high CPU utilization

2007-03-01 Thread Tom Lane
Benjamin Minshall <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> It's sounding like what you had was just transient bloat, in which case
>> it might be useful to inquire whether anything out-of-the-ordinary had
>> been done to the database right before the excessive-CPU-usage problem
>> started.

> I don't believe that there was any unusual activity on the server, but I 
> have set up some more detailed logging to hopefully identify a pattern 
> if the problem resurfaces.

A further report led us to realize that 8.2.x in fact has a nasty bug
here: the stats collector is supposed to dump its stats to a file at
most every 500 milliseconds, but the code was actually waiting only
500 microseconds :-(.  The larger the stats file, the more obvious
this problem gets.

If you want to patch this before 8.2.4, try this...

Index: pgstat.c
===
RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.140.2.2
diff -c -r1.140.2.2 pgstat.c
*** pgstat.c26 Jan 2007 20:07:01 -  1.140.2.2
--- pgstat.c1 Mar 2007 20:04:50 -
***
*** 1689,1695 
/* Preset the delay between status file writes */
MemSet(&write_timeout, 0, sizeof(struct itimerval));
write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
!   write_timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000;
  
/*
 * Read in an existing statistics stats file or initialize the stats to
--- 1689,1695 
/* Preset the delay between status file writes */
MemSet(&write_timeout, 0, sizeof(struct itimerval));
write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
!   write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000;
  
/*
 * Read in an existing statistics stats file or initialize the stats to


regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] stats collector process high CPU utilization

2007-03-01 Thread Benjamin Minshall

Tom Lane wrote:

A further report led us to realize that 8.2.x in fact has a nasty bug
here: the stats collector is supposed to dump its stats to a file at
most every 500 milliseconds, but the code was actually waiting only
500 microseconds :-(.  The larger the stats file, the more obvious
this problem gets.

If you want to patch this before 8.2.4, try this...



Thanks for the follow-up on this issue, Tom.  I was able to link the 
original huge stats file problem to some long(ish) running transactions 
which blocked VACUUM, but this patch will really help.  Thanks.


-Ben


smime.p7s
Description: S/MIME Cryptographic Signature


[PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Alex Deucher

Hello,

I have noticed a strange performance regression and I'm at a loss as
to what's happening.  We have a fairly large database (~16 GB).  The
original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB
of ram running Solaris on local scsi discs.  The new server is a sun
Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux
(AMD64) on a 4 Gbps FC SAN volume.  When we created the new database
it was created from scratch rather than copying over the old one,
however the table structure is almost identical (UTF8 on the new one
vs. C on the old). The problem is queries are ~10x slower on the new
hardware.  I read several places that the SAN might be to blame, but
testing with bonnie and dd indicates that the SAN is actually almost
twice as fast as the scsi discs in the old sun server.  I've tried
adjusting just about every option in the postgres config file, but
performance remains the same.  Any ideas?

Thanks,

Alex

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Joshua D. Drake
Alex Deucher wrote:
> Hello,
> 
> I have noticed a strange performance regression and I'm at a loss as
> to what's happening.  We have a fairly large database (~16 GB).  The
> original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB
> of ram running Solaris on local scsi discs.  The new server is a sun
> Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux
> (AMD64) on a 4 Gbps FC SAN volume.  When we created the new database
> it was created from scratch rather than copying over the old one,
> however the table structure is almost identical (UTF8 on the new one
> vs. C on the old). The problem is queries are ~10x slower on the new
> hardware.  I read several places that the SAN might be to blame, but
> testing with bonnie and dd indicates that the SAN is actually almost
> twice as fast as the scsi discs in the old sun server.  I've tried
> adjusting just about every option in the postgres config file, but
> performance remains the same.  Any ideas?

Vacuum? Analayze? default_statistics_target? How many shared_buffers?
effective_cache_size? work_mem?

Sincerely,

Joshua D. Drake


> 
> Thanks,
> 
> Alex
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>   http://www.postgresql.org/docs/faq
> 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Identical Queries

2007-03-01 Thread Rob Schall
There are 4 entries (wanted to make the playing field level for this
test). There are 2 with true for istf and 2 with false.

Rob


Stephan Szabo wrote:
> On Thu, 1 Mar 2007, Rob Schall wrote:
>
>   
>> Question for anyone...
>>
>> I tried posting to the bugs, and they said this is a better question for 
>> here.
>> I have to queries. One runs in about 2 seconds. The other takes upwards
>> of 2 minutes. I have a temp table that is created with 2 columns. This
>> table is joined with the larger database of call detail records.
>> However, these 2 queries are handled very differently.
>> 
>
> How many rows are there in anitmp and how many rows in anitmp have
> istf=true and how many have istf=false? If you don't currently analyze the
> temp table after adding the rows, you might find that doing an analyze
> helps, or at least makes the row estimates better.
>   


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Jeff Frost

On Thu, 1 Mar 2007, Joshua D. Drake wrote:


Alex Deucher wrote:

Hello,

I have noticed a strange performance regression and I'm at a loss as
to what's happening.  We have a fairly large database (~16 GB).  The
original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB
of ram running Solaris on local scsi discs.  The new server is a sun
Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux
(AMD64) on a 4 Gbps FC SAN volume.  When we created the new database
it was created from scratch rather than copying over the old one,
however the table structure is almost identical (UTF8 on the new one
vs. C on the old). The problem is queries are ~10x slower on the new
hardware.  I read several places that the SAN might be to blame, but
testing with bonnie and dd indicates that the SAN is actually almost
twice as fast as the scsi discs in the old sun server.  I've tried
adjusting just about every option in the postgres config file, but
performance remains the same.  Any ideas?


Vacuum? Analayze? default_statistics_target? How many shared_buffers?
effective_cache_size? work_mem?


Also, an explain analyze from both the 7.4 and 8.1 systems with one of the 
10x slower queries would probably be handy.


What do you mean by "created from scratch rather than copying over the old 
one"?  How did you put the data in?  Did you run analyze after loading it? 
Is autovacuum enabled and if so, what are the thresholds?


--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Alex Deucher

On 3/1/07, Joshua D. Drake <[EMAIL PROTECTED]> wrote:

Alex Deucher wrote:
> Hello,
>
> I have noticed a strange performance regression and I'm at a loss as
> to what's happening.  We have a fairly large database (~16 GB).  The
> original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB
> of ram running Solaris on local scsi discs.  The new server is a sun
> Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux
> (AMD64) on a 4 Gbps FC SAN volume.  When we created the new database
> it was created from scratch rather than copying over the old one,
> however the table structure is almost identical (UTF8 on the new one
> vs. C on the old). The problem is queries are ~10x slower on the new
> hardware.  I read several places that the SAN might be to blame, but
> testing with bonnie and dd indicates that the SAN is actually almost
> twice as fast as the scsi discs in the old sun server.  I've tried
> adjusting just about every option in the postgres config file, but
> performance remains the same.  Any ideas?

Vacuum? Analayze? default_statistics_target? How many shared_buffers?
effective_cache_size? work_mem?



I'm running the autovacuum process on the 8.1 server.  vacuuming on
the old server was done manually.

default_statistics_target and effective_cache_size are set to the the
defaults on both.

postgres 7.4 server:
# - Memory -
shared_buffers = 82000 # 1000min 16, at least
max_connections*2, 8KB each
sort_mem = 8000# 1024min 64, size in KB
vacuum_mem = 32000 # 8192min 1024, size in KB
# - Free Space Map -
#max_fsm_pages = 2  # min max_fsm_relations*16, 6 bytes each
#max_fsm_relations = 1000   # min 100, ~50 bytes each
# - Kernel Resource Usage -
#max_files_per_process = 1000   # min 25

postgres 8.1 server:
# - Memory -
shared_buffers = 10 # min 16 or max_connections*2, 8KB each
temp_buffers = 2000 #1000   # min 100, 8KB each
max_prepared_transactions = 100 #5  # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 1#1024   # min 64, size in KB
maintenance_work_mem = 524288 #16384# min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB

I've also tried using the same settings from the old server on the new
one; same performance issues.

Thanks,

Alex


Sincerely,

Joshua D. Drake


>
> Thanks,
>
> Alex
>
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
>
>   http://www.postgresql.org/docs/faq
>


--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Identical Queries

2007-03-01 Thread Chad Wagner

On 3/1/07, Rob Schall <[EMAIL PROTECTED]> wrote:


There are 4 entries (wanted to make the playing field level for this
test). There are 2 with true for istf and 2 with false.



Then the difference here has to do with using orignum vs destnum as the join
criteria.  There must be more intersections for orignum than destnum, or
your statistics are so far out of whack.  It appears to be estimating 5M vs
500K for a result set, and naturally it chose a different plan.


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Alex Deucher

On 3/1/07, Jeff Frost <[EMAIL PROTECTED]> wrote:

On Thu, 1 Mar 2007, Joshua D. Drake wrote:

> Alex Deucher wrote:
>> Hello,
>>
>> I have noticed a strange performance regression and I'm at a loss as
>> to what's happening.  We have a fairly large database (~16 GB).  The
>> original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB
>> of ram running Solaris on local scsi discs.  The new server is a sun
>> Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux
>> (AMD64) on a 4 Gbps FC SAN volume.  When we created the new database
>> it was created from scratch rather than copying over the old one,
>> however the table structure is almost identical (UTF8 on the new one
>> vs. C on the old). The problem is queries are ~10x slower on the new
>> hardware.  I read several places that the SAN might be to blame, but
>> testing with bonnie and dd indicates that the SAN is actually almost
>> twice as fast as the scsi discs in the old sun server.  I've tried
>> adjusting just about every option in the postgres config file, but
>> performance remains the same.  Any ideas?
>
> Vacuum? Analayze? default_statistics_target? How many shared_buffers?
> effective_cache_size? work_mem?

Also, an explain analyze from both the 7.4 and 8.1 systems with one of the
10x slower queries would probably be handy.



I'll run some and get back to you.


What do you mean by "created from scratch rather than copying over the old
one"?  How did you put the data in?  Did you run analyze after loading it?
Is autovacuum enabled and if so, what are the thresholds?


Both the databases were originally created from xml files.  We just
re-created the new one from the xml rather than copying the old
database over.  I didn't manually run analyze on it, but we are
running the autovacuum process:

autovacuum = on #off# enable autovacuum subprocess?
autovacuum_naptime = 360 #60# time between autovacuum runs, in secs
autovacuum_vacuum_threshold = 1 #1000   # min # of tuple updates before
   # vacuum
autovacuum_analyze_threshold = 5000 #500# min # of tuple updates before

Thanks,

Alex



--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Jeff Frost

On Thu, 1 Mar 2007, Alex Deucher wrote:


On 3/1/07, Jeff Frost <[EMAIL PROTECTED]> wrote:

On Thu, 1 Mar 2007, Joshua D. Drake wrote:

> Alex Deucher wrote:
>> Hello,
>>
>> I have noticed a strange performance regression and I'm at a loss as
>> to what's happening.  We have a fairly large database (~16 GB).  The
>> original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB
>> of ram running Solaris on local scsi discs.  The new server is a sun
>> Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux
>> (AMD64) on a 4 Gbps FC SAN volume.  When we created the new database
>> it was created from scratch rather than copying over the old one,
>> however the table structure is almost identical (UTF8 on the new one
>> vs. C on the old). The problem is queries are ~10x slower on the new
>> hardware.  I read several places that the SAN might be to blame, but
>> testing with bonnie and dd indicates that the SAN is actually almost
>> twice as fast as the scsi discs in the old sun server.  I've tried
>> adjusting just about every option in the postgres config file, but
>> performance remains the same.  Any ideas?
>
> Vacuum? Analayze? default_statistics_target? How many shared_buffers?
> effective_cache_size? work_mem?

Also, an explain analyze from both the 7.4 and 8.1 systems with one of the
10x slower queries would probably be handy.



I'll run some and get back to you.


What do you mean by "created from scratch rather than copying over the old
one"?  How did you put the data in?  Did you run analyze after loading it?
Is autovacuum enabled and if so, what are the thresholds?


Both the databases were originally created from xml files.  We just
re-created the new one from the xml rather than copying the old
database over.  I didn't manually run analyze on it, but we are
running the autovacuum process:


You should probably manually run analyze and see if that resolves your 
problem.




autovacuum = on #off# enable autovacuum subprocess?
autovacuum_naptime = 360 #60# time between autovacuum runs, in 
secs
autovacuum_vacuum_threshold = 1 #1000   # min # of tuple updates 
before

  # vacuum
autovacuum_analyze_threshold = 5000 #500# min # of tuple updates 
before


Most people make autovacuum more aggressive and not less aggressive.  In fact, 
the new defaults in 8.2 are:


#autovacuum_vacuum_threshold = 500  # min # of tuple updates before
# vacuum
#autovacuum_analyze_threshold = 250 # min # of tuple updates before
# analyze
#autovacuum_vacuum_scale_factor = 0.2   # fraction of rel size before
# vacuum
#autovacuum_analyze_scale_factor = 0.1  # fraction of rel size before

I'd recommend trying those, otherwise you might not vacuum enough.

It'll be interesting to see the explain analyze output after you've run 
analyze by hand.


--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Jeff Frost

On Thu, 1 Mar 2007, Alex Deucher wrote:


Vacuum? Analayze? default_statistics_target? How many shared_buffers?
effective_cache_size? work_mem?



I'm running the autovacuum process on the 8.1 server.  vacuuming on
the old server was done manually.

default_statistics_target and effective_cache_size are set to the the
defaults on both.

postgres 7.4 server:
# - Memory -
shared_buffers = 82000 # 1000min 16, at least
max_connections*2, 8KB each
sort_mem = 8000# 1024min 64, size in KB
vacuum_mem = 32000 # 8192min 1024, size in KB
# - Free Space Map -
#max_fsm_pages = 2  # min max_fsm_relations*16, 6 bytes each
#max_fsm_relations = 1000   # min 100, ~50 bytes each
# - Kernel Resource Usage -
#max_files_per_process = 1000   # min 25

postgres 8.1 server:
# - Memory -
shared_buffers = 10 # min 16 or max_connections*2, 8KB 
each

temp_buffers = 2000 #1000   # min 100, 8KB each
max_prepared_transactions = 100 #5  # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared 
memory

# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 1#1024   # min 64, size in KB
maintenance_work_mem = 524288 #16384# min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB

I've also tried using the same settings from the old server on the new
one; same performance issues.



If this is a linux system, could you give us the output of the 'free' command? 
Postgresql might be choosing a bad plan because your effective_cache_size is 
way off (it's the default now right?).  Also, what was the block read/write 
speed of the SAN from your bonnie tests?  Probably want to tune 
random_page_cost as well if it's also at the default.



--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Alex Deucher

On 3/1/07, Jeff Frost <[EMAIL PROTECTED]> wrote:

On Thu, 1 Mar 2007, Alex Deucher wrote:

>> Vacuum? Analayze? default_statistics_target? How many shared_buffers?
>> effective_cache_size? work_mem?
>>
>
> I'm running the autovacuum process on the 8.1 server.  vacuuming on
> the old server was done manually.
>
> default_statistics_target and effective_cache_size are set to the the
> defaults on both.
>
> postgres 7.4 server:
> # - Memory -
> shared_buffers = 82000 # 1000min 16, at least
> max_connections*2, 8KB each
> sort_mem = 8000# 1024min 64, size in KB
> vacuum_mem = 32000 # 8192min 1024, size in KB
> # - Free Space Map -
> #max_fsm_pages = 2  # min max_fsm_relations*16, 6 bytes each
> #max_fsm_relations = 1000   # min 100, ~50 bytes each
> # - Kernel Resource Usage -
> #max_files_per_process = 1000   # min 25
>
> postgres 8.1 server:
> # - Memory -
> shared_buffers = 10 # min 16 or max_connections*2, 8KB
> each
> temp_buffers = 2000 #1000   # min 100, 8KB each
> max_prepared_transactions = 100 #5  # can be 0 or more
> # note: increasing max_prepared_transactions costs ~600 bytes of shared
> memory
> # per transaction slot, plus lock space (see max_locks_per_transaction).
> work_mem = 1#1024   # min 64, size in KB
> maintenance_work_mem = 524288 #16384# min 1024, size in KB
> #max_stack_depth = 2048 # min 100, size in KB
>
> I've also tried using the same settings from the old server on the new
> one; same performance issues.
>

If this is a linux system, could you give us the output of the 'free' command?


total   used   free sharedbuffers cached
Mem:   80598528042868  16984  02287888648
-/+ buffers/cache: 1539927905860
Swap: 15631224   2164   15629060



Postgresql might be choosing a bad plan because your effective_cache_size is
way off (it's the default now right?).  Also, what was the block read/write


yes it's set to the default.


speed of the SAN from your bonnie tests?  Probably want to tune
random_page_cost as well if it's also at the default.



--Sequential Output-- --Sequential Input- --Random-
   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
luna12-san   16000M 58896  91 62931   9 35870   5 54869  82 145504 13  397.7  0

effective_cache_size is the default.

Alex

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] stats collector process high CPU utilization

2007-03-01 Thread Merlin Moncure

On 3/1/07, Tom Lane <[EMAIL PROTECTED]> wrote:

Benjamin Minshall <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> It's sounding like what you had was just transient bloat, in which case
>> it might be useful to inquire whether anything out-of-the-ordinary had
>> been done to the database right before the excessive-CPU-usage problem
>> started.

> I don't believe that there was any unusual activity on the server, but I
> have set up some more detailed logging to hopefully identify a pattern
> if the problem resurfaces.

A further report led us to realize that 8.2.x in fact has a nasty bug
here: the stats collector is supposed to dump its stats to a file at
most every 500 milliseconds, but the code was actually waiting only
500 microseconds :-(.  The larger the stats file, the more obvious
this problem gets.


I think this explains the trigger that was blowing up my FC4 box.

merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Jeff Frost

On Thu, 1 Mar 2007, Alex Deucher wrote:


On 3/1/07, Jeff Frost <[EMAIL PROTECTED]> wrote:

On Thu, 1 Mar 2007, Alex Deucher wrote:

>> Vacuum? Analayze? default_statistics_target? How many shared_buffers?
>> effective_cache_size? work_mem?
>>
>
> I'm running the autovacuum process on the 8.1 server.  vacuuming on
> the old server was done manually.
>
> default_statistics_target and effective_cache_size are set to the the
> defaults on both.
>
> postgres 7.4 server:
> # - Memory -
> shared_buffers = 82000 # 1000min 16, at least
> max_connections*2, 8KB each
> sort_mem = 8000# 1024min 64, size in KB
> vacuum_mem = 32000 # 8192min 1024, size in KB
> # - Free Space Map -
> #max_fsm_pages = 2  # min max_fsm_relations*16, 6 bytes each
> #max_fsm_relations = 1000   # min 100, ~50 bytes each
> # - Kernel Resource Usage -
> #max_files_per_process = 1000   # min 25
>
> postgres 8.1 server:
> # - Memory -
> shared_buffers = 10 # min 16 or max_connections*2, 
8KB

> each
> temp_buffers = 2000 #1000   # min 100, 8KB each
> max_prepared_transactions = 100 #5  # can be 0 or more
> # note: increasing max_prepared_transactions costs ~600 bytes of shared
> memory
> # per transaction slot, plus lock space (see max_locks_per_transaction).
> work_mem = 1#1024   # min 64, size in KB
> maintenance_work_mem = 524288 #16384# min 1024, size in KB
> #max_stack_depth = 2048 # min 100, size in KB
>
> I've also tried using the same settings from the old server on the new
> one; same performance issues.
>

If this is a linux system, could you give us the output of the 'free' 
command?


   total   used   free sharedbuffers cached
Mem:   80598528042868  16984  02287888648
-/+ buffers/cache: 1539927905860
Swap: 15631224   2164   15629060


So, I would set effective_cache_size = 988232 (7905860/8).



Postgresql might be choosing a bad plan because your effective_cache_size 
is

way off (it's the default now right?).  Also, what was the block read/write


yes it's set to the default.


speed of the SAN from your bonnie tests?  Probably want to tune
random_page_cost as well if it's also at the default.



	--Sequential Output-- --Sequential Input- 
--Random-
  -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
%CP
luna12-san   16000M 58896  91 62931   9 35870   5 54869  82 145504 13  397.7 
0




So, you're getting 62MB/s writes and 145MB/s reads.  Just FYI, that write 
speed is about the same as my single SATA drive write speed on my workstation, 
so not that great.  The read speed is decent, though and with that sort of 
read performance, you might want to lower random_page_cost to something like 
2.5 or 2 so the planner will tend to prefer index scans.


--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Alex Deucher

On 3/1/07, Jeff Frost <[EMAIL PROTECTED]> wrote:

On Thu, 1 Mar 2007, Alex Deucher wrote:

> On 3/1/07, Jeff Frost <[EMAIL PROTECTED]> wrote:
>> On Thu, 1 Mar 2007, Alex Deucher wrote:
>>
>> >> Vacuum? Analayze? default_statistics_target? How many shared_buffers?
>> >> effective_cache_size? work_mem?
>> >>
>> >
>> > I'm running the autovacuum process on the 8.1 server.  vacuuming on
>> > the old server was done manually.
>> >
>> > default_statistics_target and effective_cache_size are set to the the
>> > defaults on both.
>> >
>> > postgres 7.4 server:
>> > # - Memory -
>> > shared_buffers = 82000 # 1000min 16, at least
>> > max_connections*2, 8KB each
>> > sort_mem = 8000# 1024min 64, size in KB
>> > vacuum_mem = 32000 # 8192min 1024, size in KB
>> > # - Free Space Map -
>> > #max_fsm_pages = 2  # min max_fsm_relations*16, 6 bytes each
>> > #max_fsm_relations = 1000   # min 100, ~50 bytes each
>> > # - Kernel Resource Usage -
>> > #max_files_per_process = 1000   # min 25
>> >
>> > postgres 8.1 server:
>> > # - Memory -
>> > shared_buffers = 10 # min 16 or max_connections*2,
>> 8KB
>> > each
>> > temp_buffers = 2000 #1000   # min 100, 8KB each
>> > max_prepared_transactions = 100 #5  # can be 0 or more
>> > # note: increasing max_prepared_transactions costs ~600 bytes of shared
>> > memory
>> > # per transaction slot, plus lock space (see max_locks_per_transaction).
>> > work_mem = 1#1024   # min 64, size in KB
>> > maintenance_work_mem = 524288 #16384# min 1024, size in KB
>> > #max_stack_depth = 2048 # min 100, size in KB
>> >
>> > I've also tried using the same settings from the old server on the new
>> > one; same performance issues.
>> >
>>
>> If this is a linux system, could you give us the output of the 'free'
>> command?
>
>total   used   free sharedbuffers cached
> Mem:   80598528042868  16984  02287888648
> -/+ buffers/cache: 1539927905860
> Swap: 15631224   2164   15629060

So, I would set effective_cache_size = 988232 (7905860/8).

>
>> Postgresql might be choosing a bad plan because your effective_cache_size
>> is
>> way off (it's the default now right?).  Also, what was the block read/write
>
> yes it's set to the default.
>
>> speed of the SAN from your bonnie tests?  Probably want to tune
>> random_page_cost as well if it's also at the default.
>>
>
>   --Sequential Output-- --Sequential Input-
> --Random-
>   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec
> %CP
> luna12-san   16000M 58896  91 62931   9 35870   5 54869  82 145504 13  397.7
> 0
>

So, you're getting 62MB/s writes and 145MB/s reads.  Just FYI, that write
speed is about the same as my single SATA drive write speed on my workstation,
so not that great.  The read speed is decent, though and with that sort of
read performance, you might want to lower random_page_cost to something like
2.5 or 2 so the planner will tend to prefer index scans.



Right, but the old box was getting ~45MBps on both reads and writes,
so it's an improvement for me :)  Thanks for the advice, I'll let you
know how it goes.

Alex

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Identical Queries

2007-03-01 Thread Stephan Szabo
On Thu, 1 Mar 2007, Rob Schall wrote:

> There are 4 entries (wanted to make the playing field level for this
> test). There are 2 with true for istf and 2 with false.

Then analyzing might help, because I think it's estimating many more rows
for both cases, and with 2 rows estimated to be returned the nested loop
should seem a lot more attractive than at 900+.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Identical Queries

2007-03-01 Thread Craig A. James

Stephan Szabo wrote:

I tried posting to the bugs, and they said this is a better question for here.
I have to queries. One runs in about 2 seconds. The other takes upwards
of 2 minutes. I have a temp table that is created with 2 columns. This
table is joined with the larger database of call detail records.
However, these 2 queries are handled very differently.


Even for a temporary table, you should run ANALYZE on it after you fill it but 
before you query or join to it.  I found out (the hard way) that a temporary 
table of just 100 rows will generate dramatically different plans before and 
after ANALYZE.

Craig


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] stats collector process high CPU utilization

2007-03-01 Thread Tom Lane
"Merlin Moncure" <[EMAIL PROTECTED]> writes:
> On 3/1/07, Tom Lane <[EMAIL PROTECTED]> wrote:
>> A further report led us to realize that 8.2.x in fact has a nasty bug
>> here: the stats collector is supposed to dump its stats to a file at
>> most every 500 milliseconds, but the code was actually waiting only
>> 500 microseconds :-(.  The larger the stats file, the more obvious
>> this problem gets.

> I think this explains the trigger that was blowing up my FC4 box.

I dug in the archives a bit and couldn't find the report you're
referring to?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Alex Deucher

On 3/1/07, Jeff Frost <[EMAIL PROTECTED]> wrote:

On Thu, 1 Mar 2007, Joshua D. Drake wrote:

> Alex Deucher wrote:
>> Hello,
>>
>> I have noticed a strange performance regression and I'm at a loss as
>> to what's happening.  We have a fairly large database (~16 GB).  The
>> original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB
>> of ram running Solaris on local scsi discs.  The new server is a sun
>> Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux
>> (AMD64) on a 4 Gbps FC SAN volume.  When we created the new database
>> it was created from scratch rather than copying over the old one,
>> however the table structure is almost identical (UTF8 on the new one
>> vs. C on the old). The problem is queries are ~10x slower on the new
>> hardware.  I read several places that the SAN might be to blame, but
>> testing with bonnie and dd indicates that the SAN is actually almost
>> twice as fast as the scsi discs in the old sun server.  I've tried
>> adjusting just about every option in the postgres config file, but
>> performance remains the same.  Any ideas?
>
> Vacuum? Analayze? default_statistics_target? How many shared_buffers?
> effective_cache_size? work_mem?

Also, an explain analyze from both the 7.4 and 8.1 systems with one of the
10x slower queries would probably be handy.


here are some examples.  Analyze is still running on the new db, I'll
post results when that is done.  Mostly what our apps do is prepared
row selects from different tables:
select c1,c2,c3,c4,c5 from t1 where c1='XXX';

old server:
db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261';
   QUERY PLAN
---
Index Scan using t1_c2_index on t1  (cost=0.00..166.89 rows=42
width=26) (actual time=5.722..5.809 rows=2 loops=1)
  Index Cond: ((c2)::text = '6258261'::text)
Total runtime: 5.912 ms
(3 rows)

db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261';
QUERY PLAN

Index Scan using t1_c1_key on t1  (cost=0.00..286.08 rows=72
width=26) (actual time=12.423..12.475 rows=12 loops=1)
  Index Cond: ((c1)::text = '6258261'::text)
Total runtime: 12.538 ms
(3 rows)


new server:
db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261';
QUERY PLAN

Index Scan using t1_c2_index on t1  (cost=0.00..37.63 rows=11
width=26) (actual time=33.461..51.377 rows=2 loops=1)
  Index Cond: ((c2)::text = '6258261'::text)
Total runtime: 51.419 ms
(3 rows)

db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261';
  QUERY PLAN

Index Scan using t1_c1_index on t1  (cost=0.00..630.45 rows=2907
width=26) (actual time=45.733..46.271 rows=12 loops=1)
  Index Cond: ((c1)::text = '6258261'::text)
Total runtime: 46.325 ms
(3 rows)


Alex

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Jeff Frost

On Thu, 1 Mar 2007, Alex Deucher wrote:


here are some examples.  Analyze is still running on the new db, I'll
post results when that is done.  Mostly what our apps do is prepared
row selects from different tables:
select c1,c2,c3,c4,c5 from t1 where c1='XXX';

old server:
db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261';
  QUERY PLAN
---
Index Scan using t1_c2_index on t1  (cost=0.00..166.89 rows=42
width=26) (actual time=5.722..5.809 rows=2 loops=1)
 Index Cond: ((c2)::text = '6258261'::text)
Total runtime: 5.912 ms
(3 rows)

db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261';
   QUERY PLAN

Index Scan using t1_c1_key on t1  (cost=0.00..286.08 rows=72
width=26) (actual time=12.423..12.475 rows=12 loops=1)
 Index Cond: ((c1)::text = '6258261'::text)
Total runtime: 12.538 ms
(3 rows)


new server:
db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261';
   QUERY PLAN

Index Scan using t1_c2_index on t1  (cost=0.00..37.63 rows=11
width=26) (actual time=33.461..51.377 rows=2 loops=1)
 Index Cond: ((c2)::text = '6258261'::text)
Total runtime: 51.419 ms
(3 rows)

db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261';
 QUERY PLAN

Index Scan using t1_c1_index on t1  (cost=0.00..630.45 rows=2907
width=26) (actual time=45.733..46.271 rows=12 loops=1)
 Index Cond: ((c1)::text = '6258261'::text)
Total runtime: 46.325 ms
(3 rows)


Notice the huge disparity here betwen the expected number of rows (2907) and 
the actual rows?  That's indicative of needing to run analyze.  The time is 
only about 4x the 7.4 runtime and that's with the analyze running merrily 
along in the background.  It's probably not as bad off as you think.  At least 
this query isn't 10x. :-)


Run these again for us after analyze is complete.

--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Jeff Frost

On Thu, 1 Mar 2007, Alex Deucher wrote:

>> Postgresql might be choosing a bad plan because your 
effective_cache_size

>> is
>> way off (it's the default now right?).  Also, what was the block 
read/write

>
> yes it's set to the default.
>
>> speed of the SAN from your bonnie tests?  Probably want to tune
>> random_page_cost as well if it's also at the default.
>>
>
>   --Sequential Output-- --Sequential Input-
> --Random-
>   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec

> %CP
> luna12-san   16000M 58896  91 62931   9 35870   5 54869  82 145504 13 
397.7

> 0
>

So, you're getting 62MB/s writes and 145MB/s reads.  Just FYI, that write
speed is about the same as my single SATA drive write speed on my 
workstation,

so not that great.  The read speed is decent, though and with that sort of
read performance, you might want to lower random_page_cost to something 
like

2.5 or 2 so the planner will tend to prefer index scans.



Right, but the old box was getting ~45MBps on both reads and writes,
so it's an improvement for me :)  Thanks for the advice, I'll let you
know how it goes.


Do you think that is because you have a different interface between you and 
the SAN?  ~45MBps is pretty slow - your average 7200RPM ATA133 drive can do 
that and costs quite a bit less than a SAN.


Is the SAN being shared between the database servers and other servers?  Maybe 
it was just random timing that gave you the poor write performance on the old 
server which might be also yielding occassional poor performance on the new 
one.


--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Ron

At 07:36 PM 3/1/2007, Jeff Frost wrote:

On Thu, 1 Mar 2007, Alex Deucher wrote:

>> Postgresql might be choosing a bad plan because your 
effective_cache_size

>> is
>> way off (it's the default now right?).  Also, what was the 
block read/write

>
> yes it's set to the default.
>
>> speed of the SAN from your bonnie tests?  Probably want to tune
>> random_page_cost as well if it's also at the default.
>>
>
>   --Sequential Output-- --Sequential Input-
> --Random-
>   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
K/sec %CP /sec

> %CP
> luna12-san   16000M 58896  91 62931   9 35870   5 54869  82 
145504 13 397.7

> 0
>
So, you're getting 62MB/s writes and 145MB/s reads.  Just FYI, that write
speed is about the same as my single SATA drive write speed on my 
workstation,

so not that great.  The read speed is decent, though and with that sort of
read performance, you might want to lower random_page_cost to something like
2.5 or 2 so the planner will tend to prefer index scans.


Right, but the old box was getting ~45MBps on both reads and writes,
so it's an improvement for me :)  Thanks for the advice, I'll let you
know how it goes.


Do you think that is because you have a different interface between 
you and the SAN?  ~45MBps is pretty slow - your average 7200RPM 
ATA133 drive can do that and costs quite a bit less than a SAN.


Is the SAN being shared between the database servers and other 
servers?  Maybe it was just random timing that gave you the poor 
write performance on the old server which might be also yielding 
occassional poor performance on the new one.


Remember that pg, even pg 8.2.3, has a known history of very poor 
insert speed (see comments on this point by Josh Berkus,  Luke Lonergan, etc)


For some reason, the code changes that have resulted in dramatic 
improvements in pg's read speed have not had nearly the same efficacy 
for writes.


Bottom line: pg presently has a fairly low and fairly harsh upper 
bound on write performance.   What exactly that bound is has been the 
subject of some discussion, but IIUC the fact of its existence is 
well established.


Various proposals for improving the situation exist, I've even made 
some of them, but AFAIK this is currently considered one of the 
"tough pg problems".


Cheers,
Ron Peacetree 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] stats collector process high CPU utilization

2007-03-01 Thread Merlin Moncure

On 3/2/07, Tom Lane <[EMAIL PROTECTED]> wrote:

"Merlin Moncure" <[EMAIL PROTECTED]> writes:
> On 3/1/07, Tom Lane <[EMAIL PROTECTED]> wrote:
>> A further report led us to realize that 8.2.x in fact has a nasty bug
>> here: the stats collector is supposed to dump its stats to a file at
>> most every 500 milliseconds, but the code was actually waiting only
>> 500 microseconds :-(.  The larger the stats file, the more obvious
>> this problem gets.

> I think this explains the trigger that was blowing up my FC4 box.

I dug in the archives a bit and couldn't find the report you're
referring to?


I was referring to this:
http://archives.postgresql.org/pgsql-hackers/2007-02/msg01418.php

Even though the fundamental reason was obvious (and btw, I inherited
this server less than two months ago), I was still curious what was
making 8.2 blow up a box that was handling a million tps/hour for over
a year. :-)

merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] stats collector process high CPU utilization

2007-03-01 Thread Tom Lane
"Merlin Moncure" <[EMAIL PROTECTED]> writes:
> On 3/2/07, Tom Lane <[EMAIL PROTECTED]> wrote:
>> "Merlin Moncure" <[EMAIL PROTECTED]> writes:
>>> I think this explains the trigger that was blowing up my FC4 box.
>> 
>> I dug in the archives a bit and couldn't find the report you're
>> referring to?

> I was referring to this:
> http://archives.postgresql.org/pgsql-hackers/2007-02/msg01418.php

Oh, the kernel-panic thing.  Hm, I wouldn't have thought that replacing
a file at a huge rate would induce a kernel panic ... but who knows?
Do you want to try installing the one-liner patch and see if the panic
goes away?

Actually I was wondering a bit if that strange Windows error discussed
earlier today could be triggered by this behavior:
http://archives.postgresql.org/pgsql-general/2007-03/msg0.php

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Alex Deucher

On 3/1/07, Jeff Frost <[EMAIL PROTECTED]> wrote:

On Thu, 1 Mar 2007, Alex Deucher wrote:

>> >> Postgresql might be choosing a bad plan because your
>> effective_cache_size
>> >> is
>> >> way off (it's the default now right?).  Also, what was the block
>> read/write
>> >
>> > yes it's set to the default.
>> >
>> >> speed of the SAN from your bonnie tests?  Probably want to tune
>> >> random_page_cost as well if it's also at the default.
>> >>
>> >
>> >   --Sequential Output-- --Sequential Input-
>> > --Random-
>> >   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>> > --Seeks--
>> > MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>> /sec
>> > %CP
>> > luna12-san   16000M 58896  91 62931   9 35870   5 54869  82 145504 13
>> 397.7
>> > 0
>> >
>>
>> So, you're getting 62MB/s writes and 145MB/s reads.  Just FYI, that write
>> speed is about the same as my single SATA drive write speed on my
>> workstation,
>> so not that great.  The read speed is decent, though and with that sort of
>> read performance, you might want to lower random_page_cost to something
>> like
>> 2.5 or 2 so the planner will tend to prefer index scans.
>>
>
> Right, but the old box was getting ~45MBps on both reads and writes,
> so it's an improvement for me :)  Thanks for the advice, I'll let you
> know how it goes.

Do you think that is because you have a different interface between you and
the SAN?  ~45MBps is pretty slow - your average 7200RPM ATA133 drive can do
that and costs quite a bit less than a SAN.

Is the SAN being shared between the database servers and other servers?  Maybe
it was just random timing that gave you the poor write performance on the old
server which might be also yielding occassional poor performance on the new
one.



The direct attached scsi discs on the old database server we getting
45MBps not the SAN.  The SAN got 62/145Mbps, which is not as bad.  We
have 4 servers on the SAN each with it's own 4 GBps FC link via an FC
switch.  I'll try and re-run the numbers when the servers are idle
this weekend.

Alex

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Joshua D. Drake
\
>> Is the SAN being shared between the database servers and other
>> servers?  Maybe
>> it was just random timing that gave you the poor write performance on
>> the old
>> server which might be also yielding occassional poor performance on
>> the new
>> one.
>>
> 
> The direct attached scsi discs on the old database server we getting
> 45MBps not the SAN.  The SAN got 62/145Mbps, which is not as bad. 

How many spindles you got in that SAN?

 We
> have 4 servers on the SAN each with it's own 4 GBps FC link via an FC
> switch.  I'll try and re-run the numbers when the servers are idle
> this weekend.
> 
> Alex
> 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Alex Deucher

On 3/1/07, Joshua D. Drake <[EMAIL PROTECTED]> wrote:

\
>> Is the SAN being shared between the database servers and other
>> servers?  Maybe
>> it was just random timing that gave you the poor write performance on
>> the old
>> server which might be also yielding occassional poor performance on
>> the new
>> one.
>>
>
> The direct attached scsi discs on the old database server we getting
> 45MBps not the SAN.  The SAN got 62/145Mbps, which is not as bad.

How many spindles you got in that SAN?


105 IIRC.

Alex

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Jeff Frost

On Thu, 1 Mar 2007, Alex Deucher wrote:


On 3/1/07, Jeff Frost <[EMAIL PROTECTED]> wrote:

On Thu, 1 Mar 2007, Alex Deucher wrote:

>> >> Postgresql might be choosing a bad plan because your
>> effective_cache_size
>> >> is
>> >> way off (it's the default now right?).  Also, what was the block
>> read/write
>> >
>> > yes it's set to the default.
>> >
>> >> speed of the SAN from your bonnie tests?  Probably want to tune
>> >> random_page_cost as well if it's also at the default.
>> >>
>> >
>> >   --Sequential Output-- --Sequential Input-
>> > --Random-
>> >   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>> > --Seeks--
>> > MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>> /sec
>> > %CP
>> > luna12-san   16000M 58896  91 62931   9 35870   5 54869  82 145504 13
>> 397.7
>> > 0
>> >
>>
>> So, you're getting 62MB/s writes and 145MB/s reads.  Just FYI, that 
write

>> speed is about the same as my single SATA drive write speed on my
>> workstation,
>> so not that great.  The read speed is decent, though and with that sort 
of

>> read performance, you might want to lower random_page_cost to something
>> like
>> 2.5 or 2 so the planner will tend to prefer index scans.
>>
>
> Right, but the old box was getting ~45MBps on both reads and writes,
> so it's an improvement for me :)  Thanks for the advice, I'll let you
> know how it goes.

Do you think that is because you have a different interface between you and
the SAN?  ~45MBps is pretty slow - your average 7200RPM ATA133 drive can do
that and costs quite a bit less than a SAN.

Is the SAN being shared between the database servers and other servers? 
Maybe
it was just random timing that gave you the poor write performance on the 
old

server which might be also yielding occassional poor performance on the new
one.



The direct attached scsi discs on the old database server we getting
45MBps not the SAN.  The SAN got 62/145Mbps, which is not as bad.  We
have 4 servers on the SAN each with it's own 4 GBps FC link via an FC
switch.  I'll try and re-run the numbers when the servers are idle
this weekend.


Sorry, I thought the old server was also attached to the SAN.  My fault for 
not hanging onto the entire email thread.


I think you're mixing and matching your capitol and lower case Bs in your 
sentence above though. :-)


I suspect what you really mean is The SAN got 62/145MBps (megabytes/sec) and 
teh FC link is 4Gbps (gigabits/sec) or 500MBps.  Is that correct?  If so, and 
seeing that you think there are 105 spindles on the SAN, I'd say you're either 
maxxing out the switch fabric of the SAN with your servers or you have a 
really poorly performing SAN in general, or you just misunderstood the .


As a comparison With 8 WD Raptors configured in a RAID10 with normal ext3 I 
get about 160MB/s write and 305MB/s read performance.  Hopefully the SAN has 
lots of other super nifty features that make up for the poor performance. :-(


--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Alex Deucher

On 3/1/07, Jeff Frost <[EMAIL PROTECTED]> wrote:

On Thu, 1 Mar 2007, Alex Deucher wrote:

> On 3/1/07, Jeff Frost <[EMAIL PROTECTED]> wrote:
>> On Thu, 1 Mar 2007, Alex Deucher wrote:
>>
>> >> >> Postgresql might be choosing a bad plan because your
>> >> effective_cache_size
>> >> >> is
>> >> >> way off (it's the default now right?).  Also, what was the block
>> >> read/write
>> >> >
>> >> > yes it's set to the default.
>> >> >
>> >> >> speed of the SAN from your bonnie tests?  Probably want to tune
>> >> >> random_page_cost as well if it's also at the default.
>> >> >>
>> >> >
>> >> >   --Sequential Output-- --Sequential Input-
>> >> > --Random-
>> >> >   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>> >> > --Seeks--
>> >> > MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>> >> /sec
>> >> > %CP
>> >> > luna12-san   16000M 58896  91 62931   9 35870   5 54869  82 145504 13
>> >> 397.7
>> >> > 0
>> >> >
>> >>
>> >> So, you're getting 62MB/s writes and 145MB/s reads.  Just FYI, that
>> write
>> >> speed is about the same as my single SATA drive write speed on my
>> >> workstation,
>> >> so not that great.  The read speed is decent, though and with that sort
>> of
>> >> read performance, you might want to lower random_page_cost to something
>> >> like
>> >> 2.5 or 2 so the planner will tend to prefer index scans.
>> >>
>> >
>> > Right, but the old box was getting ~45MBps on both reads and writes,
>> > so it's an improvement for me :)  Thanks for the advice, I'll let you
>> > know how it goes.
>>
>> Do you think that is because you have a different interface between you and
>> the SAN?  ~45MBps is pretty slow - your average 7200RPM ATA133 drive can do
>> that and costs quite a bit less than a SAN.
>>
>> Is the SAN being shared between the database servers and other servers?
>> Maybe
>> it was just random timing that gave you the poor write performance on the
>> old
>> server which might be also yielding occassional poor performance on the new
>> one.
>>
>
> The direct attached scsi discs on the old database server we getting
> 45MBps not the SAN.  The SAN got 62/145Mbps, which is not as bad.  We
> have 4 servers on the SAN each with it's own 4 GBps FC link via an FC
> switch.  I'll try and re-run the numbers when the servers are idle
> this weekend.

Sorry, I thought the old server was also attached to the SAN.  My fault for
not hanging onto the entire email thread.

I think you're mixing and matching your capitol and lower case Bs in your
sentence above though. :-)


whoops :)



I suspect what you really mean is The SAN got 62/145MBps (megabytes/sec) and
teh FC link is 4Gbps (gigabits/sec) or 500MBps.  Is that correct?  If so, and
seeing that you think there are 105 spindles on the SAN, I'd say you're either
maxxing out the switch fabric of the SAN with your servers or you have a
really poorly performing SAN in general, or you just misunderstood the .

As a comparison With 8 WD Raptors configured in a RAID10 with normal ext3 I
get about 160MB/s write and 305MB/s read performance.  Hopefully the SAN has
lots of other super nifty features that make up for the poor performance. :-(



It's big and reliable (and compared to lots of others, relatively
inexpensive) which is why we bought it.  We bought it mostly as a huge
file store.  The RAID groups on the SAN were set up for maximum
capacity rather than for performance.  Using it for the databases just
came up recently.

Alex

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Joshua D. Drake
Alex Deucher wrote:
> On 3/1/07, Joshua D. Drake <[EMAIL PROTECTED]> wrote:
>> \
>> >> Is the SAN being shared between the database servers and other
>> >> servers?  Maybe
>> >> it was just random timing that gave you the poor write performance on
>> >> the old
>> >> server which might be also yielding occassional poor performance on
>> >> the new
>> >> one.
>> >>
>> >
>> > The direct attached scsi discs on the old database server we getting
>> > 45MBps not the SAN.  The SAN got 62/145Mbps, which is not as bad.
>>
>> How many spindles you got in that SAN?
> 
> 105 IIRC.

You have 105 spindles are you are only get 62megs on writes? That seems
about half what you should be getting. (at least).

Joshua D. Drake


> 
> Alex
> 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Alex Deucher

On 3/1/07, Joshua D. Drake <[EMAIL PROTECTED]> wrote:

Alex Deucher wrote:
> On 3/1/07, Joshua D. Drake <[EMAIL PROTECTED]> wrote:
>> \
>> >> Is the SAN being shared between the database servers and other
>> >> servers?  Maybe
>> >> it was just random timing that gave you the poor write performance on
>> >> the old
>> >> server which might be also yielding occassional poor performance on
>> >> the new
>> >> one.
>> >>
>> >
>> > The direct attached scsi discs on the old database server we getting
>> > 45MBps not the SAN.  The SAN got 62/145Mbps, which is not as bad.
>>
>> How many spindles you got in that SAN?
>
> 105 IIRC.

You have 105 spindles are you are only get 62megs on writes? That seems
about half what you should be getting. (at least).



Take the numbers with grain of salt.  They are by no means a thorough
evaluation.  I just ran bonnie a couple times to get a rough reference
point.  I can do a more thorough analysis.

Alex


Joshua D. Drake


>
> Alex
>



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Improving query performance

2007-03-01 Thread David Leangen

Hello!

I'm new to performance tuning on postgres. I've read the docs on the
posgtres site, as well as:

  http://www.revsys.com/writings/postgresql-performance.html
  http://www.powerpostgresql.com/PerfList

However, my query is really slow, and I'm not sure what the main cause
could be, as there are so many variables. I'm hoping people with more
experience could help out.

My machine has 8Gb RAM, 2xCPU (2Gz, I think...)

Table has about 1M rows.

This is my postgres.conf:

listen_addresses = '*'
port = 5432
max_connections = 100
shared_buffers = 256000
effective_cache_size = 100
work_mem = 500
redirect_stderr = on
log_directory = 'pg_log'
log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on
log_rotation_age = 1440
log_rotation_size = 0
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'

This is the result of "explain analyze":

 Aggregate  (cost=384713.17..384713.18 rows=1 width=4) (actual
time=254856.025..254856.025 rows=1 loops=1)
   ->  Seq Scan on medline_articles t0  (cost=0.00..382253.00
rows=984068 width=4) (actual time=511.841..254854.981 rows=788 loops=1)
 Filter: long_ugly_query_here



And this is the actual query:

SELECT COUNT(t0.ID) FROM public.MY_TABLE t0 
WHERE ((POSITION('adenosine cyclic 35-monophosphate' IN LOWER(t0.TITLE))
- 1) >=0 OR 
(POSITION('adenosine cyclic 55-monophosphate' IN LOWER(t0.TEXT)) - 1) >=
0 OR 
(POSITION('cyclic amp, disodium salt' IN LOWER(t0.TITLE)) - 1) >= 0 OR 
(POSITION('cyclic amp, disodium salt' IN LOWER(t0.TEXT)) - 1) >= 0 OR 
(POSITION('cyclic amp, sodium salt' IN LOWER(t0.TEXT)) - 1) >= 0 OR 
(POSITION('cyclic amp, sodium salt' IN LOWER(t0.TITLE)) - 1) >= 0 OR 
(POSITION('cyclic amp' IN LOWER(t0.TEXT)) - 1) >= 0 OR 
(POSITION('cyclic amp' IN LOWER(t0.TITLE)) - 1) >= 0 OR 
(POSITION('cyclic amp, monopotassium salt' IN LOWER(t0.TEXT)) - 1) >= 0
OR 
(POSITION('cyclic amp, monopotassium salt' IN LOWER(t0.TEXT)) - 1) >= 0
OR 
(POSITION('adenosine cyclic-35-monophosphate' IN LOWER(t0.TEXT)) - 1) >=
0 OR 
(POSITION('adenosine cyclic-35-monophosphate' IN LOWER(t0.TITLE)) - 1)
>= 0 OR 
(POSITION('adenosine cyclic monophosphate' IN LOWER(t0.TEXT)) - 1) >= 0
OR 
(POSITION('adenosine cyclic monophosphate' IN LOWER(t0.TITLE)) - 1) >= 0
OR 
(POSITION('cyclic amp, monoammonium salt' IN LOWER(t0.TEXT)) - 1) >= 0
OR 
(POSITION('cyclic amp, monoammonium salt' IN LOWER(t0.TITLE)) - 1) >= 0
OR 
(POSITION('adenosine cyclic 3,5 monophosphate' IN LOWER(t0.TEXT)) - 1)
>= 0 OR 
(POSITION('adenosine cyclic 3,5 monophosphate' IN LOWER(t0.TITLE)) - 1)
>= 0 OR 
(POSITION('cyclic amp, monosodium salt' IN LOWER(t0.TEXT)) - 1) >= 0 OR 
(POSITION('cyclic amp, monosodium salt' IN LOWER(t0.TITLE)) - 1) >= 0
OR 
(POSITION('cyclic amp, (r)-isomer' IN LOWER(t0.TEXT)) - 1) >= 0 OR 
(POSITION('cyclic amp, (r)-isomer' IN LOWER(t0.TEXT)) - 1) >= 0)


Some more info:

pubmed=> SELECT relpages, reltuples FROM pg_class WHERE relname =
'MY_TABLE';
 relpages | reltuples
--+---
   155887 |984200
(1 row)



Thanks for any suggestions!

Dave




PS - Yes! I did run "vacuum analyze" :-)




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PERFORM] Array indexes, GIN?

2007-03-01 Thread Adam L Beberg
I need to cross reference 2 tables. There are O(10M) A's, each has an 
ordered set of 10 of the O(100K) B's associated with it. The dominant 
query will be finding the A's and their count associated with a given 
list of ~1k B's i.e. if 2 of the listed B's are in A's set of 10, it's 
(A,2), and we should get back ~100K rows. The good news is we only need 
to run this brutal query every couple minutes, but the row updates will 
flow fast.


Luckily this is PostgreSQL, so the simple solution seems to be

  CREATE TABLE xref( A bigint, B bigint[10] ); -- A is primary key

which cuts down the table overhead. O(10M) rows w/array.

On the surface, looks like a job for GIN, but GIN seems undocumented, 
specifically mentions it doesn't support the deletes we'll have many of 
since it's designed for word searching apparently, the performance 
implications are undocumented. I searched, I read, and even IRC'd, and 
it seems like GIN is just not used much.


Is GIN right? Will this work at all? Will it run fast enough to function?

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Array indexes, GIN?

2007-03-01 Thread Josh Berkus
Adam,

> On the surface, looks like a job for GIN, but GIN seems undocumented,
> specifically mentions it doesn't support the deletes we'll have many of
> since it's designed for word searching apparently, the performance
> implications are undocumented. I searched, I read, and even IRC'd, and
> it seems like GIN is just not used much.

It's new (as of 8.2).  And the authors, Oleg and Teodor, are notorious for 
skimpy documetentation.

I'd start with the code in INTARRAY contrib module (also by Teodor) and bug 
them on pgsql-hackers about helping you implement a GIN index for arrays.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Improving query performance

2007-03-01 Thread Tom Lane
David Leangen <[EMAIL PROTECTED]> writes:
> And this is the actual query:

> SELECT COUNT(t0.ID) FROM public.MY_TABLE t0 
> WHERE ((POSITION('adenosine cyclic 35-monophosphate' IN LOWER(t0.TITLE))
> - 1) >=0 OR 
> (POSITION('adenosine cyclic 55-monophosphate' IN LOWER(t0.TEXT)) - 1) >=
> 0 OR 
> (POSITION('cyclic amp, disodium salt' IN LOWER(t0.TITLE)) - 1) >= 0 OR 
> (POSITION('cyclic amp, disodium salt' IN LOWER(t0.TEXT)) - 1) >= 0 OR 
> (POSITION('cyclic amp, sodium salt' IN LOWER(t0.TEXT)) - 1) >= 0 OR 
> ...etc...

I think you need to look into full-text indexing (see tsearch2).

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Array indexes, GIN?

2007-03-01 Thread Oleg Bartunov

On Thu, 1 Mar 2007, Josh Berkus wrote:


Adam,


On the surface, looks like a job for GIN, but GIN seems undocumented,
specifically mentions it doesn't support the deletes we'll have many of
since it's designed for word searching apparently, the performance
implications are undocumented. I searched, I read, and even IRC'd, and
it seems like GIN is just not used much.


It's new (as of 8.2).  And the authors, Oleg and Teodor, are notorious for
skimpy documetentation.


We're getting better, we have 72 pages written about new FTS :)



I'd start with the code in INTARRAY contrib module (also by Teodor) and bug
them on pgsql-hackers about helping you implement a GIN index for arrays.


GIN already has support for one dimensional arrays and intarray, particularly,
too has support of GiN.


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Array indexes, GIN?

2007-03-01 Thread Adam L Beberg

Oleg Bartunov wrote on 3/1/2007 10:45 PM:

On Thu, 1 Mar 2007, Josh Berkus wrote:


Adam,


On the surface, looks like a job for GIN, but GIN seems undocumented,
specifically mentions it doesn't support the deletes we'll have many of
since it's designed for word searching apparently, the performance
implications are undocumented. I searched, I read, and even IRC'd, and
it seems like GIN is just not used much.


It's new (as of 8.2).  And the authors, Oleg and Teodor, are notorious 
for

skimpy documetentation.


We're getting better, we have 72 pages written about new FTS :)


I'm guessing FTS is not quite done since you matched 'FTS' to 'GIN' ;)

GIN already has support for one dimensional arrays and intarray, 
particularly, too has support of GiN.


Great, so can GIN handle my situation? I'm a little unsure what to make 
of "Note: There is no delete operation for ET." in particular since I'm 
dealing with large numbers.


--
Adam L. Beberg
http://www.mithral.com/~beberg/

---(end of broadcast)---
TIP 6: explain analyze is your friend