Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-13 Thread Sven Geisler
Hi,

Did someone try '-mfpmath=sse -msse3'?

Would be interesting to know if -mfpmath=sse boost the performance.

I guess, the difference in the generated code isn't that much between
i686 and prescott. The bigger step is i386 to i686. '-mfpmath=sse
-msse3' will also use the SSE unit, which the classic i686 doesn't have.

CFLAGS=-O2 -march=prescott -mfpmath=sse -msse3

Best regards
Sven.

Daniel van Ham Colchete schrieb:
 I just made another test with a second Gentoo machine:
 
 Pentium 4 3.0Ghz Prescott
 GCC 4.1.1
 Glibc 2.4
 PostgreSQL 8.1.5
 Kernel 2.6.17
 
 Same postgresql.conf as yesterday's.
 
 First test
 ==
  GLIBC: -O2 -march=i686
  PostgreSQL: -O2 -march=i686
  Results: 974.638731 975.602142 975.882051 969.142503 992.914167
 983.467131 983.231575 994.901330 970.375221 978.377467
  Average (error):  980 tps (13 tps)
 
 Second test
 ===
  GLIBC: -O2 -march=i686
  PostgreSQL: -O2 -march=prescott
  Results: 988.319643 976.152973 1006.482553 992.431322 983.090838
 992.674065 989.216746 990.897615 987.129802 975.907955
  Average (error):  988 tps (15 tps)
 
 Third test
 ==
  GLIBC: -O2 -march=prescott
  PostgreSQL: -O2 -march=i686
  Results: 969.085400 966.187309 994.882325 968.715150 956.766771
 970.151542 960.090571 967.680628 986.568462 991.756520
  Average (error): 973 tps (19 tps)
 
 Forth test
 ==
  GLIBC: -O2 -march=prescott
  PostgreSQL: -O2 -march=prescott
  Results: 980.888371 978.128269 969.344669 978.021509 979.256603
 993.236457 984.078399 981.654834 976.295925 969.796277
  Average (error):  979 tps (11 tps)
 
 The results showed no significant change. The conclusion of today's
 test would be that there are no improvement at PostgreSQL when using
 -march=prescott.
 
 I only see 3 diferences between yesterday's server and today's: the
 kernel version (y: 2.6.18, t:2.6.17), the server uses an IDE harddrive
 (yesterday was SATA), and the gcc version (3.4.6 - 4.1.1).
 
 I don't know why yesterday we had improved and today we had not.
 
 Best
 Daniel
 
 On 12/12/06, Daniel van Ham Colchete [EMAIL PROTECTED] wrote:
 I'm making some other tests here at another hardware (also Gentoo). I
 found out that PostgreSQL stops for a while if I change the -t
 parameter on bgbench from 600 to 1000 and I have ~150 tps instead of
 ~950tps.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PERFORM] Slow update with simple query

2006-12-13 Thread Arnaud Lesauvage

Hi list !

I am running a query to update the boolean field of a table based on
another table's fields.

The query is (changed names for readability):
UPDATE t1
SET booleanfield = (t2.field1 IN ('some', 'other') AND t2.field2 = 'Y')
FROM t2
WHERE t1.uid = t2.uid

t2.uid is the PRIMARY KEY.
t2 only has ~1000 rows, so I think it fits fully in memory.
t1 as ~2.000.000 rows.
There is an index on t1.uid also.

The explain (sorry, not explain analyze available yet) is :

Hash Join  (cost=112.75..307410.10 rows=2019448 width=357)
  Hash Cond: (outer.uid= inner.uid)
  -  Seq Scan on t1 (cost=0.00..261792.01 rows=2033001 width=340)
  -  Hash  (cost=110.20..110.20 rows=1020 width=53)
-  Seq Scan on t2  (cost=0.00..110.20 rows=1020 width=53)

My query has been running for more than 1.5 hour now, and it is still running.
Nothing else is running on the server.
There are two multicolumn-indexes on this column (both are 3-columns indexes). One of them has a 
functional column (date_trunc('month', datefield)).


Do you think the problem is with the indexes ?

The hardware is not great, but the database is on a RAID1 array, so its not bad 
either.
I am surprised that it takes more than 3 seconds per row to be updated.

Thanks for your opinion on this !

--
Arnaud

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Slow update with simple query

2006-12-13 Thread Ragnar
On mið, 2006-12-13 at 11:51 +0100, Arnaud Lesauvage wrote:
 Hi list !
 
 I am running a query to update the boolean field of a table based on
 another table's fields.
 
 The query is (changed names for readability):
 UPDATE t1
 SET booleanfield = (t2.field1 IN ('some', 'other') AND t2.field2 = 'Y')
 FROM t2
 WHERE t1.uid = t2.uid
 
 t2.uid is the PRIMARY KEY.
 t2 only has ~1000 rows, so I think it fits fully in memory.
 t1 as ~2.000.000 rows.
 There is an index on t1.uid also.
 
 The explain (sorry, not explain analyze available yet) is :
 
 Hash Join  (cost=112.75..307410.10 rows=2019448 width=357)
Hash Cond: (outer.uid= inner.uid)
-  Seq Scan on t1 (cost=0.00..261792.01 rows=2033001 width=340)
-  Hash  (cost=110.20..110.20 rows=1020 width=53)
  -  Seq Scan on t2  (cost=0.00..110.20 rows=1020 width=53)
 
 My query has been running for more than 1.5 hour now, and it is still running.
 Nothing else is running on the server.
 There are two multicolumn-indexes on this column (both are 3-columns 
 indexes). One of them has a 
 functional column (date_trunc('month', datefield)).
 
 Do you think the problem is with the indexes ?

I guess so. are you sure about the index on t1.uid?
what are the column definitions for t1.uid and t2.uid ?
are they the same ?
you should ba able to get a plan similar to:
Merge Join  (cost=0.00..43.56 rows=1000 width=11)
   Merge Cond: (outer.uid = inner.uid)
   -  Index Scan using t1i on t1  (cost=0.00..38298.39 rows=235
width=10)
   -  Index Scan using t2i on t2  (cost=0.00..26.73 rows=1000 width=5)

what postgres version are you using ?

gnari




 
 The hardware is not great, but the database is on a RAID1 array, so its not 
 bad either.
 I am surprised that it takes more than 3 seconds per row to be updated.
 
 Thanks for your opinion on this !
 
 --
 Arnaud
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org
 


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Slow update with simple query

2006-12-13 Thread Arnaud Lesauvage

Ragnar a écrit :

Do you think the problem is with the indexes ?


I guess so. are you sure about the index on t1.uid?
what are the column definitions for t1.uid and t2.uid ?
are they the same ?


Man, no !!!
I just checked and indeed, no index on this column. I probably dropped 
it lately.

Thanks Ragnar.
(t1.uid and t2.uid were the same, character(32) columns)


you should ba able to get a plan similar to:
Merge Join  (cost=0.00..43.56 rows=1000 width=11)
   Merge Cond: (outer.uid = inner.uid)
   -  Index Scan using t1i on t1  (cost=0.00..38298.39 rows=235
width=10)
   -  Index Scan using t2i on t2  (cost=0.00..26.73 rows=1000 width=5)

what postgres version are you using ?


Ooops, forgot that too : 8.1.4

I am creating the index right now, I'll tell you if this fixes the problem.
Thanks for your help !

--
Arnaud

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Slow update with simple query

2006-12-13 Thread Jens Schipkowski

Hi,

the problem is a combination of bad formed SQL and maybe missing indexes.
try this:
UPDATE t1
SET booleanfield = foo.bar
FROM (SELECT uid,(field IN ('some','other') AND field2 = 'Y') AS bar FROM  
t2) AS foo

WHERE t1.uid=foo.uid;

and index t1.uid, t2.uid, t2.field, t2.field2

regards,
Jens Schipkowski

On Wed, 13 Dec 2006 11:51:10 +0100, Arnaud Lesauvage [EMAIL PROTECTED]  
wrote:



Hi list !

I am running a query to update the boolean field of a table based on
another table's fields.

The query is (changed names for readability):
UPDATE t1
SET booleanfield = (t2.field1 IN ('some', 'other') AND t2.field2 = 'Y')
FROM t2
WHERE t1.uid = t2.uid

t2.uid is the PRIMARY KEY.
t2 only has ~1000 rows, so I think it fits fully in memory.
t1 as ~2.000.000 rows.
There is an index on t1.uid also.

The explain (sorry, not explain analyze available yet) is :

Hash Join  (cost=112.75..307410.10 rows=2019448 width=357)
   Hash Cond: (outer.uid= inner.uid)
   -  Seq Scan on t1 (cost=0.00..261792.01 rows=2033001 width=340)
   -  Hash  (cost=110.20..110.20 rows=1020 width=53)
 -  Seq Scan on t2  (cost=0.00..110.20 rows=1020 width=53)

My query has been running for more than 1.5 hour now, and it is still  
running.

Nothing else is running on the server.
There are two multicolumn-indexes on this column (both are 3-columns  
indexes). One of them has a functional column (date_trunc('month',  
datefield)).


Do you think the problem is with the indexes ?

The hardware is not great, but the database is on a RAID1 array, so its  
not bad either.

I am surprised that it takes more than 3 seconds per row to be updated.

Thanks for your opinion on this !

--
Arnaud

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org




--
**
APUS Software GmbH

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Slow update with simple query

2006-12-13 Thread Arnaud Lesauvage

Jens Schipkowski a écrit :

the problem is a combination of bad formed SQL and maybe missing indexes.
try this:
UPDATE t1
SET booleanfield = foo.bar
 FROM (SELECT uid,(field IN ('some','other') AND field2 = 'Y') AS bar FROM  
t2) AS foo

WHERE t1.uid=foo.uid;



Hi Jens,
Why is this query better than the other one ? Because it runs the 
(field IN ('some','other') AND field2 = 'Y') once and then executes 
the join with the resulting set ?



and index t1.uid, t2.uid, t2.field, t2.field2


t1.field can only take 3 or 4 values (don't remember exactly), and 
field2 only 2 ('Y' or 'N'). So this fields have a very low cardinality.

Won't the planner chose to do a table scan in such a case ?

Thanks for your advices !

--
Arnaud

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Slow update with simple query

2006-12-13 Thread Jens Schipkowski
On Wed, 13 Dec 2006 13:23:41 +0100, Arnaud Lesauvage [EMAIL PROTECTED]  
wrote:



Jens Schipkowski a écrit :
the problem is a combination of bad formed SQL and maybe missing  
indexes.

try this:
UPDATE t1
SET booleanfield = foo.bar
 FROM (SELECT uid,(field IN ('some','other') AND field2 = 'Y') AS bar  
FROM  t2) AS foo

WHERE t1.uid=foo.uid;



Hi Jens,
Why is this query better than the other one ? Because it runs the  
(field IN ('some','other') AND field2 = 'Y') once and then executes  
the join with the resulting set ?
True. The Subselect in FROM clause will be executed once and will be  
joined using the condition at where clause. So your condition at t2 is not  
executed for each row in t1(2mio records) but for each row in t2(1k  
records). And the boolean value is already set during update.


regards,
Jens




and index t1.uid, t2.uid, t2.field, t2.field2


t1.field can only take 3 or 4 values (don't remember exactly), and  
field2 only 2 ('Y' or 'N'). So this fields have a very low cardinality.

Won't the planner chose to do a table scan in such a case ?

Thanks for your advices !

--
Arnaud




--
**
APUS Software GmbH

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Slow update with simple query

2006-12-13 Thread Arnaud Lesauvage

Jens Schipkowski a écrit :
On Wed, 13 Dec 2006 13:23:41 +0100, Arnaud Lesauvage [EMAIL PROTECTED]  
Why is this query better than the other one ? Because it runs the  
(field IN ('some','other') AND field2 = 'Y') once and then executes  
the join with the resulting set ?
True. The Subselect in FROM clause will be executed once and will be  
joined using the condition at where clause. So your condition at t2 is not  
executed for each row in t1(2mio records) but for each row in t2(1k  
records). And the boolean value is already set during update.


OK Jens, thanks for clarifying this.
I thought the planner could guess what to do in such cases.

Regards
--
Arnaud

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Slow update with simple query

2006-12-13 Thread Ragnar
On mið, 2006-12-13 at 14:38 +0100, Arnaud Lesauvage wrote:
 Jens Schipkowski a écrit :
  On Wed, 13 Dec 2006 13:23:41 +0100, Arnaud Lesauvage [EMAIL PROTECTED]  
  Why is this query better than the other one ? Because it runs the  
  (field IN ('some','other') AND field2 = 'Y') once and then executes  
  the join with the resulting set ?
  True. The Subselect in FROM clause will be executed once and will be  
  joined using the condition at where clause. So your condition at t2 is not  
  executed for each row in t1(2mio records) but for each row in t2(1k  
  records). And the boolean value is already set during update.
 
 OK Jens, thanks for clarifying this.
 I thought the planner could guess what to do in such cases.

don't worry, it will.
this is not your problem

gnari



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Slow update with simple query

2006-12-13 Thread Arnaud Lesauvage

Ragnar a écrit :

On mið, 2006-12-13 at 14:38 +0100, Arnaud Lesauvage wrote:

Jens Schipkowski a écrit :
 On Wed, 13 Dec 2006 13:23:41 +0100, Arnaud Lesauvage [EMAIL PROTECTED]  
 Why is this query better than the other one ? Because it runs the  
 (field IN ('some','other') AND field2 = 'Y') once and then executes  
 the join with the resulting set ?
 True. The Subselect in FROM clause will be executed once and will be  
 joined using the condition at where clause. So your condition at t2 is not  
 executed for each row in t1(2mio records) but for each row in t2(1k  
 records). And the boolean value is already set during update.


OK Jens, thanks for clarifying this.
I thought the planner could guess what to do in such cases.


don't worry, it will.
this is not your problem


Indeed, the new query does not perform that well :

Hash Join  (cost=112.75..307504.97 rows=2024869 width=355) (actual 
time=53.995..246443.811 rows=2020061 loops=1)
  Hash Cond: (outer.uid = inner.uid)
  -  Seq Scan on t1 (cost=0.00..261792.01 rows=2033001 width=338) (actual 
time=19.342..234304.499 rows=2033001 loops=1)
  -  Hash  (cost=110.20..110.20 rows=1020 width=53) (actual time=4.853..4.853 
rows=1020 loops=1)
-  Seq Scan on t2  (cost=0.00..110.20 rows=1020 width=53) (actual 
time=0.017..2.586 rows=1020 loops=1)
Total runtime: 2777844.892 ms

I removed all unnecessary indexes on t1 before running the query (I left the 
index on uid and the multicolumn index containind the updated field).
I believe the multicolumn-functional-index computation is taking some time 
here, isn't it ?

Regards
--
Arnaud

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-13 Thread Cosimo Streppone

Michael Stone wrote:


On Tue, Dec 12, 2006 at 01:42:06PM +0100, Cosimo Streppone wrote:

-O0 ~ 957 tps
-O1 -mcpu=pentium4 -mtune=pentium4 ~ 1186 tps
-O2 -mcpu=pentium4 -mtune=pentium4 ~ 1229 tps
-O3 -mcpu=pentium4 -mtune=pentium4 ~ 1257 tps
-O6 -mcpu=pentium4 -mtune=pentium4 ~ 1254 tps

I'm curious now to get the same tests run with
a custom-cflags-compiled glibc.


I'd be curious to see -O2 with and without the arch-specific flags, 
since that's mostly what the discussion is about.


I run the same tests only for:

1) '-O2'
2) '-O2 -march=pentium4 -mtune=pentium4 -mcpu=pentium4'
   (so no more doubts here, and thanks for gcc hints :-)

and I obtained respectively an average of 1238 (plain -O2)
vs. 1229 tps on 9 runs.
Disk subsystem is a standard desktop SATA, no more than that.

I tried also recompiling *only* pgbench with various options, but as
I expected (and hoped) nothing changed.

Interesting, eh?

--
Cosimo


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Slow update with simple query

2006-12-13 Thread asif ali
Arnaud,
 Have you run ANALYZE on the table after creating index?
 Also make sure that #effective_cache_size is set properly. A higher 
value makes it more likely to use index scans.
 
 Thanks
 asif ali

Arnaud Lesauvage [EMAIL PROTECTED] wrote: Ragnar a écrit :
 On mið, 2006-12-13 at 14:38 +0100, Arnaud Lesauvage wrote:
 Jens Schipkowski a écrit :
  On Wed, 13 Dec 2006 13:23:41 +0100, Arnaud Lesauvage   
  Why is this query better than the other one ? Because it runs the  
  (field IN ('some','other') AND field2 = 'Y') once and then executes  
  the join with the resulting set ?
  True. The Subselect in FROM clause will be executed once and will be  
  joined using the condition at where clause. So your condition at t2 is not 
   
  executed for each row in t1(2mio records) but for each row in t2(1k  
  records). And the boolean value is already set during update.
 
 OK Jens, thanks for clarifying this.
 I thought the planner could guess what to do in such cases.
 
 don't worry, it will.
 this is not your problem

Indeed, the new query does not perform that well :

Hash Join  (cost=112.75..307504.97 rows=2024869 width=355) (actual 
time=53.995..246443.811 rows=2020061 loops=1)
  Hash Cond: (outer.uid = inner.uid)
  -  Seq Scan on t1 (cost=0.00..261792.01 rows=2033001 width=338) (actual 
time=19.342..234304.499 rows=2033001 loops=1)
  -  Hash  (cost=110.20..110.20 rows=1020 width=53) (actual time=4.853..4.853 
rows=1020 loops=1)
-  Seq Scan on t2  (cost=0.00..110.20 rows=1020 width=53) (actual 
time=0.017..2.586 rows=1020 loops=1)
Total runtime: 2777844.892 ms

I removed all unnecessary indexes on t1 before running the query (I left the 
index on uid and the multicolumn index containind the updated field).
I believe the multicolumn-functional-index computation is taking some time 
here, isn't it ?

Regards
--
Arnaud

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


 
-
Any questions?  Get answers on any topic at Yahoo! Answers. Try it now.

[PERFORM] Insertion to temp table deteriorating over time

2006-12-13 Thread Steven Flatt

Hi,

Our application is using Postgres 7.4 and I'd like to understand the root
cause of this problem:

To speed up overall insert time, our application will write thousands of
rows, one by one, into a temp table (same structure as a permanent table),
then do a bulk insert from the temp table to the permanent table.  After
this bulk insert is done, the temp table is truncated and the process is
repeated.  We do this because Postgres can do many individual inserts to a
temp table much faster than to a permanent table.

The problem we are seeing is that over time, the cost of a single insert to
the temp table seems to grow.  After a restart of postgres, a single insert
to the temp table takes about 3ms.  Over a few days, this grows to about
60ms per insert.  Restarting postgres drops this insert time back to 3ms,
supposedly because the temp table is re-created.  Our workaround right now
is to restart the database every few days, but we don't like this solution
much.

Any idea where the bloat is happening?  I believe that if we were dropping
and re-creating the temp table over and over, that could result in pg_class
bloat (among other catalog tables), but what is going wrong if we use the
same table over and over and truncate it?

Thanks,
Steve


Re: [PERFORM] Slow update with simple query

2006-12-13 Thread Tom Lane
Arnaud Lesauvage [EMAIL PROTECTED] writes:
 Indeed, the new query does not perform that well :

 Hash Join  (cost=112.75..307504.97 rows=2024869 width=355) (actual 
 time=53.995..246443.811 rows=2020061 loops=1)
 ...
 Total runtime: 2777844.892 ms

 I removed all unnecessary indexes on t1 before running the query (I left the 
 index on uid and the multicolumn index containind the updated field).
 I believe the multicolumn-functional-index computation is taking some time 
 here, isn't it ?

Given that the plan itself only takes 246 sec, there's *something*
associated with row insertion that's eating the other 2500+ seconds.
Either index entry computation or constraint checking ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Slow update with simple query

2006-12-13 Thread Arnaud Lesauvage

asif ali a écrit :

Arnaud,
 Have you run ANALYZE on the table after creating index?


Yes, I have !


 Also make sure that #effective_cache_size is set properly. A higher 
value makes it more likely to use index scans.


It is set to 50.000. I thought this would be enough, and maybe too much !

Thanks for your advice !

--
Arnaud

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Insertion to temp table deteriorating over time

2006-12-13 Thread Rajesh Kumar Mallah

On 12/13/06, Steven Flatt [EMAIL PROTECTED] wrote:

Hi,

Our application is using Postgres 7.4 and I'd like to understand the root
cause of this problem:

To speed up overall insert time, our application will write thousands of
rows, one by one, into a temp table


1. how frequently are you commiting the transaction ?
   if you commit less frequetly it will be faster.

2. If you use COPY instead of INSERT it will be faster.
   using COPY is easy with DBD::Pg (perl). In versions
   8.x i think there has been major speed improvements
  in COPY.

I do not know the root cause of slowdown though.

Regds
mallah.







---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-13 Thread Ron

At 11:11 AM 12/13/2006, Cosimo Streppone wrote:


Interesting, eh?

Cosimo


What I find interesting is that so far Guido's C2D Mac laptop has 
gotten the highest values by far in this set of experiments, and no 
one else is even close.
The slowest results, Michael's, are on the system with what appears 
to be the slowest CPU of the bunch; and the ranking of the rest of 
the results seem to similarly depend on relative CPU 
performance.  This is not what one would naively expect when benching 
a IO intensive app like a DBMS.


Given that the typical laptop usually has 1 HD, and a relatively 
modest one at that (the fastest available are SATA 7200rpm or 
Seagate's perpendicular recording 5400rpm) in terms of performance, 
this feels very much like other factors are bottlenecking the 
experiments to the point where Daniel's results regarding compiler 
options are not actually being tested.


Anyone got a 2.33 GHz C2D box with a decent HD IO subsystem more 
representative of a typical DB server hooked up to it?


Again, the best way to confirm/deny Daniel's results is to duplicate 
the environment he obtained those results with as closely as possible 
(preferably exactly) and then have someone else try to duplicate his results.


Also, I think the warnings regarding proper configuration of pgbench 
and which version of pgbench to use are worthy of note.  Do we have 
guidance yet as to what checkpoint_segments should be set 
to?   Should we be considering using something other than pgbench for 
such experiments?


Ron Peacetree



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Slow update with simple query

2006-12-13 Thread Tom Lane
Mark Lewis [EMAIL PROTECTED] writes:
 But he's using 8.1.4-- in that version, an explain analyze would list
 the time taken to go through triggers, so the fact that we don't see any
 of those lines means that it can't be constraint checking, so wouldn't
 it have to be the index update overhead?

Well, it can't be foreign key checking.  Could have been an expensive
function in a CHECK constraint, maybe...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-13 Thread Bucky Jordan
 What I find interesting is that so far Guido's C2D Mac laptop has
 gotten the highest values by far in this set of experiments, and no
 one else is even close.
 The slowest results, Michael's, are on the system with what appears
 to be the slowest CPU of the bunch; and the ranking of the rest of
 the results seem to similarly depend on relative CPU
 performance.  This is not what one would naively expect when benching
 a IO intensive app like a DBMS.
 
 Given that the typical laptop usually has 1 HD, and a relatively
 modest one at that (the fastest available are SATA 7200rpm or
 Seagate's perpendicular recording 5400rpm) in terms of performance,
 this feels very much like other factors are bottlenecking the
 experiments to the point where Daniel's results regarding compiler
 options are not actually being tested.
 
 Anyone got a 2.33 GHz C2D box with a decent HD IO subsystem more
 representative of a typical DB server hooked up to it?

I've only seen pg_bench numbers  2,000 tps on either really large
hardware (none of the above mentioned comes close) or the results are in
memory due to a small database size (aka measuring update contention).

Just a guess, but these tests (compiler opts.) seem like they sometimes
show a benefit where the database is mostly in RAM (which I'd guess many
people have) since that would cause more work to be put on the
CPU/Memory subsystems. 

Other people on the list hinted at this, but I share their hypothesis
that once you get IO involved as a bottleneck (which is a more typical
DB situation) you won't notice compiler options.

I've got a 2 socket x 2 core woodcrest poweredge 2950 with a BBC 6 disk
RAID I'll run some tests on as soon as I get a chance. 

I'm also thinking for this test, there's no need to tweak the default
config other than maybe checkpoint_segments, since I don't really want
postgres using large amounts of RAM (all that does is require me to
build a larger test DB). Thoughts?

Thanks,

Bucky

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] strange query behavior

2006-12-13 Thread Tim Jones
Hi,

  I have a query that uses an IN clause and it seems in perform great
when there is more than two values in it but if there is only one it is
really slow. Also if I change the query to use an = instead of IN in the
case of only one value it is still slow. Possibly I need to reindex this
particular index?

thanks 

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] strange query behavior

2006-12-13 Thread Ragnar
On mið, 2006-12-13 at 13:42 -0500, Tim Jones wrote:

   I have a query that uses an IN clause and it seems in perform great
 when there is more than two values in it but if there is only one it is
 really slow. Also if I change the query to use an = instead of IN in the
 case of only one value it is still slow. Possibly I need to reindex this
 particular index?

can you provide us with an EXPLAIN ANALYZE for these 2 cases?

what version pg is this?

does this happen only for a particular single value, or for any values?

I assume you have ANALYZEd the table in question.

gnari



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] strange query behavior

2006-12-13 Thread Tom Lane
Tim Jones [EMAIL PROTECTED] writes:
   I have a query that uses an IN clause and it seems in perform great
 when there is more than two values in it but if there is only one it is
 really slow. Also if I change the query to use an = instead of IN in the
 case of only one value it is still slow.

Please provide EXPLAIN ANALYZE output for both cases.

 Possibly I need to reindex this
 particular index?

More likely you need to ANALYZE the table so that the planner has
up-to-date stats ...

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PERFORM] Optimizing a query

2006-12-13 Thread James Cloos
I've currently got this table:

,
| n=# \d nanpa
|  Table public.nanpa
|Column   | Type | Modifiers 
| +--+---
|  state  | character(2) | 
|  npa| character(3) | not null
|  nxx| character(3) | not null
|  ocn| character(4) | 
|  company| text | 
|  ratecenter | text | 
|  switch | text | 
|  effective  | date | 
|  use| character(2) | not null
|  assign | date | 
|  ig | character(1) | 
| Indexes:
| nanpa_pkey PRIMARY KEY, btree (npa, nxx) CLUSTER
`

and was doing queries of the form:

,
| select * from nanpa where npa=775 and nxx=413;
`

where were quite slow.  Explain showed that it was doing sequential
scans even though the primary key contained the two term I was
selecting on.

Today, looking at it again in prep to this post, I noticed that the
numbers were being converted to ::text, and a quick test showed that
queries of the form:

,
| select * from nanpa where npa=775::bpchar and nxx=413::bpchar;
`

used the index.

I specified char(3) when I created the table simple because npa and
nxx are defined as three-character strings.  Tagging the queies is
a pain, especially as I often do queries of that form in psql(1).

(Incidently, there are multiple similar tables, also keyed on
(npa,nxx), which show the same problem.  The nanpa table above is
just a good example.)

Should I convert the columns to text?  Or create an additional index
that expects ::text args?  (If so, how?)

Or is there some other way to ensure the indices get used w/o having
to tag data in the queries?

Thanks,

-JimC
-- 
James Cloos [EMAIL PROTECTED] OpenPGP: 1024D/ED7DAEA6

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Optimizing a query

2006-12-13 Thread Tomeh, Husam

Have you run vacuum/analyze on the table?

--
  Husam 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of James Cloos
Sent: Wednesday, December 13, 2006 10:48 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Optimizing a query

I've currently got this table:

,
| n=# \d nanpa
|  Table public.nanpa
|Column   | Type | Modifiers 
| +--+---
|  state  | character(2) | 
|  npa| character(3) | not null
|  nxx| character(3) | not null
|  ocn| character(4) | 
|  company| text | 
|  ratecenter | text | 
|  switch | text | 
|  effective  | date | 
|  use| character(2) | not null
|  assign | date | 
|  ig | character(1) | 
| Indexes:
| nanpa_pkey PRIMARY KEY, btree (npa, nxx) CLUSTER
`

and was doing queries of the form:

,
| select * from nanpa where npa=775 and nxx=413;
`

where were quite slow.  Explain showed that it was doing sequential
scans even though the primary key contained the two term I was
selecting on.

Today, looking at it again in prep to this post, I noticed that the
numbers were being converted to ::text, and a quick test showed that
queries of the form:

,
| select * from nanpa where npa=775::bpchar and nxx=413::bpchar;
`

used the index.

I specified char(3) when I created the table simple because npa and
nxx are defined as three-character strings.  Tagging the queies is
a pain, especially as I often do queries of that form in psql(1).

(Incidently, there are multiple similar tables, also keyed on
(npa,nxx), which show the same problem.  The nanpa table above is
just a good example.)

Should I convert the columns to text?  Or create an additional index
that expects ::text args?  (If so, how?)

Or is there some other way to ensure the indices get used w/o having
to tag data in the queries?

Thanks,

-JimC
-- 
James Cloos [EMAIL PROTECTED] OpenPGP: 1024D/ED7DAEA6

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
**
This message contains confidential information intended only for the use of the 
addressee(s) named above and may contain information that is legally 
privileged.  If you are not the addressee, or the person responsible for 
delivering it to the addressee, you are hereby notified that reading, 
disseminating, distributing or copying this message is strictly prohibited.  If 
you have received this message by mistake, please immediately notify us by 
replying to the message and delete the original message immediately thereafter.

Thank you.

   FADLD Tag
**


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] strange query behavior

2006-12-13 Thread Tim Jones
The tables for theses queries are vacuumed and analyzed regularly. I
just did an analyze to be sure and here are the results


explain analyze select * from battery join observationresults on
battery.batteryidentifier = observationresults.batteryidentifier left
outer join observationcomment on
observationresults.observationidentifier =
observationcomment.observationidentifier left outer join batterycomment
on battery.batteryidentifier=batterycomment.batteryidentifier  where
battery.batteryidentifier in (1177470, 1177469);
 
QUERY PLAN



---
 Nested Loop Left Join  (cost=5.03..12553.00 rows=13 width=248) (actual
time=0.362..1.345 rows=30 loops=1)
   -  Nested Loop Left Join  (cost=4.01..12424.13 rows=13 width=208)
(actual time=0.307..0.927 rows=30 loops=1)
 -  Nested Loop  (cost=4.01..9410.49 rows=13 width=145) (actual
time=0.227..0.416 rows=30 loops=1)
   -  Bitmap Heap Scan on battery  (cost=4.01..11.64 rows=2
width=69) (actual time=0.135..0.138 rows=2 loops=1)
 Recheck Cond: ((batteryidentifier = 1177470) OR
(batteryidentifier = 1177469))
 -  BitmapOr  (cost=4.01..4.01 rows=2 width=0)
(actual time=0.106..0.106 rows=0 loops=1)
   -  Bitmap Index Scan on ix_battery_id
(cost=0.00..2.00 rows=1 width=0) (actual time=0.091..0.091 rows=1
loops=1)
 Index Cond: (batteryidentifier =
1177470)
   -  Bitmap Index Scan on ix_battery_id
(cost=0.00..2.00 rows=1 width=0) (actual time=0.011..0.011 rows=1
loops=1)
 Index Cond: (batteryidentifier =
1177469)
   -  Index Scan using ix_obresults_bat on
observationresults  (cost=0.00..4682.40 rows=1362 width=76) (actual
time=0.047..0.091 rows=15 loops=2)
 Index Cond: (outer.batteryidentifier =
observationresults.batteryidentifier)
 -  Index Scan using ix_obcomment_obid on observationcomment
(cost=0.00..227.73 rows=327 width=63) (actual time=0.013..0.013 rows=0
loops=30)
   Index Cond: (outer.observationidentifier =
observationcomment.observationidentifier)
   -  Bitmap Heap Scan on batterycomment  (cost=1.02..9.84 rows=6
width=40) (actual time=0.007..0.007 rows=0 loops=30)
 Recheck Cond: (outer.batteryidentifier =
batterycomment.batteryidentifier)
 -  Bitmap Index Scan on ix_batcomment  (cost=0.00..1.02 rows=6
width=0) (actual time=0.005..0.005 rows=0 loops=30)
   Index Cond: (outer.batteryidentifier =
batterycomment.batteryidentifier)
 Total runtime: 1.585 ms


explain analyze select * from battery join observationresults on
battery.batteryidentifier = observationresults.batteryidentifier left
outer join observationcomment on
observationresults.observationidentifier =
observationcomment.observationidentifier left outer join batterycomment
on battery.batteryidentifier=batterycomment.batteryidentifier  where
battery.batteryidentifier = 1177470;
 
QUERY PLAN




 Hash Left Join  (cost=4733.62..269304.43 rows=1348 width=248) (actual
time=19275.506..19275.568 rows=9 loops=1)
   Hash Cond: (outer.batteryidentifier = inner.batteryidentifier)
   -  Merge Right Join  (cost=4723.75..269287.81 rows=1348 width=208)
(actual time=19275.432..19275.473 rows=9 loops=1)
 Merge Cond: (outer.observationidentifier =
inner.observationidentifier)
 -  Index Scan using ix_obcomment_obid on observationcomment
(cost=0.00..245841.14 rows=7484253 width=63) (actual
time=0.094..13403.300 rows=4361601 loops=1)
 -  Sort  (cost=4723.75..4727.12 rows=1348 width=145) (actual
time=0.270..0.278 rows=9 loops=1)
   Sort Key: observationresults.observationidentifier
   -  Nested Loop  (cost=0.00..4653.67 rows=1348 width=145)
(actual time=0.166..0.215 rows=9 loops=1)
 -  Index Scan using ix_battery_id on battery
(cost=0.00..5.81 rows=1 width=69) (actual time=0.079..0.082 rows=1
loops=1)
   Index Cond: (batteryidentifier = 1177470)
 -  Index Scan using ix_obresults_bat on
observationresults  (cost=0.00..4634.38 rows=1348 width=76) (actual
time=0.079..0.102 rows=9 loops=1)
   Index Cond: (1177470 = batteryidentifier)
   -  Hash  (cost=9.85..9.85 rows=6 width=40) (actual time=0.039..0.039
rows=0 loops=1)
 -  Bitmap Heap Scan on batterycomment  (cost=1.02..9.85 rows=6
width=40) (actual time=0.037..0.037 rows=0 loops=1)
   Recheck Cond: (batteryidentifier = 1177470)
   -  Bitmap Index Scan on ix_batcomment  (cost=0.00..1.02
rows=6 width=0) (actual time=0.032..0.032 rows=0 loops=1)
   

Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-13 Thread Ron

At 01:49 PM 12/13/2006, Bucky Jordan wrote:

I've only seen pg_bench numbers  2,000 tps on either really large 
hardware (none of the above mentioned comes close) or the results 
are in memory due to a small database size (aka measuring update contention).

Which makes a laptop achieving such numbers all the more interesting IMHO.


Just a guess, but these tests (compiler opts.) seem like they 
sometimes show a benefit where the database is mostly in RAM (which 
I'd guess many people have) since that would cause more work to be 
put on the CPU/Memory subsystems.
The cases where the working set, or the performance critical part of 
the working set, of the DB is RAM resident are very important ones ITRW.



Other people on the list hinted at this, but I share their 
hypothesis that once you get IO involved as a bottleneck (which is a 
more typical DB situation) you won't notice compiler options.
Certainly makes intuitive sense.  OTOH, this list has seen discussion 
of what should be IO bound operations being CPU bound.  Evidently due 
to the expense of processing pg datastructures.  Only objective 
benches are going to tell us where the various limitations on pg 
performance really are.



I've got a 2 socket x 2 core woodcrest poweredge 2950 with a BBC 6 
disk RAID I'll run some tests on as soon as I get a chance.


I'm also thinking for this test, there's no need to tweak the 
default config other than maybe checkpoint_segments, since I don't 
really want postgres using large amounts of RAM (all that does is 
require me to build a larger test DB).


Daniel's orginal system had 512MB RAM.  This suggests to me that 
tests involving 256MB of pg memory should be plenty big enough.




Thoughts?

Hope they are useful.

Ron Peacetree 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-13 Thread Guido Neitzer

On 13.12.2006, at 19:03, Ron wrote:

What I find interesting is that so far Guido's C2D Mac laptop has  
gotten the highest values by far in this set of experiments, and no  
one else is even close.


This might be the case because I have tested with fsync=off as my  
internal harddrive would be a limiting factor and the results  
wouldn't be really helpful. Perhaps it's still the IO system, I don't  
know. I can try to reproduce the tests as close as possible again.  
Perhaps I had different settings on something but I doubt that.


The new Core * CPUs from Intel are extremely fast with PostgreSQL.

Anyone got a 2.33 GHz C2D box with a decent HD IO subsystem more  
representative of a typical DB server hooked up to it?


I have also now an Xserve with two Dual-Core Xeons and two SAS drives  
(15k Seagates) in a mirrored RAID here. Will do some testing tomorrow.


Btw: I always compare only to my own results to have something  
comparable - same test, same scripts, same db version, same operating  
system and so on. The rest is just pure interest.


cug

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Optimizing a query

2006-12-13 Thread Tom Lane
James Cloos [EMAIL PROTECTED] writes:
 ... and was doing queries of the form:
 | select * from nanpa where npa=775 and nxx=413;

If those are char(3) columns, shouldn't you be quoting the constants?

select * from nanpa where npa='775' and nxx='413';

Any non-numeric input will fail entirely without the quotes, and I'm
also not too confident that inputs of less than three digits will work
as you expect (the blank-padding might not match what's in the table).
Leading zeroes would be troublesome too.

OTOH, if the keys are and always will be small integers, it's just
stupid not to be storing them as integers --- integer comparison
is far faster than string.

Postgres' data type capabilities are exceptional.  Learn to work with
them, not against them --- that means thinking about what the data
really is and declaring it appropriately.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] strange query behavior

2006-12-13 Thread Tim Jones
Version 8.1

Here are the planner constraints I believe we changed
effective_cache_size and random_page_cost
BTW this is an AIX 5.2 

#---

# QUERY TUNING
#---


# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

#effective_cache_size = 1   # typically 8KB each
effective_cache_size = 40
random_page_cost = 3.8  # units are one sequential page fetch
# cost
#cpu_tuple_cost = 0.01  # (same)
#cpu_index_tuple_cost = 0.001   # (same)
#cpu_operator_cost = 0.0025 # (same)

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5# range 1-10
#geqo_pool_size = 0 # selects default based on effort
#geqo_generations = 0   # selects default based on effort
#geqo_selection_bias = 2.0  # range 1.5-2.0

# - Other Planner Options -

#default_statistics_target = 10 # range 1-1000
#constraint_exclusion = off
#from_collapse_limit = 8
#join_collapse_limit = 8# 1 disables collapsing of explicit
# JOINs


Thanks

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 13, 2006 4:59 PM
To: Tim Jones
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] strange query behavior 

Tim Jones [EMAIL PROTECTED] writes:
 The tables for theses queries are vacuumed and analyzed regularly. I 
 just did an analyze to be sure and here are the results ...

There's something pretty wacko about the choice of plan in the slow case
--- I don't see why it'd not have used the same plan structure as for
the IN case.  It's coming up with a cost a lot higher than for the
other, so it certainly knows this isn't a great plan ...

Which PG version is this exactly?  Are you running with any nondefault
planner parameters?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Optimizing a query

2006-12-13 Thread James Cloos
 Husam == Tomeh, Husam [EMAIL PROTECTED] writes:

Husam Have you run vacuum/analyze on the table?

Yes, back when I first noticed how slow it was.
It did not make any difference.

explain analyze says:

,
| n=# explain analyse  select * from nanpa where npa=775 and nxx=473;
|QUERY PLAN 
  
| 

|  Seq Scan on nanpa  (cost=0.00..5344.60 rows=4 width=105) (actual 
time=371.718..516.816 rows=1 loops=1)
|Filter: (((npa)::text = '775'::text) AND ((nxx)::text = '473'::text))
|  Total runtime: 516.909 ms
| (3 rows)
`

vs:

,
| n=# explain analyse  select * from nanpa where npa=775::char and 
nxx=473::char;
|   QUERY PLAN  

| 
--
|  Index Scan using nanpa_pkey on nanpa  (cost=0.00..4.33 rows=1 width=105) 
(actual time=64.831..64.831 rows=0 loops=1)
|Index Cond: ((npa = '7'::bpchar) AND (nxx = '4'::bpchar))
|  Total runtime: 64.927 ms
| (3 rows)
`

BTW, I forgot to mention I'm at 8.1.4 on that box.

-JimC
-- 
James Cloos [EMAIL PROTECTED] OpenPGP: 1024D/ED7DAEA6

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] strange query behavior

2006-12-13 Thread Tom Lane
Tim Jones [EMAIL PROTECTED] writes:
 Which PG version is this exactly?  Are you running with any nondefault
 planner parameters?

 Version 8.1

8.1.what?

 Here are the planner constraints I believe we changed
 effective_cache_size and random_page_cost

Those look reasonably harmless.

My best bet at the moment is that you've got a pretty early 8.1.x
release and are hitting one of the planner bugs that we fixed earlier
this year.  Not enough info to say for sure though.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] strange query behavior

2006-12-13 Thread Tim Jones
Looks like 8.1.2

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 13, 2006 5:37 PM
To: Tim Jones
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] strange query behavior 

Tim Jones [EMAIL PROTECTED] writes:
 Which PG version is this exactly?  Are you running with any 
 nondefault planner parameters?

 Version 8.1

8.1.what?

 Here are the planner constraints I believe we changed 
 effective_cache_size and random_page_cost

Those look reasonably harmless.

My best bet at the moment is that you've got a pretty early 8.1.x
release and are hitting one of the planner bugs that we fixed earlier
this year.  Not enough info to say for sure though.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] strange query behavior

2006-12-13 Thread Tim Jones
That's what I did and got 8.1.2 ... do you want gcc version etc 3.3.2
powerpc aix5.2

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-Original Message-
From: Matthew O'Connor [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 13, 2006 5:51 PM
To: Tim Jones
Subject: Re: [PERFORM] strange query behavior

 From psql perform: select version();
and send us that output.

Tim Jones wrote:
 Looks like 8.1.2
 
 Tim Jones
 Healthcare Project Manager
 Optio Software, Inc.
 (770) 576-3555
 
 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, December 13, 2006 5:37 PM
 To: Tim Jones
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] strange query behavior
 
 Tim Jones [EMAIL PROTECTED] writes:
 Which PG version is this exactly?  Are you running with any 
 nondefault planner parameters?
 
 Version 8.1
 
 8.1.what?
 
 Here are the planner constraints I believe we changed 
 effective_cache_size and random_page_cost
 
 Those look reasonably harmless.
 
 My best bet at the moment is that you've got a pretty early 8.1.x 
 release and are hitting one of the planner bugs that we fixed earlier 
 this year.  Not enough info to say for sure though.
 
   regards, tom lane
 
 ---(end of 
 broadcast)---
 TIP 5: don't forget to increase your free space map settings
 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Optimizing a query

2006-12-13 Thread Tomeh, Husam

Your nap and nxx columns have character datatype, so you should use
quotes. Try:

explain analyze  select * from nanpa where npa='775' and
nxx='473'; 

If that does not work, you could try to influence the planner's
execution plan to favor index scans over sequential scan by tweaking a
couple of the postgres parameters, particularly, the
effective_cache_size. This parameter primarily set the planner's
estimates of the relative likelihood of a particular table or index
being in memory, and will thus have a significant effect on whether the
planner chooses indexes over seqscans. Tweaking such parameters are
usually done as a last resort.

--
  Husam 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of James Cloos
Sent: Wednesday, December 13, 2006 2:35 PM
To: Tomeh, Husam
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Optimizing a query

 Husam == Tomeh, Husam [EMAIL PROTECTED] writes:

Husam Have you run vacuum/analyze on the table?

Yes, back when I first noticed how slow it was.
It did not make any difference.

explain analyze says:

,
| n=# explain analyse  select * from nanpa where npa=775 and nxx=473;
|QUERY PLAN

|


|  Seq Scan on nanpa  (cost=0.00..5344.60 rows=4 width=105) (actual
time=371.718..516.816 rows=1 loops=1)
|Filter: (((npa)::text = '775'::text) AND ((nxx)::text =
'473'::text))
|  Total runtime: 516.909 ms
| (3 rows)
`

vs:

,
| n=# explain analyse  select * from nanpa where npa=775::char and
nxx=473::char;
|   QUERY PLAN

|

--
|  Index Scan using nanpa_pkey on nanpa  (cost=0.00..4.33 rows=1
width=105) (actual time=64.831..64.831 rows=0 loops=1)
|Index Cond: ((npa = '7'::bpchar) AND (nxx = '4'::bpchar))
|  Total runtime: 64.927 ms
| (3 rows)
`

BTW, I forgot to mention I'm at 8.1.4 on that box.

-JimC
-- 
James Cloos [EMAIL PROTECTED] OpenPGP: 1024D/ED7DAEA6

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq
**
This message contains confidential information intended only for the use of the 
addressee(s) named above and may contain information that is legally 
privileged.  If you are not the addressee, or the person responsible for 
delivering it to the addressee, you are hereby notified that reading, 
disseminating, distributing or copying this message is strictly prohibited.  If 
you have received this message by mistake, please immediately notify us by 
replying to the message and delete the original message immediately thereafter.

Thank you.

   FADLD Tag
**


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Insertion to temp table deteriorating over time

2006-12-13 Thread Steven Flatt

After running some further standalone tests using temp tables, I'm not
convinced the problem is specific to temp table usage.  In fact it looks
like generic SQL activity degrades over time.

Having said that, what kinds of things should I be looking for that could
deteriorate/bloat over time?  Ordinarily the culprit might be infrequent
vacuuming or analyzing, but that wouldn't be corrected by a restart of
Postgres.  In our case, restarting Postgres gives us a huge performance
improvement (for a short while, anyways).

By the way, we are using PG 7.4.6 on FreeBSD 5.30.0170.  The temp table has
15 columns: a timestamp, a double, and the rest integers.  It has no
indexes.

Thanks,
Steve


On 12/13/06, Tom Lane [EMAIL PROTECTED] wrote:


Steven Flatt [EMAIL PROTECTED] writes:
 Any idea where the bloat is happening?  I believe that if we were
dropping
 and re-creating the temp table over and over, that could result in
pg_class
 bloat (among other catalog tables), but what is going wrong if we use
the
 same table over and over and truncate it?

That seems very strange --- I too would have expected a TRUNCATE to
bring you back to ground zero performance-wise.  I wonder whether the
issue is not directly related to the temp table but is just some generic
resource leakage problem in a very long-running backend.  Have you
checked to see if the backend process bloats memory-wise, or perhaps has
a huge number of files open (I wonder if it could be leaking open file
handles to the successive generations of the temp table)?  Are you sure
that the slowdown is specific to inserts into the temp table, as opposed
to generic SQL activity?

Also, which PG version is this exactly (7.4 is not specific enough)?
On what platform?  Can you show us the full schema definition for the
temp table and any indexes on it?

   regards, tom lane



Re: [PERFORM] strange query behavior

2006-12-13 Thread Tom Lane
Tim Jones [EMAIL PROTECTED] writes:
 [ explain results ]

As best I can see, the problem is with the estimate of the size of the
inner join: for two keys we have

 -  Nested Loop  (cost=4.01..9410.49 rows=13 width=145) (actual 
time=0.227..0.416 rows=30 loops=1)
   -  Bitmap Heap Scan on battery  (cost=4.01..11.64 rows=2 
width=69) (actual time=0.135..0.138 rows=2 loops=1)
 Recheck Cond: ((batteryidentifier = 1177470) OR 
(batteryidentifier = 1177469))
 -  BitmapOr  (cost=4.01..4.01 rows=2 width=0) (actual 
time=0.106..0.106 rows=0 loops=1)
   -  Bitmap Index Scan on ix_battery_id 
(cost=0.00..2.00 rows=1 width=0) (actual time=0.091..0.091 rows=1 loops=1)
 Index Cond: (batteryidentifier = 1177470)
   -  Bitmap Index Scan on ix_battery_id 
(cost=0.00..2.00 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=1)
 Index Cond: (batteryidentifier = 1177469)
   -  Index Scan using ix_obresults_bat on observationresults  
(cost=0.00..4682.40 rows=1362 width=76) (actual time=0.047..0.091 rows=15 
loops=2)
 Index Cond: (outer.batteryidentifier = 
observationresults.batteryidentifier)

but for one key we have

   -  Nested Loop  (cost=0.00..4653.67 rows=1348 width=145) 
(actual time=0.166..0.215 rows=9 loops=1)
 -  Index Scan using ix_battery_id on battery 
(cost=0.00..5.81 rows=1 width=69) (actual time=0.079..0.082 rows=1 loops=1)
   Index Cond: (batteryidentifier = 1177470)
 -  Index Scan using ix_obresults_bat on 
observationresults  (cost=0.00..4634.38 rows=1348 width=76) (actual 
time=0.079..0.102 rows=9 loops=1)
   Index Cond: (1177470 = batteryidentifier)

The large rowcount estimate makes it back off to a non-nestloop
plan for the outer joins, and in this situation that's a loser.

I'm actually not sure why they're not both too high --- with the
rowcount estimate of 1362 for the inner scan in the first example, you'd
expect about twice that for the join result.  But the immediate problem
is that in the case where it knows exactly what batteryidentifier is
being probed for, it's still off by more than a factor of 100 on the
rowcount estimate for observationresults.  How many rows in
observationresults, and may we see the pg_stats entry for
observationresults.batteryidentifier?

It's likely that the answer for you will be raise the statistics target
for observationresults and re-ANALYZE, but I'd like to gather more info
about what's going wrong first.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Insertion to temp table deteriorating over time

2006-12-13 Thread Tom Lane
Steven Flatt [EMAIL PROTECTED] writes:
 Having said that, what kinds of things should I be looking for that could
 deteriorate/bloat over time?  Ordinarily the culprit might be infrequent
 vacuuming or analyzing, but that wouldn't be corrected by a restart of
 Postgres.  In our case, restarting Postgres gives us a huge performance
 improvement (for a short while, anyways).

Do you actually need to restart the postmaster, or is just starting a
fresh session (fresh backend) sufficient?  And again, have you monitored
the backend process to see if it's bloating memory-wise or open-file-wise?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Insertion to temp table deteriorating over time

2006-12-13 Thread Tom Lane
Steven Flatt [EMAIL PROTECTED] writes:
 Having said that, what kinds of things should I be looking for that could
 deteriorate/bloat over time?  Ordinarily the culprit might be infrequent
 vacuuming or analyzing, but that wouldn't be corrected by a restart of
 Postgres.  In our case, restarting Postgres gives us a huge performance
 improvement (for a short while, anyways).

 By the way, we are using PG 7.4.6 on FreeBSD 5.30.0170.  The temp table has
 15 columns: a timestamp, a double, and the rest integers.  It has no
 indexes.

Hm, *are* you vacuuming only infrequently?  In particular, what is your
maintenance policy for pg_class?

Some experimentation with TRUNCATE and VACUUM VERBOSE shows that in 7.4,
a TRUNCATE of a temp table with no indexes and no toast table generates
three dead row versions in pg_class.  (I'm surprised that it's as many
as three, but in any case the TRUNCATE would certainly have to do one
update of the table's pg_class entry and thereby generate one dead row
version.)

If you're being sloppy about vacuuming pg_class, then over time the
repeated-truncate pattern would build up a huge number of dead rows
in pg_class, *all with the same OID*.  It's unsurprising that this
would create some slowness in looking up the temp table's pg_class
entry.

If this theory is correct, the reason that starting a fresh backend
makes it fast again is that the new backend creates a whole new temp
table with a new OID assigned, and so the adjacent litter in pg_class
doesn't matter anymore (or not so much anyway).

Solution would be to institute regular vacuuming of the system
catalogs...

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-13 Thread Bruce Momjian
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  While skimming over the pgbench source it has looked to me like it's
  necessary to pass the -s switch (scale factor) to both the
  initialization (-i) and the subsequent (non -i) runs.
 
 No, it's not supposed to be, and I've never found it needed in practice.
 The code seems able to pull the scale out of the database (I forget how
 it figures it out exactly).

pgbench is designed to be a general benchmark, meanining it exercises
all parts of the system.  I am thinking just reexecuting a single SELECT
over and over again would be a better test of the CPU optimizations.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-13 Thread Rajesh Kumar Mallah

[offtopic];
hmm quite a long thread below is stats of posting
Total Messages:87Total Participants: 27
-
19 Daniel van Ham Colchete
12 Michael Stone
 9 Ron
 5 Steinar H. Gunderson
 5 Alexander Staubo
 4 Tom Lane
 4 Greg Smith
 3 Luke Lonergan
 3 Christopher Browne
 2 Merlin Moncure
 2 Guido Neitzer
 2 Dave Cramer
 2 Cosimo Streppone
 2 Bucky Jordan
 1 Tatsuo Ishii
 1 Sven Geisler
 1 Shane Ambler
 1 Michael Glaesemann
 1 Mark Kirkwood
 1 Gene
 1 Florian Weimer
 1 David Boreham
 1 Craig A. James
 1 Chris Browne
 1 Brad Nicholson
 1 Bill Moran
 1 Alvaro Herrera
---

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-13 Thread Josh Berkus
Bruce,

 pgbench is designed to be a general benchmark, meanining it exercises
 all parts of the system.  I am thinking just reexecuting a single SELECT
 over and over again would be a better test of the CPU optimizations.

Mostly, though, pgbench just gives the I/O system a workout.  It's not a 
really good general workload.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-13 Thread Joshua D. Drake
On Wed, 2006-12-13 at 18:36 -0800, Josh Berkus wrote:
 Bruce,
 
  pgbench is designed to be a general benchmark, meanining it exercises
  all parts of the system.  I am thinking just reexecuting a single SELECT
  over and over again would be a better test of the CPU optimizations.
 
 Mostly, though, pgbench just gives the I/O system a workout.  It's not a 
 really good general workload.

It also will not utilize all cpus on a many cpu machine. We recently
found that the only way to *really* test with pgbench was to actually
run 4+ copies of pgbench at the same time.

J


 
-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-13 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 On Wed, 2006-12-13 at 18:36 -0800, Josh Berkus wrote:
 Mostly, though, pgbench just gives the I/O system a workout.  It's not a 
 really good general workload.

 It also will not utilize all cpus on a many cpu machine. We recently
 found that the only way to *really* test with pgbench was to actually
 run 4+ copies of pgbench at the same time.

The pgbench app itself becomes the bottleneck at high transaction
rates.  Awhile back I rewrote it to improve its ability to issue
commands concurrently, but then desisted from submitting the
changes --- if we change the app like that, future numbers would
be incomparable to past ones, which sort of defeats the purpose of a
benchmark no?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-13 Thread Michael Glaesemann


On Dec 14, 2006, at 14:44 , Tom Lane wrote:


The pgbench app itself becomes the bottleneck at high transaction
rates.  Awhile back I rewrote it to improve its ability to issue
commands concurrently, but then desisted from submitting the
changes --- if we change the app like that, future numbers would
be incomparable to past ones, which sort of defeats the purpose of a
benchmark no?


At the same time, if the current pgbench isn't the tool we want to  
use, is this kind of backward comparison going to hinder any move to  
improve it? It sounds like there's quite a bit of room for  
improvement in pg_bench, and in my opinion we should move forward to  
make an improved tool, one that measures what we want to measure. And  
while comparison with past results might not be possible, there  
remains the possibility of rerunning the improved pgbench on previous  
systems, I should think.


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] File Systems Compared

2006-12-13 Thread Jim Nasby

On Dec 11, 2006, at 12:54 PM, Bruno Wolff III wrote:

On Wed, Dec 06, 2006 at 08:55:14 -0800,
  Mark Lewis [EMAIL PROTECTED] wrote:

Anyone run their RAIDs with disk caches enabled, or is this akin to
having fsync off?


Disk write caches are basically always akin to having fsync off.  The
only time a write-cache is (more or less) safe to enable is when  
it is

backed by a battery or in some other way made non-volatile.

So a RAID controller with a battery-backed write cache can enable its
own write cache, but can't safely enable the write-caches on the disk
drives it manages.


This appears to be changing under Linux. Recent kernels have write  
barriers
implemented using cache flush commands (which some drives ignore,  
so you
need to be careful). In very recent kernels, software raid using  
raid 1
will also handle write barriers. To get this feature, you are  
supposed to
mount ext3 file systems with the barrier=1 option. For other file  
systems,

the parameter may need to be different.


But would that actually provide a meaningful benefit? When you  
COMMIT, the WAL data must hit non-volatile storage of some kind,  
which without a BBU or something similar, means hitting the platter.  
So I don't see how enabling the disk cache will help, unless of  
course it's ignoring fsync.


Now, I have heard something about drives using their stored  
rotational energy to flush out the cache... but I tend to suspect  
urban legend there...

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-13 Thread Ron

Benchmarks, like any other SW, need modernizing and updating from time to time.

Given the multi-core CPU approach to higher performance as the 
current fad in CPU architecture, we need a benchmark that is appropriate.


If SPEC feels it is appropriate to rev their benchmark suite 
regularly, we probably should as well.


Ron Peacetree

At 12:44 AM 12/14/2006, Tom Lane wrote:

Joshua D. Drake [EMAIL PROTECTED] writes:
 On Wed, 2006-12-13 at 18:36 -0800, Josh Berkus wrote:
 Mostly, though, pgbench just gives the I/O system a workout.  It's not a
 really good general workload.

 It also will not utilize all cpus on a many cpu machine. We recently
 found that the only way to *really* test with pgbench was to actually
 run 4+ copies of pgbench at the same time.

The pgbench app itself becomes the bottleneck at high transaction
rates.  Awhile back I rewrote it to improve its ability to issue
commands concurrently, but then desisted from submitting the
changes --- if we change the app like that, future numbers would
be incomparable to past ones, which sort of defeats the purpose of a
benchmark no?



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-13 Thread Gregory S. Williamson
(Re)-Design it to do both, unless there's reason to believe that doing one 
after the other would skew the results.

Then old results are available, new results are also visible and useful for 
future comparisons. And seeing them side by side mught be an interesting 
exercise as well, at least for a while.

(sorry for top-posting -- web based interface that doesn't do proper quoting)

Greg Williamson
DBA
GlobeXplorer LLC

-Original Message-
From:   [EMAIL PROTECTED] on behalf of Michael Glaesemann
Sent:   Wed 12/13/2006 10:11 PM
To: Tom Lane
Cc: Joshua D. Drake; Josh Berkus; pgsql-performance@postgresql.org; Bruce 
Momjian; Alvaro Herrera; Alexander Staubo; Michael Stone
Subject:Re: [PERFORM] New to PostgreSQL, performance considerations 


On Dec 14, 2006, at 14:44 , Tom Lane wrote:

 The pgbench app itself becomes the bottleneck at high transaction
 rates.  Awhile back I rewrote it to improve its ability to issue
 commands concurrently, but then desisted from submitting the
 changes --- if we change the app like that, future numbers would
 be incomparable to past ones, which sort of defeats the purpose of a
 benchmark no?

At the same time, if the current pgbench isn't the tool we want to  
use, is this kind of backward comparison going to hinder any move to  
improve it? It sounds like there's quite a bit of room for  
improvement in pg_bench, and in my opinion we should move forward to  
make an improved tool, one that measures what we want to measure. And  
while comparison with past results might not be possible, there  
remains the possibility of rerunning the improved pgbench on previous  
systems, I should think.

Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


---
Click link below if it is SPAM [EMAIL PROTECTED]
https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=4580ea76236074356172766[EMAIL
 PROTECTED]retrain=spamtemplate=historyhistory_page=1
!DSPAM:4580ea76236074356172766!
---






---(end of broadcast)---
TIP 6: explain analyze is your friend