Re: [PERFORM] Caching of Queries

2004-09-23 Thread Gaetano Mendola
Neil Conway wrote:
Another idea would be to improve the quality of the plan we generate at PREPARE time:
for instance you could generate 'n' plans for various combinations of
input parameters, and then choose the best query plan at EXECUTE time.
It's a difficult problem to solve, however (consider multiple parameters
to PREPARE, for example).
Do you mean store different plans for each different histogram segment ?
Regards
Gaetano Mendola
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] O_DIRECT setting

2004-09-23 Thread Bruce Momjian

TODO has:

* Consider use of open/fcntl(O_DIRECT) to minimize OS caching

Should the item be removed?

---

Neil Conway wrote:
 On Mon, 2004-09-20 at 17:57, Guy Thornley wrote:
  According to the manpage, O_DIRECT implies O_SYNC:
  
  File I/O is done directly to/from user space buffers.  The I/O is
  synchronous, i.e., at the completion of the read(2) or write(2)
  system call, data is guaranteed to have been transferred.
 
 This seems like it would be a rather large net loss. PostgreSQL already
 structures writes so that the writes we need to hit disk immediately
 (WAL records) are fsync()'ed -- the kernel is given more freedom to
 schedule how other writes are flushed from the cache. Also, my
 recollection is that O_DIRECT also disables readahead -- if that's
 correct, that's not what we want either.
 
 BTW, using O_DIRECT has been discussed a few times in the past. Have you
 checked the list archives? (for both -performance and -hackers)
 
  Would people be interested in a performance benchmark?
 
 Sure -- I'd definitely be curious, although as I said I'm skeptical it's
 a win.
 
  I need some benchmark tips :)
 
 Some people have noted that it can be difficult to use contrib/pgbench
 to get reproducible results -- you might want to look at Jan's TPC-W
 implementation or the OSDL database benchmarks:
 
 http://pgfoundry.org/projects/tpc-w-php/
 http://www.osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/
 
  Incidentally, postgres heap files suffer really, really bad fragmentation,
  which affects sequential scan operations (VACUUM, ANALYZE, REINDEX ...)
  quite drastically. We have in-house patches that somewhat alleiviate this,
  but they are not release quality.
 
 Can you elaborate on these in-house patches?
 
 -Neil
 
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Caching of Queries

2004-09-23 Thread Harald Fuchs
In article [EMAIL PROTECTED],
Scott Kirkwood [EMAIL PROTECTED] writes:

 I couldn't find anything in the docs or in the mailing list on this,
 but it is something that Oracle appears to do as does MySQL.
 The idea, I believe, is to do a quick (hash) string lookup of the
 query and if it's exactly the same as another query that has been done
 recently to re-use the old parse tree.

That's not was MySQL is doing.  MySQL caches not the query plan, but
the result set for the (hashed) query string.  If the same query comes
again, it is not executed at all (unless one of the tables involved
have been changed meanwhile).


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )

2004-09-23 Thread Gaetano Mendola
Tom Lane wrote:
Greg Stark [EMAIL PROTECTED] writes:
No, postgres didn't do things in reverse order. It hashed the empty table and
then went ahead and checked every record of the non-empty table against the
empty hash table.

Reading the code there's no check for this, and it seems like it would be a
useful low-cost little optimization.

Yeah, I was just looking at doing that.
It would also be interesting to prefetch one row from the outer table and fall
out immediately (without building the hash table) if the outer table is
empty.  This seems to require some contortion of the code though :-(

I think postgres normally hashes the table it thinks is smaller,

Right, it will prefer to put the physically smaller table (estimated
width*rows) on the inside.
Do you plan to do a patch for the 7.4, so I'll wait for a 7.4.6 ( that IIRC have 
already
two important patches pending ) or is 8.0 stuff ?

Regards
Gaetano Mendola

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] O_DIRECT setting

2004-09-23 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 TODO has:
   * Consider use of open/fcntl(O_DIRECT) to minimize OS caching
 Should the item be removed?

I think it's fine ;-) ... it says consider it, not do it.  The point
is that we could do with more research in this area, even if O_DIRECT
per se is not useful.  Maybe you could generalize the entry to
investigate ways of fine-tuning OS caching behavior.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Caching of Queries

2004-09-23 Thread Mr Pink
Not knowing anything about the internals of pg, I don't know how this relates, but in 
theory, 
query plan caching is not just about saving time re-planning queries, it's about 
scalability.
Optimizing queries requires shared locks on the database metadata, which, as I 
understand it
causes contention and serialization, which kills scalability. 

I read this thread from last to first, and I'm not sure if I missed something, but if 
pg isnt
caching plans, then I would say plan caching should be a top priority for future 
enhancements. It
needn't be complex either: if the SQL string is the same, and none of the tables 
involved in the
query have changed (in structure), then re-use the cached plan. Basically, DDL and 
updated
statistics would have to invalidate plans for affected tables. 

Preferably, it should work equally for prepared statements and those not pre-prepared. 
If you're
not using prepare (and bind variables) though, your plan caching down the drain 
anyway...

I don't think that re-optimizing based on values of bind variables is needed. It seems 
like it
could actually be counter-productive and difficult to asses it's impact.

That's the way I see it anyway.

:)

--- Scott Kirkwood [EMAIL PROTECTED] wrote:

 I couldn't find anything in the docs or in the mailing list on this,
 but it is something that Oracle appears to do as does MySQL.
 The idea, I believe, is to do a quick (hash) string lookup of the
 query and if it's exactly the same as another query that has been done
 recently to re-use the old parse tree.
 It should save the time of doing the parsing of the SQL and looking up
 the object in the system tables.
 It should probably go through the planner again because values passed
 as parameters may have changed. Although, for extra points it could
 look at the previous query plan as a hint.
 On the surface it looks like an easy enhancement, but what do I know?
 I suppose it would benefit mostly those programs that use a lot of
 PQexecParams() with simple queries where a greater percentage of the
 time is spent parsing the SQL rather than building the execute plan.
 What do you think?
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings
 




___
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] SAN performance

2004-09-23 Thread Mr Pink
Hi,

I expect you mean RAID 1/0 or 1+0 since the CX300 didn't support RAID 10 last time I 
looked.

Whether you are using a SAN or not, you should consider putting the WAL files (pg_xlog 
folder) on
seperate diskes from the DB. Since the log files are mostly written to, not read from 
you could
just use RAID 1. 

It's a pity pg doesn't have a way to use a cluster of servers to get the most out of 
your
expensive SAN.

I read a comment earlier about setting block sizes to 8k to math pg's block size. 
Seems to make
sense, you should check it out.

Have fun,
Mr Pink

--- Anjan Dave [EMAIL PROTECTED] wrote:

 Hello,
 
  
 
 I'll be moving a DB from internal RAID-10 SCSI storage to an EMC CX300
 FC RAID-10 LUN, bound to the host. I've setup a test host machine and a
 test LUN. The /var/lib/pgsql/data folder is sym-linked to a partition on
 the LUN. 
 
  
 
 Other than the shared_buffers, effective cache size, and sort memory, I
 am not sure if I need to change any other parameters in the
 postgresql.conf file for getting maximum performance from the EMC box.
 
  
 
 Is there a general guideline for setting up postgres database and the
 tunable parameters on a SAN, especially for EMC?
 
  
 
 Appreciate any help,
 
  
 
 Thanks,
 Anjan
 
 





__
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail 

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] vacuum full max_fsm_pages question

2004-09-23 Thread Patrick Hatcher

I upgraded to 7.4.3 this morning and
did a vacuum full analyze on the problem table and now the indexes show
the correct number of records


Patrick Hatcher
Macys.Com






Josh Berkus [EMAIL PROTECTED]

Sent by: [EMAIL PROTECTED]
09/21/04 10:49 AM




To
Patrick Hatcher
[EMAIL PROTECTED]


cc
Robert Treat
[EMAIL PROTECTED], [EMAIL PROTECTED]


Subject
Re: [PERFORM] vacuum full
 max_fsm_pages question








Patrick,

 Sorry. I wrote PG 7.4.2 and then I erased it to write something
else and
 then forgot to add it back.

Odd. You shouldn't be having to re-vacuum on 7.4.

 And thanks for the Page info. I was getting frustrated and looked
in the
 wrong place.

 So it's probably best to drop and readd the indexes then?

Well, I have to wonder if you've not run afoul of the known 7.4.2 bug 
regarding indexes.  This system hasn't had an improper database shutdown
or 
power-out in the last few weeks, has it?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match



Re: [PERFORM] Caching of Queries

2004-09-23 Thread Jason Coene
I'm not an expert, but I've been hunting down a killer performance problem
for a while now.  It seems this may be the cause.

At peak load, our database slows to a trickle.  The CPU and disk utilization
are normal - 20-30% used CPU and disk performance good.

All of our postgres processes end up in the semwai state - seemingly
waiting on other queries to complete.  If the system isn't taxed in CPU or
disk, I have a good feeling that this may be the cause.  I didn't know that
planning queries could create such a gridlock, but based on Mr Pink's
explanation, it sounds like a very real possibility.

We're running on SELECT's, and the number of locks on our high traffic
tables grows to the hundreds.  If it's not the SELECT locking (and we don't
get that many INSERT/UPDATE on these tables), could the planner be doing it?

At peak load (~ 1000 queries/sec on highest traffic table, all very
similar), the serialized queries pile up and essentially create a DoS on our
service - requiring a restart of the PG daemon.  Upon stop  start, it's
back to normal.

I've looked at PREPARE, but apparently it only lasts per-session - that's
worthless in our case (web based service, one connection per data-requiring
connection).

Does this sound plausible?  Is there an alternative way to do this that I
don't know about?  Additionally, in our case, I personally don't see any
downside to caching and using the same query plan when the only thing
substituted are variables.  In fact, I'd imagine it would help performance
significantly in high-volume web applications.

Thanks,

Jason

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-performance-
 [EMAIL PROTECTED] On Behalf Of Mr Pink
 Sent: Thursday, September 23, 2004 11:29 AM
 To: Scott Kirkwood; [EMAIL PROTECTED]
 Subject: Re: [PERFORM] Caching of Queries
 
 Not knowing anything about the internals of pg, I don't know how this
 relates, but in theory,
 query plan caching is not just about saving time re-planning queries, it's
 about scalability.
 Optimizing queries requires shared locks on the database metadata, which,
 as I understand it
 causes contention and serialization, which kills scalability.
 
 I read this thread from last to first, and I'm not sure if I missed
 something, but if pg isnt
 caching plans, then I would say plan caching should be a top priority for
 future enhancements. It
 needn't be complex either: if the SQL string is the same, and none of the
 tables involved in the
 query have changed (in structure), then re-use the cached plan. Basically,
 DDL and updated
 statistics would have to invalidate plans for affected tables.
 
 Preferably, it should work equally for prepared statements and those not
 pre-prepared. If you're
 not using prepare (and bind variables) though, your plan caching down the
 drain anyway...
 
 I don't think that re-optimizing based on values of bind variables is
 needed. It seems like it
 could actually be counter-productive and difficult to asses it's impact.
 
 That's the way I see it anyway.
 
 :)
 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Caching of Queries

2004-09-23 Thread Tom Lane
Jason Coene [EMAIL PROTECTED] writes:
 All of our postgres processes end up in the semwai state - seemingly
 waiting on other queries to complete.  If the system isn't taxed in CPU or
 disk, I have a good feeling that this may be the cause.

Whatever that is, I'll bet lunch that it's got 0 to do with caching
query plans.  Can you get stack tracebacks from some of the stuck
processes?  What do they show in ps?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Caching of Queries

2004-09-23 Thread Gaetano Mendola
Jason Coene wrote:
I'm not an expert, but I've been hunting down a killer performance problem
for a while now.  It seems this may be the cause.
At peak load, our database slows to a trickle.  The CPU and disk utilization
are normal - 20-30% used CPU and disk performance good.
For a peak load 20-30% used CPU this mean you reached your IO bottleneck.
All of our postgres processes end up in the semwai state - seemingly
waiting on other queries to complete.  If the system isn't taxed in CPU or
disk, I have a good feeling that this may be the cause.  I didn't know that
planning queries could create such a gridlock, but based on Mr Pink's
explanation, it sounds like a very real possibility.
We're running on SELECT's, and the number of locks on our high traffic
tables grows to the hundreds.  If it's not the SELECT locking (and we don't
get that many INSERT/UPDATE on these tables), could the planner be doing it?
At peak load (~ 1000 queries/sec on highest traffic table, all very
similar), the serialized queries pile up and essentially create a DoS on our
service - requiring a restart of the PG daemon.  Upon stop  start, it's
back to normal.
Give us informations on this queries, a explain analyze could be a good start
point.
I've looked at PREPARE, but apparently it only lasts per-session - that's
worthless in our case (web based service, one connection per data-requiring
connection).
Trust me the PREPARE is not doing miracle in shenarios like yours . If you use 
postgres
in a web service environment what you can use is a connection pool ( look for pgpoll 
IIRC ),
if you use a CMS then try to enable the cache in order to avoid to hit the DB for each
request.

Regards
Gaetano Mendola
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Caching of Queries

2004-09-23 Thread jason.servetar
Scott: 

We have seen similar issues when we have had massive load on our web
server. My determination was that simply the act of spawning and
stopping postgres sessions was very heavy on the box, and by
implementing connection pooling (sqlrelay), we got much higher
throughput, and better response on the server then we would get any
other way. 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jason Coene
Sent: Thursday, September 23, 2004 10:53 AM
To: 'Mr Pink'; 'Scott Kirkwood'
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] Caching of Queries

I'm not an expert, but I've been hunting down a killer performance
problem
for a while now.  It seems this may be the cause.

At peak load, our database slows to a trickle.  The CPU and disk
utilization
are normal - 20-30% used CPU and disk performance good.

All of our postgres processes end up in the semwai state - seemingly
waiting on other queries to complete.  If the system isn't taxed in CPU
or
disk, I have a good feeling that this may be the cause.  I didn't know
that
planning queries could create such a gridlock, but based on Mr Pink's
explanation, it sounds like a very real possibility.

We're running on SELECT's, and the number of locks on our high traffic
tables grows to the hundreds.  If it's not the SELECT locking (and we
don't
get that many INSERT/UPDATE on these tables), could the planner be doing
it?

At peak load (~ 1000 queries/sec on highest traffic table, all very
similar), the serialized queries pile up and essentially create a DoS on
our
service - requiring a restart of the PG daemon.  Upon stop  start, it's
back to normal.

I've looked at PREPARE, but apparently it only lasts per-session -
that's
worthless in our case (web based service, one connection per
data-requiring
connection).

Does this sound plausible?  Is there an alternative way to do this that
I
don't know about?  Additionally, in our case, I personally don't see any
downside to caching and using the same query plan when the only thing
substituted are variables.  In fact, I'd imagine it would help
performance
significantly in high-volume web applications.

Thanks,

Jason

 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:pgsql-performance-
 [EMAIL PROTECTED] On Behalf Of Mr Pink
 Sent: Thursday, September 23, 2004 11:29 AM
 To: Scott Kirkwood; [EMAIL PROTECTED]
 Subject: Re: [PERFORM] Caching of Queries
 
 Not knowing anything about the internals of pg, I don't know how this
 relates, but in theory,
 query plan caching is not just about saving time re-planning queries,
it's
 about scalability.
 Optimizing queries requires shared locks on the database metadata,
which,
 as I understand it
 causes contention and serialization, which kills scalability.
 
 I read this thread from last to first, and I'm not sure if I missed
 something, but if pg isnt
 caching plans, then I would say plan caching should be a top priority
for
 future enhancements. It
 needn't be complex either: if the SQL string is the same, and none of
the
 tables involved in the
 query have changed (in structure), then re-use the cached plan.
Basically,
 DDL and updated
 statistics would have to invalidate plans for affected tables.
 
 Preferably, it should work equally for prepared statements and those
not
 pre-prepared. If you're
 not using prepare (and bind variables) though, your plan caching down
the
 drain anyway...
 
 I don't think that re-optimizing based on values of bind variables is
 needed. It seems like it
 could actually be counter-productive and difficult to asses it's
impact.
 
 That's the way I see it anyway.
 
 :)
 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Caching of Queries

2004-09-23 Thread Jason Coene
Hi Tom,

Easily recreated with Apache benchmark, ab -n 3 -c 3000
http://webserver .  This runs 1 query per page, everything else is cached
on webserver. 

The lone query:

SELECT 
id, 
gameid, 
forumid, 
subject 
  FROM threads 
  WHERE nuked = 0 
  ORDER BY nuked DESC, 
lastpost DESC LIMIT 8

Limit  (cost=0.00..1.99 rows=8 width=39) (actual time=27.865..28.027 rows=8
loops=1)
   -  Index Scan Backward using threads_ix_nuked_lastpost on threads
(cost=0.0 0..16824.36 rows=67511 width=39) (actual time=27.856..27.989
rows=8 loops=1)
 Filter: (nuked = 0)
 Total runtime: 28.175 ms

I'm not sure how I go about getting the stack traceback you need.  Any info
on this?  Results of ps below.  System is dual xeon 2.6, 2gb ram, hardware
raid 10 running FreeBSD 5.2.1.

Jason

last pid: 96094;  load averages:  0.22,  0.35,  0.38
up 19+20:50:37  13:10:45
161 processes: 2 running, 151 sleeping, 8 lock
CPU states: 12.2% user,  0.0% nice, 16.9% system,  1.6% interrupt, 69.4%
idle
Mem: 120M Active, 1544M Inact, 194M Wired, 62M Cache, 112M Buf, 2996K Free
Swap: 4096M Total, 4096M Free

  PID USERNAME PRI NICE   SIZERES STATE  C   TIME   WCPUCPU COMMAND
50557 pgsql 980 95276K  4860K select 0  24:00  0.59%  0.59% postgres
95969 pgsql  40 96048K 34272K sbwait 0   0:00  2.10%  0.29% postgres
95977 pgsql -40 96048K 29620K semwai 2   0:00  1.40%  0.20% postgres
96017 pgsql  40 96048K 34280K sbwait 0   0:00  2.05%  0.20% postgres
95976 pgsql -40 96048K 30564K semwai 3   0:00  1.05%  0.15% postgres
95970 pgsql -40 96048K 24404K semwai 1   0:00  1.05%  0.15% postgres
95972 pgsql -40 96048K 21060K semwai 1   0:00  1.05%  0.15% postgres
96053 pgsql -40 96048K 24140K semwai 3   0:00  1.54%  0.15% postgres
96024 pgsql -40 96048K 22192K semwai 3   0:00  1.54%  0.15% postgres
95985 pgsql -40 96048K 15208K semwai 3   0:00  1.54%  0.15% postgres
96033 pgsql 980 95992K  7812K *Giant 2   0:00  1.54%  0.15% postgres
95973 pgsql -40 96048K 30936K semwai 3   0:00  0.70%  0.10% postgres
95966 pgsql  40 96048K 34272K sbwait 0   0:00  0.70%  0.10% postgres
95983 pgsql  40 96048K 34272K sbwait 2   0:00  1.03%  0.10% postgres
95962 pgsql  40 96048K 34268K sbwait 2   0:00  0.70%  0.10% postgres
95968 pgsql -40 96048K 26232K semwai 2   0:00  0.70%  0.10% postgres
95959 pgsql  40 96048K 34268K sbwait 2   0:00  0.70%  0.10% postgres

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Thursday, September 23, 2004 1:06 PM
 To: Jason Coene
 Cc: 'Mr Pink'; 'Scott Kirkwood'; [EMAIL PROTECTED]
 Subject: Re: [PERFORM] Caching of Queries
 
 Jason Coene [EMAIL PROTECTED] writes:
  All of our postgres processes end up in the semwai state - seemingly
  waiting on other queries to complete.  If the system isn't taxed in CPU
 or
  disk, I have a good feeling that this may be the cause.
 
 Whatever that is, I'll bet lunch that it's got 0 to do with caching
 query plans.  Can you get stack tracebacks from some of the stuck
 processes?  What do they show in ps?
 
   regards, tom lane


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Caching of Queries

2004-09-23 Thread Greg Stark

Jason Coene [EMAIL PROTECTED] writes:

 All of our postgres processes end up in the semwai state - seemingly
 waiting on other queries to complete.  If the system isn't taxed in CPU or
 disk, I have a good feeling that this may be the cause.  

Well, it's possible contention of some sort is an issue but it's not clear
that it's planning related contention.

 We're running on SELECT's, and the number of locks on our high traffic
 tables grows to the hundreds.  

Where are you seeing this? What information do you have about these locks?

 I've looked at PREPARE, but apparently it only lasts per-session - that's
 worthless in our case (web based service, one connection per data-requiring
 connection).

Well the connection time in postgres is pretty quick. But a lot of other
things, including prepared queries but also including other factors are a lot
more effective if you have long-lived sessions.

I would strongly recommend you consider some sort of persistent database
connection for your application. Most web based services run queries from a
single source base where all the queries are written in-house. In that
situation you can ensure that one request never leaves the session in an
unusual state (like setting guc variables strangely, or leaving a transaction
open, or whatever).

That saves you the reconnect time, which as I said is actually small, but
could still be contributing to your problem. I think it also makes the buffer
cache more effective as well. And It also means you can prepare all your
queries and reuse them on subsequent requests.

The nice thing about web based services is that while each page only executes
each query once, you tend to get the same pages over and over thousands of
times. So if they prepare their queries the first time around they can reuse
those prepared queries thousands of times.

Using a text cache of the query string on the server side is just a
work-around for failing to do that on the client side. It's much more
efficient and more flexible to do it on the client-side.

-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Caching of Queries

2004-09-23 Thread Tom Lane
Jason Coene [EMAIL PROTECTED] writes:
 I'm not sure how I go about getting the stack traceback you need.  Any info
 on this?  Results of ps below.  System is dual xeon 2.6, 2gb ram, hardware
 raid 10 running FreeBSD 5.2.1.

Hmm.  Dual Xeon sets off alarm bells ...

I think you are probably looking at the same problem previously reported
by Josh Berkus among others.  Does the rate of context swaps shown by
vmstat go through the roof when this happens?  If you strace or ktrace
one of the backends, do you see lots of semop()s and little else?

Check the archives for this thread among others:
http://archives.postgresql.org/pgsql-performance/2004-04/msg00249.php
The test case you are talking about is a tight indexscan loop, which
is pretty much the same scenario as here:
http://archives.postgresql.org/pgsql-performance/2004-04/msg00280.php

The fundamental problem is heavy contention for access to a shared data
structure.  We're still looking for good solutions, but in the context
of this thread it's worth pointing out that a shared query-plan cache
would itself be subject to heavy contention, and arguably would make
this sort of problem worse not better.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Caching of Queries

2004-09-23 Thread Matt Clark
 I've looked at PREPARE, but apparently it only lasts 
 per-session - that's worthless in our case (web based 
 service, one connection per data-requiring connection).

That's a non-sequitur.  Most 'normal' high volume web apps have persistent
DB connections, one per http server process.  Are you really dropping DB
connections and reconnecting each time a new HTTP request comes in?

M


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Caching of Queries

2004-09-23 Thread Gaetano Mendola
Jason Coene wrote:
Hi Tom,
Easily recreated with Apache benchmark, ab -n 3 -c 3000
http://webserver .  This runs 1 query per page, everything else is cached
on webserver. 
That test require 3 access with 3000 connections that is not a normal
load. Describe us your HW.
3000 connections means a very huge load, may you provide also the result of
vmstat 5 my webserver trash already with -c 120 !
how many connection your postgres can manage ?
You have to consider to use a connection pool with that ammount of connections.
Regards
Gaetano Mendola
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Caching of Queries

2004-09-23 Thread Markus Schaber
Hi, Jason,

On Thu, 23 Sep 2004 12:53:25 -0400
Jason Coene [EMAIL PROTECTED] wrote:

 I've looked at PREPARE, but apparently it only lasts per-session - that's
 worthless in our case (web based service, one connection per data-requiring
 connection).

This sounds like the loads of connection init and close may be the
reason for the slowdown. Can you use connection pooling in your service?

HTH,
Markus


-- 
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:[EMAIL PROTECTED] | www.logi-track.com

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] SAN performance

2004-09-23 Thread Anjan Dave
I believe 1/0 or 1+0 is aka RAID-10. CX300 doesn't support 0+1.
 
So far i am aware of two things, the cache page size is 8KB (can be increased or 
decreased), and the stripe element size of 128 sectors default.
 
Thanks,
Anjan

-Original Message- 
From: Mr Pink [mailto:[EMAIL PROTECTED] 
Sent: Thu 9/23/2004 11:39 AM 
To: Anjan Dave; [EMAIL PROTECTED] 
Cc: 
Subject: Re: [PERFORM] SAN performance



Hi, 

I expect you mean RAID 1/0 or 1+0 since the CX300 didn't support RAID 10 last 
time I looked. 

Whether you are using a SAN or not, you should consider putting the WAL files 
(pg_xlog folder) on 
seperate diskes from the DB. Since the log files are mostly written to, not 
read from you could 
just use RAID 1. 

It's a pity pg doesn't have a way to use a cluster of servers to get the most 
out of your 
expensive SAN. 

I read a comment earlier about setting block sizes to 8k to math pg's block 
size. Seems to make 
sense, you should check it out. 

Have fun, 
Mr Pink 

--- Anjan Dave [EMAIL PROTECTED] wrote: 

 Hello, 
 
  
 
 I'll be moving a DB from internal RAID-10 SCSI storage to an EMC CX300 
 FC RAID-10 LUN, bound to the host. I've setup a test host machine and a 
 test LUN. The /var/lib/pgsql/data folder is sym-linked to a partition on 
 the LUN. 
 
  
 
 Other than the shared_buffers, effective cache size, and sort memory, I 
 am not sure if I need to change any other parameters in the 
 postgresql.conf file for getting maximum performance from the EMC box. 
 
  
 
 Is there a general guideline for setting up postgres database and the 
 tunable parameters on a SAN, especially for EMC? 
 
  
 
 Appreciate any help, 
 
  
 
 Thanks, 
 Anjan 
 
 





__ 
Do you Yahoo!? 
New and Improved Yahoo! Mail - 100MB free storage! 
http://promotions.yahoo.com/new_mail 


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Caching of Queries

2004-09-23 Thread Josh Berkus
Tom,

 I think you are probably looking at the same problem previously reported
 by Josh Berkus among others.  Does the rate of context swaps shown by
 vmstat go through the roof when this happens?  If you strace or ktrace
 one of the backends, do you see lots of semop()s and little else?

That would be interesting.   Previously we've only demonstrated the problem on 
long-running queries, but I suppose it could also affect massive concurrent 
query access.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Hyper threading?

2004-09-23 Thread Greg Copeland
On Tue, 2004-09-21 at 03:54, Mariusz Czuada wrote:
 Hi all,
 
 I searched list archives, but did not found anything about HT of Pentium 
 4/Xeon processors. I wonder if hyperthreading can boost or decrease 
 performance. AFAIK for other commercial servers (msssql, oracle) official 
 documents state something like faster, but not always, so probably slower, 
 unless faster. User opinions are generaly more clear: better swhitch off HT.
 
 Do you have any experiance or test results regarding hyperthreading? Or what 
 additional conditions can make HT useful or pointless?
 

I think you'll find that HT is very sensitive to both the OS and the
application.  Generally speaking, most consider HT to actually slow
things down, unless you can prove that your OS/application combination
is faster with HT enabled.  Last I heard, most vendors specifically
disable HT in the BIOS because the defacto is to expect HT to inflict a
negative performance hit.

IIRC, one of critical paths for good HT performance is an OS that
understands how to schedule processes in a HT friendly manner (as in,
doesn't push processes from a virtual CPU to a different physical CPU,
etc).  Secondly, applications which experience a lot of bad branch
predictions tend to do well.  I don't recall what impact SSE
instructions have on the pipeline; but memory seems to recall that
applications which use a lot of SSE may be more HT friendly.  At any
rate, the notion is, if you are HT'ing, and one application/thread
requires the pipeline to be flushed, the other HT'ing thread is free to
run while the new branch is populating cache, etc.  Thusly, you get a
performance gain for the other thread when the CPU makes a bad guess.

Along these lines, I understand that Intel is planning better HT
implementation in the future, but as a general rule, people simply
expect too much from the current HT implementations.  Accordingly, for
most applications, performance generally suffers because they don't tend
to fall into the corner cases where HT helps.

Long story short, the general rule is, slower unless you having proven
it to be faster.


Cheers,

-- 
Greg Copeland, Owner
[EMAIL PROTECTED]
Copeland Computer Consulting
940.206.8004



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Caching of Queries

2004-09-23 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 I think you are probably looking at the same problem previously reported
 by Josh Berkus among others.

 That would be interesting.  Previously we've only demonstrated the
 problem on long-running queries, but I suppose it could also affect
 massive concurrent query access.

Well, the test cases we used were designed to get the system into a
tight loop of grabbing and releasing shared buffers --- a long-running
index scan is certainly one of the best ways to do that, but there are
others.

I hadn't focused before on the point that Jason is launching a new
connection for every query.  In that scenario I think the bulk of the
cycles are going to go into loading the per-backend catalog caches with
the system catalog rows that are needed to parse and plan the query.
The catalog fetches to get those rows are effectively mini-queries
with preset indexscan plans, so it's not hard to believe that they'd be
hitting the BufMgrLock nearly as hard as a tight indexscan loop.  Once
all the pages needed are cached in shared buffers, there's no I/O delays
to break the loop, and so you could indeed get into the context swap
storm regime we saw before.

I concur with the thought that using persistent connections might go a
long way towards alleviating his problem.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Caching of Queries

2004-09-23 Thread Jason Coene
Hi All,

It does sound like we should be pooling connections somehow.  I'll be
looking at implementing that shortly.  I'd really like to understand what
the actual problem is, though.

Sorry, I meant 30,000 with 300 connections - not 3,000.  The 300 connections
/ second is realistic, if not underestimated.  As is the nature of our site
(realtime information about online gaming), there's a huge fan base and as a
big upset happens, we'll do 50,000 page views in a span of 3-5 minutes.

I get the same results with:

ab -n 1 -c 150 http://www.gotfrag.com/portal/news/

I've attached results from the above test, showing open locks, top output,
and vmstat 5.

Tom, I've run the test described in:

http://archives.postgresql.org/pgsql-performance/2004-04/msg00280.php

Results attached in mptest.txt.  The box did experience the same problems as
we've seen before.  I ran it under a separate database (test), and it still
caused our other queries to slow significantly from our production database
(gf) - semwait again.

It does look like the cs column under CPU (which I'd assume is Context
Swap) does bump up significantly (10x or better) during both my ab test, and
the test you suggested in that archived message.

Reading the first thread you pointed out (2004-04/msg00249.php), Josh Berkus
was questioning the ServerWorks chipsets.  We're running on the Intel E7501
Chipset (MSI board).  Our CPU's are 2.66 GHz with 533MHz FSB, Hyperthreading
enabled.  Unfortunately, I don't have physical access to the machine to turn
HT off.


Thanks,

Jason



 -Original Message-
 From: Gaetano Mendola [mailto:[EMAIL PROTECTED]
 Sent: Thursday, September 23, 2004 1:41 PM
 To: Jason Coene
 Subject: Re: Caching of Queries
 
 Jason Coene wrote:
  Hi Tom,
 
  Easily recreated with Apache benchmark, ab -n 3 -c 3000
  http://webserver .  This runs 1 query per page, everything else is
 cached
  on webserver.
 
 That test require 3 access with 3000 connections that is not a normal
 load. Describe us your HW.
 
 3000 connections means a very huge load, may you provide also the result
 of
 vmstat 5 my webserver trash already with -c 120 !
 
 how many connection your postgres can manage ?
 
 You have to consider to use a connection pool with that ammount of
 connections.
 
 
 Regards
 Gaetano Mendola

gf=#  SELECT r.relname, l.mode, count(*) AS numlocks
   FROM pg_locks l, pg_class r
  WHERE r.oid = l.relation
  GROUP BY r.relname, l.mode
  ORDER BY count(*) DESC;

 relname  |  mode   | numlocks
--+-+--
 threads  | AccessShareLock |   63
 threads_ix_nuked_lastpost| AccessShareLock |   47
 threads_ix_nuked_gameid_lastpost | AccessShareLock |7
 pg_class | AccessShareLock |5
 pg_opclass_am_name_nsp_index | AccessShareLock |3
 pg_opclass   | AccessShareLock |3
 pg_class_oid_index   | AccessShareLock |3
 pg_type  | AccessShareLock |2
 pg_statistic | AccessShareLock |2
 pg_attribute | AccessShareLock |2
 pg_amop_opc_strategy_index   | AccessShareLock |2
 pg_attrdef   | AccessShareLock |2
 pg_trigger_tgrelid_tgname_index  | AccessShareLock |2
 pg_trigger   | AccessShareLock |2
 users| AccessShareLock |2
 pg_statistic_relid_att_index | AccessShareLock |2
 pg_type_oid_index| AccessShareLock |2
 pg_amop  | AccessShareLock |2
 pg_attribute_relid_attnum_index  | AccessShareLock |2
 comments | AccessShareLock |2
 pg_shadow| AccessShareLock |2
 acls | AccessShareLock |1
 pg_index_indexrelid_index| AccessShareLock |1
 pg_attrdef_adrelid_adnum_index   | AccessShareLock |1
 surveyresults_ix_userid  | AccessShareLock |1
 pg_cast  | AccessShareLock |1
 pg_shadow_usesysid_index | AccessShareLock |1
 pg_index | AccessShareLock |1
 games| AccessShareLock |1
 usersessions | AccessShareLock |1
 surveyoptions| AccessShareLock |1
 countries| AccessShareLock |1
 surveyresults| AccessShareLock |1
 vopenlocks   | AccessShareLock |1
 pg_class_relname_nsp_index   | AccessShareLock |1
 pg_namespace | AccessShareLock |1
 pg_database  | AccessShareLock |1
 surveys  | AccessShareLock |  

Re: [PERFORM] Caching of Queries

2004-09-23 Thread Jason Coene
Update:

I just tried running the same test (ab with 150 concurrent connections)
while connecting to postgres through 35 persistent connections (PHP
library), and had roughly the same type of results.  This should eliminate
the new connection overhead.  I've attached top and vmstat.  I let it run
until it had completed 800 requests.  Unless I'm missing something, there's
more than the new connection IO load here.

Jason

 -Original Message-
 From: Jason Coene [mailto:[EMAIL PROTECTED]
 Sent: Thursday, September 23, 2004 3:08 PM
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: RE: Caching of Queries
 
 Hi All,
 
 It does sound like we should be pooling connections somehow.  I'll be
 looking at implementing that shortly.  I'd really like to understand what
 the actual problem is, though.
 
 Sorry, I meant 30,000 with 300 connections - not 3,000.  The 300
 connections
 / second is realistic, if not underestimated.  As is the nature of our
 site
 (realtime information about online gaming), there's a huge fan base and as
 a
 big upset happens, we'll do 50,000 page views in a span of 3-5 minutes.
 
 I get the same results with:
 
 ab -n 1 -c 150 http://www.gotfrag.com/portal/news/
 
 I've attached results from the above test, showing open locks, top output,
 and vmstat 5.
 
 Tom, I've run the test described in:
 
 http://archives.postgresql.org/pgsql-performance/2004-04/msg00280.php
 
 Results attached in mptest.txt.  The box did experience the same problems
 as
 we've seen before.  I ran it under a separate database (test), and it
 still
 caused our other queries to slow significantly from our production
 database
 (gf) - semwait again.
 
 It does look like the cs column under CPU (which I'd assume is Context
 Swap) does bump up significantly (10x or better) during both my ab test,
 and
 the test you suggested in that archived message.
 
 Reading the first thread you pointed out (2004-04/msg00249.php), Josh
 Berkus
 was questioning the ServerWorks chipsets.  We're running on the Intel
 E7501
 Chipset (MSI board).  Our CPU's are 2.66 GHz with 533MHz FSB,
 Hyperthreading
 enabled.  Unfortunately, I don't have physical access to the machine to
 turn
 HT off.
 
 
 Thanks,
 
 Jason
 
 
 
  -Original Message-
  From: Gaetano Mendola [mailto:[EMAIL PROTECTED]
  Sent: Thursday, September 23, 2004 1:41 PM
  To: Jason Coene
  Subject: Re: Caching of Queries
 
  Jason Coene wrote:
   Hi Tom,
  
   Easily recreated with Apache benchmark, ab -n 3 -c 3000
   http://webserver .  This runs 1 query per page, everything else is
  cached
   on webserver.
 
  That test require 3 access with 3000 connections that is not a
 normal
  load. Describe us your HW.
 
  3000 connections means a very huge load, may you provide also the result
  of
  vmstat 5 my webserver trash already with -c 120 !
 
  how many connection your postgres can manage ?
 
  You have to consider to use a connection pool with that ammount of
  connections.
 
 
  Regards
  Gaetano Mendola

last pid: 48239;  load averages:  5.83,  2.43,  1.50   up 19+22:59:04  15:19:12
127 processes: 16 running, 111 sleeping
CPU states: 17.7% user,  0.0% nice, 20.0% system,  1.0% interrupt, 61.3% idle
Mem: 125M Active, 1456M Inact, 193M Wired, 96M Cache, 112M Buf, 54M Free
Swap: 4096M Total, 4096M Free

  PID USERNAME PRI NICE   SIZERES STATE  C   TIME   WCPUCPU COMMAND
48190 pgsql -40 97408K 86416K semwai 1   0:01  3.35%  1.32% postgres
47761 pgsql -40 96816K 56708K semwai 2   0:01  0.90%  0.88% postgres
47765 pgsql  40 96816K 56708K sbwait 3   0:01  0.90%  0.88% postgres
47754 pgsql -40 96816K 56708K semwai 2   0:01  0.85%  0.83% postgres
47763 pgsql -40 96816K 56708K semwai 0   0:01  0.85%  0.83% postgres
47741 pgsql  40 96816K 56708K sbwait 3   0:01  0.75%  0.73% postgres
47674 pgsql -40 96264K 38992K semwai 1   0:01  0.74%  0.73% postgres
47753 pgsql -40 96816K 56708K semwai 1   0:00  0.65%  0.63% postgres
48204 pgsql -40 96856K 46752K semwai 0   0:00  2.15%  0.63% postgres
47698 pgsql  40 96240K 37792K sbwait 3   0:01  0.59%  0.59% postgres
47757 pgsql -40 96816K 56708K semwai 3   0:01  0.60%  0.59% postgres
47740 pgsql  40 96240K 37768K sbwait 0   0:01  0.55%  0.54% postgres
47759 pgsql  40 96816K 56708K sbwait 0   0:01  0.50%  0.49% postgres
47735 pgsql -40 96240K 37772K semwai 0   0:00  0.50%  0.49% postgres
48223 pgsql -40 96984K 55980K semwai 1   0:00  2.69%  0.49% postgres
48102 pgsql  40 96136K 54956K sbwait 0   0:00  0.69%  0.44% postgres
47718 pgsql  40 96816K 56716K sbwait 1   0:01  0.40%  0.39% postgres
48225 pgsql1230 96272K 57156K RUN0   0:00  2.80%  0.39% postgres
48053 pgsql -40 96136K 55040K semwai 0   0:00  0.48%  0.34% postgres
48041 pgsql  40 96136K 54992K sbwait 1   0:00  0.47%  0.34% postgres
48222 pgsql -4

[PERFORM] Cleaning up indexes

2004-09-23 Thread Martin Foster
My database was converted from MySQL a while back and has maintained all 
of the indexes which were previously used.   Tt the time however, there 
were limitations on the way PostgreSQL handled the indexes compared to 
MySQL.

Meaning that under MySQL, it would make use of a multi-column index even 
if the rows within did not match.When the conversion was made more 
indexes were created overall to correct this and proceed with the 
conversion.

Now the time has come to clean up the used indexes.   Essentially, I 
want to know if there is a way in which to determine which indexes are 
being used and which are not.   This will allow me to drop off the 
unneeded ones and reduce database load as a result.

And have things changed as to allow for mismatched multi-column indexes 
in version 7.4.x or even the upcoming 8.0.x?

Martin Foster
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Cleaning up indexes

2004-09-23 Thread Gregory S. Williamson

If you have set up the postgres instance to write stats, the tables 
pg_stat_user_indexes, pg_statio_all_indexes and so (use the \dS option at the psql 
prompt to see these system tables); also check the pg_stat_user_tables table and 
similar beasts for information on total access, etc. Between these you can get a good 
idea of what indexes are not being used, and from the sequentail scan info on tables 
perhaps some idea of what may need some indexes.

HTH,

Greg Williamson
DBA
GlobeXplorer LLC 

-Original Message-
From:   Martin Foster [mailto:[EMAIL PROTECTED]
Sent:   Thu 9/23/2004 3:16 PM
To: [EMAIL PROTECTED]
Cc: 
Subject:[PERFORM] Cleaning up indexes
My database was converted from MySQL a while back and has maintained all 
of the indexes which were previously used.   Tt the time however, there 
were limitations on the way PostgreSQL handled the indexes compared to 
MySQL.

Meaning that under MySQL, it would make use of a multi-column index even 
if the rows within did not match.When the conversion was made more 
indexes were created overall to correct this and proceed with the 
conversion.

Now the time has come to clean up the used indexes.   Essentially, I 
want to know if there is a way in which to determine which indexes are 
being used and which are not.   This will allow me to drop off the 
unneeded ones and reduce database load as a result.

And have things changed as to allow for mismatched multi-column indexes 
in version 7.4.x or even the upcoming 8.0.x?

Martin Foster
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Large # of rows in query extremely slow, not using

2004-09-23 Thread Kris Jurka


On Tue, 14 Sep 2004, Stephen Crowley wrote:

 Problem solved.. I set the fetchSize to a reasonable value instead of
 the default of unlimited  in the PreparedStatement and now the query
 is . After some searching it seeems this is a common problem, would it
 make sense to change the default value to something other than 0 in
 the JDBC driver?

In the JDBC driver, setting the fetch size to a non-zero value means that 
the query will be run using what the frontend/backend protocol calls a 
named statement.  What this means on the backend is that the planner will 
not be able to use the values from the query parameters to generate the 
optimum query plan and must use generic placeholders and create a generic 
plan.  For this reason we have decided not to default to a non-zero 
fetch size.  This is something whose default value could be set by a URL 
parameter if you think that is something that is really required.

Kris Jurka


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Large # of rows in query extremely slow, not using index

2004-09-23 Thread Stephen Crowley
Thanks for the explanation. So what sort of changes need to be made to
the client/server protocol to fix this problem?



On Thu, 23 Sep 2004 18:22:15 -0500 (EST), Kris Jurka [EMAIL PROTECTED] wrote:
 
 
 On Tue, 14 Sep 2004, Stephen Crowley wrote:
 
  Problem solved.. I set the fetchSize to a reasonable value instead of
  the default of unlimited  in the PreparedStatement and now the query
  is . After some searching it seeems this is a common problem, would it
  make sense to change the default value to something other than 0 in
  the JDBC driver?
 
 In the JDBC driver, setting the fetch size to a non-zero value means that
 the query will be run using what the frontend/backend protocol calls a
 named statement.  What this means on the backend is that the planner will
 not be able to use the values from the query parameters to generate the
 optimum query plan and must use generic placeholders and create a generic
 plan.  For this reason we have decided not to default to a non-zero
 fetch size.  This is something whose default value could be set by a URL
 parameter if you think that is something that is really required.
 
 Kris Jurka
 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Large # of rows in query extremely slow, not using

2004-09-23 Thread Kris Jurka


On Thu, 23 Sep 2004, Stephen Crowley wrote:

 Thanks for the explanation. So what sort of changes need to be made to
 the client/server protocol to fix this problem?

The problem is that there is no way to indicate why you are using a 
particular statement in the extended query protocol.  For the JDBC driver 
there are two potential reasons, streaming a ResultSet and using a server 
prepared statement.  For the streaming as default case you desire there 
needs to be a way to indicate that you don't want to create a generic 
server prepared statement and that this query is really just for one time 
use, so it can generate the best plan possible.

Additionally you can only stream ResultSets that are of type FORWARD_ONLY.  
It would also be nice to be able to specify scrollability and holdability 
when creating a statement and the offset/direction when streaming data 
from a scrollable one.

Kris Jurka

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Caching of Queries

2004-09-23 Thread Josh Berkus
Jason,

 Sorry, I meant 30,000 with 300 connections - not 3,000.  The 300
 connections
 / second is realistic, if not underestimated.  As is the nature of
 our site
 (realtime information about online gaming), there's a huge fan base
 and as a
 big upset happens, we'll do 50,000 page views in a span of 3-5
 minutes.

First, your posts show no evidences of the CS storm bug.

Second, 300 *new* connections a second is a lot.   Each new connection
requires a significant amount of both database and OS overhead.   This
is why all the other web developers use a connection pool.

In fact, I wouldn't be surprised if your lockups are on the OS level,
even; I don't recall that you cited what OS you're using, but I can
imagine locking up Linux 2.4 trying to spawn 300 new processes a
second.

--Josh

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Caching of Queries

2004-09-23 Thread Joshua D. Drake






  
Sorry, I meant 30,000 with 300 connections - not 3,000.  The 300
connections
/ second is realistic, if not underestimated.  As is the nature of
our site
(realtime information about online gaming), there's a huge fan base
and as a
big upset happens, we'll do 50,000 page views in a span of 3-5
minutes.

  
  
First, your posts show no evidences of the CS storm bug.

Second, 300 *new* connections a second is a lot.   Each new connection
requires a significant amount of both database and OS overhead.   This
is why all the other web developers use a connection pool.

  

I would second this. You need to be running a connection pool and
probably multiple web servers in 
front of that. You are talking about a huge amount of connections in
that amount of time.

Josh Drake




  In fact, I wouldn't be surprised if your lockups are on the OS level,
even; I don't recall that you cited what OS you're using, but I can
imagine locking up Linux 2.4 trying to spawn 300 new processes a
second.

--Josh

---(end of broadcast)---
TIP 8: explain analyze is your friend
  



-- 
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL




Re: [PERFORM] Caching of Queries (now with pgpool)

2004-09-23 Thread Jason Coene
Hi Josh,

I just tried using pgpool to pool the connections, and ran:

ab -n 1000 -c 50 http://wstg.int/portal/news/

I ran some previous queries to get pgpool to pre-establish all the
connections, and ab ran for a few minutes (with one query per page, eek!).
It was still exhibiting the same problems as before.  While so many new
connections at once can surely make the problem worse (and pgpool will
surely help there), shouldn't this prove that it's not the only issue?

We're running FreeBSD 5.2.1

I've attached open locks, running queries, query plans, top output and
vmstat 5 output for while ab was running, from start to finish.

Any ideas?

Jason


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-performance-
 [EMAIL PROTECTED] On Behalf Of Josh Berkus
 Sent: Thursday, September 23, 2004 8:06 PM
 To: Jason Coene; [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: [PERFORM] Caching of Queries
 
 Jason,
 
  Sorry, I meant 30,000 with 300 connections - not 3,000.  The 300
  connections
  / second is realistic, if not underestimated.  As is the nature of
  our site
  (realtime information about online gaming), there's a huge fan base
  and as a
  big upset happens, we'll do 50,000 page views in a span of 3-5
  minutes.
 
 First, your posts show no evidences of the CS storm bug.
 
 Second, 300 *new* connections a second is a lot.   Each new connection
 requires a significant amount of both database and OS overhead.   This
 is why all the other web developers use a connection pool.
 
 In fact, I wouldn't be surprised if your lockups are on the OS level,
 even; I don't recall that you cited what OS you're using, but I can
 imagine locking up Linux 2.4 trying to spawn 300 new processes a
 second.
 
 --Josh
 
 ---(end of broadcast)---
 TIP 8: explain analyze is your friend
OPEN LOCKS:


gf=#  SELECT r.relname, l.mode, count(*) AS numlocks
   FROM pg_locks l, pg_class r
  WHERE r.oid = l.relation
  GROUP BY r.relname, l.mode
  ORDER BY count(*) DESC;

 relname  |  mode   | numlocks
--+
 threads  | AccessShareLock |   43
 threads_ix_nuked_lastpost| AccessShareLock |   35
 threads_ix_nuked_gameid_lastpost | AccessShareLock |7
 pg_attribute | AccessShareLock |1
 v_locks  | AccessShareLock |1
 pg_class | AccessShareLock |1
 usersessions | AccessShareLock |1
 countries| AccessShareLock |1
 users| AccessShareLock |1
 userstats_ix_id  | AccessShareLock |1
 pg_statistic_relid_att_index | AccessShareLock |1
 pg_attribute_relid_attnum_index  | AccessShareLock |1
 userstats| AccessShareLock |1
 demos| AccessShareLock |1
 pg_cast_source_target_index  | AccessShareLock |1
 pg_locks | AccessShareLock |1
 users_ix_id  | AccessShareLock |1
 buddies  | AccessShareLock |1
 buddies_ix_userid| AccessShareLock |1
 pg_cast  | AccessShareLock |1
 pg_statistic | AccessShareLock |1
(21 rows)

RUNNING QUERIES (AND HOW LONG FOR):

gf=# SELECT pg_stat_activity.usename, round(date_part('epoch'::text, now() - 
pg_stat_activity.query_start)) AS duration, pg_stat_activity.current_query
   FROM pg_stat_activity
  ORDER BY round(date_part('epoch'::text, now() - pg_stat_activity.query_start)) DESC;

 usename | duration |
   current_query


-+--+---


--
 gf  |4 | SELECT id, gameid, forumid, subject FROM threads WHERE nuked = 0 
AND gameid = 1 ORDER BY nuked DESC, gameid DESC, lastpost DESC LIMIT 8
 gf  |3 | SELECT id, gameid, forumid, subject FROM threads WHERE nuked = 0 
ORDER BY nuked DESC, lastpost DESC LIMIT 8
 gf  |3 | SELECT id, gameid, forumid, subject FROM threads WHERE nuked = 0 
ORDER BY nuked DESC, lastpost DESC LIMIT 8
 gf  |3 | SELECT id, gameid, forumid, subject FROM threads WHERE nuked = 0 
ORDER BY nuked DESC, lastpost DESC LIMIT 8
 gf  |3 | IDLE
 gf  |3 | SELECT id, gameid, forumid, subject FROM threads WHERE nuked = 0 
ORDER BY nuked DESC, lastpost DESC LIMIT 8
 gf  |3 | SELECT id, gameid, forumid, 

Re: [PERFORM] Caching of Queries (now with pgpool)

2004-09-23 Thread Michael Adler
On Thu, Sep 23, 2004 at 09:23:51PM -0400, Jason Coene wrote:
 I ran some previous queries to get pgpool to pre-establish all the
 connections, and ab ran for a few minutes (with one query per page, eek!).
 It was still exhibiting the same problems as before.  While so many new
 connections at once can surely make the problem worse (and pgpool will
 surely help there), shouldn't this prove that it's not the only issue?

 Any ideas?

Now that your connections are persistent, you may benefit from using
PREPAREd queries.

-Mike

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])