Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Mike Benoit
On Fri, 2004-08-06 at 23:18 +, Martin Foster wrote:
 Mike Benoit wrote:
 
  On Wed, 2004-08-04 at 17:25 +0200, Gaetano Mendola wrote:
  
  
 The queries themselves are simple, normally drawing information from one 
 table with few conditions or in the most complex cases using joins on 
 two table or sub queries.   These behave very well and always have, the 
 problem is that these queries take place in rather large amounts due to 
 the dumb nature of the scripts themselves.
 
 Show us the explain analyze on that queries, how many rows the tables are
 containing, the table schema could be also usefull.
 
  
  
  If the queries themselves are optimized as much as they can be, and as
  you say, its just the sheer amount of similar queries hitting the
  database, you could try using prepared queries for ones that are most
  often executed to eliminate some of the overhead. 
  
  I've had relatively good success with this in the past, and it doesn't
  take very much code modification.
  
 
 One of the biggest problems is most probably related to the indexes. 
 Since the performance penalty of logging the information needed to see 
 which queries are used and which are not is a slight problem, then I 
 cannot really make use of it for now.
 
 However, I am curious how one would go about preparing query?   Is this 
 similar to the DBI::Prepare statement with placeholders and simply 
 changing the values passed on execute?  Or is this something database 
 level such as a view et cetera?
 

Yes, always optimize your queries and GUC settings first and foremost.
Thats where you are likely to gain the most performance. After that if
you still want to push things even further I would try prepared queries.
I'm not familiar with DBI::Prepare at all, but I don't think its what
your looking for.

This is what you want:
http://www.postgresql.org/docs/current/static/sql-prepare.html


-- 
Mike Benoit [EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Jeff
On Aug 8, 2004, at 1:29 AM, Martin Foster wrote:
I am currently making use of Apache::DBI which overrides the 
DBI::disconnect call and keeps a pool of active connections for use 
when need be.   Since it offloads the pooling to the webserver, it 
seems more advantageous then pgpool which while being able to run on a 
external system is not adding another layer of complexity.

Apache::DBI is not the same sort of a pool as pgpool.  DB connections 
are not shared among all your apache children (A common misconception). 
 So if you have 300 apache kids you can have have 300 db connections.  
With pgpool connections are  shared among all of them so even though 
you have 300 kids you only have say 32 db connections.

Anyone had any experience with both Apache::DBI and pgpool?   For my 
needs they seem to do essentially the same thing, simply that one is 
invisible to the code while the other requires adding the complexity 
of a proxy.

Both are invisible to the app.  (With pgpool it thinks it is connecting 
to a regular old PG server)

And I've been running pgpool in production for months.  It just sits 
there.  Doesn't take much to set it up or configure it.  Works like a 
champ

--
Jeff Trout [EMAIL PROTECTED]
http://www.jefftrout.com/
http://www.stuarthamm.net/
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Jan Wieck
On 8/8/2004 8:10 AM, Jeff wrote:
On Aug 8, 2004, at 1:29 AM, Martin Foster wrote:
I am currently making use of Apache::DBI which overrides the 
DBI::disconnect call and keeps a pool of active connections for use 
when need be.   Since it offloads the pooling to the webserver, it 
seems more advantageous then pgpool which while being able to run on a 
external system is not adding another layer of complexity.

Apache::DBI is not the same sort of a pool as pgpool.  DB connections 
are not shared among all your apache children (A common misconception). 
  So if you have 300 apache kids you can have have 300 db connections.  
With pgpool connections are  shared among all of them so even though 
you have 300 kids you only have say 32 db connections.
And this is exactly where the pgpool advantage lies. Especially with the 
TPC-W, the Apache is serving a mix of PHP (or whatever CGI technique is 
used) and static content like images. Since the 200+ Apache kids serve 
any of that content by random and the emulated browsers very much 
encourage it to ramp up MaxClients children by using up to 4 concurrent 
image connections, one does end up with MaxClients DB connections that 
are all relatively low frequently used. In contrast to that the real 
pgpool causes lesser, more active DB connections, which is better for 
performance.



Anyone had any experience with both Apache::DBI and pgpool?   For my 
needs they seem to do essentially the same thing, simply that one is 
invisible to the code while the other requires adding the complexity 
of a proxy.

Both are invisible to the app.  (With pgpool it thinks it is connecting 
to a regular old PG server)

And I've been running pgpool in production for months.  It just sits 
there.  Doesn't take much to set it up or configure it.  Works like a 
champ
And it buys you some extra admin feature people like to forget about it. 
One can shut down one pool for one web application only. That gives you 
instant single user access to one database without shutting down the 
whole webserver or tempering with the pg_hba.conf file.

Jan
--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Matt Clark
 And this is exactly where the pgpool advantage lies. 
 Especially with the 
 TPC-W, the Apache is serving a mix of PHP (or whatever CGI 
 technique is 
 used) and static content like images. Since the 200+ Apache 
 kids serve 
 any of that content by random and the emulated browsers very much 
 encourage it to ramp up MaxClients children by using up to 4 
 concurrent 
 image connections, one does end up with MaxClients DB 
 connections that 
 are all relatively low frequently used. In contrast to that the real 
 pgpool causes lesser, more active DB connections, which is better for 
 performance.

There are two well-worn and very mature techniques for dealing with the
issue of web apps using one DB connection per apache process, both of which
work extremely well and attack the issue at its source.

1)  Use a front-end caching proxy like Squid as an accelerator.  Static
content will be served by the accelerator 99% of the time.  Additionally,
large pages can be served immediately to the accelerator by Apache, which
can then go on to serve another request without waiting for the end user's
dial-up connection to pull the data down.  Massive speedup, fewer apache
processes needed.

2)  Serve static content off an entirely separate apache server than the
dynamic content, but by using separate domains (e.g. 'static.foo.com').

Personally I favour number 1.  Our last biggish peak saw 6000 open HTTP and
HTTPS connections and only 200 apache children, all of them nice and busy,
not hanging around on street corners looking bored.  During quiet times
Apache drops back to its configured minimum of 40 kids.  Option 2 has the
advantage that you can use a leaner build for the 'dynamic' apache server,
but with RAM so plentiful these days that's a less useful property.

Basically this puts the 'pooling' back in the stateless HTTP area where it
truly belongs and can be proven not to have any peculiar side effects
(especially when it comes to transaction safety).  Even better, so long as
you use URL parameters for searches and the like, you can have the
accelerator cache those pages for a certain time too so long as slightly
stale results are OK.

I'm sure pgpool and the like have their place, but being band-aids for
poorly configured websites probably isn't the best use for them.

M


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Martin Foster
Jeff wrote:
On Aug 8, 2004, at 1:29 AM, Martin Foster wrote:
I am currently making use of Apache::DBI which overrides the 
DBI::disconnect call and keeps a pool of active connections for use 
when need be.   Since it offloads the pooling to the webserver, it 
seems more advantageous then pgpool which while being able to run on a 
external system is not adding another layer of complexity.

Apache::DBI is not the same sort of a pool as pgpool.  DB connections 
are not shared among all your apache children (A common misconception). 
 So if you have 300 apache kids you can have have 300 db connections.  
With pgpool connections are  shared among all of them so even though you 
have 300 kids you only have say 32 db connections.

Seems that you are right, never noticed that from the documentation 
before.   I always assumed it had something to do with the long 
lasting/persistent scripts that would remain in transactions for 
extended periods of time.

Here is an odd question.   While the server run 7.4.x, the client 
connects with 7.3.x.  Would this in itself make a difference in 
performance as the protocols are different?   At least based from 
pgpool's documentation.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Arjen van der Meijden
On 8-8-2004 16:29, Matt Clark wrote:
There are two well-worn and very mature techniques for dealing with the
issue of web apps using one DB connection per apache process, both of which
work extremely well and attack the issue at its source.
1)  Use a front-end caching proxy like Squid as an accelerator.  Static
content will be served by the accelerator 99% of the time.  Additionally,
large pages can be served immediately to the accelerator by Apache, which
can then go on to serve another request without waiting for the end user's
dial-up connection to pull the data down.  Massive speedup, fewer apache
processes needed.
Another version of this 1) is to run with a content accelerator; our 
favourite is to run Tux in front of Apache. It takes over the 
connection-handling stuff, has a very low memoryprofile (compared to 
Apache) and very little overhead. What it does, is to serve up all 
simple content (although you can have cgi/php/perl and other languages 
being processed by it, entirely disabling the need for apache in some 
cases) and forwards/proxies everything it doesn't understand to an 
Apache/other webserver running at the same machine (which runs on 
another port).

I think there are a few advantages over Squid; since it is partially 
done in kernel-space it can be slightly faster in serving up content, 
apart from its simplicity which will probably matter even more. You'll 
have no caching issues for pages that should not be cached or static 
files that change periodically (like every few seconds). Afaik Tux can 
handle more than 10 times as much ab-generated requests per second than 
a default-compiled Apache on the same machine.
And besides the speed-up, you can do any request you where able to do 
before, since Tux will simply forward it to Apache if it didn't 
understand it.

Anyway, apart from all that. Reducing the amount of apache-connections 
is nice, but not really the same as reducing the amount of 
pooled-connections using a db-pool... You may even be able to run with 
1000 http-connections, 40 apache-processes and 10 db-connections. In 
case of the non-pooled setup, you'd still have 40 db-connections.

In a simple test I did, I did feel pgpool had quite some overhead 
though. So it should be well tested, to find out where the 
turnover-point is where it will be a gain instead of a loss...

Best regards,
Arjen van der Meijden
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Martin Foster
Arjen van der Meijden wrote:
On 8-8-2004 16:29, Matt Clark wrote:
There are two well-worn and very mature techniques for dealing with the
issue of web apps using one DB connection per apache process, both of 
which
work extremely well and attack the issue at its source.

1)Use a front-end caching proxy like Squid as an accelerator.  Static
content will be served by the accelerator 99% of the time.  Additionally,
large pages can be served immediately to the accelerator by Apache, which
can then go on to serve another request without waiting for the end 
user's
dial-up connection to pull the data down.  Massive speedup, fewer apache
processes needed.

Another version of this 1) is to run with a content accelerator; our 
favourite is to run Tux in front of Apache. It takes over the 
connection-handling stuff, has a very low memoryprofile (compared to 
Apache) and very little overhead. What it does, is to serve up all 
simple content (although you can have cgi/php/perl and other languages 
being processed by it, entirely disabling the need for apache in some 
cases) and forwards/proxies everything it doesn't understand to an 
Apache/other webserver running at the same machine (which runs on 
another port).

I think there are a few advantages over Squid; since it is partially 
done in kernel-space it can be slightly faster in serving up content, 
apart from its simplicity which will probably matter even more. You'll 
have no caching issues for pages that should not be cached or static 
files that change periodically (like every few seconds). Afaik Tux can 
handle more than 10 times as much ab-generated requests per second than 
a default-compiled Apache on the same machine.
And besides the speed-up, you can do any request you where able to do 
before, since Tux will simply forward it to Apache if it didn't 
understand it.

Anyway, apart from all that. Reducing the amount of apache-connections 
is nice, but not really the same as reducing the amount of 
pooled-connections using a db-pool... You may even be able to run with 
1000 http-connections, 40 apache-processes and 10 db-connections. In 
case of the non-pooled setup, you'd still have 40 db-connections.

In a simple test I did, I did feel pgpool had quite some overhead 
though. So it should be well tested, to find out where the 
turnover-point is where it will be a gain instead of a loss...

Best regards,
Arjen van der Meijden
Other then images, there are very few static pages being loaded up by 
the user.Since they make up a very small portion of the traffic, it 
tends to be an optimization we can forgo for now.

I attempted to make use of pgpool.   At the default 32 connections 
pre-forked the webserver almost immediately tapped out the pgpool base 
and content stopped being served because no new processes were being 
forked to make up for it.

So I raised it to a higher value (256) and it immediately segfaulted and 
dropped the core.So not sure exactly how to proceed, since I rather 
need the thing to fork additional servers as load hits and not the other 
way around.

Unless I had it configured oddly, but it seems work differently then an 
Apache server would to handle content.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Tatsuo Ishii
 Jeff wrote:
  
  On Aug 8, 2004, at 1:29 AM, Martin Foster wrote:
  
  I am currently making use of Apache::DBI which overrides the 
  DBI::disconnect call and keeps a pool of active connections for use 
  when need be.   Since it offloads the pooling to the webserver, it 
  seems more advantageous then pgpool which while being able to run on a 
  external system is not adding another layer of complexity.
 
  
  Apache::DBI is not the same sort of a pool as pgpool.  DB connections 
  are not shared among all your apache children (A common misconception). 
   So if you have 300 apache kids you can have have 300 db connections.  
  With pgpool connections are  shared among all of them so even though you 
  have 300 kids you only have say 32 db connections.
  
 
 Seems that you are right, never noticed that from the documentation 
 before.   I always assumed it had something to do with the long 
 lasting/persistent scripts that would remain in transactions for 
 extended periods of time.
 
 Here is an odd question.   While the server run 7.4.x, the client 
 connects with 7.3.x.  Would this in itself make a difference in 
 performance as the protocols are different?   At least based from 
 pgpool's documentation.

In this case the server fall back from V3 protocol (employed in 7.4 or
later) to V2 protocol (employed in from 6.4 to 7.3.x). As far as
pgpool concerning, performance difference is significant. Of course
that depends on the implementation though.

FYI here is the outline of the testing using pgbench.

H/W: Pentium4 2.4GHz x2/memory 1GB/HDD IDE 80GB (all PCs are same spec)
S/W: RedHat Linux 9/PostgreSQL 7.3.6/7.4.3

postgresql.conf:
tcpip_socket = true
max_connections = 512
shared_buffers = 2048

host A: pgbench, host B: pgpool, host C: PostgreSQL 7.3.6 or 7.4.3
pgbench parameters: -S -c 10 -t 1000

result:
TPS ratio(7.4.3)   
 ratio(7.3.6)

without pgpool  4357.625059 100%   
 100%
with pgpool(connection pool mode)   4330.290294 99.4%  
 94.1%
with pgpool(replication mode)   4297.614996 98.6%  
 87.6%
with pgpoo(replication with strictmode) 4270.223136 98.0%  
 81.5%
--
Tatsuo Ishii

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Tatsuo Ishii
 Arjen van der Meijden wrote:
 
  On 8-8-2004 16:29, Matt Clark wrote:
  
  There are two well-worn and very mature techniques for dealing with the
  issue of web apps using one DB connection per apache process, both of 
  which
  work extremely well and attack the issue at its source.
 
  1)Use a front-end caching proxy like Squid as an accelerator.  Static
  content will be served by the accelerator 99% of the time.  Additionally,
  large pages can be served immediately to the accelerator by Apache, which
  can then go on to serve another request without waiting for the end 
  user's
  dial-up connection to pull the data down.  Massive speedup, fewer apache
  processes needed.
  
  
  Another version of this 1) is to run with a content accelerator; our 
  favourite is to run Tux in front of Apache. It takes over the 
  connection-handling stuff, has a very low memoryprofile (compared to 
  Apache) and very little overhead. What it does, is to serve up all 
  simple content (although you can have cgi/php/perl and other languages 
  being processed by it, entirely disabling the need for apache in some 
  cases) and forwards/proxies everything it doesn't understand to an 
  Apache/other webserver running at the same machine (which runs on 
  another port).
  
  I think there are a few advantages over Squid; since it is partially 
  done in kernel-space it can be slightly faster in serving up content, 
  apart from its simplicity which will probably matter even more. You'll 
  have no caching issues for pages that should not be cached or static 
  files that change periodically (like every few seconds). Afaik Tux can 
  handle more than 10 times as much ab-generated requests per second than 
  a default-compiled Apache on the same machine.
  And besides the speed-up, you can do any request you where able to do 
  before, since Tux will simply forward it to Apache if it didn't 
  understand it.
  
  Anyway, apart from all that. Reducing the amount of apache-connections 
  is nice, but not really the same as reducing the amount of 
  pooled-connections using a db-pool... You may even be able to run with 
  1000 http-connections, 40 apache-processes and 10 db-connections. In 
  case of the non-pooled setup, you'd still have 40 db-connections.
  
  In a simple test I did, I did feel pgpool had quite some overhead 
  though. So it should be well tested, to find out where the 
  turnover-point is where it will be a gain instead of a loss...

I don't know what were the configurations you are using, but I noticed
that UNIX domain sockets are preferred for the connection bwteen
clients and pgpool. When I tested using pgbench -C (involving
connection estblishing for each transaction),
with-pgpool-configuration 10 times faster than without-pgpool-conf if
using UNIX domain sockets, while there is only 3.6 times speed up with
TCP/IP sockets.

  Best regards,
  
  Arjen van der Meijden
  
 
 Other then images, there are very few static pages being loaded up by 
 the user.Since they make up a very small portion of the traffic, it 
 tends to be an optimization we can forgo for now.
 
 I attempted to make use of pgpool.   At the default 32 connections 
 pre-forked the webserver almost immediately tapped out the pgpool base 
 and content stopped being served because no new processes were being 
 forked to make up for it.
 
 So I raised it to a higher value (256) and it immediately segfaulted and 
 dropped the core.So not sure exactly how to proceed, since I rather 
 need the thing to fork additional servers as load hits and not the other 
 way around.

What version of pgpool did you test? I know that certain version
(actually 2.0.2) had such that problem. Can you try again with the
latest verison of pgpool? (it's 2.0.6).
--
Tatsuo Ishii

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] performance with column orders

2004-08-08 Thread Jim Thomason
I couldn't track down recent info in the archives, so I figured I'd ask here.

Does the order of columns still have an impact on table speed? Back in
the olden days, it used to be that fixed width columns (integer,
tinyint, etc.) should be the first (left) columns in the table and
variable width ones should be towards the end (right). This allowed
a database to line up the columns better on disk and give you a speed
boost.

So, does Postgres still care about it? And, if so, how much? The posts
I found were from 2 years ago, and indicated that there is a minor
increase, but not a lot. Incidentally, could anyone quantify that in
any fashion?

Thanks,

-Jim

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings