Re: [PERFORM] Performance bottleneck due to array manipulation

2015-08-24 Thread Genc , Ömer
Thanks a lot,

The mentioned advices helped me a lot. I used an approach similar to the one 
mentioned by Igor and Felix and now the stored procedure runs fast.

Kind regards,

From: 
pgsql-performance-ow...@postgresql.orgmailto:pgsql-performance-ow...@postgresql.org
 
[mailto:pgsql-performance-ow...@postgresql.org]mailto:[mailto:pgsql-performance-ow...@postgresql.org]
 On Behalf Of Genc, Ömer
Sent: Friday, August 21, 2015 8:49 AM
To: pgsql-performance@postgresql.orgmailto:pgsql-performance@postgresql.org
Subject: [PERFORM] Performance bottleneck due to array manipulation

Hey,

i have a very long running stored procedure, due to array manipulation in a 
stored procedure. The following procedure takes 13 seconds to finish.

BEGIN
point_ids_older_than_one_hour := '{}';
object_ids_to_be_invalidated := '{}';

select ARRAY(SELECT
point_id
from ONLY
public.ims_point as p
where
p.timestamp  m_before_one_hour
)
into point_ids_older_than_one_hour ; -- this array has a size of 20k

select ARRAY(SELECT
object_id
  from
public.ims_object_header h
  WHERE
h.last_point_id= ANY(point_ids_older_than_one_hour)
 )
into object_ids_to_be_invalidated; -- this array has a size of 100

--current_last_point_ids will have a size of 100k
current_last_point_ids := ARRAY( SELECT
last_point_id
  from
public.ims_object_header h
 );
-- START OF PERFORMANCE BOTTLENECK
IF(array_length(current_last_point_ids, 1)  0)
THEN
FOR i IN 0 .. array_upper(current_last_point_ids, 1)
LOOP
point_ids_older_than_one_hour = 
array_remove(point_ids_older_than_one_hour, current_last_point_ids[i]::bigint);
END LOOP;
END IF;
-- END OF PERFORMANCE BOTTLENECK
END;

The array manipulation part is the performance bottleneck. I am pretty sure, 
that there is a better way of doing this, however I couldn't find one.
What I have is two table, lets call them ims_point and ims_object_header. 
ims_object_header references some entries of ims_point in the column 
last_point_id.
Now I want to delete all entries from ims_point, where the timestamp is older 
than one hour. The currently being referenced ids of the table 
ims_object_header should be excluded from this deletion. Therefore I stored the 
ids in arrays and iterate over those arrays to exclude the referenced values 
from being deleted.

However, I not sure if using an array for an operation like this is the best 
approach.

Can anyone give me some advice how this could be enhanced.

Thanks in advance.


I think in this case (as is in many other cases) pure SQL does the job much 
better than procedural language:

DELETE FROM public.ims_point as P
WHERE  P.timestamp  m_before_one_hour
 AND NOT EXISTS (SELECT 1 FROM  public.ims_object_header OH
WHERE OH.last_point_id = 
P.object_id);

Is that what you are trying to accomplish?

Regards,
Igor Neyman






[PERFORM] Performance bottleneck due to array manipulation

2015-08-21 Thread Genc , Ömer
Hey,

i have a very long running stored procedure, due to array manipulation in a 
stored procedure. The following procedure takes 13 seconds to finish.

BEGIN
point_ids_older_than_one_hour := '{}';
object_ids_to_be_invalidated := '{}';

select ARRAY(SELECT
point_id
from ONLY
public.ims_point as p
where
p.timestamp  m_before_one_hour
)
into point_ids_older_than_one_hour ; -- this array has a size of 20k

select ARRAY(SELECT
object_id
  from
public.ims_object_header h
  WHERE
h.last_point_id= ANY(point_ids_older_than_one_hour)
 )
into object_ids_to_be_invalidated; -- this array has a size of 100

--current_last_point_ids will have a size of 100k
current_last_point_ids := ARRAY( SELECT
last_point_id
  from
public.ims_object_header h
 );
-- START OF PERFORMANCE BOTTLENECK
IF(array_length(current_last_point_ids, 1)  0)
THEN
FOR i IN 0 .. array_upper(current_last_point_ids, 1)
LOOP
point_ids_older_than_one_hour = 
array_remove(point_ids_older_than_one_hour, current_last_point_ids[i]::bigint);
END LOOP;
END IF;
-- END OF PERFORMANCE BOTTLENECK
END;

The array manipulation part is the performance bottleneck. I am pretty sure, 
that there is a better way of doing this, however I couldn't find one.
What I have is two table, lets call them ims_point and ims_object_header. 
ims_object_header references some entries of ims_point in the column 
last_point_id.
Now I want to delete all entries from ims_point, where the timestamp is older 
than one hour. The currently being referenced ids of the table 
ims_object_header should be excluded from this deletion. Therefore I stored the 
ids in arrays and iterate over those arrays to exclude the referenced values 
from being deleted.

However, I not sure if using an array for an operation like this is the best 
approach.

Can anyone give me some advice how this could be enhanced.

Thanks in advance.



Re: [PERFORM] Performance bottleneck due to array manipulation

2015-08-21 Thread Tom Lane
=?iso-8859-1?Q?Genc=2C_=D6mer?= oemer.g...@iais.fraunhofer.de writes:
 i have a very long running stored procedure, due to array manipulation in a 
 stored procedure. The following procedure takes 13 seconds to finish.

 BEGIN
 point_ids_older_than_one_hour := '{}';
 object_ids_to_be_invalidated := '{}';

 select ARRAY(SELECT
 point_id
 from ONLY
 public.ims_point as p
 where
 p.timestamp  m_before_one_hour
 )
 into point_ids_older_than_one_hour ; -- this array has a size of 20k

 select ARRAY(SELECT
 object_id
   from
 public.ims_object_header h
   WHERE
 h.last_point_id= ANY(point_ids_older_than_one_hour)
  )
 into object_ids_to_be_invalidated; -- this array has a size of 100

 --current_last_point_ids will have a size of 100k
 current_last_point_ids := ARRAY( SELECT
 last_point_id
   from
 public.ims_object_header h
  );
 -- START OF PERFORMANCE BOTTLENECK
 IF(array_length(current_last_point_ids, 1)  0)
 THEN
 FOR i IN 0 .. array_upper(current_last_point_ids, 1)
 LOOP
 point_ids_older_than_one_hour = 
 array_remove(point_ids_older_than_one_hour, 
 current_last_point_ids[i]::bigint);
 END LOOP;
 END IF;
 -- END OF PERFORMANCE BOTTLENECK
 END;

Well, in the first place, this is the cardinal sin of working with SQL
databases: doing procedurally that which should be done relationally.
Forget the arrays entirely and use EXCEPT, ie

  SELECT point_id FROM ...
  EXCEPT
  SELECT last_point_id FROM ...

Or maybe you want EXCEPT ALL, depending on whether duplicates are possible
and what you want to do with them if so.

Having said that, the way you have this is necessarily O(N^2) because
array_remove has to search the entire array on each call, and then
reconstruct the entire array less any removed element(s).  The new
expanded array infrastructure in 9.5 could perhaps reduce some of the
constant factor, but the array search loop remains so it would still be
O(N^2); and anyway array_remove has not been taught about expanded arrays
(which means this example is probably even slower in 9.5 :-().

If you were using integers, you could possibly get decent performance from
contrib/intarray's int[] - int[] operator (which I think does a sort and
merge internally); but I gather that these are bigints, so that won't
work.

regards, tom lane








   from






 The array manipulation part is the performance bottleneck. I am pretty sure, 
 that there is a better way of doing this, however I couldn't find one.
 What I have is two table, lets call them ims_point and ims_object_header. 
 ims_object_header references some entries of ims_point in the column 
 last_point_id.
 Now I want to delete all entries from ims_point, where the timestamp is older 
 than one hour. The currently being referenced ids of the table 
 ims_object_header should be excluded from this deletion. Therefore I stored 
 the ids in arrays and iterate over those arrays to exclude the referenced 
 values from being deleted.

 However, I not sure if using an array for an operation like this is the best 
 approach.

 Can anyone give me some advice how this could be enhanced.

 Thanks in advance.




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance bottleneck due to array manipulation

2015-08-21 Thread Félix GERZAGUET
Hello,

On Fri, Aug 21, 2015 at 2:48 PM, Genc, Ömer oemer.g...@iais.fraunhofer.de
wrote:

 Now I want to delete all entries from ims_point, where the timestamp is
 older than one hour. The currently being referenced ids of the table
 ims_object_header should be excluded from this deletion.




delete from public.ims_point ip
  where ip.timestamp  current_timestamp - interval '1 hour'
and not exists ( select 'reference exists'
   from public.ims_object_header ioh
  where ioh.last_point_id = ip.point_id
 )
;

Does this works for you ?


Re: [PERFORM] Performance bottleneck due to array manipulation

2015-08-21 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Genc, Ömer
Sent: Friday, August 21, 2015 8:49 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Performance bottleneck due to array manipulation

Hey,

i have a very long running stored procedure, due to array manipulation in a 
stored procedure. The following procedure takes 13 seconds to finish.

BEGIN
point_ids_older_than_one_hour := '{}';
object_ids_to_be_invalidated := '{}';

select ARRAY(SELECT
point_id
from ONLY
public.ims_point as p
where
p.timestamp  m_before_one_hour
)
into point_ids_older_than_one_hour ; -- this array has a size of 20k

select ARRAY(SELECT
object_id
  from
public.ims_object_header h
  WHERE
h.last_point_id= ANY(point_ids_older_than_one_hour)
 )
into object_ids_to_be_invalidated; -- this array has a size of 100

--current_last_point_ids will have a size of 100k
current_last_point_ids := ARRAY( SELECT
last_point_id
  from
public.ims_object_header h
 );
-- START OF PERFORMANCE BOTTLENECK
IF(array_length(current_last_point_ids, 1)  0)
THEN
FOR i IN 0 .. array_upper(current_last_point_ids, 1)
LOOP
point_ids_older_than_one_hour = 
array_remove(point_ids_older_than_one_hour, current_last_point_ids[i]::bigint);
END LOOP;
END IF;
-- END OF PERFORMANCE BOTTLENECK
END;

The array manipulation part is the performance bottleneck. I am pretty sure, 
that there is a better way of doing this, however I couldn't find one.
What I have is two table, lets call them ims_point and ims_object_header. 
ims_object_header references some entries of ims_point in the column 
last_point_id.
Now I want to delete all entries from ims_point, where the timestamp is older 
than one hour. The currently being referenced ids of the table 
ims_object_header should be excluded from this deletion. Therefore I stored the 
ids in arrays and iterate over those arrays to exclude the referenced values 
from being deleted.

However, I not sure if using an array for an operation like this is the best 
approach.

Can anyone give me some advice how this could be enhanced.

Thanks in advance.


I think in this case (as is in many other cases) pure SQL does the job much 
better than procedural language:

DELETE FROM public.ims_point as P
WHERE  P.timestamp  m_before_one_hour
 AND NOT EXISTS (SELECT 1 FROM  public.ims_object_header OH
WHERE OH.last_point_id = 
P.object_id);

Is that what you are trying to accomplish?

Regards,
Igor Neyman






Re: [PERFORM] Performance Bottleneck

2004-08-10 Thread matt

 Squid also takes away the work of doing SSL (presuming you're running it
 on a different machine). Unfortunately it doesn't support HTTP/1.1 which
 means that most generated pages (those that don't set Content-length) end
 up forcing squid to close and then reopen the connection to the web
 server.

It is true that it doesn't support http/1.1, but 'most generated pages'? 
Unless they are actually emitted progressively they should have a
perfectly good content-length header.

 I've also had some problems when Squid had a large number of connections
 open (several thousand); though that may have been because of my
 half_closed_clients setting. Squid 3 coped a lot better when I tried it
 (quite a few months ago now - and using FreeBSD and the special kqueue
 system call) but crashed under some (admittedly synthetic) conditions.

It runs out of the box with a very conservative setting for max open file
descriptors - this may or may not be the cause of the problems you have
seen.  Certainly I ran squid with 16,000 connections back in 1999...

 You still have periods of time when the web servers are busy using their
 CPUs to generate HTML rather than waiting for database queries. This is
 especially true if you cache a lot of data somewhere on the web servers
 themselves (which, in my experience, reduces the database load a great
 deal). If you REALLY need to reduce the number of connections (because you
 have a large number of web servers doing a lot of computation, say) then
 it might still be useful.

Aha, a postgres related topic in this thread!  What you say is very true,
but then given that the connection overhead is so vanishingly small, why
not simply run without a persistent DB connection in this case?  I would
maintain that if your webservers are holding open idle DB connections for
so long that it's a problem, then simply close the connections!

M

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Mike Benoit
On Fri, 2004-08-06 at 23:18 +, Martin Foster wrote:
 Mike Benoit wrote:
 
  On Wed, 2004-08-04 at 17:25 +0200, Gaetano Mendola wrote:
  
  
 The queries themselves are simple, normally drawing information from one 
 table with few conditions or in the most complex cases using joins on 
 two table or sub queries.   These behave very well and always have, the 
 problem is that these queries take place in rather large amounts due to 
 the dumb nature of the scripts themselves.
 
 Show us the explain analyze on that queries, how many rows the tables are
 containing, the table schema could be also usefull.
 
  
  
  If the queries themselves are optimized as much as they can be, and as
  you say, its just the sheer amount of similar queries hitting the
  database, you could try using prepared queries for ones that are most
  often executed to eliminate some of the overhead. 
  
  I've had relatively good success with this in the past, and it doesn't
  take very much code modification.
  
 
 One of the biggest problems is most probably related to the indexes. 
 Since the performance penalty of logging the information needed to see 
 which queries are used and which are not is a slight problem, then I 
 cannot really make use of it for now.
 
 However, I am curious how one would go about preparing query?   Is this 
 similar to the DBI::Prepare statement with placeholders and simply 
 changing the values passed on execute?  Or is this something database 
 level such as a view et cetera?
 

Yes, always optimize your queries and GUC settings first and foremost.
Thats where you are likely to gain the most performance. After that if
you still want to push things even further I would try prepared queries.
I'm not familiar with DBI::Prepare at all, but I don't think its what
your looking for.

This is what you want:
http://www.postgresql.org/docs/current/static/sql-prepare.html


-- 
Mike Benoit [EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Jeff
On Aug 8, 2004, at 1:29 AM, Martin Foster wrote:
I am currently making use of Apache::DBI which overrides the 
DBI::disconnect call and keeps a pool of active connections for use 
when need be.   Since it offloads the pooling to the webserver, it 
seems more advantageous then pgpool which while being able to run on a 
external system is not adding another layer of complexity.

Apache::DBI is not the same sort of a pool as pgpool.  DB connections 
are not shared among all your apache children (A common misconception). 
 So if you have 300 apache kids you can have have 300 db connections.  
With pgpool connections are  shared among all of them so even though 
you have 300 kids you only have say 32 db connections.

Anyone had any experience with both Apache::DBI and pgpool?   For my 
needs they seem to do essentially the same thing, simply that one is 
invisible to the code while the other requires adding the complexity 
of a proxy.

Both are invisible to the app.  (With pgpool it thinks it is connecting 
to a regular old PG server)

And I've been running pgpool in production for months.  It just sits 
there.  Doesn't take much to set it up or configure it.  Works like a 
champ

--
Jeff Trout [EMAIL PROTECTED]
http://www.jefftrout.com/
http://www.stuarthamm.net/
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Jan Wieck
On 8/8/2004 8:10 AM, Jeff wrote:
On Aug 8, 2004, at 1:29 AM, Martin Foster wrote:
I am currently making use of Apache::DBI which overrides the 
DBI::disconnect call and keeps a pool of active connections for use 
when need be.   Since it offloads the pooling to the webserver, it 
seems more advantageous then pgpool which while being able to run on a 
external system is not adding another layer of complexity.

Apache::DBI is not the same sort of a pool as pgpool.  DB connections 
are not shared among all your apache children (A common misconception). 
  So if you have 300 apache kids you can have have 300 db connections.  
With pgpool connections are  shared among all of them so even though 
you have 300 kids you only have say 32 db connections.
And this is exactly where the pgpool advantage lies. Especially with the 
TPC-W, the Apache is serving a mix of PHP (or whatever CGI technique is 
used) and static content like images. Since the 200+ Apache kids serve 
any of that content by random and the emulated browsers very much 
encourage it to ramp up MaxClients children by using up to 4 concurrent 
image connections, one does end up with MaxClients DB connections that 
are all relatively low frequently used. In contrast to that the real 
pgpool causes lesser, more active DB connections, which is better for 
performance.



Anyone had any experience with both Apache::DBI and pgpool?   For my 
needs they seem to do essentially the same thing, simply that one is 
invisible to the code while the other requires adding the complexity 
of a proxy.

Both are invisible to the app.  (With pgpool it thinks it is connecting 
to a regular old PG server)

And I've been running pgpool in production for months.  It just sits 
there.  Doesn't take much to set it up or configure it.  Works like a 
champ
And it buys you some extra admin feature people like to forget about it. 
One can shut down one pool for one web application only. That gives you 
instant single user access to one database without shutting down the 
whole webserver or tempering with the pg_hba.conf file.

Jan
--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Matt Clark
 And this is exactly where the pgpool advantage lies. 
 Especially with the 
 TPC-W, the Apache is serving a mix of PHP (or whatever CGI 
 technique is 
 used) and static content like images. Since the 200+ Apache 
 kids serve 
 any of that content by random and the emulated browsers very much 
 encourage it to ramp up MaxClients children by using up to 4 
 concurrent 
 image connections, one does end up with MaxClients DB 
 connections that 
 are all relatively low frequently used. In contrast to that the real 
 pgpool causes lesser, more active DB connections, which is better for 
 performance.

There are two well-worn and very mature techniques for dealing with the
issue of web apps using one DB connection per apache process, both of which
work extremely well and attack the issue at its source.

1)  Use a front-end caching proxy like Squid as an accelerator.  Static
content will be served by the accelerator 99% of the time.  Additionally,
large pages can be served immediately to the accelerator by Apache, which
can then go on to serve another request without waiting for the end user's
dial-up connection to pull the data down.  Massive speedup, fewer apache
processes needed.

2)  Serve static content off an entirely separate apache server than the
dynamic content, but by using separate domains (e.g. 'static.foo.com').

Personally I favour number 1.  Our last biggish peak saw 6000 open HTTP and
HTTPS connections and only 200 apache children, all of them nice and busy,
not hanging around on street corners looking bored.  During quiet times
Apache drops back to its configured minimum of 40 kids.  Option 2 has the
advantage that you can use a leaner build for the 'dynamic' apache server,
but with RAM so plentiful these days that's a less useful property.

Basically this puts the 'pooling' back in the stateless HTTP area where it
truly belongs and can be proven not to have any peculiar side effects
(especially when it comes to transaction safety).  Even better, so long as
you use URL parameters for searches and the like, you can have the
accelerator cache those pages for a certain time too so long as slightly
stale results are OK.

I'm sure pgpool and the like have their place, but being band-aids for
poorly configured websites probably isn't the best use for them.

M


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Martin Foster
Jeff wrote:
On Aug 8, 2004, at 1:29 AM, Martin Foster wrote:
I am currently making use of Apache::DBI which overrides the 
DBI::disconnect call and keeps a pool of active connections for use 
when need be.   Since it offloads the pooling to the webserver, it 
seems more advantageous then pgpool which while being able to run on a 
external system is not adding another layer of complexity.

Apache::DBI is not the same sort of a pool as pgpool.  DB connections 
are not shared among all your apache children (A common misconception). 
 So if you have 300 apache kids you can have have 300 db connections.  
With pgpool connections are  shared among all of them so even though you 
have 300 kids you only have say 32 db connections.

Seems that you are right, never noticed that from the documentation 
before.   I always assumed it had something to do with the long 
lasting/persistent scripts that would remain in transactions for 
extended periods of time.

Here is an odd question.   While the server run 7.4.x, the client 
connects with 7.3.x.  Would this in itself make a difference in 
performance as the protocols are different?   At least based from 
pgpool's documentation.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Arjen van der Meijden
On 8-8-2004 16:29, Matt Clark wrote:
There are two well-worn and very mature techniques for dealing with the
issue of web apps using one DB connection per apache process, both of which
work extremely well and attack the issue at its source.
1)  Use a front-end caching proxy like Squid as an accelerator.  Static
content will be served by the accelerator 99% of the time.  Additionally,
large pages can be served immediately to the accelerator by Apache, which
can then go on to serve another request without waiting for the end user's
dial-up connection to pull the data down.  Massive speedup, fewer apache
processes needed.
Another version of this 1) is to run with a content accelerator; our 
favourite is to run Tux in front of Apache. It takes over the 
connection-handling stuff, has a very low memoryprofile (compared to 
Apache) and very little overhead. What it does, is to serve up all 
simple content (although you can have cgi/php/perl and other languages 
being processed by it, entirely disabling the need for apache in some 
cases) and forwards/proxies everything it doesn't understand to an 
Apache/other webserver running at the same machine (which runs on 
another port).

I think there are a few advantages over Squid; since it is partially 
done in kernel-space it can be slightly faster in serving up content, 
apart from its simplicity which will probably matter even more. You'll 
have no caching issues for pages that should not be cached or static 
files that change periodically (like every few seconds). Afaik Tux can 
handle more than 10 times as much ab-generated requests per second than 
a default-compiled Apache on the same machine.
And besides the speed-up, you can do any request you where able to do 
before, since Tux will simply forward it to Apache if it didn't 
understand it.

Anyway, apart from all that. Reducing the amount of apache-connections 
is nice, but not really the same as reducing the amount of 
pooled-connections using a db-pool... You may even be able to run with 
1000 http-connections, 40 apache-processes and 10 db-connections. In 
case of the non-pooled setup, you'd still have 40 db-connections.

In a simple test I did, I did feel pgpool had quite some overhead 
though. So it should be well tested, to find out where the 
turnover-point is where it will be a gain instead of a loss...

Best regards,
Arjen van der Meijden
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Martin Foster
Arjen van der Meijden wrote:
On 8-8-2004 16:29, Matt Clark wrote:
There are two well-worn and very mature techniques for dealing with the
issue of web apps using one DB connection per apache process, both of 
which
work extremely well and attack the issue at its source.

1)Use a front-end caching proxy like Squid as an accelerator.  Static
content will be served by the accelerator 99% of the time.  Additionally,
large pages can be served immediately to the accelerator by Apache, which
can then go on to serve another request without waiting for the end 
user's
dial-up connection to pull the data down.  Massive speedup, fewer apache
processes needed.

Another version of this 1) is to run with a content accelerator; our 
favourite is to run Tux in front of Apache. It takes over the 
connection-handling stuff, has a very low memoryprofile (compared to 
Apache) and very little overhead. What it does, is to serve up all 
simple content (although you can have cgi/php/perl and other languages 
being processed by it, entirely disabling the need for apache in some 
cases) and forwards/proxies everything it doesn't understand to an 
Apache/other webserver running at the same machine (which runs on 
another port).

I think there are a few advantages over Squid; since it is partially 
done in kernel-space it can be slightly faster in serving up content, 
apart from its simplicity which will probably matter even more. You'll 
have no caching issues for pages that should not be cached or static 
files that change periodically (like every few seconds). Afaik Tux can 
handle more than 10 times as much ab-generated requests per second than 
a default-compiled Apache on the same machine.
And besides the speed-up, you can do any request you where able to do 
before, since Tux will simply forward it to Apache if it didn't 
understand it.

Anyway, apart from all that. Reducing the amount of apache-connections 
is nice, but not really the same as reducing the amount of 
pooled-connections using a db-pool... You may even be able to run with 
1000 http-connections, 40 apache-processes and 10 db-connections. In 
case of the non-pooled setup, you'd still have 40 db-connections.

In a simple test I did, I did feel pgpool had quite some overhead 
though. So it should be well tested, to find out where the 
turnover-point is where it will be a gain instead of a loss...

Best regards,
Arjen van der Meijden
Other then images, there are very few static pages being loaded up by 
the user.Since they make up a very small portion of the traffic, it 
tends to be an optimization we can forgo for now.

I attempted to make use of pgpool.   At the default 32 connections 
pre-forked the webserver almost immediately tapped out the pgpool base 
and content stopped being served because no new processes were being 
forked to make up for it.

So I raised it to a higher value (256) and it immediately segfaulted and 
dropped the core.So not sure exactly how to proceed, since I rather 
need the thing to fork additional servers as load hits and not the other 
way around.

Unless I had it configured oddly, but it seems work differently then an 
Apache server would to handle content.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Tatsuo Ishii
 Jeff wrote:
  
  On Aug 8, 2004, at 1:29 AM, Martin Foster wrote:
  
  I am currently making use of Apache::DBI which overrides the 
  DBI::disconnect call and keeps a pool of active connections for use 
  when need be.   Since it offloads the pooling to the webserver, it 
  seems more advantageous then pgpool which while being able to run on a 
  external system is not adding another layer of complexity.
 
  
  Apache::DBI is not the same sort of a pool as pgpool.  DB connections 
  are not shared among all your apache children (A common misconception). 
   So if you have 300 apache kids you can have have 300 db connections.  
  With pgpool connections are  shared among all of them so even though you 
  have 300 kids you only have say 32 db connections.
  
 
 Seems that you are right, never noticed that from the documentation 
 before.   I always assumed it had something to do with the long 
 lasting/persistent scripts that would remain in transactions for 
 extended periods of time.
 
 Here is an odd question.   While the server run 7.4.x, the client 
 connects with 7.3.x.  Would this in itself make a difference in 
 performance as the protocols are different?   At least based from 
 pgpool's documentation.

In this case the server fall back from V3 protocol (employed in 7.4 or
later) to V2 protocol (employed in from 6.4 to 7.3.x). As far as
pgpool concerning, performance difference is significant. Of course
that depends on the implementation though.

FYI here is the outline of the testing using pgbench.

H/W: Pentium4 2.4GHz x2/memory 1GB/HDD IDE 80GB (all PCs are same spec)
S/W: RedHat Linux 9/PostgreSQL 7.3.6/7.4.3

postgresql.conf:
tcpip_socket = true
max_connections = 512
shared_buffers = 2048

host A: pgbench, host B: pgpool, host C: PostgreSQL 7.3.6 or 7.4.3
pgbench parameters: -S -c 10 -t 1000

result:
TPS ratio(7.4.3)   
 ratio(7.3.6)

without pgpool  4357.625059 100%   
 100%
with pgpool(connection pool mode)   4330.290294 99.4%  
 94.1%
with pgpool(replication mode)   4297.614996 98.6%  
 87.6%
with pgpoo(replication with strictmode) 4270.223136 98.0%  
 81.5%
--
Tatsuo Ishii

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Tatsuo Ishii
 Arjen van der Meijden wrote:
 
  On 8-8-2004 16:29, Matt Clark wrote:
  
  There are two well-worn and very mature techniques for dealing with the
  issue of web apps using one DB connection per apache process, both of 
  which
  work extremely well and attack the issue at its source.
 
  1)Use a front-end caching proxy like Squid as an accelerator.  Static
  content will be served by the accelerator 99% of the time.  Additionally,
  large pages can be served immediately to the accelerator by Apache, which
  can then go on to serve another request without waiting for the end 
  user's
  dial-up connection to pull the data down.  Massive speedup, fewer apache
  processes needed.
  
  
  Another version of this 1) is to run with a content accelerator; our 
  favourite is to run Tux in front of Apache. It takes over the 
  connection-handling stuff, has a very low memoryprofile (compared to 
  Apache) and very little overhead. What it does, is to serve up all 
  simple content (although you can have cgi/php/perl and other languages 
  being processed by it, entirely disabling the need for apache in some 
  cases) and forwards/proxies everything it doesn't understand to an 
  Apache/other webserver running at the same machine (which runs on 
  another port).
  
  I think there are a few advantages over Squid; since it is partially 
  done in kernel-space it can be slightly faster in serving up content, 
  apart from its simplicity which will probably matter even more. You'll 
  have no caching issues for pages that should not be cached or static 
  files that change periodically (like every few seconds). Afaik Tux can 
  handle more than 10 times as much ab-generated requests per second than 
  a default-compiled Apache on the same machine.
  And besides the speed-up, you can do any request you where able to do 
  before, since Tux will simply forward it to Apache if it didn't 
  understand it.
  
  Anyway, apart from all that. Reducing the amount of apache-connections 
  is nice, but not really the same as reducing the amount of 
  pooled-connections using a db-pool... You may even be able to run with 
  1000 http-connections, 40 apache-processes and 10 db-connections. In 
  case of the non-pooled setup, you'd still have 40 db-connections.
  
  In a simple test I did, I did feel pgpool had quite some overhead 
  though. So it should be well tested, to find out where the 
  turnover-point is where it will be a gain instead of a loss...

I don't know what were the configurations you are using, but I noticed
that UNIX domain sockets are preferred for the connection bwteen
clients and pgpool. When I tested using pgbench -C (involving
connection estblishing for each transaction),
with-pgpool-configuration 10 times faster than without-pgpool-conf if
using UNIX domain sockets, while there is only 3.6 times speed up with
TCP/IP sockets.

  Best regards,
  
  Arjen van der Meijden
  
 
 Other then images, there are very few static pages being loaded up by 
 the user.Since they make up a very small portion of the traffic, it 
 tends to be an optimization we can forgo for now.
 
 I attempted to make use of pgpool.   At the default 32 connections 
 pre-forked the webserver almost immediately tapped out the pgpool base 
 and content stopped being served because no new processes were being 
 forked to make up for it.
 
 So I raised it to a higher value (256) and it immediately segfaulted and 
 dropped the core.So not sure exactly how to proceed, since I rather 
 need the thing to fork additional servers as load hits and not the other 
 way around.

What version of pgpool did you test? I know that certain version
(actually 2.0.2) had such that problem. Can you try again with the
latest verison of pgpool? (it's 2.0.6).
--
Tatsuo Ishii

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance Bottleneck

2004-08-07 Thread Gaetano Mendola
Scott Marlowe wrote:
On Fri, 2004-08-06 at 22:02, Martin Foster wrote:
Scott Marlowe wrote:

On Fri, 2004-08-06 at 17:24, Gaetano Mendola wrote:

Martin Foster wrote:

Gaetano Mendola wrote:

Let start from your postgres configuration:
shared_buffers = 8192 This is really too small for your 
configuration
sort_mem = 2048

wal_buffers = 128 This is really too small for your 
configuration

effective_cache_size = 16000
change this values in:
shared_buffers = 5
sort_mem = 16084
wal_buffers = 1500
effective_cache_size = 32000
to bump up the shm usage you have to configure your OS in order to be
allowed to use that ammount of SHM.
This are the numbers that I feel good for your HW, the second step now is
analyze your queries
These changes have yielded some visible improvements, with load averages 
rarely going over the anything noticeable.   However, I do have a 
question on the matter, why do these values seem to be far higher then 
what a frequently pointed to document would indicate as necessary?

http://www.varlena.com/GeneralBits/Tidbits/perf.html
I am simply curious, as this clearly shows that my understanding of 
PostgreSQL is clearly lacking when it comes to tweaking for the hardware.
Unfortunately there is no a wizard tuning for postgres so each one of
us have a own school. The data I gave you are oversized to be sure
to achieve improvements. Now you can start to decrease these values
( starting from the wal_buffers ) in order to find the good compromise
with your HW.

FYI, my school of tuning is to change one thing at a time some
reasonable percentage (shared_buffers from 1000 to 2000) and measure the
change under simulated load.  Make another change, test it, chart the
shape of the change line.  It should look something like this for most
folks:
shared_buffers | q/s (more is better)
100 | 20
200 | 45
400 | 80
1000 | 100
... levels out here...
8000 | 110
1 | 108
2 | 40
3 | 20
Note it going back down as we exceed our memory and start swapping
shared_buffers.  Where that happens on your machine is determined by
many things like your machine's memory, memory bandwidth, type of load,
etc... but it will happen on most machines and when it does, it often
happens at the worst times, under heavy parallel load.
Unless testing shows it's faster, 1 or 25% of mem (whichever is
less) is usually a pretty good setting for shared_buffers.  Large data
sets may require more than 1, but going over 25% on machines with
large memory is usually a mistake, especially servers that do anything
other than just PostgreSQL.
You're absolutely right about one thing, there's no automatic wizard for
tuning this stuff.
Which rather points out the crux of the problem.  This is a live system, 
meaning changes made need to be as informed as possible, and that 
changing values for the sake of testing can lead to potential problems 
in service.

But if you make those changes slowly, as I was showing, you should see
the small deleterious effects like I was showing long before they become
catastrophic.  To just jump shared_buffers to 5 is not a good idea,
especially if the sweet spot is likely lower than that.  
As you can see 5 are less then 20% of his total memory and I strongly
fell that 5 is not oversized for his hardware ( as wal_buffers isn't),
may be could be for his database activity but for sure that value ( values )
can not be source of problems.
I'd like to have a wizard that could be run also for hours in order to find the
good compromise for all GUC parameters , may be a genetic algoritm can help.
Regards
Gaetano Mendola











---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Performance Bottleneck

2004-08-07 Thread Gaetano Mendola
Tom Lane wrote:
Martin Foster [EMAIL PROTECTED] writes:
Gaetano Mendola wrote:
change this values in:
shared_buffers = 5
sort_mem = 16084
wal_buffers = 1500

This value of wal_buffers is simply ridiculous.
Instead I think is ridiculous a wal_buffers = 8 ( 64KB ) by default.
There isn't any reason to set wal_buffers higher than the amount of
WAL log data that will be generated by a single transaction, because
whatever is in the buffers will be flushed at transaction commit.
If you are mainly dealing with heavy concurrency then it's the mean time
between transaction commits that matters, and that's even less than the
average transaction length.
I partially agree with you, tell me how decide that value without
even now the typical queries, the tipical load ... nothing.
I suggested to OP to keep the wal_buffers so high in order to eliminate one
freedom of degree in his performance problems. You can see from following reply,

Gaetano Mendola wrote:
Unfortunately there is no a wizard tuning for postgres so each one of
us have a own school. The data I gave you are oversized to be sure
to achieve improvements. Now you can start to decrease these values
( starting from the wal_buffers ) in order to find the good compromise
with your HW.

However wal_buffers = 1500 means ~12 MB that are not so expensive considering
a server with 2GB of ram and I think that is a good compromise if you are not
starving for RAM.
I had a discussion about how fine tuning a postgres server with a client,
my question was: are you planning to have someone that periodically take a
look at your server activities in order to use your hardware at the best?
Easy answer: No, because when the server is overloaded I will buy a bigger
one that is less expensive that pay someone, considering also that shareolders
prefer increase the capex that pay salaries ( if the company close the hardware
can be selled :-(  ).
This is the real world out there.

Regards
Gaetano Mendola





---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Performance Bottleneck

2004-08-07 Thread Christopher Kings-Lynne
This value of wal_buffers is simply ridiculous.

Instead I think is ridiculous a wal_buffers = 8 ( 64KB ) by default.
There is no point making WAL buffers higher than 8.  I have done much 
testing of this and it makes not the slightest difference to performance 
that I could measure.

Chris
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Performance Bottleneck

2004-08-07 Thread Jan Wieck
On 8/3/2004 2:05 PM, Martin Foster wrote:
I run a Perl/CGI driven website that makes extensive use of PostgreSQL 
(7.4.3) for everything from user information to formatting and display 
of specific sections of the site.   The server itself, is a dual 
processor AMD Opteron 1.4Ghz w/ 2GB Ram and 2 x 120GB hard drives 
mirrored for redundancy running under FreeBSD 5.2.1 (AMD64).

Recently loads on the site have increased during peak hours to the point 
of showing considerable loss in performance.This can be observed 
when connections move from the 120 concurrent connections to PostgreSQL 
to roughly 175 or more. Essentially, the machine seems to struggle 
to keep up with continual requests and slows down respectively as 
resources are tied down.
Have you taken a look at pgpool? I know, it sounds silly to *reduce* the 
number of DB connections through a connection pool, but it can help.

Jan
--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Performance Bottleneck

2004-08-07 Thread Martin Foster
Jan Wieck wrote:
On 8/3/2004 2:05 PM, Martin Foster wrote:
I run a Perl/CGI driven website that makes extensive use of PostgreSQL 
(7.4.3) for everything from user information to formatting and display 
of specific sections of the site.   The server itself, is a dual 
processor AMD Opteron 1.4Ghz w/ 2GB Ram and 2 x 120GB hard drives 
mirrored for redundancy running under FreeBSD 5.2.1 (AMD64).

Recently loads on the site have increased during peak hours to the 
point of showing considerable loss in performance.This can be 
observed when connections move from the 120 concurrent connections to 
PostgreSQL to roughly 175 or more. Essentially, the machine seems 
to struggle to keep up with continual requests and slows down 
respectively as resources are tied down.

Have you taken a look at pgpool? I know, it sounds silly to *reduce* the 
number of DB connections through a connection pool, but it can help.

Jan
I am currently making use of Apache::DBI which overrides the 
DBI::disconnect call and keeps a pool of active connections for use when 
need be.   Since it offloads the pooling to the webserver, it seems more 
advantageous then pgpool which while being able to run on a external 
system is not adding another layer of complexity.

Anyone had any experience with both Apache::DBI and pgpool?   For my 
needs they seem to do essentially the same thing, simply that one is 
invisible to the code while the other requires adding the complexity of 
a proxy.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance Bottleneck

2004-08-07 Thread Martin Foster
Christopher Kings-Lynne wrote:
This value of wal_buffers is simply ridiculous.

Instead I think is ridiculous a wal_buffers = 8 ( 64KB ) by default.

There is no point making WAL buffers higher than 8.  I have done much 
testing of this and it makes not the slightest difference to performance 
that I could measure.

Chris
No point?  I had it at 64 if memory serves and logs were warning me that 
raising this value would be desired because of excessive IO brought upon 
from the logs being filled far too often.

It would seem to me that 8 is a bit low in at least a few circumstances.
Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Performance Bottleneck

2004-08-06 Thread Gaetano Mendola
Martin Foster wrote:
Gaetano Mendola wrote:
Martin Foster wrote:
I run a Perl/CGI driven website that makes extensive use of 
PostgreSQL (7.4.3) for everything from user information to formatting 
and display of specific sections of the site.   The server itself, is 
a dual processor AMD Opteron 1.4Ghz w/ 2GB Ram and 2 x 120GB hard 
drives mirrored for redundancy running under FreeBSD 5.2.1 (AMD64).

Recently loads on the site have increased during peak hours to the 
point of showing considerable loss in performance.This can be 
observed when connections move from the 120 concurrent connections to 
PostgreSQL to roughly 175 or more. Essentially, the machine seems 
to struggle to keep up with continual requests and slows down 
respectively as resources are tied down.

Code changes have been made to the scripts to essentially back off in 
high load working environments which have worked to an extent. 
However, as loads continue to increase the database itself is not 
taking well to the increased traffic taking place.

Having taken a look at 'Tuning PostgreSQL for Performance' 
(http://www.varlena.com/GeneralBits/Tidbits/perf.html) using it as 
best I could in order to set my settings.However, even with 
statistics disabled and ever setting tweaked things still consider to 
deteriorate.

Is there anything anyone can recommend in order to give the system a 
necessary speed boost?   It would seem to me that a modest dataset of 
roughly a Gig combined with that type of hardware should be able to 
handle substantially more load then what it is.  Can anyone provide 
me with clues as where to pursue?Would disabling 'fsync' provide 
more performance if I choose that information may be lost in case of 
a crash?

If anyone needs access to logs, settings et cetera.   Please ask, I 
simply wish to test the waters first on what is needed. Thanks!

Tell us about your tipical queries, show us your configuration file.
The access are only in read only mode or do you have concurrent writers
and readers ? During peak hours your processors are tied to 100% ?
What say the vmstat and the iostat ?
May be you are not using indexes some where, or may be yes but the
planner is not using it... In two words we needs other informations
in order to help you.

Regards
Gaetano Mendola

I included all the files in attachments, which will hopefully cut down 
on any replied to Emails.As for things like connection pooling, the 
web server makes use of Apache::DBI to pool the connections for the Perl 
scripts being driven on that server.For the sake of being thorough, 
a quick 'apachectl status' was thrown in when the database was under a 
good load.
Let start from your postgres configuration:
shared_buffers = 8192 This is really too small for your configuration
sort_mem = 2048
wal_buffers = 128 This is really too small for your configuration
effective_cache_size = 16000
change this values in:
shared_buffers = 5
sort_mem = 16084
wal_buffers = 1500
effective_cache_size = 32000
to bump up the shm usage you have to configure your OS in order to be
allowed to use that ammount of SHM.
This are the numbers that I feel good for your HW, the second step now is
analyze your queries
The queries themselves are simple, normally drawing information from one 
table with few conditions or in the most complex cases using joins on 
two table or sub queries.   These behave very well and always have, the 
problem is that these queries take place in rather large amounts due to 
the dumb nature of the scripts themselves.
Show us the explain analyze on that queries, how many rows the tables are
containing, the table schema could be also usefull.

regards
Gaetano Mendola




---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Performance Bottleneck

2004-08-06 Thread Mike Benoit
On Wed, 2004-08-04 at 17:25 +0200, Gaetano Mendola wrote:

  The queries themselves are simple, normally drawing information from one 
  table with few conditions or in the most complex cases using joins on 
  two table or sub queries.   These behave very well and always have, the 
  problem is that these queries take place in rather large amounts due to 
  the dumb nature of the scripts themselves.
 
 Show us the explain analyze on that queries, how many rows the tables are
 containing, the table schema could be also usefull.
 

If the queries themselves are optimized as much as they can be, and as
you say, its just the sheer amount of similar queries hitting the
database, you could try using prepared queries for ones that are most
often executed to eliminate some of the overhead. 

I've had relatively good success with this in the past, and it doesn't
take very much code modification.

-- 
Mike Benoit [EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Performance Bottleneck

2004-08-06 Thread Martin Foster
Gaetano Mendola wrote:

Let start from your postgres configuration:
shared_buffers = 8192 This is really too small for your 
configuration
sort_mem = 2048

wal_buffers = 128 This is really too small for your configuration
effective_cache_size = 16000
change this values in:
shared_buffers = 5
sort_mem = 16084
wal_buffers = 1500
effective_cache_size = 32000
to bump up the shm usage you have to configure your OS in order to be
allowed to use that ammount of SHM.
This are the numbers that I feel good for your HW, the second step now is
analyze your queries
These changes have yielded some visible improvements, with load averages 
rarely going over the anything noticeable.   However, I do have a 
question on the matter, why do these values seem to be far higher then 
what a frequently pointed to document would indicate as necessary?

http://www.varlena.com/GeneralBits/Tidbits/perf.html
I am simply curious, as this clearly shows that my understanding of 
PostgreSQL is clearly lacking when it comes to tweaking for the hardware.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance Bottleneck

2004-08-06 Thread Martin Foster
Mike Benoit wrote:
On Wed, 2004-08-04 at 17:25 +0200, Gaetano Mendola wrote:

The queries themselves are simple, normally drawing information from one 
table with few conditions or in the most complex cases using joins on 
two table or sub queries.   These behave very well and always have, the 
problem is that these queries take place in rather large amounts due to 
the dumb nature of the scripts themselves.
Show us the explain analyze on that queries, how many rows the tables are
containing, the table schema could be also usefull.

If the queries themselves are optimized as much as they can be, and as
you say, its just the sheer amount of similar queries hitting the
database, you could try using prepared queries for ones that are most
often executed to eliminate some of the overhead. 

I've had relatively good success with this in the past, and it doesn't
take very much code modification.
One of the biggest problems is most probably related to the indexes. 
Since the performance penalty of logging the information needed to see 
which queries are used and which are not is a slight problem, then I 
cannot really make use of it for now.

However, I am curious how one would go about preparing query?   Is this 
similar to the DBI::Prepare statement with placeholders and simply 
changing the values passed on execute?  Or is this something database 
level such as a view et cetera?

SELECT
 Post.PostIDNumber,
 Post.$format,
 Post.PuppeteerLogin,
 Post.PuppetName,
 Post.PostCmd,
 Post.PostClass
FROM Post
WHERE Post.PostIDNumber  ?::INT
  AND (Post.PostTo='all' OR Post.PostTo=?)
  AND (NOT EXISTS (SELECT PuppetIgnore.PuppetLogin
FROM PuppetIgnore
WHERE PuppetIgnore.PuppetIgnore='global'
  AND PuppetIgnore.PuppeteerLogin=?
  AND PuppetIgnore.PuppetLogin=Post.PuppeteerLogin)
   OR Post.PuppeteerLogin IS NULL)
  AND (NOT EXISTS (SELECT PuppetIgnore.PuppetName
FROM PuppetIgnore
WHERE PuppetIgnore.PuppetIgnore='single'
  AND PuppetIgnore.PuppeteerLogin=?
  AND PuppetIgnore.PuppetName=Post.PuppetName)
   OR Post.PuppetName IS NULL)
ORDER BY Post.PostIDNumber LIMIT 100
The range is determined from the previous run or through a query listed 
below.   It was determined that using INT was far faster then limiting 
by timestamp.

SELECT MIN(PostIDNumber)
FROM Post
WHERE RealmName=?
  AND PostClass IN ('general','play')
  AND PostTo='all'
The above simply provides a starting point, nothing more.   Once posts 
are pulled the script will throw in the last pulled number as to start 
from a fresh point.

Under MySQL time was an stored as an INT which may have helped it handle 
timestamps more efficiently.It also made use of three or more 
queries, where two were done to generate an IN statement for the query 
actually running at the time.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance Bottleneck

2004-08-06 Thread Gaetano Mendola
Martin Foster wrote:
Gaetano Mendola wrote:

Let start from your postgres configuration:
shared_buffers = 8192 This is really too small for your 
configuration
sort_mem = 2048

wal_buffers = 128 This is really too small for your 
configuration

effective_cache_size = 16000
change this values in:
shared_buffers = 5
sort_mem = 16084
wal_buffers = 1500
effective_cache_size = 32000
to bump up the shm usage you have to configure your OS in order to be
allowed to use that ammount of SHM.
This are the numbers that I feel good for your HW, the second step now is
analyze your queries
These changes have yielded some visible improvements, with load averages 
rarely going over the anything noticeable.   However, I do have a 
question on the matter, why do these values seem to be far higher then 
what a frequently pointed to document would indicate as necessary?

http://www.varlena.com/GeneralBits/Tidbits/perf.html
I am simply curious, as this clearly shows that my understanding of 
PostgreSQL is clearly lacking when it comes to tweaking for the hardware.
Unfortunately there is no a wizard tuning for postgres so each one of
us have a own school. The data I gave you are oversized to be sure
to achieve improvements. Now you can start to decrease these values
( starting from the wal_buffers ) in order to find the good compromise
with your HW.

Regards
Gaetano Mendola





---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Performance Bottleneck

2004-08-06 Thread Tom Lane
Martin Foster [EMAIL PROTECTED] writes:
 Gaetano Mendola wrote:
 change this values in:
 shared_buffers = 5
 sort_mem = 16084
 
 wal_buffers = 1500

This value of wal_buffers is simply ridiculous.

There isn't any reason to set wal_buffers higher than the amount of
WAL log data that will be generated by a single transaction, because
whatever is in the buffers will be flushed at transaction commit.
If you are mainly dealing with heavy concurrency then it's the mean time
between transaction commits that matters, and that's even less than the
average transaction length.

Even if you are mainly interested in the performance of large updating
transactions that are not concurrent with anything else (bulk data load,
perhaps), I'm not sure that I see any value in setting wal_buffers so
high.  The data will have to go to disk before commit in any case, and
buffering so much of it just means that you are going to have a serious
spike in disk traffic right before commit.  It's almost certainly better
to keep wal_buffers conservatively small and let the data trickle out as
the transaction proceeds.  I don't actually think there is anything very
wrong with the default value (8) ... perhaps it is too small, but it's
not two orders of magnitude too small.

In 8.0, the presence of the background writer may make it useful to run
with wal_buffers somewhat higher than before, but I still doubt that
order-of-a-thousand buffers would be useful.  The RAM would almost
certainly be better spent on general-purpose disk buffers or kernel
cache.

Note though that this is just informed opinion, as I've never done or
seen any benchmarks that examine the results of changing wal_buffers
while holding other things constant.  Has anyone tried it?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Performance Bottleneck

2004-08-06 Thread Scott Marlowe
On Fri, 2004-08-06 at 17:24, Gaetano Mendola wrote:
 Martin Foster wrote:
 
  Gaetano Mendola wrote:
  
 
 
  Let start from your postgres configuration:
 
  shared_buffers = 8192 This is really too small for your 
  configuration
  sort_mem = 2048
 
  wal_buffers = 128 This is really too small for your 
  configuration
 
  effective_cache_size = 16000
 
  change this values in:
 
  shared_buffers = 5
  sort_mem = 16084
 
  wal_buffers = 1500
 
  effective_cache_size = 32000
 
 
  to bump up the shm usage you have to configure your OS in order to be
  allowed to use that ammount of SHM.
 
  This are the numbers that I feel good for your HW, the second step now is
  analyze your queries
 
  
  These changes have yielded some visible improvements, with load averages 
  rarely going over the anything noticeable.   However, I do have a 
  question on the matter, why do these values seem to be far higher then 
  what a frequently pointed to document would indicate as necessary?
  
  http://www.varlena.com/GeneralBits/Tidbits/perf.html
  
  I am simply curious, as this clearly shows that my understanding of 
  PostgreSQL is clearly lacking when it comes to tweaking for the hardware.
 
 Unfortunately there is no a wizard tuning for postgres so each one of
 us have a own school. The data I gave you are oversized to be sure
 to achieve improvements. Now you can start to decrease these values
 ( starting from the wal_buffers ) in order to find the good compromise
 with your HW.

FYI, my school of tuning is to change one thing at a time some
reasonable percentage (shared_buffers from 1000 to 2000) and measure the
change under simulated load.  Make another change, test it, chart the
shape of the change line.  It should look something like this for most
folks:

shared_buffers | q/s (more is better)
100 | 20
200 | 45
400 | 80
1000 | 100
... levels out here...
8000 | 110
1 | 108
2 | 40
3 | 20

Note it going back down as we exceed our memory and start swapping
shared_buffers.  Where that happens on your machine is determined by
many things like your machine's memory, memory bandwidth, type of load,
etc... but it will happen on most machines and when it does, it often
happens at the worst times, under heavy parallel load.

Unless testing shows it's faster, 1 or 25% of mem (whichever is
less) is usually a pretty good setting for shared_buffers.  Large data
sets may require more than 1, but going over 25% on machines with
large memory is usually a mistake, especially servers that do anything
other than just PostgreSQL.

You're absolutely right about one thing, there's no automatic wizard for
tuning this stuff.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Performance Bottleneck

2004-08-06 Thread Martin Foster
Scott Marlowe wrote:
On Fri, 2004-08-06 at 17:24, Gaetano Mendola wrote:
Martin Foster wrote:

Gaetano Mendola wrote:

Let start from your postgres configuration:
shared_buffers = 8192 This is really too small for your 
configuration
sort_mem = 2048

wal_buffers = 128 This is really too small for your 
configuration

effective_cache_size = 16000
change this values in:
shared_buffers = 5
sort_mem = 16084
wal_buffers = 1500
effective_cache_size = 32000
to bump up the shm usage you have to configure your OS in order to be
allowed to use that ammount of SHM.
This are the numbers that I feel good for your HW, the second step now is
analyze your queries
These changes have yielded some visible improvements, with load averages 
rarely going over the anything noticeable.   However, I do have a 
question on the matter, why do these values seem to be far higher then 
what a frequently pointed to document would indicate as necessary?

http://www.varlena.com/GeneralBits/Tidbits/perf.html
I am simply curious, as this clearly shows that my understanding of 
PostgreSQL is clearly lacking when it comes to tweaking for the hardware.
Unfortunately there is no a wizard tuning for postgres so each one of
us have a own school. The data I gave you are oversized to be sure
to achieve improvements. Now you can start to decrease these values
( starting from the wal_buffers ) in order to find the good compromise
with your HW.

FYI, my school of tuning is to change one thing at a time some
reasonable percentage (shared_buffers from 1000 to 2000) and measure the
change under simulated load.  Make another change, test it, chart the
shape of the change line.  It should look something like this for most
folks:
shared_buffers | q/s (more is better)
100 | 20
200 | 45
400 | 80
1000 | 100
... levels out here...
8000 | 110
1 | 108
2 | 40
3 | 20
Note it going back down as we exceed our memory and start swapping
shared_buffers.  Where that happens on your machine is determined by
many things like your machine's memory, memory bandwidth, type of load,
etc... but it will happen on most machines and when it does, it often
happens at the worst times, under heavy parallel load.
Unless testing shows it's faster, 1 or 25% of mem (whichever is
less) is usually a pretty good setting for shared_buffers.  Large data
sets may require more than 1, but going over 25% on machines with
large memory is usually a mistake, especially servers that do anything
other than just PostgreSQL.
You're absolutely right about one thing, there's no automatic wizard for
tuning this stuff.
Which rather points out the crux of the problem.  This is a live system, 
meaning changes made need to be as informed as possible, and that 
changing values for the sake of testing can lead to potential problems 
in service.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Performance Bottleneck

2004-08-06 Thread Scott Marlowe
On Fri, 2004-08-06 at 22:02, Martin Foster wrote:
 Scott Marlowe wrote:
 
  On Fri, 2004-08-06 at 17:24, Gaetano Mendola wrote:
  
 Martin Foster wrote:
 
 
 Gaetano Mendola wrote:
 
 
 
 Let start from your postgres configuration:
 
 shared_buffers = 8192 This is really too small for your 
 configuration
 sort_mem = 2048
 
 wal_buffers = 128 This is really too small for your 
 configuration
 
 effective_cache_size = 16000
 
 change this values in:
 
 shared_buffers = 5
 sort_mem = 16084
 
 wal_buffers = 1500
 
 effective_cache_size = 32000
 
 
 to bump up the shm usage you have to configure your OS in order to be
 allowed to use that ammount of SHM.
 
 This are the numbers that I feel good for your HW, the second step now is
 analyze your queries
 
 
 These changes have yielded some visible improvements, with load averages 
 rarely going over the anything noticeable.   However, I do have a 
 question on the matter, why do these values seem to be far higher then 
 what a frequently pointed to document would indicate as necessary?
 
 http://www.varlena.com/GeneralBits/Tidbits/perf.html
 
 I am simply curious, as this clearly shows that my understanding of 
 PostgreSQL is clearly lacking when it comes to tweaking for the hardware.
 
 Unfortunately there is no a wizard tuning for postgres so each one of
 us have a own school. The data I gave you are oversized to be sure
 to achieve improvements. Now you can start to decrease these values
 ( starting from the wal_buffers ) in order to find the good compromise
 with your HW.
  
  
  FYI, my school of tuning is to change one thing at a time some
  reasonable percentage (shared_buffers from 1000 to 2000) and measure the
  change under simulated load.  Make another change, test it, chart the
  shape of the change line.  It should look something like this for most
  folks:
  
  shared_buffers | q/s (more is better)
  100 | 20
  200 | 45
  400 | 80
  1000 | 100
  ... levels out here...
  8000 | 110
  1 | 108
  2 | 40
  3 | 20
  
  Note it going back down as we exceed our memory and start swapping
  shared_buffers.  Where that happens on your machine is determined by
  many things like your machine's memory, memory bandwidth, type of load,
  etc... but it will happen on most machines and when it does, it often
  happens at the worst times, under heavy parallel load.
  
  Unless testing shows it's faster, 1 or 25% of mem (whichever is
  less) is usually a pretty good setting for shared_buffers.  Large data
  sets may require more than 1, but going over 25% on machines with
  large memory is usually a mistake, especially servers that do anything
  other than just PostgreSQL.
  
  You're absolutely right about one thing, there's no automatic wizard for
  tuning this stuff.
  
 
 Which rather points out the crux of the problem.  This is a live system, 
 meaning changes made need to be as informed as possible, and that 
 changing values for the sake of testing can lead to potential problems 
 in service.

But if you make those changes slowly, as I was showing, you should see
the small deleterious effects like I was showing long before they become
catastrophic.  To just jump shared_buffers to 5 is not a good idea,
especially if the sweet spot is likely lower than that.  


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance Bottleneck

2004-08-06 Thread Martin Foster
Scott Marlowe wrote:
On Fri, 2004-08-06 at 22:02, Martin Foster wrote:
Scott Marlowe wrote:

On Fri, 2004-08-06 at 17:24, Gaetano Mendola wrote:

Martin Foster wrote:

Gaetano Mendola wrote:

Let start from your postgres configuration:
shared_buffers = 8192 This is really too small for your 
configuration
sort_mem = 2048

wal_buffers = 128 This is really too small for your 
configuration

effective_cache_size = 16000
change this values in:
shared_buffers = 5
sort_mem = 16084
wal_buffers = 1500
effective_cache_size = 32000
to bump up the shm usage you have to configure your OS in order to be
allowed to use that ammount of SHM.
This are the numbers that I feel good for your HW, the second step now is
analyze your queries
These changes have yielded some visible improvements, with load averages 
rarely going over the anything noticeable.   However, I do have a 
question on the matter, why do these values seem to be far higher then 
what a frequently pointed to document would indicate as necessary?

http://www.varlena.com/GeneralBits/Tidbits/perf.html
I am simply curious, as this clearly shows that my understanding of 
PostgreSQL is clearly lacking when it comes to tweaking for the hardware.
Unfortunately there is no a wizard tuning for postgres so each one of
us have a own school. The data I gave you are oversized to be sure
to achieve improvements. Now you can start to decrease these values
( starting from the wal_buffers ) in order to find the good compromise
with your HW.

FYI, my school of tuning is to change one thing at a time some
reasonable percentage (shared_buffers from 1000 to 2000) and measure the
change under simulated load.  Make another change, test it, chart the
shape of the change line.  It should look something like this for most
folks:
shared_buffers | q/s (more is better)
100 | 20
200 | 45
400 | 80
1000 | 100
... levels out here...
8000 | 110
1 | 108
2 | 40
3 | 20
Note it going back down as we exceed our memory and start swapping
shared_buffers.  Where that happens on your machine is determined by
many things like your machine's memory, memory bandwidth, type of load,
etc... but it will happen on most machines and when it does, it often
happens at the worst times, under heavy parallel load.
Unless testing shows it's faster, 1 or 25% of mem (whichever is
less) is usually a pretty good setting for shared_buffers.  Large data
sets may require more than 1, but going over 25% on machines with
large memory is usually a mistake, especially servers that do anything
other than just PostgreSQL.
You're absolutely right about one thing, there's no automatic wizard for
tuning this stuff.
Which rather points out the crux of the problem.  This is a live system, 
meaning changes made need to be as informed as possible, and that 
changing values for the sake of testing can lead to potential problems 
in service.

But if you make those changes slowly, as I was showing, you should see
the small deleterious effects like I was showing long before they become
catastrophic.  To just jump shared_buffers to 5 is not a good idea,
especially if the sweet spot is likely lower than that.  

While I agree, there are also issues with the fact that getting 
consistent results from this site are very much difficult to do, since 
it is based on the whims of users visiting one of three sites hosted on 
the same hardware.

Now that being said, having wal_buffers at 8 certainly would not be a 
good idea, since the database logs themselves were warning of excessive 
writes in that region.I am not hoping for a perfect intermix ratio, 
that will solve all my problems.

But a good idea on a base that will allow me to gain a fair load would 
certainly be a good option.   Right now, the load being handled is not 
much more then a single processor system did with half the memory. 
Certainly this architecture should be able to take more of a beating 
then this?

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance Bottleneck

2004-08-05 Thread Pierre-Frdric Caillaud
Apache processes running for 30 minutes ?.
My advice : use frames and Javascript !
In your webpage, you have two frames : content and refresh.
	content starts empty (say, just a title on top of the page).
	refresh is refreshed every five seconds from a script on your server.  
This script generates a javascript which document.write()'s new entries  
in the content frame, thus adding new records in the upper frame.

	Thus, the refreshing uses a new request every 5 seconds, which terminates  
very fast, and does not hog an Apache process.

Turn keepalive timeout down.
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Performance Bottleneck

2004-08-05 Thread Steinar H. Gunderson
On Thu, Aug 05, 2004 at 08:40:35AM +0200, Pierre-Frédéric Caillaud wrote:
   Apache processes running for 30 minutes ?.
 
   My advice : use frames and Javascript !

My advice: Stay out of frames and Javascript if you can avoid it. The first
is severely outdated technology, and the other one might well be disabled at
the client side.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Performance Bottleneck

2004-08-04 Thread Pierre-Frdric Caillaud

The queries themselves are simple, normally drawing information from one
table with few conditions or in the most complex cases using joins on
two table or sub queries.   These behave very well and always have, the
problem is that these queries take place in rather large amounts due to
the dumb nature of the scripts themselves.
Hum, maybe this dumb thing is where to look at ?
	I'm no expert, but I have had the same situation with a very dump PHP  
application, namely osCommerce, which averaged about 140 (!) queries  
on a page !

	I added some traces to queries, and some logging, only to see that the  
stupid programmers did something like (pseudo code):

for id in id_list:
select stuff from database where id=id
Geee...
I replaced it by :
select stuff from database where id in (id_list)
	And this saved about 20 requests... The code was peppered by queries like  
that. In the end it went from 140 queries to about 20, which is still way  
too much IMHO, but I couldn't go lower without an extensive rewrite.

	If you have a script making many selects, it's worth grouping them, even  
using stored procedures.

	For instance using the classical tree in a table to store a tree of  
product categories :

create table categories
(
id serial primary key,
parent_id references categories(id),
etc
);
You basically have these choices in order to display the tree :
- select for parent_id=0 (root)
- for each element, select its children
- and so on
OR
	- make a stored procedure which does that. At least 3x faster and a lot  
less CPU overhead.

OR (if you have say 50 rows in the table which was my case)
- select the entire table and build your tree in the script
It was a little bit faster than the stored procedure.
	Could you give an example of your dumb scripts ? It's good to optimize a  
database, but it's even better to remove useless queries...






---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Performance Bottleneck

2004-08-04 Thread Martin Foster
Gaetano Mendola wrote:
Martin Foster wrote:
Gaetano Mendola wrote:
Martin Foster wrote:
I run a Perl/CGI driven website that makes extensive use of 
PostgreSQL (7.4.3) for everything from user information to 
formatting and display of specific sections of the site.   The 
server itself, is a dual processor AMD Opteron 1.4Ghz w/ 2GB Ram and 
2 x 120GB hard drives mirrored for redundancy running under FreeBSD 
5.2.1 (AMD64).

Recently loads on the site have increased during peak hours to the 
point of showing considerable loss in performance.This can be 
observed when connections move from the 120 concurrent connections 
to PostgreSQL to roughly 175 or more. Essentially, the machine 
seems to struggle to keep up with continual requests and slows down 
respectively as resources are tied down.

Code changes have been made to the scripts to essentially back off 
in high load working environments which have worked to an extent. 
However, as loads continue to increase the database itself is not 
taking well to the increased traffic taking place.

Having taken a look at 'Tuning PostgreSQL for Performance' 
(http://www.varlena.com/GeneralBits/Tidbits/perf.html) using it as 
best I could in order to set my settings.However, even with 
statistics disabled and ever setting tweaked things still consider 
to deteriorate.

Is there anything anyone can recommend in order to give the system a 
necessary speed boost?   It would seem to me that a modest dataset 
of roughly a Gig combined with that type of hardware should be able 
to handle substantially more load then what it is.  Can anyone 
provide me with clues as where to pursue?Would disabling 'fsync' 
provide more performance if I choose that information may be lost in 
case of a crash?

If anyone needs access to logs, settings et cetera.   Please ask, I 
simply wish to test the waters first on what is needed. Thanks!


Tell us about your tipical queries, show us your configuration file.
The access are only in read only mode or do you have concurrent writers
and readers ? During peak hours your processors are tied to 100% ?
What say the vmstat and the iostat ?
May be you are not using indexes some where, or may be yes but the
planner is not using it... In two words we needs other informations
in order to help you.

Regards
Gaetano Mendola

I included all the files in attachments, which will hopefully cut down 
on any replied to Emails.As for things like connection pooling, 
the web server makes use of Apache::DBI to pool the connections for 
the Perl scripts being driven on that server.For the sake of being 
thorough, a quick 'apachectl status' was thrown in when the database 
was under a good load.

Let start from your postgres configuration:
shared_buffers = 8192 This is really too small for your 
configuration
sort_mem = 2048

wal_buffers = 128 This is really too small for your configuration
effective_cache_size = 16000
change this values in:
shared_buffers = 5
sort_mem = 16084
wal_buffers = 1500
effective_cache_size = 32000
to bump up the shm usage you have to configure your OS in order to be
allowed to use that ammount of SHM.
This are the numbers that I feel good for your HW, the second step now is
analyze your queries
The queries themselves are simple, normally drawing information from 
one table with few conditions or in the most complex cases using joins 
on two table or sub queries.   These behave very well and always have, 
the problem is that these queries take place in rather large amounts 
due to the dumb nature of the scripts themselves.

Show us the explain analyze on that queries, how many rows the tables are
containing, the table schema could be also usefull.

regards
Gaetano Mendola
I will look into moving up those values and seeing how they interact 
with the system once I get back from work.Since it was requested, I 
have a visual representation of an older schema, one that was used under 
MySQL.  Note that all of the timestamps are now properly set to 
LOCALTIME on PostgreSQL.

http://prdownloads.sourceforge.net/ethereal-realms/ethereal-3_0_0.png?download
The amount of rows for tables of note are as follows:
 Puppeteer 1606
 Puppet33176
 Realm 83
 Post  36156
 Audit 61961
The post table is continually cleared of old information since the 
nature of the information is time very critical and archiving would only 
hinder performance.As a result, this will vary wildly based on time 
of day since users (Puppeteers) tend to post more during peak hours.

NOTE:   The scripts make use of different schema's with the same
information in order to virtualize the script in order
to support more then one site on the same hardware.
On a side note, this would be a normal post-authentication session once 
in realm for getting new posts:
 * Script is executed and schema is determined through stored procedure;
 * Formatting information is fetched from Tag and RealmDesign as needed;
 

[PERFORM] Performance Bottleneck

2004-08-03 Thread Martin Foster
I run a Perl/CGI driven website that makes extensive use of PostgreSQL 
(7.4.3) for everything from user information to formatting and display 
of specific sections of the site.   The server itself, is a dual 
processor AMD Opteron 1.4Ghz w/ 2GB Ram and 2 x 120GB hard drives 
mirrored for redundancy running under FreeBSD 5.2.1 (AMD64).

Recently loads on the site have increased during peak hours to the point 
of showing considerable loss in performance.This can be observed 
when connections move from the 120 concurrent connections to PostgreSQL 
to roughly 175 or more. Essentially, the machine seems to struggle 
to keep up with continual requests and slows down respectively as 
resources are tied down.

Code changes have been made to the scripts to essentially back off in 
high load working environments which have worked to an extent. 
However, as loads continue to increase the database itself is not taking 
well to the increased traffic taking place.

Having taken a look at 'Tuning PostgreSQL for Performance' 
(http://www.varlena.com/GeneralBits/Tidbits/perf.html) using it as best 
I could in order to set my settings.However, even with statistics 
disabled and ever setting tweaked things still consider to deteriorate.

Is there anything anyone can recommend in order to give the system a 
necessary speed boost?   It would seem to me that a modest dataset of 
roughly a Gig combined with that type of hardware should be able to 
handle substantially more load then what it is.  Can anyone provide me 
with clues as where to pursue?Would disabling 'fsync' provide more 
performance if I choose that information may be lost in case of a crash?

If anyone needs access to logs, settings et cetera.   Please ask, I 
simply wish to test the waters first on what is needed. Thanks!

Martin Foster
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Performance Bottleneck

2004-08-03 Thread Dennis Bjorklund
On Tue, 3 Aug 2004, Martin Foster wrote:

 to roughly 175 or more. Essentially, the machine seems to struggle 
 to keep up with continual requests and slows down respectively as 
 resources are tied down.

I suggest you try to find queries that are slow and check to see if the 
plans are optimal for those queries.

There are some logging options for logging quries that run longer then a 
user set limit. That can help finding the slow queries. Just doing some 
logging for some typical page fetches often show things that can be done 
better. For example, it's not uncommon to see the same information beeing 
pulled several times by misstake.

Maybe you can also try something like connection pooling. I'm not sure how
much that can give, but for small queries the connection time is usually
the big part.

 Would disabling 'fsync' provide more performance if I choose that
 information may be lost in case of a crash?

I would not do that. In most cases the performance increase is modest and
the data corruption risk after a crash is much bigger so it's not worth
it.

If you have a lot of small inserts then it might be faster with this, but
if possible it's much better to try to do more work in a transaction then 
before.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Performance Bottleneck

2004-08-03 Thread Joshua D. Drake
Hello,
It sounds to me like you are IO bound. 2x120GB hard drives just isn't 
going to cut it with that many connections (as a general rule). Are you 
swapping ?

Sincerely,
Joshua D. Drake


Martin Foster wrote:
I run a Perl/CGI driven website that makes extensive use of PostgreSQL 
(7.4.3) for everything from user information to formatting and display 
of specific sections of the site.   The server itself, is a dual 
processor AMD Opteron 1.4Ghz w/ 2GB Ram and 2 x 120GB hard drives 
mirrored for redundancy running under FreeBSD 5.2.1 (AMD64).
--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
begin:vcard
fn:Joshua D. Drake
n:Drake;Joshua D.
org:Command Prompt, Inc.
adr:;;PO Box 215;Cascade Locks;Oregon;97014;USA
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0034
note:Command Prompt, Inc. is the largest and oldest US based commercial PostgreSQL support provider. We  provide the only commercially viable integrated PostgreSQL replication solution, but also custom programming, and support. We authored  the book Practical PostgreSQL, the procedural language plPHP, and adding trigger capability to plPerl.
x-mozilla-html:FALSE
url:http://www.commandprompt.com/
version:2.1
end:vcard


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Performance Bottleneck

2004-08-03 Thread Gaetano Mendola
Martin Foster wrote:
I run a Perl/CGI driven website that makes extensive use of PostgreSQL 
(7.4.3) for everything from user information to formatting and display 
of specific sections of the site.   The server itself, is a dual 
processor AMD Opteron 1.4Ghz w/ 2GB Ram and 2 x 120GB hard drives 
mirrored for redundancy running under FreeBSD 5.2.1 (AMD64).

Recently loads on the site have increased during peak hours to the point 
of showing considerable loss in performance.This can be observed 
when connections move from the 120 concurrent connections to PostgreSQL 
to roughly 175 or more. Essentially, the machine seems to struggle 
to keep up with continual requests and slows down respectively as 
resources are tied down.

Code changes have been made to the scripts to essentially back off in 
high load working environments which have worked to an extent. However, 
as loads continue to increase the database itself is not taking well to 
the increased traffic taking place.

Having taken a look at 'Tuning PostgreSQL for Performance' 
(http://www.varlena.com/GeneralBits/Tidbits/perf.html) using it as best 
I could in order to set my settings.However, even with statistics 
disabled and ever setting tweaked things still consider to deteriorate.

Is there anything anyone can recommend in order to give the system a 
necessary speed boost?   It would seem to me that a modest dataset of 
roughly a Gig combined with that type of hardware should be able to 
handle substantially more load then what it is.  Can anyone provide me 
with clues as where to pursue?Would disabling 'fsync' provide more 
performance if I choose that information may be lost in case of a crash?

If anyone needs access to logs, settings et cetera.   Please ask, I 
simply wish to test the waters first on what is needed. Thanks!
Tell us about your tipical queries, show us your configuration file.
The access are only in read only mode or do you have concurrent writers
and readers ? During peak hours your processors are tied to 100% ?
What say the vmstat and the iostat ?
May be you are not using indexes some where, or may be yes but the
planner is not using it... In two words we needs other informations
in order to help you.

Regards
Gaetano Mendola

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Performance Bottleneck

2004-08-03 Thread Martin Foster
Gaetano Mendola wrote:
Martin Foster wrote:
I run a Perl/CGI driven website that makes extensive use of PostgreSQL 
(7.4.3) for everything from user information to formatting and display 
of specific sections of the site.   The server itself, is a dual 
processor AMD Opteron 1.4Ghz w/ 2GB Ram and 2 x 120GB hard drives 
mirrored for redundancy running under FreeBSD 5.2.1 (AMD64).

Recently loads on the site have increased during peak hours to the 
point of showing considerable loss in performance.This can be 
observed when connections move from the 120 concurrent connections to 
PostgreSQL to roughly 175 or more. Essentially, the machine seems 
to struggle to keep up with continual requests and slows down 
respectively as resources are tied down.

Code changes have been made to the scripts to essentially back off in 
high load working environments which have worked to an extent. 
However, as loads continue to increase the database itself is not 
taking well to the increased traffic taking place.

Having taken a look at 'Tuning PostgreSQL for Performance' 
(http://www.varlena.com/GeneralBits/Tidbits/perf.html) using it as 
best I could in order to set my settings.However, even with 
statistics disabled and ever setting tweaked things still consider to 
deteriorate.

Is there anything anyone can recommend in order to give the system a 
necessary speed boost?   It would seem to me that a modest dataset of 
roughly a Gig combined with that type of hardware should be able to 
handle substantially more load then what it is.  Can anyone provide me 
with clues as where to pursue?Would disabling 'fsync' provide more 
performance if I choose that information may be lost in case of a crash?

If anyone needs access to logs, settings et cetera.   Please ask, I 
simply wish to test the waters first on what is needed. Thanks!

Tell us about your tipical queries, show us your configuration file.
The access are only in read only mode or do you have concurrent writers
and readers ? During peak hours your processors are tied to 100% ?
What say the vmstat and the iostat ?
May be you are not using indexes some where, or may be yes but the
planner is not using it... In two words we needs other informations
in order to help you.

Regards
Gaetano Mendola

I included all the files in attachments, which will hopefully cut down 
on any replied to Emails.As for things like connection pooling, the 
web server makes use of Apache::DBI to pool the connections for the Perl 
scripts being driven on that server.For the sake of being thorough, 
a quick 'apachectl status' was thrown in when the database was under a 
good load.

Since it would rather slow things down to wait for the servers to really 
get bogged down with load averages of 20.00 and more, I opted to choose 
a period of time where we are a bit busier then normal.   You will be 
able to see how the system behaves under a light load and subsequently 
reaching 125 or so concurrent connections.

The queries themselves are simple, normally drawing information from one 
table with few conditions or in the most complex cases using joins on 
two table or sub queries.   These behave very well and always have, the 
problem is that these queries take place in rather large amounts due to 
the dumb nature of the scripts themselves.

Over a year ago when I was still using MySQL for the project, the 
statistics generated would report well over 65 queries per second under 
loads ranging from 130 to 160 at peak but averaged over the weeks of 
operation.   Looking at the Apache status, one can see that it averages 
only roughly 2.5 requests per second giving you a slight indication as 
to what is taking place.

A quick run of 'systat -ifstat' shows the following graph:
   /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
Load Average   
Interface   Traffic   PeakTotal
  lo0  in  0.000 KB/s  0.000 KB/s   37.690 GB
   out 0.000 KB/s  0.000 KB/s   37.690 GB
  em0  in 34.638 KB/s 41.986 KB/s   28.998 GB
   out70.777 KB/s 70.777 KB/s   39.553 GB
Em0 is a full duplexed 100Mbs connection to an internal switch that 
supports the servers directly.   Load on the loopback was cut down 
considerably once I stopped using pg_autovaccum since its performance 
benefits under low load were buried under the hindrance it caused when 
traffic was high.

I am sure that there are some places that could benefit from some 
optimization.  Especially in the case of indexes, however as a whole the 
problem seems to be related more to the massive onslaught of queries 
then it does anything else.

Also note that some of these scripts run for longer durations even if 
they are web based.Some run as long as 30 minutes, making queries to 
the database from periods of wait from five seconds to twenty-five 
seconds. Under high duress the