[PERFORM] insert speed - Mac OSX vs Redhat

2004-01-15 Thread Syd
I've read most of the threads on insert speed in this list and wanted 
to share some interesting observations and a question.

We've been benchmarking some dbs to implement Bayesian processing on an 
email server. This involves frequent insert and updates to the 
following table:

create table bayes_token (
username varchar(200) not null default '',
token varchar(200) not null default '',
spam_count integer not null default 0,
ham_count integer not null default 0,
atime integer not null default 0,
primary key (username, token));
On a variety of hardware with Redhat, and versions of postgres, we're 
not getting much better than 50 inserts per second. This is prior to 
moving WAL to another disk, and fsync is on.

However, with postgres 7.4 on Mac OSX 10.2.3, we're getting an amazing 
500 inserts per second.

We can only put this down to the OS.

Can anyone shed light on why Redhat appears to be so much poorer than 
Mac OS X in supporting postgres insert transactions? Or why MacOS 
appears to be so much better?

BTW, on the same hardware that postgres is running on to get 50 inserts 
per sec, MySQL (4.0.17) is getting an almost unbelievable 5,500 inserts 
per second.

-SL

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] insert speed - Mac OSX vs Redhat

2004-01-15 Thread Matt Clark
 On a variety of hardware with Redhat, and versions of postgres, we're
 not getting much better than 50 inserts per second. This is prior to
 moving WAL to another disk, and fsync is on.

 However, with postgres 7.4 on Mac OSX 10.2.3, we're getting an amazing
 500 inserts per second.

 We can only put this down to the OS.
  ^^^

You haven't really produced much evidence to support that statement.  Given that the 
differences in performance between Postgres
running on *BSD and Linux on Intel hardware are not large at all, it seems to be 
almost certainly false in fact.

It may of course be due to some settings of the different OSes, but not the OSes 
themselves.

It would help if you gave a straight PG7.4 comparison with hardware specs as well, and 
config file differences if any.

One thought: assuming the Apple has IDE disks, then the disks probably have write 
caching turned on, which is good for speed, but
not crash-safe.

matt





---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Trigger question

2004-01-15 Thread pginfo
Hi,

I am using pg 7.4.1 and have created a trigger over table with 3 M rows.

If I start masive update on this table, pg executes this trigger on
every row and dramaticaly slows the system.

Exists in pg any way to define the trigger execution only if I have
changes on some fields?

For example I am able to declare this in oracle.

My trigger is writen in pgSQL.

regards,
ivan.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] insert speed - Mac OSX vs Redhat

2004-01-15 Thread Tom Lane
Syd [EMAIL PROTECTED] writes:
 However, with postgres 7.4 on Mac OSX 10.2.3, we're getting an amazing 
 500 inserts per second.

 We can only put this down to the OS.

As noted elsewhere, it's highly likely that this has nothing to do with
the OS, and everything to do with write caching in the disks being used.

I assume you are benchmarking small individual transactions (one insert
per xact).  In such scenarios it's essentially impossible to commit more
than one transaction per revolution of the WAL disk, because you have to
write the same WAL disk page repeatedly and wait for it to get down to
the platter.  When you get results that are markedly in excess of the
disk RPM figure, it's proof positive that the disk is lying about write
complete (or that you don't have fsync on).

The only way to get better performance and still have genuine ACID
behavior is to gang multiple insertions per WAL write.  You can do
multiple insertions per transaction, or if you are doing several
insertion transactions in parallel, you can try to commit them all in
one write (experiment with the commit_delay and commit_siblings
parameters).

 BTW, on the same hardware that postgres is running on to get 50 inserts 
 per sec, MySQL (4.0.17) is getting an almost unbelievable 5,500 inserts 
 per second.

I'll bet a good lunch that MySQL is not being ACID compliant in this
test.  Are you using a transaction-safe table type (InnoDB) and
committing after every insert?

If you don't in fact care about ACID safety, turn off fsync in Postgres
so that you have an apples-to-apples comparison (or at least
apples-to-oranges rather than apples-to-cannonballs).

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Weird query speed

2004-01-15 Thread Rigmor Ukuhe

Almoust identical querys are having very different exec speed (Postgresql
7.2.4).

query: select NP_ID from a WHERE NP_ID  '0'
Index Scan using NP_ID_a on a  (cost=0.00..13.01 rows=112 width=4) (actual
time=16.89..18.11 rows=93 loops=1)
Total runtime: 18.32 msec
-
query: select NP_ID from a WHERE NP_ID  '1'
Index Scan using NP_ID_a on a  (cost=0.00..13.01 rows=112 width=4) (actual
time=0.08..1.36 rows=93 loops=1)
Total runtime: 1.56 msec

From where such difference comes?

There are about 37K rows and only about 100 of then are not NP_ID = 0

For a workaround i use WHERE NP_ID = '1' and if works as speedy as ' 1'

Rigmor Ukuhe
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.560 / Virus Database: 352 - Release Date: 08.01.2004


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Trigger question

2004-01-15 Thread Richard Huxton
On Thursday 15 January 2004 13:13, pginfo wrote:
 Hi,

 I am using pg 7.4.1 and have created a trigger over table with 3 M rows.
 If I start masive update on this table, pg executes this trigger on
 every row and dramaticaly slows the system.
 Exists in pg any way to define the trigger execution only if I have
 changes on some fields?

Not at the moment (and I don't know of any plans for it).

 For example I am able to declare this in oracle.
 My trigger is writen in pgSQL.

Hmm - I can only think of two things you can try:
1. check for the change first thing you do and exit if not there
2. do the same, but write the trigger function in 'C'

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Weird query speed

2004-01-15 Thread Tom Lane
Rigmor Ukuhe [EMAIL PROTECTED] writes:
 query: select NP_ID from a WHERE NP_ID  '0'  [is slow]

 query: select NP_ID from a WHERE NP_ID  '1'  [is fast]

 There are about 37K rows and only about 100 of then are not NP_ID = 0

Yeah, it's scanning over all the zero values when you say  0 :-(

This is fixed for 7.5:

2003-12-20 20:23  tgl

* src/: backend/access/nbtree/nbtinsert.c,
backend/access/nbtree/nbtpage.c, backend/access/nbtree/nbtsearch.c,
include/access/nbtree.h: Improve btree's
initial-positioning-strategy code so that we never need to step
more than one entry after descending the search tree to arrive at
the correct place to start the scan.  This can improve the behavior
substantially when there are many entries equal to the chosen
boundary value.  Per suggestion from Dmitry Tkach, 14-Jul-03.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] insert speed - Mac OSX vs Redhat

2004-01-15 Thread Syd
On 16/01/2004, at 2:44 AM, Tom Lane wrote:
...
As noted elsewhere, it's highly likely that this has nothing to do with
the OS, and everything to do with write caching in the disks being 
used.

I assume you are benchmarking small individual transactions (one insert
per xact).  In such scenarios it's essentially impossible to commit 
more
than one transaction per revolution of the WAL disk, because you have 
to
write the same WAL disk page repeatedly and wait for it to get down to
the platter.  When you get results that are markedly in excess of the
disk RPM figure, it's proof positive that the disk is lying about write
complete (or that you don't have fsync on).

Tom, thanks for this explanation - we'll check this out straight away, 
but it would explain a lot.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


[PERFORM] shared_buffer value

2004-01-15 Thread Anjan Dave
Title: Message



Gurus,

I have defined the 
following values on a db:

shared_buffers = 
10240 # 10240 = 80MB 

max_connections = 100
sort_mem = 
1024 
# 1024KB is 1MB per operation
effective_cache_size = 262144 # equals 
to 2GB for 8k pages

Rest of the values 
are unchanged from default.


The poweredge 2650 
machine has 4GB RAM, and the size of the database (size of 'data' folder) is 
about 5GB. PG is 7.4, RH9.

The machine has been 
getting quite busy (when, say, 50 students login at the same time, when others 
have logged in already) and is maxing out at 100 connections (will increase this 
tonight probably to 200). We have been getting "too many clients" message upon 
trying to connect. Once connected, the pgmonitor, and the 'pg_stat_activity' 
show connections reaching about 100.

There's a series of 
SELECT and UPDATE statements that get called for when a group of users log in 
simultaneously...and for some reason, many of them stay there for a 
while...

During that time, if 
i do a 'top', i can see multiple postmaster processes, each about 87MB in size. 
The Memory utilization drops down to about 30MB free, and i can see a little bit 
of swap utilization in vmstat then.

Question is, does 
the 80MB buffer allocation correspond to ~87MB per postmaster instance? (with 
about 100 instances of postmaster, that will be about 100 x 80MB = 
8GB??)

Should i decrease 
the buffer value to about 50MB and monitor?

Interestingly, at 
one point, we vacuumed the database, and the size reported by 'df -k' on the 
pgsql slice dropped very significantly...guess, it had been using a lot of temp 
files?

Further steps will 
be to add more memory, and possibly drop/recreate a couple of indexes that are 
used in the UPDATE statements.


Thanks in advance 
for any inputs.
-Anjan

** 


This e-mail and any files transmitted with it are intended for the use of the 
addressee(s) only and may be confidential and covered by the attorney/client and 
other privileges. If you received this e-mail in error, please notify the 
sender; do not disclose, copy, distribute, or take any action in reliance on the 
contents of this information; and delete it from your system. Any other use of 
this e-mail is prohibited.



Re: [PERFORM] shared_buffer value

2004-01-15 Thread Richard Huxton
On Thursday 15 January 2004 22:49, Anjan Dave wrote:
 Gurus,

 I have defined the following values on a db:

 shared_buffers = 10240  # 10240 = 80MB
 max_connections = 100
 sort_mem = 1024 # 1024KB is 1MB per operation
 effective_cache_size = 262144   # equals to 2GB for 8k pages

 Rest of the values are unchanged from default.


 The poweredge 2650 machine has 4GB RAM, and the size of the database
 (size of 'data' folder) is about 5GB. PG is 7.4, RH9.

OK - settings don't look unreasonable so far.

 The machine has been getting quite busy (when, say, 50 students login at
 the same time, when others have logged in already) and is maxing out at
 100 connections (will increase this tonight probably to 200). We have
 been getting too many clients message upon trying to connect. Once
 connected, the pgmonitor, and the 'pg_stat_activity' show connections
 reaching about 100.

 There's a series of SELECT and UPDATE statements that get called for
 when a group of users log in simultaneously...and for some reason, many
 of them stay there for a while...

 During that time, if i do a 'top', i can see multiple postmaster
 processes, each about 87MB in size. The Memory utilization drops down to
 about 30MB free, and i can see a little bit of swap utilization in
 vmstat then.

On linux you'll see three values: SIZE, RSS and SHARE. SIZE is what you're 
looking at, RSS is resident set size (it's in main memory) and SHARE is how 
much is shared with other processes. So - 3 processes each with RSS=15MB, 
SIZE=10MB take up 10+5+5+5 = 25MB.
Don't worry about a tiny bit of swap - how is your buff/cache doing then?

 Should i decrease the buffer value to about 50MB and monitor?

That shared_buffer is between all backends. The sort_mem however, is *per 
sort*, not even per backend. So - if a complicated query uses four sorts you 
could use 4MB in one backend.

 Interestingly, at one point, we vacuumed the database, and the size
 reported by 'df -k' on the pgsql slice dropped very
 significantly...guess, it had been using a lot of temp files?

You need to run VACUUM regularly to reclaim unused space. Since you're on 7.4, 
take a look at the pg_autovacuum utility, or start by running VACUUM ANALYZE 
from a cron job every evening. Perhaps a VACUUM FULL at weekends?

 Further steps will be to add more memory, and possibly drop/recreate a
 couple of indexes that are used in the UPDATE statements.

A REINDEX might be worthwhile. Details on this and VACUUM in the manuals.


-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] IDE/SCSI disk tools to turn off write caching

2004-01-15 Thread Syd
We've found these tools
http://scsirastools.sourceforge.net/ and
http://www.seagate.com/support/seatools/ (for seagate drives)
to check the settings of scsi disks and to change settings for seagate 
drives.

What are people using for IDE disks?
Are you all using hdparm on linux
http://freshmeat.net/projects/hdparm/?topic_id=146%2C861
or are there other tools?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] 100 simultaneous connections, critical limit?

2004-01-15 Thread Thomas Swan
scott.marlowe wrote:

On Wed, 14 Jan 2004, Adam Alkins wrote:

  

scott.marlowe wrote:



A few tips from an old PHP/Apache/Postgresql developer.

1: Avoid pg_pconnect unless you are certain you have load tested the 
system and it will behave properly.  pg_pconnect often creates as many 
issues as it solves.
 

  

I share the above view. I've had little success with persistent 
connections. The cost of pg_connect is minimal, pg_pconnect is not a 
viable solution IMHO. Connections are rarely actually reused.



I've found that for best performance with pg_pconnect, you need to 
restrict the apache server to a small number of backends, say 40 or 50, 
extend keep alive to 60 or so seconds, and use the same exact connection 
string all over the place.  Also, set max.persistant.connections or 
whatever it is in php.ini to 1 or 2.  Note that max.persistant.connections 
is PER BACKEND, not total, in php.ini, so 1 or 2 should be enough for most 
types of apps.  3 tops.  Then, setup postgresql for 200 connections, so 
you'll never run out.  Tis better to waste a little shared memory and be 
safe than it is to get the dreaded out of connections error from 
postgresql.

  

I disagree.   With the server I have been running for the last two years
we found the the pconnect settings with long keep-alives in apache
consumed far more resources than you would imagine.   We found the
because some clients would not support keep-alive (older IE clients)
correctly.  They would hammer the server with 20-30 individual requests;
apache would keep those processes in keep-alive mode.   When the number
of apache processes were restricted there were DoS problems.   The
short-keep alive pattern works best to keep a single pages related
requests to be served effeciently.   In fact the best performance and
the greatest capacity in real life was with a 3 second timeout for
keep-alive requests.   A modem connection normally won't have sufficient
lag as to time-out on related loads and definitely not a broadband
connection.  

Also, depending on your machine you should time the amount of time it
takes to connect to the db.   This server ran about 3-4 milliseconds on
average to connect without pconnect, and it was better to conserve
memory so that none postgresql scripts and applications didn't have the
extra memory footprint of a postgresql connection preventing memory
exhaustion and excessive swapping.

Please keep in mind that this was on a dedicated server with apache and
postgresql and a slew of other processes running on the same machine.  
The results may be different for separate process oriented setups.

If you do all of the above, pg_pconnect can work pretty well, on things 
like dedicated app servers where only one thing is being done and it's 
being done a lot.  On general purpose servers with 60 databases and 120 
applications, it adds little, although extending the keep alive timeout 
helps.  

but if you just start using pg_pconnect without reconfiguring and then 
testing, it's quite likely your site will topple over under load with out 
of connection errors.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings
  



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org