[PERFORM] insert speed - Mac OSX vs Redhat
I've read most of the threads on insert speed in this list and wanted to share some interesting observations and a question. We've been benchmarking some dbs to implement Bayesian processing on an email server. This involves frequent insert and updates to the following table: create table bayes_token ( username varchar(200) not null default '', token varchar(200) not null default '', spam_count integer not null default 0, ham_count integer not null default 0, atime integer not null default 0, primary key (username, token)); On a variety of hardware with Redhat, and versions of postgres, we're not getting much better than 50 inserts per second. This is prior to moving WAL to another disk, and fsync is on. However, with postgres 7.4 on Mac OSX 10.2.3, we're getting an amazing 500 inserts per second. We can only put this down to the OS. Can anyone shed light on why Redhat appears to be so much poorer than Mac OS X in supporting postgres insert transactions? Or why MacOS appears to be so much better? BTW, on the same hardware that postgres is running on to get 50 inserts per sec, MySQL (4.0.17) is getting an almost unbelievable 5,500 inserts per second. -SL ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] insert speed - Mac OSX vs Redhat
On a variety of hardware with Redhat, and versions of postgres, we're not getting much better than 50 inserts per second. This is prior to moving WAL to another disk, and fsync is on. However, with postgres 7.4 on Mac OSX 10.2.3, we're getting an amazing 500 inserts per second. We can only put this down to the OS. ^^^ You haven't really produced much evidence to support that statement. Given that the differences in performance between Postgres running on *BSD and Linux on Intel hardware are not large at all, it seems to be almost certainly false in fact. It may of course be due to some settings of the different OSes, but not the OSes themselves. It would help if you gave a straight PG7.4 comparison with hardware specs as well, and config file differences if any. One thought: assuming the Apple has IDE disks, then the disks probably have write caching turned on, which is good for speed, but not crash-safe. matt ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Trigger question
Hi, I am using pg 7.4.1 and have created a trigger over table with 3 M rows. If I start masive update on this table, pg executes this trigger on every row and dramaticaly slows the system. Exists in pg any way to define the trigger execution only if I have changes on some fields? For example I am able to declare this in oracle. My trigger is writen in pgSQL. regards, ivan. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] insert speed - Mac OSX vs Redhat
Syd [EMAIL PROTECTED] writes: However, with postgres 7.4 on Mac OSX 10.2.3, we're getting an amazing 500 inserts per second. We can only put this down to the OS. As noted elsewhere, it's highly likely that this has nothing to do with the OS, and everything to do with write caching in the disks being used. I assume you are benchmarking small individual transactions (one insert per xact). In such scenarios it's essentially impossible to commit more than one transaction per revolution of the WAL disk, because you have to write the same WAL disk page repeatedly and wait for it to get down to the platter. When you get results that are markedly in excess of the disk RPM figure, it's proof positive that the disk is lying about write complete (or that you don't have fsync on). The only way to get better performance and still have genuine ACID behavior is to gang multiple insertions per WAL write. You can do multiple insertions per transaction, or if you are doing several insertion transactions in parallel, you can try to commit them all in one write (experiment with the commit_delay and commit_siblings parameters). BTW, on the same hardware that postgres is running on to get 50 inserts per sec, MySQL (4.0.17) is getting an almost unbelievable 5,500 inserts per second. I'll bet a good lunch that MySQL is not being ACID compliant in this test. Are you using a transaction-safe table type (InnoDB) and committing after every insert? If you don't in fact care about ACID safety, turn off fsync in Postgres so that you have an apples-to-apples comparison (or at least apples-to-oranges rather than apples-to-cannonballs). regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Weird query speed
Almoust identical querys are having very different exec speed (Postgresql 7.2.4). query: select NP_ID from a WHERE NP_ID '0' Index Scan using NP_ID_a on a (cost=0.00..13.01 rows=112 width=4) (actual time=16.89..18.11 rows=93 loops=1) Total runtime: 18.32 msec - query: select NP_ID from a WHERE NP_ID '1' Index Scan using NP_ID_a on a (cost=0.00..13.01 rows=112 width=4) (actual time=0.08..1.36 rows=93 loops=1) Total runtime: 1.56 msec From where such difference comes? There are about 37K rows and only about 100 of then are not NP_ID = 0 For a workaround i use WHERE NP_ID = '1' and if works as speedy as ' 1' Rigmor Ukuhe --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.560 / Virus Database: 352 - Release Date: 08.01.2004 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Trigger question
On Thursday 15 January 2004 13:13, pginfo wrote: Hi, I am using pg 7.4.1 and have created a trigger over table with 3 M rows. If I start masive update on this table, pg executes this trigger on every row and dramaticaly slows the system. Exists in pg any way to define the trigger execution only if I have changes on some fields? Not at the moment (and I don't know of any plans for it). For example I am able to declare this in oracle. My trigger is writen in pgSQL. Hmm - I can only think of two things you can try: 1. check for the change first thing you do and exit if not there 2. do the same, but write the trigger function in 'C' -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Weird query speed
Rigmor Ukuhe [EMAIL PROTECTED] writes: query: select NP_ID from a WHERE NP_ID '0' [is slow] query: select NP_ID from a WHERE NP_ID '1' [is fast] There are about 37K rows and only about 100 of then are not NP_ID = 0 Yeah, it's scanning over all the zero values when you say 0 :-( This is fixed for 7.5: 2003-12-20 20:23 tgl * src/: backend/access/nbtree/nbtinsert.c, backend/access/nbtree/nbtpage.c, backend/access/nbtree/nbtsearch.c, include/access/nbtree.h: Improve btree's initial-positioning-strategy code so that we never need to step more than one entry after descending the search tree to arrive at the correct place to start the scan. This can improve the behavior substantially when there are many entries equal to the chosen boundary value. Per suggestion from Dmitry Tkach, 14-Jul-03. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] insert speed - Mac OSX vs Redhat
On 16/01/2004, at 2:44 AM, Tom Lane wrote: ... As noted elsewhere, it's highly likely that this has nothing to do with the OS, and everything to do with write caching in the disks being used. I assume you are benchmarking small individual transactions (one insert per xact). In such scenarios it's essentially impossible to commit more than one transaction per revolution of the WAL disk, because you have to write the same WAL disk page repeatedly and wait for it to get down to the platter. When you get results that are markedly in excess of the disk RPM figure, it's proof positive that the disk is lying about write complete (or that you don't have fsync on). Tom, thanks for this explanation - we'll check this out straight away, but it would explain a lot. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] shared_buffer value
Title: Message Gurus, I have defined the following values on a db: shared_buffers = 10240 # 10240 = 80MB max_connections = 100 sort_mem = 1024 # 1024KB is 1MB per operation effective_cache_size = 262144 # equals to 2GB for 8k pages Rest of the values are unchanged from default. The poweredge 2650 machine has 4GB RAM, and the size of the database (size of 'data' folder) is about 5GB. PG is 7.4, RH9. The machine has been getting quite busy (when, say, 50 students login at the same time, when others have logged in already) and is maxing out at 100 connections (will increase this tonight probably to 200). We have been getting "too many clients" message upon trying to connect. Once connected, the pgmonitor, and the 'pg_stat_activity' show connections reaching about 100. There's a series of SELECT and UPDATE statements that get called for when a group of users log in simultaneously...and for some reason, many of them stay there for a while... During that time, if i do a 'top', i can see multiple postmaster processes, each about 87MB in size. The Memory utilization drops down to about 30MB free, and i can see a little bit of swap utilization in vmstat then. Question is, does the 80MB buffer allocation correspond to ~87MB per postmaster instance? (with about 100 instances of postmaster, that will be about 100 x 80MB = 8GB??) Should i decrease the buffer value to about 50MB and monitor? Interestingly, at one point, we vacuumed the database, and the size reported by 'df -k' on the pgsql slice dropped very significantly...guess, it had been using a lot of temp files? Further steps will be to add more memory, and possibly drop/recreate a couple of indexes that are used in the UPDATE statements. Thanks in advance for any inputs. -Anjan ** This e-mail and any files transmitted with it are intended for the use of the addressee(s) only and may be confidential and covered by the attorney/client and other privileges. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.
Re: [PERFORM] shared_buffer value
On Thursday 15 January 2004 22:49, Anjan Dave wrote: Gurus, I have defined the following values on a db: shared_buffers = 10240 # 10240 = 80MB max_connections = 100 sort_mem = 1024 # 1024KB is 1MB per operation effective_cache_size = 262144 # equals to 2GB for 8k pages Rest of the values are unchanged from default. The poweredge 2650 machine has 4GB RAM, and the size of the database (size of 'data' folder) is about 5GB. PG is 7.4, RH9. OK - settings don't look unreasonable so far. The machine has been getting quite busy (when, say, 50 students login at the same time, when others have logged in already) and is maxing out at 100 connections (will increase this tonight probably to 200). We have been getting too many clients message upon trying to connect. Once connected, the pgmonitor, and the 'pg_stat_activity' show connections reaching about 100. There's a series of SELECT and UPDATE statements that get called for when a group of users log in simultaneously...and for some reason, many of them stay there for a while... During that time, if i do a 'top', i can see multiple postmaster processes, each about 87MB in size. The Memory utilization drops down to about 30MB free, and i can see a little bit of swap utilization in vmstat then. On linux you'll see three values: SIZE, RSS and SHARE. SIZE is what you're looking at, RSS is resident set size (it's in main memory) and SHARE is how much is shared with other processes. So - 3 processes each with RSS=15MB, SIZE=10MB take up 10+5+5+5 = 25MB. Don't worry about a tiny bit of swap - how is your buff/cache doing then? Should i decrease the buffer value to about 50MB and monitor? That shared_buffer is between all backends. The sort_mem however, is *per sort*, not even per backend. So - if a complicated query uses four sorts you could use 4MB in one backend. Interestingly, at one point, we vacuumed the database, and the size reported by 'df -k' on the pgsql slice dropped very significantly...guess, it had been using a lot of temp files? You need to run VACUUM regularly to reclaim unused space. Since you're on 7.4, take a look at the pg_autovacuum utility, or start by running VACUUM ANALYZE from a cron job every evening. Perhaps a VACUUM FULL at weekends? Further steps will be to add more memory, and possibly drop/recreate a couple of indexes that are used in the UPDATE statements. A REINDEX might be worthwhile. Details on this and VACUUM in the manuals. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] IDE/SCSI disk tools to turn off write caching
We've found these tools http://scsirastools.sourceforge.net/ and http://www.seagate.com/support/seatools/ (for seagate drives) to check the settings of scsi disks and to change settings for seagate drives. What are people using for IDE disks? Are you all using hdparm on linux http://freshmeat.net/projects/hdparm/?topic_id=146%2C861 or are there other tools? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] 100 simultaneous connections, critical limit?
scott.marlowe wrote: On Wed, 14 Jan 2004, Adam Alkins wrote: scott.marlowe wrote: A few tips from an old PHP/Apache/Postgresql developer. 1: Avoid pg_pconnect unless you are certain you have load tested the system and it will behave properly. pg_pconnect often creates as many issues as it solves. I share the above view. I've had little success with persistent connections. The cost of pg_connect is minimal, pg_pconnect is not a viable solution IMHO. Connections are rarely actually reused. I've found that for best performance with pg_pconnect, you need to restrict the apache server to a small number of backends, say 40 or 50, extend keep alive to 60 or so seconds, and use the same exact connection string all over the place. Also, set max.persistant.connections or whatever it is in php.ini to 1 or 2. Note that max.persistant.connections is PER BACKEND, not total, in php.ini, so 1 or 2 should be enough for most types of apps. 3 tops. Then, setup postgresql for 200 connections, so you'll never run out. Tis better to waste a little shared memory and be safe than it is to get the dreaded out of connections error from postgresql. I disagree. With the server I have been running for the last two years we found the the pconnect settings with long keep-alives in apache consumed far more resources than you would imagine. We found the because some clients would not support keep-alive (older IE clients) correctly. They would hammer the server with 20-30 individual requests; apache would keep those processes in keep-alive mode. When the number of apache processes were restricted there were DoS problems. The short-keep alive pattern works best to keep a single pages related requests to be served effeciently. In fact the best performance and the greatest capacity in real life was with a 3 second timeout for keep-alive requests. A modem connection normally won't have sufficient lag as to time-out on related loads and definitely not a broadband connection. Also, depending on your machine you should time the amount of time it takes to connect to the db. This server ran about 3-4 milliseconds on average to connect without pconnect, and it was better to conserve memory so that none postgresql scripts and applications didn't have the extra memory footprint of a postgresql connection preventing memory exhaustion and excessive swapping. Please keep in mind that this was on a dedicated server with apache and postgresql and a slew of other processes running on the same machine. The results may be different for separate process oriented setups. If you do all of the above, pg_pconnect can work pretty well, on things like dedicated app servers where only one thing is being done and it's being done a lot. On general purpose servers with 60 databases and 120 applications, it adds little, although extending the keep alive timeout helps. but if you just start using pg_pconnect without reconfiguring and then testing, it's quite likely your site will topple over under load with out of connection errors. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org