Re: [PERFORM] PostgreSQL as a local in-memory cache
Chris Browne wrote: jgard...@jonathangardner.net jgard...@jonathangardner.net writes: My question is how can I configure the database to run as quickly as possible if I don't care about data consistency or durability? That is, the data is updated so often and it can be reproduced fairly rapidly so that if there is a server crash or random particles from space mess up memory we'd just restart the machine and move on. For such a scenario, I'd suggest you: - Set up a filesystem that is memory-backed. On Linux, RamFS or TmpFS are reasonable options for this. - The complication would be that your restart the machine and move on needs to consist of quite a few steps: - recreating the filesystem - fixing permissions as needed - running initdb to set up new PG instance - automating any needful fiddling with postgresql.conf, pg_hba.conf - starting up that PG instance - creating users, databases, schemas, ... Doesn't PG now support putting both WAL and user table files onto file systems other than the one holding the PG config files and PG 'admin' tables? Wouldn't doing so simplify the above considertably by allowing just the WAL and user tables on the memory-backed file systems? I wouldn't think the performance impact of leaving the rest of the stuff on disk would be that large. Or does losing WAL files mandate a new initdb? -- Steve Wampler -- swamp...@noao.edu The gods that smiled on your birth are now laughing out loud. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Running on an NFS Mounted Directory
On Thu, Apr 27, 2006 at 08:57:51AM -0400, Ketema Harris wrote: OK. My thought process was that having non local storage as say a big raid 5 san ( I am talking 5 TB with expansion capability up to 10 ) would allow me to have redundancy, expandability, and hopefully still retain decent performance from the db. I also would hopefully then not have to do periodic backups from the db server to some other type of storage. Is this not a good idea? How bad of a performance hit are we talking about? Also, in regards to the commit data integrity, as far as the db is concerned once the data is sent to the san or nas isn't it written? The storage may have that write in cache, but from my reading and understanding of how these various storage devices work that is how they keep up performance. I would expect my bottleneck if any to be the actual Ethernet transfer to the storage, and I am going to try and compensate for that with a full gigabit backbone. Well, if you have to have both the best performance and remote attach storage, I think you'll find that a fibre-channel SAN is still the king of the hill. 4Gb FC switches are common now, though finding a 4Gb HBA for your computer might be a trick. 2Gb HBAs are everywhere in FC land. That's a premium price solution, however, and I don't know anything about how well PG would perform with a FC SAN. We use our SAN for bulk science data and leave the PGDB on a separate machine with local disk. -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Running on an NFS Mounted Directory
On Wed, Apr 26, 2006 at 10:06:58PM -0400, Ketema Harris wrote: I was wondering if there were any performance issues with having a data directory that was an nfs mounted drive? Say like a SAN or NAS device? Has anyone done this before? My understanding is that NFS is pretty poor in performance in general, so I would expect it to be particularly bad for a DB. You might run some (non-DB) performance tests to get a feel for how bad it might me. (Someone once told me that NFS topped out at around 12MB/s, but I don't know if that's really true [they were trying to sell a competitive networked filesystem]). In any event, you're at least limited by ethernet speeds, if not more. -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [pgsql-www] [PERFORM] Help speeding up delete
Christopher Kings-Lynne wrote: That most people don't know they should use the new one I understand though. But I don't see how this will help against that :-) It'll make it easy... As the miscreant that caused this thread to get started, let me *wholeheartedly* agree with Chris. An easy way to get the pg_dump for the upgrade target to run with the upgradable source would work wonders. (Instructions included, of course.) -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Hardware/OS recommendations for large databases (
Joshua D. Drake wrote: The reason you want the dual core cpus is that PostgreSQL can only execute 1 query per cpu at a time,... Is that true? I knew that PG only used one cpu per query, but how does PG know how many CPUs there are to limit the number of queries? -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Hardware/OS recommendations for large databases (
David Boreham wrote: Steve Wampler wrote: Joshua D. Drake wrote: The reason you want the dual core cpus is that PostgreSQL can only execute 1 query per cpu at a time,... Is that true? I knew that PG only used one cpu per query, but how does PG know how many CPUs there are to limit the number of queries? He means only one query can be executing on each cpu at any particular instant. Got it - the cpu is only acting on one query in any instant but may be switching between many 'simultaneous' queries. PG isn't really involved in the decision. That makes sense. -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Help speeding up delete
Arjen van der Meijden wrote: On 15-11-2005 15:18, Steve Wampler wrote: Magnus Hagander wrote: (This is after putting an index on the (id,name,value) tuple.) That outer seq scan is still annoying, but maybe this will be fast enough. I've passed this on, along with the (strong) recommendation that they upgrade PG. Have you tried with an index on (name,value) and of course one on id ? Yes, although not with a unique index on (name,value) [possible, but not so on the just-id index]. Anyway, it turns out the latest incarnation is 'fast enough' for the user's need, so she's not doing any more with it until after an upgrade. -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Help speeding up delete
Magnus Hagander wrote: Because I think we need to. The above would only delete rows that have name = 'obsid' and value = 'oid080505'. We need to delete all rows that have the same ids as those rows. However, from what you note, I bet we could do: DELETE FROM tmp_table2 WHERE id IN (SELECT id FROM temp_table2 WHERE name = 'obsid' and value= 'oid080505'); However, even that seems to have a much higher cost than I'd expect: lab.devel.configdb=# explain delete from tmp_table2 where id in (select id from tmp_table2 where name='obsid' and value = 'oid080505'); NOTICE: QUERY PLAN: Seq Scan on tmp_table2 (cost=0.00..65705177237.26 rows=769844 width=6) SubPlan - Materialize (cost=42674.32..42674.32 rows=38 width=50) - Seq Scan on tmp_table2 (cost=0.00..42674.32 rows=38 width=50) EXPLAIN ... Earlier pg versions have always been bad at dealing with IN subqueries. Try rewriting it as (with fixing any broken syntax, I'm not actually testing this :P) DELETE FROM tmp_table2 WHERE EXISTS (SELECT * FROM tmp_table2 t2 WHERE t2.id=tmp_table2.id AND t2.name='obsid' AND t2.value='oid080505') Thanks - that looks *significantly* better: lab.devel.configdb=# explain delete from tmp_table2 where exists (select 1 from tmp_table2 t2 where t2.id=tmp_table2.id and t2.name='obsid' and t2.value='oid080505'); NOTICE: QUERY PLAN: Seq Scan on tmp_table2 (cost=0.00..9297614.80 rows=769844 width=6) SubPlan - Index Scan using inv_index_2 on tmp_table2 t2 (cost=0.00..6.02 rows=1 width=0) EXPLAIN (This is after putting an index on the (id,name,value) tuple.) That outer seq scan is still annoying, but maybe this will be fast enough. I've passed this on, along with the (strong) recommendation that they upgrade PG. Thanks!! -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware
Joost Kraaijeveld wrote: If I understand correctly (I have 4GB ram): [EMAIL PROTECTED]:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=100 100+0 records in 100+0 records out 819200 bytes transferred in 304.085269 seconds (26939812 bytes/sec) Which looks suspicious: 26308 MB/sec??? Eh? That looks more like ~25.7 MB/sec, assuming 1MB = 1024*1024 bytes. -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Help speeding up delete
We've got an older system in production (PG 7.2.4). Recently one of the users has wanted to implement a selective delete, but is finding that the time it appears to take exceeds her patience factor by several orders of magnitude. Here's a synopsis of her report. It appears that the WHERE id IN ... is resulting in a seq scan that is causing the problem, but we're not SQL expert enough to know what to do about it. Can someone point out what we're doing wrong, or how we could get a (much) faster delete? Thanks! Report: This command yields results in only a few seconds: # SELECT at.id FROM tmp_table2 at, tmp_tabl2e a # WHERE at.id=a.id and a.name='obsid' and a.value='oid080505'; However, the following command does not seen to want to ever complete (the person running this killed it after 1/2 hour). # DELETE FROM tmp_table2 WHERE id IN # (SELECT at.id FROM tmp_table2 at, tmp_table2 a # WHERE at.id=a.id and a.name='obsid' and a.value='oid080505'); == The table has four columns. There are 6175 rows satifying the condition given, and the table itself has 1539688 entries. Layout is: lab.devel.configdb=# \d tmp_table2 Table tmp_table2 Column | Type | Modifiers +--+--- id | character varying(64)| name | character varying(64)| units | character varying(32)| value | text | time | timestamp with time zone | == lab.devel.configdb=# EXPLAIN DELETE FROM tmp_table2 WHERE id IN lab.devel.configdb-# (SELECT at.id FROM tmp_table2 at, tmp_table2 a lab.devel.configdb(# WHERE at.id=a.id AND a.name='obsid' AND a.value='oid080505'); NOTICE: QUERY PLAN: Seq Scan on tmp_table2 (cost=0.00..154893452082.10 rows=769844 width=6) SubPlan - Materialize (cost=100600.52..100600.52 rows=296330 width=100) - Hash Join (cost=42674.42..100600.52 rows=296330 width=100) - Seq Scan on tmp_table2 at (cost=0.00..34975.88 rows=1539688 width=50) - Hash (cost=42674.32..42674.32 rows=38 width=50) - Seq Scan on tmp_table2 a (cost=0.00..42674.32 rows=38 width=50) EXPLAIN lab.devel.configdb=# EXPLAIN (SELECT at.id FROM tmp_table2 at, tmp_table2 a lab.devel.configdb(#WHERE at.id=a.id AND a.name='obsid' AND a.value='oid080505'); NOTICE: QUERY PLAN: Hash Join (cost=42674.42..100600.52 rows=296330 width=100) - Seq Scan on tmp_table2 at (cost=0.00..34975.88 rows=1539688 width=50) - Hash (cost=42674.32..42674.32 rows=38 width=50) - Seq Scan on tmp_table2 a (cost=0.00..42674.32 rows=38 width=50) EXPLAIN -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Help speeding up delete
Scott Lamb wrote: On Nov 14, 2005, at 2:07 PM, Steve Wampler wrote: # SELECT at.id FROM tmp_table2 at, tmp_tabl2e a # WHERE at.id=a.id and a.name='obsid' and a.value='oid080505'; Isn't this equivalent? select id from tmp_table2 where name = 'obsid' and value = 'oid080505'; Probably, the user based the above on a query designed to find all rows with the same id as those rows that have a.name='obsid' and a.value='oid080505'. However, I think the above would work to locate all the ids, which is all we need for the delete (see below) # DELETE FROM tmp_table2 WHERE id IN # (SELECT at.id FROM tmp_table2 at, tmp_table2 a # WHERE at.id=a.id and a.name='obsid' and a.value='oid080505'); and this? delete from tmp_table2 where name = 'obsid' and value = 'oid080505'; Why are you doing a self-join using id, which I assume is a primary key? Because I think we need to. The above would only delete rows that have name = 'obsid' and value = 'oid080505'. We need to delete all rows that have the same ids as those rows. However, from what you note, I bet we could do: DELETE FROM tmp_table2 WHERE id IN (SELECT id FROM temp_table2 WHERE name = 'obsid' and value= 'oid080505'); However, even that seems to have a much higher cost than I'd expect: lab.devel.configdb=# explain delete from tmp_table2 where id in (select id from tmp_table2 where name='obsid' and value = 'oid080505'); NOTICE: QUERY PLAN: Seq Scan on tmp_table2 (cost=0.00..65705177237.26 rows=769844 width=6) SubPlan - Materialize (cost=42674.32..42674.32 rows=38 width=50) - Seq Scan on tmp_table2 (cost=0.00..42674.32 rows=38 width=50) EXPLAIN And, sure enough, is taking an extrordinarily long time to run (more than 10 minutes so far, compared to 10seconds for the select). Is this really typical of deletes? It appears (to me) to be the Seq Scan on tmp_table2 that is the killer here. If we put an index on, would it help? (The user claims she tried that and it's EXPLAIN cost went even higher, but I haven't checked that...) Thanks! -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Help speeding up delete
Tom Lane wrote: Steve Wampler [EMAIL PROTECTED] writes: We've got an older system in production (PG 7.2.4). Recently one of the users has wanted to implement a selective delete, but is finding that the time it appears to take exceeds her patience factor by several orders of magnitude. Here's a synopsis of her report. It appears that the WHERE id IN ... is resulting in a seq scan that is causing the problem, but we're not SQL expert enough to know what to do about it. Can someone point out what we're doing wrong, or how we could get a (much) faster delete? Thanks! Update to 7.4 or later ;-) I was afraid you'd say that :-) I'm not officially involved in this project anymore and was hoping for a fix that wouldn't drag me back in. The security issues aren't a concern because this DB is *well* hidden from the outside world (it's part of a telescope control system behind several firewalls with no outside access). However, the data-loss-grade bugs issue *is* important. We'll try to do the upgrade as soon as we get some cloudy days to actually do it! Is the performance behavior that we're experiencing a known problem with 7.2 that has been addressed in 7.4? Or will the upgrade fix other problems while leaving this one? Quite seriously, if you're still using 7.2.4 for production purposes you could justifiably be accused of negligence. There are three or four data-loss-grade bugs fixed in the later 7.2.x releases, not to mention security holes; and that was before we abandoned support for 7.2. You *really* need to be thinking about an update. Thanks! Steve -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Help speeding up delete
Scott Lamb wrote: On Nov 14, 2005, at 3:52 PM, Steve Wampler wrote: Scott Lamb wrote: On Nov 14, 2005, at 2:07 PM, Steve Wampler wrote: # SELECT at.id FROM tmp_table2 at, tmp_tabl2e a # WHERE at.id=a.id and a.name='obsid' and a.value='oid080505'; Isn't this equivalent? select id from tmp_table2 where name = 'obsid' and value = 'oid080505'; Probably, the user based the above on a query designed to find all rows with the same id as those rows that have a.name='obsid' and a.value='oid080505'. Well, this indirection is only significant if those two sets can differ. If (A) you meant tmp_table2 when you wrote tmp_tabl2e, so this is a self-join, and (B) there is a primary key on id, I don't think that can ever happen. I wasn't clear. The original query was: SELECT at.* FROM tmp_table2 at, tmp_table2 a WHERE at.id=a.id and a.name='obsid' and a.value='oid080505'; which is significantly different than: SELECT * FROM tmp_table2 WHERE name='obsid' and value='oid080505'; The user had adapted that query for her needs, but it would have been better to just use the query that you suggested (as the subselect in the DELETE FROM...). Unfortunately, that only improves performance slightly - it is still way too slow on deletes. -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] batch inserts are slow
Tim Terlegård wrote: Just as on Oracle you would use SQL*Loader for this application, you should use the COPY syntax for PostgreSQL. You will find it a lot faster. I have used it by building the input files and executing 'psql' with a COPY command, and also by using it with a subprocess, both are quite effective. I tried this now. Now it's down to 0.45 seconds. It feels a bit hacky to run /usr/bin/psql from java, but it sure works. Thanks for the hint! There was a patch against 7.4 that provided direct JDBC access to PostgreSQL's COPY. (I have it installed here and *love* it - it gives outstanding performance.) However, it hasn't made into an official release yet. I don't know why, perhaps there's a problem yet to be solved with it ('works for me', though)? Is this still on the board? I won't upgrade past 7.4 without it. -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: RE : RE: [PERFORM] Postgresql vs SQLserver for this application
Mohan, Ross wrote: I wish I had a Dell system and run case to show you Alex, but I don't... however...using Oracle's direct path feature, it's pretty straightforward. We've done 110,000 rows per second into index-less tables on a big system (IBM Power5 chips, Hitachi SAN). ( Yes, I am sure: over 100K a second. Sustained for almost 9 minutes. ) Yes, this is an exception, but oracle directpath/InsertAppend/BulkLoad feature enabled us to migrate a 4 TB database...really quickly. How close to this is PG's COPY? I get surprisingly good results using COPY with jdbc on smallish systems (now if that patch would make into the mainstream PG jdbc support!) I think COPY has a bit more overhead than what a Bulkload feature may have, but I suspect it's not that much more. Now...if you ask me can this work without Power5 and Hitachi SAN? my answer is..you give me a top end Dell and SCSI III on 15K disks and I'll likely easily match it, yea. I'd love to see PG get into this range..i am a big fan of PG (just a rank newbie) but I gotta think the underlying code to do this has to be not-too-complex. It may not be that far off if you can use COPY instead of INSERT. But comparing Bulkload to INSERT is a bit apples-orangish. -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Reading recommendations
Stefan Weiss wrote: On 2005-03-31 15:19, [EMAIL PROTECTED] wrote: Now *that* I want to see. Aught to be at least as interesting as the TCP/IP over carrier pigeon experiment - and more challenging to boot! .. Interestingly, we had a follow on contract to investigate routing optimization using flooding techniques. Oddly, it was commissioned by a consortium of local car washes. Work stopped when the park service sued us for the cost of cleaning all the statuary, and the company went out of business. We were serving cornish game hens at our frequent dinner parties for months. This method might have been safer (and it works great with Apaches): http://eagle.auc.ca/~dreid/ Aha - VOIPOBD as well as VOIPOBT! What more can one want? VOIPOCP, I suppose... -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Reading recommendations
Mohan, Ross wrote: VOIP over BitTorrent? Now *that* I want to see. Aught to be at least as interesting as the TCP/IP over carrier pigeon experiment - and more challenging to boot! -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Reading recommendations
[EMAIL PROTECTED] wrote: Mohan, Ross wrote: VOIP over BitTorrent? Now *that* I want to see. Aught to be at least as interesting as the TCP/IP over carrier pigeon experiment - and more challenging to boot! It was very challenging. I worked on the credit window sizing and retransmission timer estimation algorithms. We took into account weather patterns, size and age of the bird, feeding times, and the average number of times a bird circles before determining magnetic north. Interestingly, packet size had little effect in the final algorithms. I would love to share them with all of you, but they're classified. Ah, but VOIPOBT requires many people all saying the same thing at the same time. The synchronization alone (since you need to distribute these people adequately to avoid overloading a trunk line...) is probably sufficiently hard to make it interesting. Then there are the problems of different accents, dilects, and languages ;) -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
Hervé Piedvache wrote: No ... as I have said ... how I'll manage a database getting a table of may be 250 000 000 records ? I'll need incredible servers ... to get quick access or index reading ... no ? So what we would like to get is a pool of small servers able to make one virtual server ... for that is called a Cluster ... no ? I know they are not using PostgreSQL ... but how a company like Google do to get an incredible database in size and so quick access ? Probably by carefully partitioning their data. I can't imagine anything being fast on a single table in 250,000,000 tuple range. Nor can I really imagine any database that efficiently splits a single table across multiple machines (or even inefficiently unless some internal partitioning is being done). So, you'll have to do some work at your end and not just hope that a magic bullet is available. Once you've got the data partitioned, the question becomes one of how to inhance performance/scalability. Have you considered RAIDb? -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
Hervé Piedvache wrote: Le Jeudi 20 Janvier 2005 16:23, Dave Cramer a écrit : Google uses something called the google filesystem, look it up in google. It is a distributed file system. Yes that's another point I'm working on ... make a cluster of server using GFS ... and making PostgreSQL running with it ... A few years ago I played around with GFS, but not for postgresql. I don't think it's going to help - logically there's no difference between putting PG on GFS and putting PG on NFS - in both cases the filesystem doesn't provide any support for distributing the task at hand - and a PG database server isn't written to be distributed across hosts regardless of the distribution of the data across filesystems. -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?
On Mon, 2004-06-07 at 02:26, Kris Jurka wrote: On Sat, 5 Jun 2004, Steve Wampler wrote: [I want to use copy from JDBC] I made a patch to the driver to support COPY as a PG extension. ... http://archives.postgresql.org/pgsql-jdbc/2003-12/msg00186.php Thanks Kris - that patch worked beautifully and bumped the insert rate from ~1000 entries/second to ~9000 e/s in my test code. Here's hoping it makes it into 7.5. I do have a little concern about what's happening in the back end during the copy - I suspect the entire table is locked, which may impact the performance when multiple clients are saving entries into the table. Anyone know if that's how COPY works? (For that matter, would that also be true of a transaction consisting of a set of inserts?) Thanks again! -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?
On Mon, 2004-06-07 at 10:40, Steve Wampler wrote: Thanks Kris - that patch worked beautifully and bumped the insert rate from ~1000 entries/second to ~9000 e/s in my test code. As a followup - that 9000 e/s becomes ~21,000 e/s if I don't have the java code also dump the message to standard output! -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Using a COPY...FROM through JDBC?
I've got a simple database (no indices, 6 columns) that I need to write data quickly into through JDBC connections from multiple such connections simultaneously in a distributed environment. (This is going to be a message logging service for software generated messages.) Using a PreparedStatement, I can get about 400/s inserted. If I (on the java side) buffer up the entries and dump them in large transaction blocks I can push this up to about 1200/s. I'd like to go faster. One approach that I think might be promising would be to try using a COPY command instead of an INSERT, but I don't have a file for input, I have a Java collection, so COPY isn't quite right. Is there anyway to efficiently use COPY without having to create a file (remember that the java apps are distributed on a LAN and aren't running on the DB server.) Is this a dead end because of the way COPY is implemented to only use a file? Is there something else I can do? Ultimately, this will end up on a machine running 1+0 RAID, so I expect that will give me some performance boost as well, but I'd like to push it up as best I can with my current hardware setup. Thanks for any advice! -Steve -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Help tracking down problem with inserts slowing
On Sun, 2003-12-07 at 09:52, Tom Lane wrote: Steve Wampler [EMAIL PROTECTED] writes: Hmmm, I have a feeling that's not as obvious as I thought... I can't identify the index (named 'id_index') in the output of vacuum verbose. In 7.2, the index reports look like Index %s: Pages %u; Tuples %.0f. and should appear in the part of the printout that deals with their owning table. Thanks, Tom. Are there any reasons why it would not appear?: - farm.devel.configdb=# vacuum verbose attributes_table; NOTICE: --Relation attributes_table-- NOTICE: Pages 1389: Changed 0, Empty 0; Tup 111358: Vac 0, Keep 0, UnUsed 51. Total CPU 0.00s/0.02u sec elapsed 0.03 sec. NOTICE: --Relation pg_toast_1743942-- NOTICE: Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0. Total CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM farm.devel.configdb=# \d attributes_table Table attributes_table Column | Type | Modifiers +--+--- id | character varying(64)| not null name | character varying(64)| not null units | character varying(32)| value | text | time | timestamp with time zone | default now() Indexes: id_index Primary key: attributes_table_pkey Triggers: trigger_insert --- The odd thing is that I could have sworn it appeared yesterday... -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Help tracking down problem with inserts slowing down...
On Fri, Dec 05, 2003 at 09:54:52PM -0500, Robert Treat wrote: ... A vacuum verbose could give you a good indication if you need to reindex, compare the # of pages in the index with the # in the table. Hmmm, I have a feeling that's not as obvious as I thought... I can't identify the index (named 'id_index') in the output of vacuum verbose. The closest I can find is: NOTICE: --Relation pg_index-- NOTICE: Pages 2: Changed 0, Empty 0; Tup 56: Vac 0, Keep 0, UnUsed 42. Total CPU 0.00s/0.00u sec elapsed 0.00 sec. Which probably isn't correct, right (the name doesn't seem to match)? The table's entry is: NOTICE: --Relation attributes_table-- NOTICE: Pages 639: Changed 0, Empty 0; Tup 52846: Vac 0, Keep 0, UnUsed 48. Total CPU 0.00s/0.01u sec elapsed 0.01 sec. Thanks! Steve -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Help tracking down problem with inserts slowing
On Fri, 2003-12-05 at 16:38, Neil Conway wrote: (1) Can you confirm that the VACUUM FULL on site B actually removed all the tuples you intended it to remove? Concurrent transactions can limit the amount of data that VACUUM FULL is able to reclaim. If you run contrib/pgstattuple (or compare the database's disk consumption with the number of live rows in it), you should be able to tell. Hmmm, I installed 7.2.3 from RPMs, but the contrib package seems to be missing the pgstattuple library code. (According to the readme, I should do: $ make $ make install $ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test but the first two lines don't make sense with the binary rpm distribution and trying the last line as (for my world): -psql -e -f /usr/share/pgsql/contrib/pgstattuple.sql farm.devel.configdb yields: DROP FUNCTION pgstattuple(NAME); psql:/usr/share/pgsql/contrib/pgstattuple.sql:1: ERROR: RemoveFunction: function 'pgstattuple(name)' does not exist CREATE FUNCTION pgstattuple(NAME) RETURNS FLOAT8 AS '$libdir/pgstattuple', 'pgstattuple' LANGUAGE 'c' WITH (isstrict); psql:/usr/share/pgsql/contrib/pgstattuple.sql:4: ERROR: stat failed on file '$libdir/pgstattuple': No such file or directory I don't need this right now (a reindex seems to have fixed our problem for now...), but it sounds like it would be useful in the future. Thanks! Steve -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Help tracking down problem with inserts slowing down...
I need some help tracking down a sudden, massive slowdown in inserts in one of our databases. PG: 7.2.3 (RedHat 8.0) Background. We currently run nearly identical systems at two sites: Site A is a 'lab' site used for development, Site B is a production site. The databases in question have identical structure: A simple table with 4 columns with a trigger function on inserts (which checks to see if the entry already exists, and if so, changes the insert into an update...) A simple view with 4 columns into the above table. All access is through jdbc (JDK 1.3.1, jdbc 7.1-1.3), postgresql.conf's are identical. The two sites were performing at comparable speeds until a few days ago, when we deleted several million records from each database and then did a vacuum full; analyze on both. Now inserts at Site B are several orders of magnitude slower than at Site A. The odd thing is that Site B's DB now has only 60,000 records while Site A's is up around 3 million. Inserts at A average 63ms, inserts at B are now up at 4.5 seconds! EXPLAIN doesn't show any difference between the two. Can someone suggest ways to track this down? I don't know much about postgresql internals/configuration. Thanks! Steve -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] [SQL] Replacing a simple nested query?
On Sun, 2003-07-13 at 14:50, Steve Wampler wrote: I've got a simple nested query: select * from attributes where id in (select id from attributes where (name='obsid') and (value='oid00066')); that performs abysmally. I've heard this described as the 'classic WHERE IN' problem. Is there a better way to obtain the same results? The inner select identifies a set of ids (2049 of them, to be exact) that are then used to locate records that have the same id (about 30-40K of those, including the aforementioned 2049). For the record, Joe Conway and Hannu Krosing both provided the same solution: select at.* from attributes_table at, attributes a where at.id = a.id and a.name='obsid' and a.value='oid00066'; which is several orders of infinity faster than than my naive approach above: - lab.devel.configdb=# explain analyze select * from attributes_table where id in (select id from attributes where (name='obsid') and (value='oid00066')) order by id; NOTICE: QUERY PLAN: Index Scan using id_index on attributes_table (cost=0.00..8773703316.10 rows=241201 width=59) (actual time=136297.91..3418016.04 rows=32799 loops=1) SubPlan - Materialize (cost=18187.48..18187.48 rows=15 width=25) (actual time=0.01..1.68 rows=1979 loops=482402) - Index Scan using name_index on attributes_table (cost=0.00..18187.48 rows=15 width=25) (actual time=0.27..251.95 rows=2049 loops=1) Total runtime: 3418035.38 msec -- lab.devel.configdb=# explain analyze select at.* from attributes_table at, attributes a where at.id = a.id and a.name='obsid' and a.value='oid00066'; NOTICE: QUERY PLAN: Nested Loop (cost=0.00..18739.44 rows=217 width=84) (actual time=0.76..1220.65 rows=32799 loops=1) - Index Scan using name_index on attributes_table (cost=0.00..18187.48 rows=15 width=25) (actual time=0.47..507.31 rows=2049 loops=1) - Index Scan using id_index on attributes_table at (cost=0.00..35.80 rows=12 width=59) (actual time=0.11..0.31 rows=16 loops=2049) Total runtime: 1235.42 msec --- My thanks to both Joe and Hannu! Steve -- Steve Wampler -- [EMAIL PROTECTED] Quantum materiae materietur marmota monax si marmota monax materiam possit materiari? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster