Re: [PERFORM] most bang for buck with ~ $20,000
On Aug 8, 2006, at 4:49 PM, Joshua D. Drake wrote: I am considering a setup such as this: - At least dual cpu (possibly with 2 cores each) - 4GB of RAM - 2 disk RAID 1 array for root disk - 4 disk RAID 1+0 array for PGDATA - 2 disk RAID 1 array for pg_xlog Does anyone know a vendor that might be able provide such setup? Wouldn't it be preferable to put WAL on a multi-disk RAID 10 if you had the opportunity? This gives you the redundancy of RAID 1 but approaches the performance of RAID 0, especially as you add disks to the array. In benchmarking, I've seen consistent success with this approach. -- Thomas F. O'Connell Sitening, LLC http://www.sitening.com/ 3004B Poston Avenue Nashville, TN 37203-1314 615-469-5150 x802 615-469-5151 (fax) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] most bang for buck with ~ $20,000
On Aug 8, 2006, at 5:28 PM, Joshua D. Drake wrote: Thomas F. O'Connell wrote: On Aug 8, 2006, at 4:49 PM, Joshua D. Drake wrote: I am considering a setup such as this: - At least dual cpu (possibly with 2 cores each) - 4GB of RAM - 2 disk RAID 1 array for root disk - 4 disk RAID 1+0 array for PGDATA - 2 disk RAID 1 array for pg_xlog Does anyone know a vendor that might be able provide such setup? Wouldn't it be preferable to put WAL on a multi-disk RAID 10 if you had the opportunity? This gives you the redundancy of RAID 1 but approaches the performance of RAID 0, especially as you add disks to the array. In benchmarking, I've seen consistent success with this approach. WALL is written in order so RAID 1 is usually fine. We also don't need journaling for WAL so the speed is even faster. In which case, which is theoretically better (since I don't have a convenient test bed at the moment) for WAL in a write-heavy environment? More disks in a RAID 10 (which should theoretically improve write throughput in general, to a point) or a 2-disk RAID 1? Does it become a price/performance question, or is there virtually no benefit to throwing more disks at RAID 10 for WAL if you turn off journaling on the filesystem? -- Thomas F. O'Connell Sitening, LLC http://www.sitening.com/ 3004B Poston Avenue Nashville, TN 37203-1314 615-469-5150 x802 615-469-5151 (fax) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] most bang for buck with ~ $20,000
On Aug 8, 2006, at 6:24 PM, Joshua D. Drake wrote: In which case, which is theoretically better (since I don't have a convenient test bed at the moment) for WAL in a write-heavy environment? More disks in a RAID 10 (which should theoretically improve write throughput in general, to a point) or a 2-disk RAID 1? Does it become a price/performance question, or is there virtually no benefit to throwing more disks at RAID 10 for WAL if you turn off journaling on the filesystem? Over 4 drives, I would gather that RAID 10 wouldn't gain you anything. Possibly over 6 or 8 however, it may be faster because you are writing smaller chunks of data, even if two copies of each. Yeah, where I've seen the benefits in practice, the scenarios have involved the availability of a minimum of 6 drives for a RAID 10 for WAL. I really should do a comparison of a 2-disk RAID 1 with a variety of multi-disk RAID 10 configurations at some point. -- Thomas F. O'Connell Sitening, LLC http://www.sitening.com/ 3004B Poston Avenue Nashville, TN 37203-1314 615-469-5150 x802 615-469-5151 (fax) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Very slow queries - please help
On Nov 24, 2005, at 12:14 PM, Bealach-na Bo wrote: The consensus seems to be that I need more indexes and I also need to look into the NOT IN statement as a possible bottleneck. I've introduced the indexes which has led to a DRAMATIC change in response time. Now I have to experiment with INNER JOIN - OUTER JOIN variations, SET ENABLE_SEQSCAN=OFF. Forgive me for not mentioning each person individually and by name. You have all contributed to confirming what I had suspected (and hoped): that *I* have a lot to learn! I'm attaching table descriptions, the first few lines of top output while the queries were running, index lists, sample queries and EXPLAIN ANALYSE output BEFORE and AFTER the introduction of the indexes. As I said, DRAMATIC :) I notice that the CPU usage does not vary very much, it's nearly 100% anyway, but the memory usage drops markedly, which is another very nice result of the index introduction. Any more comments and tips would be very welcome. You might find the following resources from techdocs instructive: http://techdocs.postgresql.org/redir.php?link=/techdocs/ pgsqladventuresep2.php http://techdocs.postgresql.org/redir.php?link=/techdocs/ pgsqladventuresep3.php These documents provide some guidance into the process of index selection. It seems like you could still stand to benefit from more indexes based on your queries, table definitions, and current indexes. -- Thomas F. O'Connell Database Architecture and Programming Co-Founder Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 (cell) 615-469-5150 (office) 615-469-5151 (fax) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Figuring out which command failed
On Nov 7, 2005, at 3:30 PM, Ralph Mason wrote: Hi, I have a transaction that has multiple separate command in it (nothing unusual there). However sometimes one of the sql statements will fail and so the whole transaction fails. In some cases I could fix the failing statement if only I knew which one it was. Can anyone think of any way to get which statement actually failed from the error message? If the error message gave me the line of the failure it would be excellent, but it doesn't. Perhaps it would be easy for me to patch my version of Postgres to do that? I realize I could do this with 2 phase commit, but that isn't ready yet! Any thoughts or ideas are much appreciated Thanks Ralph 2PC might not've been ready yesterday, but it's ready today! http://www.postgresql.org/docs/whatsnew -- Thomas F. O'Connell Database Architecture and Programming Co-Founder Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-469-5150 615-469-5151 (fax) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Effects of cascading references in foreign keys
On Oct 29, 2005, at 9:48 AM, Bruno Wolff III wrote: On Sat, Oct 29, 2005 at 13:10:31 +0200, Martin Lesser [EMAIL PROTECTED] wrote: Which effects have UPDATEs on REFERENCEd TABLEs when only columns in the referenced table are updated which are not part of the FOREIGN KEY constraint? In 8.1 there is a check to see if the foreign key value has changed and if not a trigger isn't queued. In the currently released versions any update will fire triggers. The check in comment for trigger.c didn't say if this optimization applied to both referencing and referenced keys or just one of those. If you need to know more you can look at the code at: http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/ for trigger.c. It seems like this warrants an item somewhere in the release notes, and I'm not currently seeing it (or a related item) anywhere. Perhaps E.1.3.1 (Performance Improvements)? For some of the more extreme UPDATE scenarios I've seen, this could be a big win. -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC Open Source Solutions. Optimized Web Development. http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-469-5150 615-469-5151 (fax) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] tuning seqscan costs
On Oct 19, 2005, at 9:51 AM, Katherine Stoovs wrote: I want to correlate two index rows of different tables to find an offset so that table1.value = table2.value AND table1.id = table2.id + offset is true for a maximum number of rows. To achieve this, I have the two tables and a table with possible offset values and execute a query: SELECT value,(SELECT COUNT(*) FROM table1,table2 WHERE table1.value = table2.value AND table1.id = table2.id + offset) AS matches FROM offsets ORDER BY matches; The query is very inefficient, however, because the planner doesn't use my indexes and executes seqscans instead. I can get it to execute fast by setting ENABLE_SEQSCAN to OFF, but I have read this will make the performance bad on other query types so I want to know how to tweak the planner costs or possibly other stats so the planner will plan the query correctly and use index scans. There must be something wrong in the planning parameters after all if a plan that is slower by a factor of tens or hundreds becomes estimated better than the fast variant. I have already issued ANALYZE commands on the tables. Thanks for your help, Katherine Stoovs Katherine, If offset is a column in offsets, can you add an index on the expresion table2.id + offset? http://www.postgresql.org/docs/8.0/static/indexes-expressional.html -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC Open Source Solutions. Optimized Web Development. http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-469-5150 615-469-5151 (fax) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] wal_buffers
On Oct 5, 2005, at 8:23 AM, Ian Westmacott wrote: Can anyone tell me what precisely a WAL buffer contains, so that I can compute an appropriate setting for wal_buffers (in 8.0.3)? I know the documentation suggests there is little evidence that supports increasing wal_buffers, but we are inserting a large amount of data that, I believe, easily exceeds the default 64K in a single transaction. We are also very sensitive to write latency. As background, we are doing a sustained insert of 2.2 billion rows in 1.3 million transactions per day. Thats about 1700 rows per transaction, at (roughly) 50 bytes per row. Ian, The WAL Configuration chapter (25.2) has a pretty good discussion of how wal_buffers is used: http://www.postgresql.org/docs/8.0/static/wal-configuration.html You might also take a look at Josh Berkus' recent testing on this setting: http://www.powerpostgresql.com/ -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC Strategic Open Source: Open Your i™ http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-469-5150 615-469-5151 (fax) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Index Selection: ORDER BY vs. PRIMARY KEY
I have a query that looks roughly like this (I've removed irrelevant SELECT clause material and obfuscated names, trying to keep them consistent where altered in EXPLAIN output): SELECT u.emma_member_id, h.action_ts FROM user as u, history as h WHERE u.user_id = h.user_id AND h.action_id = '$constant_data' ORDER BY h.action_ts DESC LIMIT 100 OFFSET 0 The user table has ~25,000 rows. The history table has ~750,000 rows. Currently, there is an index on history.action_ts and a separate one on history.action_id. There's also a PRIMARY KEY on user.user_id. If I run the query as such, I get a plan like this: QUERY PLAN Limit (cost=0.00..2196.30 rows=100 width=925) (actual time=947.208..3178.775 rows=3 loops=1) - Nested Loop (cost=0.00..83898.65 rows=3820 width=925) (actual time=947.201..3178.759 rows=3 loops=1) - Index Scan Backward using h_action_ts_idx on history h (cost=0.00..60823.53 rows=3820 width=480) (actual time=946.730..3177.953 rows=3 loops=1) Filter: (action_id = $constant_data::bigint) - Index Scan using user_pkey on user u (cost=0.00..6.01 rows=1 width=445) (actual time=0.156..0.161 rows=1 loops=3) Index Cond: (u.user_id = outer.user_id) Total runtime: 3179.143 ms (7 rows) If I drop the index on the timestamp field, I get a plan like this: QUERY PLAN - Limit (cost=17041.41..17041.66 rows=100 width=925) (actual time=201.725..201.735 rows=3 loops=1) - Sort (cost=17041.41..17050.96 rows=3820 width=925) (actual time=201.719..201.722 rows=3 loops=1) Sort Key: h.action_ts - Merge Join (cost=13488.15..16814.13 rows=3820 width=925) (actual time=7.306..201.666 rows=3 loops=1) Merge Cond: (outer.user_id = inner.user_id) - Index Scan using user_pkey on user u (cost=0.00..3134.82 rows=26802 width=445) (actual time=0.204..151.351 rows=24220 loops=1) - Sort (cost=13488.15..13497.70 rows=3820 width=480) (actual time=0.226..0.234 rows=3 loops=1) Sort Key: h.user_id - Index Scan using h_action_id_idx on history h (cost=0.00..13260.87 rows=3820 width=480) (actual time=0.184..0.195 rows=3 loops=1) Index Cond: (action_id = $constant_data::bigint) Total runtime: 202.089 ms (11 rows) Clearly, if the index on the timestamp field is there, postgres wants to use it for the ORDER BY, even though the performance is worse. How is this preference made internally? If both indexes exist, will postgres always prefer the index on an ordered column? If I need the index on the timestamp field for other queries, is my best bet just to increase sort_mem for this query? Here's my version string: PostgreSQL 8.0.3 on i686-pc-linux-gnu, compiled by GCC 2.95.4 -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC Strategic Open Source: Open Your i™ http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-469-5150 615-469-5151 (fax) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] poor VACUUM performance on large tables
On Sep 4, 2005, at 1:16 AM, Jan Peterson wrote: Hello, We have been experiencing poor performance of VACUUM in our production database. Relevant details of our implementation are as follows: 1. We have a database that grows to about 100GB. 2. The database is a mixture of large and small tables. 3. Bulk data (stored primarily in pg_largeobject, but also in various TOAST tables) comprises about 45% of our data. 4. Some of our small tables are very active, with several hundred updates per hour. 5. We have a rolling delete function that purges older data on a periodic basis to keep our maximum database size at or near 100GB. Everything works great until our rolling delete kicks in. Of course, we are doing periodic VACUUMS on all tables, with frequent VACUUMs on the more active tables. The problem arises when we start deleting the bulk data and have to VACUUM pg_largeobject and our other larger tables. We have seen VACUUM run for several hours (even tens of hours). During this VACUUM process, our smaller tables accumulate dead rows (we assume because of the transactional nature of the VACUUM) at a very rapid rate. Statistics are also skewed during this process and we have observed the planner choosing sequential scans on tables where it is obvious that an index scan would be more efficient. We're looking for ways to improve the performance of VACUUM. We are already experimenting with Hannu Krosing's patch for VACUUM, but it's not really helping (we are still faced with doing a database wide VACUUM about once every three weeks or so as we approach the transaction id rollover point... this VACUUM has been measured at 28 hours in an active environment). Other things we're trying are partitioning tables (rotating the table that updates happen to and using a view to combine the sub-tables for querying). Unfortunately, we are unable to partition the pg_largeobject table, and that table alone can take up 40+% of our database storage. We're also looking at somehow storing our large objects externally (as files in the local file system) and implementing a mechanism similar to Oracle's bfile functionality. Of course, we can't afford to give up the transactional security of being able to roll back if a particular update doesn't succeed. Does anyone have any suggestions to offer on good ways to proceed given our constraints? Thanks in advance for any help you can provide. -jan- Do you have your Free Space Map settings configured appropriately? See section 16.4.3.2 of the docs: http://www.postgresql.org/docs/8.0/static/runtime-config.html#RUNTIME- CONFIG-RESOURCE You'll want to run a VACUUM VERBOSE and note the numbers at the end, which describe how many pages are used and how many are needed. max_fsm_pages should be set according to that, and you can set max_fsm_relations based on it, too, although typically one knows roughly how many relations are in a database. http://www.postgresql.org/docs/8.0/static/sql-vacuum.html Finally, have you experimented with pg_autovacuum, which is located in contrib in the source tarballs (and is integrated into the backend in 8.1 beta and beyond)? You don't really say how often you're running VACUUM, and it might be that you're not vacuuming often enough. -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC Strategic Open Source: Open Your i™ http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-469-5150 615-469-5151 (fax) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Need indexes on empty tables for good performance ?
Rohan,You should note that in Postgres, indexes are not inherited by child tables.Also, it seems difficult to select from a child table whose name you don't know unless you access the parent. And if you are accessing the data via the parent, I'm reasonably certain that you will find that indexes aren't used (even if they exist on the children) as a result of the way the children are accessed. --Thomas F. O'ConnellCo-Founder, Information ArchitectSitening, LLCStrategic Open Source: Open Your i™http://www.sitening.com/110 30th Avenue North, Suite 6Nashville, TN 37203-6320615-469-5150615-469-5151 (fax) On Aug 22, 2005, at 10:41 PM, Lenard, Rohan (Rohan) wrote: I've read that indexes aren't used for COUNT(*) and I've noticed (7.3.x) with EXPLAIN that indexes never seem to be used on empty tables - is there any reason to have indexes on empty tables, or will postgresql never use them. This is not as silly as it sounds - with table inheritance you might have table children with the data and a parent that is empty. It'd be nice to make sure postgresql knows to never really look at the parent - especially is you don't know the names of all the children .. Thoughts ? thx, Rohan
Re: [PERFORM] pgbench Comparison of 7.4.7 to 8.0.2
Interesting. I should've included standard deviation in my pgbench iteration patch. Maybe I'll go back and do that. I was seeing oscillation across the majority of iterations in the 25 clients/1000 transaction runs on both database versions. I've got my box specs and configuration files posted. If you see anything obvious about the tuning parameters that should be tweaked, please let me know. Thanks for the feedback! -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC Strategic Open Source: Open Your i http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Apr 25, 2005, at 1:58 AM, Steve Poe wrote: There was some interesting oscillation behavior in both version of postgres that occurred with 25 clients and 1000 transactions at a scaling factor of 100. This was repeatable with the distribution version of pgbench run iteratively from the command line. I'm not sure how to explain this. Tom, When you see these oscillations, do they occur after so many generated results? Some oscillation is normal, in my opinion, from 10-15% of the performance is noise-related. The key is to tune the server that you either 1) minimize the oscillation and/or 2)increase your overall performance above the 10-15% baseline, and 3) find out what the mean and standard deviation between all your results. If your results are within that range, this maybe normal. I follow-up with you later on what I do. Steve Poe ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] pgbench Comparison of 7.4.7 to 8.0.2
Steve, Per your and Tom's recommendations, I significantly increased the number of transactions used for testing. See my last post. The database will have pretty heavy mixed use, i.e., both reads and writes. I performed 32 iterations per scenario this go-round. I'll look into OSDB for further benchmarking. Thanks for the tip. Since pgbench is part of the postgres distribution and I had it at hand and it seems to be somewhat widely referenced, I figured I go ahead and post preliminary results from it. -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC Strategic Open Source: Open Your i http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Apr 15, 2005, at 4:24 PM, Steve Poe wrote: Tom, People's opinions on pgbench may vary, so take what I say with a grain of salt. Here are my thoughts: 1) Test with no less than 200 transactions per client. I've heard with less than this, your results will vary too much with the direction of the wind blowing. A high enough value will help rule out some noise factor. If I am wrong, please let me know. 2) How is the database going to be used? What percentage will be read/write if you had to guess? Pgbench is like a TPC-B with will help guage the potential throughput of your tps. However, it may not stress the server enough to help you make key performance changes. However, benchmarks are like statistics...full of lies g. 3) Run not just a couple pgbench runs, but *many* (I do between 20-40 runs) so you can rule out noise and guage improvement on median results. 4) Find something that you test OLTP-type transactions. I used OSDB since it is simple to implement and use. Although OSDL's OLTP testing will closer to reality. Steve Poe ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] pgbench Comparison of 7.4.7 to 8.0.2
I'm in the fortunate position of having a newly built database server that's pre-production. I'm about to run it through the ringer with some simulations of business data and logic, but I wanted to post the results of some preliminary pgbench marking. http://www.sitening.com/pgbench.html To me, it looks like basic transactional performance is modestly improved at 8.0 across a variety of metrics. I think this bodes well for more realistic loads, but I'll be curious to see the results of some of the simulations. I've still got a little bit of preparatory time with this box, so I can continue to do some experimentation. I'd be curious to see whether these numbers meet developer expectations and to see whether the developer and user community have insight into other pgbench options that would be useful to see. Thanks! -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC Strategic Open Source: Open Your i http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Follow-Up: How to improve db performance with $7K?
Things might've changed somewhat over the past year, but this is from _the_ Linux guy at Dell... -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC Strategic Open Source Open Your i http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 Date: Mon, 26 Apr 2004 14:15:02 -0500 From: Matt Domsch [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: PERC3/Di failure workaround hypothesis --uXxzq0nDebZQVNAZ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Apr 26, 2004 at 11:10:36AM -0500, Sellek, Greg wrote: Short of ordering a Perc4 for every 2650 that I want to upgrade to RH ES, is there anything else I can do to get around the Perc3/Di problem? Our working hypothesis for a workaround is to do as follows: In afacli, set: Read Cache: enabled Write Cache: enabled when protected Then unplug the ROMB battery. A reboot is not necessary. The firmware will immediately drop into Write-Through Cache mode, which in our testing has not exhibited the problem. Setting the write cache to disabled in afacli doesn't seem to help - you've got to unplug the battery with it in the above settings. We are continuing to search for the root cause to the problem, and will update the list when we can. Thanks, Matt -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions linux.dell.com www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com On Apr 5, 2005, at 11:44 PM, Kevin Brown wrote: Thomas F.O'Connell wrote: I'd use two of your drives to create a mirrored partition where pg_xlog resides separate from the actual data. RAID 10 is probably appropriate for the remaining drives. Fortunately, you're not using Dell, so you don't have to worry about the Perc3/Di RAID controller, which is not so compatible with Linux... Hmm...I have to wonder how true this is these days. My company has a Dell 2500 with a Perc3/Di running Debian Linux, with the 2.6.10 kernel. The controller seems to work reasonably well, though I wouldn't doubt that it's slower than a different one might be. But so far we haven't had any reliability issues with it. Now, the performance is pretty bad considering the setup -- a RAID 5 with five 73.6 gig SCSI disks (10K RPM, I believe). Reads through the filesystem come through at about 65 megabytes/sec, writes about 35 megabytes/sec (at least, so says bonnie -s 8192). This is on a system with a single 3 GHz Xeon and 1 gigabyte of memory. I'd expect much better read performance from what is essentially a stripe of 4 fast SCSI disks. While compatibility hasn't really been an issue, at least as far as the basics go, I still agree with your general sentiment -- stay away from the Dells, at least if they have the Perc3/Di controller. You'll probably get much better performance out of something else. -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Configuration/Tuning of server/DB
Reid, There are a few very valuable tuning documents that are part of the established PostgreSQL-related literature. You don't mention which version of postgres you'll be running, but here are the documents you'll find useful: postgresql.conf 7.4: http://www.varlena.com/varlena/GeneralBits/Tidbits/ annotated_conf_e.html 8.0: http://www.powerpostgresql.com/Downloads/annotated_conf_80.html general tuning http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC Strategic Open Source Open Your i http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Mar 24, 2005, at 1:46 PM, Reid Thompson wrote: Using information found on the web, I've come up with some configuration and tuning parameters for a server/db that we will be implementing. I was wondering if I could generate some feedback as to configuration and tuning so that I could compare my estimations with those of others. Host is AIX 5.1 with 4 cpu's and 4 GB ram. Postgresql will be sharing this machine with other processes. Storage is an EMC storage array. The DB itself is very simple. Two tables, one with 40-45 columns ( largest column will likely contain no more than 32 chars of data ), the other with less than 5 columns ( largest column will contain no more than 20 chars data ). Expected transactions will be along the order of ~600K +- 100K inserts and ~600K +-200K updates per week. Thanks ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] [ADMIN] Too slow
Please post the results of that query as run through EXPLAIN ANALYZE. Also, I'm going to reply to this on pgsql-performance, which is probably where it better belongs. -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Mar 22, 2005, at 8:23 AM, Sabio - PSQL wrote: How can I improve speed on my queries. For example this query takes one day executing itself and it has not finalized !!! create table tmp_partes as select * from partes where identificacion not in (select cedula from sujetos) partes have 1888000 rows, an index on identificacion sujetos have 550 rows, an index on cedula ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Triggers During COPY
I'm involved in an implementation of doing trigger-based counting as a substitute for count( * ) in real time in an application. My trigger-based counts seem to be working fine and dramatically improve the performance of the display of the counts in the application layer. The problem comes in importing new data into the tables for which the counts are maintained. The current import process does some preprocessing and then does a COPY from the filesystem to one of the tables on which counts are maintained. This means that for each row being inserted by COPY, a trigger is fired. This didn't seem like a big deal to me until testing began on realistic data sets. For a 5,000-record import, preprocessing plus the COPY took about 5 minutes. Once the triggers used for maintaining the counts were added, this grew to 25 minutes. While I knew there would be a slowdown per row affected, I expected something closer to 2x than to 5x. It's not unrealistic for this system to require data imports on the order of 100,000 records. Whereas this would've taken at most an hour and a half before (preprocessing takes a couple of minutes, so the actual original COPY takes closer to 2-3 minutes, or just over 1500 rows per minute), the new version is likely to take more than 7 hours, which seems unreasonable to me. Additionally, the process is fairly CPU intensive. I've examined the plans, and, as far as I can tell, the trigger functions are being prepared and using the indexes on the involved tables, which are hundreds of thousands of rows in the worst cases. The basic structure of the functions is a status lookup SELECT (to determine whether a count needs to be updated and which one) and one or two UPDATE statements (depending on whether both an increment and a decrement need to be performed). As I said, it looks like this basic format is using indexes appropriately. Is there anything I could be overlooking that would tweak some more performance out of this scenario? Would it be absurd to drop the triggers during import and recreate them afterward and update the counts in a summary update based on information from the import process? -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Triggers During COPY
I forgot to mention that I'm running 7.4.6. The README includes the caveat that pgmemcache is designed for use with 8.0. My instinct is to be hesitant using something like that in a production environment without some confidence that people have done so with good and reliable success or without more extensive testing than I'm likely to have time for primarily because support for 7.4.x is never likely to increase. Thanks for the tip, though. For the time being, it sounds like I'll probably try to implement the drop/create trigger setup during import. -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Jan 27, 2005, at 11:41 PM, Josh Berkus wrote: Thomas, Would it be absurd to drop the triggers during import and recreate them afterward and update the counts in a summ ary update based on information from the import process? That's what I'd do. Also, might I suggest storing the counts in memcached (see the pgmemached project on pgFoundry) rather than in a table? -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance Anomalies in 7.4.5
I know, I know: I should've done this before I posted. REINDEXing and VACUUMing mostly fixed this problem. Which gets me back to where I was yesterday, reviewing an import process (that existed previously) that populates tables in this system that seems to allow small data sets to cause simple queries like this to crawl. Is there anything about general COPY/INSERT activity that can cause small data sets to become so severely slow in postgres that can be prevented other than being diligent about VACUUMing? I was hoping that pg_autovacuum along with post-import manual VACUUMs would be sufficient, but it doesn't seem to be the case necessarily. Granted, I haven't done a methodical and complete review of the process, but I'm still surprised at how quickly it seems able to debilitate postgres with even small amounts of data. I had a similar situation crawl yesterday based on a series of COPYs involving 5 rows! As in, can I look for something to treat the cause rather than the symptoms? If not, should I be REINDEXing manually, as well as VACUUMing manually after large data imports (whether via COPY or INSERT)? Or will a VACUUM FULL ANALYZE be enough? Thanks! -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Oct 21, 2004, at 3:36 PM, Thomas F.O'Connell wrote: I'm seeing some weird behavior on a repurposed server that was wiped clean and set up to run as a database and application server with postgres and Apache, as well as some command-line PHP scripts. The box itself is a quad processor (2.4 GHz Intel Xeons) Debian woody GNU/Linux (2.6.2) system. postgres is crawling on some fairly routine queries. I'm wondering if this could somehow be related to the fact that this isn't a database-only server, but Apache is not really using any resources when postgres slows to a crawl. Here's an example of analysis of a recent query: EXPLAIN ANALYZE SELECT COUNT(DISTINCT u.id) FROM userdata as u, userdata_history as h WHERE h.id = '18181' AND h.id = u.id; QUERY PLAN --- --- -- Aggregate (cost=0.02..0.02 rows=1 width=8) (actual time=298321.421..298321.422 rows=1 loops=1) - Nested Loop (cost=0.00..0.01 rows=1 width=8) (actual time=1.771..298305.531 rows=2452 loops=1) Join Filter: (inner.id = outer.id) - Seq Scan on userdata u (cost=0.00..0.00 rows=1 width=8) (actual time=0.026..11.869 rows=2452 loops=1) - Seq Scan on userdata_history h (cost=0.00..0.00 rows=1 width=8) (actual time=0.005..70.519 rows=41631 loops=2452) Filter: (id = 18181::bigint) Total runtime: 298321.926 ms (7 rows) userdata has a primary/foreign key on id, which references userdata_history.id, which is a primary key. At the time of analysis, the userdata table had 2,500 rows. userdata_history had 50,000 rows. I can't imagine how even a seq scan could result in a runtime of nearly 5 minutes in these circumstances. Also, doing a count( * ) from each table individually returns nearly instantly. I can provide details of postgresql.conf and kernel settings if necessary, but I'm using some pretty well tested settings that I use any time I admin a postgres installation these days based on box resources and database size. I'm more interested in knowing if there are any bird's eye details I should be checking immediately. Thanks. -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Table UPDATE is too slow
What is the datatype of the id column? -tfo On Aug 31, 2004, at 1:11 PM, Ron St-Pierre wrote: We have a web based application with data that is updated daily. The biggest bottleneck occurs when we try to update one of the tables. This table contains 58,000 rows and 62 columns, and EVERY column is indexed. Every column is queryable (?) by the users through the web interface so we are reluctant to remove the indexes (recreating them would be time consuming too). The primary key is an INT and the rest of the columns are a mix of NUMERIC, TEXT, and DATEs. A typical update is: UPDATE dataTable SET field01=44.5, field02=44.5, field03='Bob', field04='foo', ... , field60='2004-08-30', field61='2004-08-29' WHERE id = 1234; Also of note is that the update is run about 10 times per day; we get blocks of data from 10 different sources, so we pre-process the data and then update the table. We also run VACUUM FULL ANALYZE on a nightly basis. Does anyone have some idea on how we can increase speed, either by changing the updates, designing the database differently, etc, etc? This is currently a big problem for us. Other notables: The UPDATE is run from a within a function: FOR rec IN SELECT ...LOOP RETURN NEXT rec; UPDATE dataTable. Postgres 7.4.3 debian stable 2 GB RAM 80 DB IDE drive (we can't change it) shared_buffers = 2048 sort_mem = 1024max_fsm_pages = 4 checkpoint_segments = 5 random_page_cost = 3 Thanks Ron ---(end of broadcast)--- TIP 8: explain analyze is your friend