Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10
[EMAIL PROTECTED] wrote: WAL is on a RAID 0 drive along with the OS Isn't that just as unsafe as having the whole lot on RAID0? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
On 18/03/2008, Peter Koczan [EMAIL PROTECTED] wrote: available, and RAID 5 and RAID 6 are just huge pains and terribly slow for writes. RAID 5 and RAID 6 are just huge pains and terribly slow for writes with small numbers of spindles ;} In my testing I found that once you hit 10 spindles in a RAID5 the differences between it and a RAID10 started to become negligible (around 6% slower on writes average with 10 runs of bonnie++ on 10 spindles) while the read speed (if you're doing similar amounts of reads writes it's a fair criterion) were in about the 10% region faster. With 24 spindles I couldn't see any difference at all. Those were 73GB 15K SCAs, btw, and the SAN connected via 2GB fibre. Peter Cheers, Andrej -- Please don't top post, and don't use HTML e-Mail :} Make your quotes concise. http://www.american.edu/econ/notes/htmlmail.htm -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
Andrej Ricnik-Bay wrote: In my testing I found that once you hit 10 spindles in a RAID5 the differences between it and a RAID10 started to become negligible (around 6% slower on writes average with 10 runs of bonnie++ on 10 spindles) while the read speed (if you're doing similar amounts of reads writes it's a fair criterion) were in about the 10% region faster. With 24 spindles I couldn't see any difference at all. Those were 73GB 15K SCAs, btw, and the SAN connected via 2GB fibre. Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a double disk failure (such as during the load imposed by rebuild onto a spare) ? I guess if you have good backups - as you must - it's not that big a deal, but I'd be pretty nervous with anything less than RAID 6 or RAID 10 . -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] TB-sized databases
Ron Mayer [EMAIL PROTECTED] writes: Would another possible condition for considering Cartesian joins be be: * Consider Cartesian joins when a unique constraint can prove that at most one row will be pulled from one of the tables that would be part of this join? What for? That would still lead us to consider large numbers of totally useless joins. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10
On Sun, Mar 16, 2008 at 12:04:44PM -0700, Craig James wrote: Just out of curiosity: Last time I did research, the word seemed to be that xfs was better than ext2 or ext3. Is that not true? Why use ext2/3 at all if xfs is faster for Postgres? For the WAL, the filesystem is largely irrelevant. (It's relatively small, the files are preallocated, the data is synced to disk so there's not advantage from write buffering, etc.) The best filesystem is one that does almost nothing and stays out of the way--ext2 is a good choice for that. The data is a different story and a different filesystem is usually a better choice. (If for no other reason than to avoid long fsck times.) Mike Stone -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Planner mis-estimation using nested loops followup
A number of weeks ago, I had posted a request for help regarding join estimates in pg 8.2.6. In moderately complex to very complex ad hoc queries in our system, we were consistently having the system massively underestimate the number of rows coming out of join at a low level making these queries very slow and inefficient. At times the mis-estimation was 1000:1. Ie when it should have been 2000 returned rows from a join, the planner assumed 1 or 2 rows. Modifying stats on the join columns up to the max made little difference (y, we analyzed tables in question after each change). Since the planner sees only one row coming out of the low level join, it uses nested loops all the way up chain when it would be more efficient to use another join type. In our informal testing, we found that by disabling nested loops and forcing other join types, we could get fantastic speedups. Those queries that seem to benefit most from this have a lot of sub-queries being built up into a final query set as well as a fair number of joins in the sub-queries. Since these are user created and are then generated via our tools, they can be quite messy at times. After doing this testing, have since added some functionality in our ad hoc reporting tool to allow us to tune individual queries by turning on and off individual join types at runtime. As we hear of slow reports, we've been individually turning off the nested loops on those reports. Almost always, this has increased the performance of the reports, sometimes in a completely amazing fashion (many, many minutes to seconds at times). It of course doesn't help everything and turning off nested loops in general causes overall slowdown in other parts of the system. As this has gone on over the last couple of weeks, it feels like we either have a misconfiguration on the server, or we are tickling a mis-estimation bug in the planner. I'm hoping it's the former. The db server has 8G of memory and raid1 -wal, raid10- data configuration, os is linux 2.6.9, db is 8.2.6. The db is a utf-8 db if that is of any bearing and autovac and bgwriter are on. Nondefault settings of interest from postgresql.conf shared_buffers = 1024MB # min 128kB or max_connections*16kB work_mem = 256MB# min 64kB maintenance_work_mem = 256MB# min 1MB random_page_cost = 1.75 # same scale as above effective_cache_size = 4096MB default_statistics_target = 100 # range 1-1000 If nothing else, perhaps this will help somebody else who has run into the same problem. If explain analyze of a query shows a large mis-estimation of rows returned on a join (estimate=1, actual=2k) causing the planner to choose nested loops instead of another join type, you might try running the query with nested loops set to off and see if that helps w/ performance. Thanks, -Chris
Re: [PERFORM] Planner mis-estimation using nested loops followup
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, 18 Mar 2008 11:35:08 -0400 Chris Kratz [EMAIL PROTECTED] wrote: Nondefault settings of interest from postgresql.conf shared_buffers = 1024MB # min 128kB or max_connections*16kB work_mem = 256MB # min 64kB maintenance_work_mem = 256MB# min 1MB random_page_cost = 1.75 # same scale as above effective_cache_size = 4096MB default_statistics_target = 100 # range 1-1000 If nothing else, perhaps this will help somebody else who has run into the same problem. If explain analyze of a query shows a large mis-estimation of rows returned on a join (estimate=1, actual=2k) causing the planner to choose nested loops instead of another join type, you might try running the query with nested loops set to off and see if that helps w/ performance. Did you try that? Did it work? Joshua D. Drake - -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL political pundit | Mocker of Dolphins -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFH3+TlATb/zqfZUUQRAmXUAKCjwidfW0KXjzUM26I4yTx94/wSiQCfaqWU eI9i5yucBH718okW3w2UewQ= =BO3E -END PGP SIGNATURE- -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner mis-estimation using nested loops followup
Y, turning nested loops off in specific cases has increased performance greatly. It didn't fix the planner mis-estimation, just the plan it chose. It's certainly not a panacea, but it's something we now try early on when trying to speed up a query that matches these characteristics. -Chris On 3/18/08, Joshua D. Drake [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, 18 Mar 2008 11:35:08 -0400 Chris Kratz [EMAIL PROTECTED] wrote: Nondefault settings of interest from postgresql.conf shared_buffers = 1024MB # min 128kB or max_connections*16kB work_mem = 256MB # min 64kB maintenance_work_mem = 256MB# min 1MB random_page_cost = 1.75 # same scale as above effective_cache_size = 4096MB default_statistics_target = 100 # range 1-1000 If nothing else, perhaps this will help somebody else who has run into the same problem. If explain analyze of a query shows a large mis-estimation of rows returned on a join (estimate=1, actual=2k) causing the planner to choose nested loops instead of another join type, you might try running the query with nested loops set to off and see if that helps w/ performance. Did you try that? Did it work? Joshua D. Drake - -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL political pundit | Mocker of Dolphins -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFH3+TlATb/zqfZUUQRAmXUAKCjwidfW0KXjzUM26I4yTx94/wSiQCfaqWU eI9i5yucBH718okW3w2UewQ= =BO3E -END PGP SIGNATURE-
Re: [PERFORM] Planner mis-estimation using nested loops followup
On Tue, 18 Mar 2008, Chris Kratz wrote: In moderately complex to very complex ad hoc queries in our system, we were consistently having the system massively underestimate the number of rows coming out of join at a low level making these queries very slow and inefficient. I have long thought that perhaps Postgres should be a little more cautious about its estimates, and assume the worst case scenario sometimes, rather than blindly following the estimates from the statistics. The problem is that Postgres uses the statistics to generate best estimates of the cost. However, it does not take into account the consequences of being wrong. If it was more clever, then it may be able to decide to use a non-optimal algorithm according to the best estimate, if the optimal algorithm has the possibility of blowing up to 1000 times the work if the estimates are off by a bit. Such cleverness would be very cool, but (I understand) a lot of work. It would hopefully solve this problem. Matthew -- Taking apron off And now you can say honestly that you have been to a lecture where you watched paint dry. - Computer Graphics Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
On 18/03/2008, Craig Ringer [EMAIL PROTECTED] wrote: Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a double disk failure (such as during the load imposed by rebuild onto a spare) ? I never said that we actually USED that set-up. I just said I did extensive testing with varied RAID-setups. ;} We did go with the 10 in the end because of that very consideration. It's just that the mantra RAID5 = slow writes isn't quite true. Cheers, Andrej -- Please don't top post, and don't use HTML e-Mail :} Make your quotes concise. http://www.american.edu/econ/notes/htmlmail.htm -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] TB-sized databases
Tom Lane wrote: Ron Mayer [EMAIL PROTECTED] writes: Would another possible condition for considering Cartesian joins be be: * Consider Cartesian joins when a unique constraint can prove that at most one row will be pulled from one of the tables that would be part of this join? What for? That would still lead us to consider large numbers of totally useless joins. regards, tom lane Often I get order-of-magnitude better queries by forcing the cartesian join even without multi-column indexes. Explain analyze results below. Here's an example with your typical star schema. fact is the central fact table. d_ref is a dimension table for the referrer d_uag is a dimension table for the useragent. Forcing the cartesan join using offset 0 makes the the query take 14 ms (estimated cost 7575). If I don't force the cartesian join the query takes over 100ms (estimated cost 398919). Indexes are on each dimension; but no multi-column indexes (since the ad-hoc queries can hit any permutation of dimensions). logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host = 'www.real.com' and ref_path = '/products/player/more_info/moreinfo.html' and ref_query = '?ID=370DC=LANG=PN=RealOne%20PlayerPV=6.0.11.818PT=OS=CM=CMV=LS=RE=RA=RV=' and useragent = 'Mozilla/4.08 [en] (WinNT; U ;Nav)' offset 0 ) as a; QUERY PLAN -- Nested Loop (cost=6465.12..7575.91 rows=367 width=2096) (actual time=14.152..14.192 rows=4 loops=1) - Limit (cost=0.00..14.22 rows=1 width=218) (actual time=0.084..0.102 rows=1 loops=1) - Nested Loop (cost=0.00..14.22 rows=1 width=218) (actual time=0.082..0.096 rows=1 loops=1) - Index Scan using i_ref__val on d_ref (cost=0.00..7.83 rows=1 width=127) (actual time=0.056..0.058 rows=1 loops=1) Index Cond: (((ref_path)::text = '/products/player/more_info/moreinfo.html'::text) AND ((ref_host)::text = 'www.real.com'::text) AND ((ref_query)::text = '?ID=370DC=LANG=PN=RealOne%20PlayerPV=6.0.11.818PT=OS=CM=CMV=LS=RE=RA=RV='::text)) - Index Scan using i_uag__val on d_uag (cost=0.00..6.38 rows=1 width=91) (actual time=0.020..0.029 rows=1 loops=1) Index Cond: ((useragent)::text = 'Mozilla/4.08 [en] (WinNT; U ;Nav)'::text) - Bitmap Heap Scan on fact (cost=6465.12..7556.18 rows=367 width=32) (actual time=14.053..14.066 rows=4 loops=1) Recheck Cond: ((fact.uag_id = a.uag_id) AND (fact.ref_id = a.ref_id)) - BitmapAnd (cost=6465.12..6465.12 rows=367 width=0) (actual time=14.016..14.016 rows=0 loops=1) - Bitmap Index Scan on i__fact__uag_id (cost=0.00..2770.83 rows=196223 width=0) (actual time=2.258..2.258 rows=7960 loops=1) Index Cond: (fact.uag_id = a.uag_id) - Bitmap Index Scan on i__fact__ref_id (cost=0.00..3581.50 rows=253913 width=0) (actual time=9.960..9.960 rows=13751 loops=1) Index Cond: (fact.ref_id = a.ref_id) Total runtime: 14.332 ms (15 rows) logs=# logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host = 'www.real.com' and ref_path = '/products/player/more_info/moreinfo.html' and ref_query = '?ID=370DC=LANG=PN=RealOne%20PlayerPV=6.0.11.818PT=OS=CM=CMV=LS=RE=RA=RV=' and useragent = 'Mozilla/4.08 [en] (WinNT; U ;Nav)' ) as a; QUERY PLAN Hash Join (cost=2827.72..398919.05 rows=1 width=242) (actual time=78.777..107.038 rows=4 loops=1) Hash Cond: (fact.ref_id = d_ref.ref_id) - Nested Loop (cost=2819.88..398908.65 rows=511 width=119) (actual time=6.311..101.843 rows=7960 loops=1) - Index Scan using i_uag__val on d_uag (cost=0.00..6.38 rows=1 width=91) (actual time=0.021..0.029 rows=1 loops=1) Index Cond: ((useragent)::text = 'Mozilla/4.08 [en] (WinNT; U ;Nav)'::text) - Bitmap Heap Scan on fact (cost=2819.88..396449.49 rows=196223 width=32) (actual time=6.273..91.645 rows=7960 loops=1) Recheck Cond: (fact.uag_id = d_uag.uag_id) - Bitmap Index Scan on i__fact__uag_id (cost=0.00..2770.83
Re: [PERFORM] What is the best way to storage music files in Postgresql
On Wed, 19 Mar 2008, Andrej Ricnik-Bay wrote: On 18/03/2008, Craig Ringer [EMAIL PROTECTED] wrote: Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a double disk failure (such as during the load imposed by rebuild onto a spare) ? that's why you should use raid6 (allowing for dual failures) David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner mis-estimation using nested loops followup
On Tue, Mar 18, 2008 at 9:58 AM, Chris Kratz [EMAIL PROTECTED] wrote: Y, turning nested loops off in specific cases has increased performance greatly. It didn't fix the planner mis-estimation, just the plan it chose. It's certainly not a panacea, but it's something we now try early on when trying to speed up a query that matches these characteristics. I have to admit I've had one or two reporting queries in the past that turning off nested_loop was the only reasonable fix due to misestimation. I'd tried changing the stats targets etc and nothing really worked reliably to prevent the nested_loop from showing up in the wrong places. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
On Tue, 18 Mar 2008, Gregory Stark wrote: [EMAIL PROTECTED] writes: On Wed, 19 Mar 2008, Andrej Ricnik-Bay wrote: On 18/03/2008, Craig Ringer [EMAIL PROTECTED] wrote: Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a double disk failure (such as during the load imposed by rebuild onto a spare) ? that's why you should use raid6 (allowing for dual failures) You can have as many parity drives as you want with RAID 5 too. you can? I've never seen a raid 5 setup with more then a single parity dirve (or even the option of having more then one drives worth of redundancy). you can have hot-spare drives, but thats a different thing. what controller/software lets you do this? David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
[EMAIL PROTECTED] writes: On Wed, 19 Mar 2008, Andrej Ricnik-Bay wrote: On 18/03/2008, Craig Ringer [EMAIL PROTECTED] wrote: Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a double disk failure (such as during the load imposed by rebuild onto a spare) ? that's why you should use raid6 (allowing for dual failures) You can have as many parity drives as you want with RAID 5 too. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] question on TRUNCATE vs VACUUM FULL
Hi folks, We are running Postgres 8.2.5. I have 3 tables, call them A, B, and C Table A houses info on all emails that have ever been created for the purpose of being delivered to our end customers. Big table. About 23 million rows. Table B, the 'holding' table is populated with Table A key information via an after trigger whenever Table A is updated or inserted to. Table C, the 'work' table is populated by function D from table B. It is configured exactly like table B. PLPGSQL Function D inserts a predefined number of rows from table B to table C. For purposes of discussion, say 500. Function D, after it does its thing, then deletes the 500 rows it processed from table B, and ALL 500 rows from table C. This entire process, after a sleep period of 10 seconds, repeats itself all day. After each fifth iteration of function D, we perform a VACUUM FULL on both tables B and C. Takes less than 5 seconds. In terms of transaction processing: Table A is processed by many transactions (some read, some update), Table B is processed by - any transaction updating or inserting to Table A via the after trigger (insert, update) - Function D (insert, update, delete) Table C is processed ONLY by function D (insert, update, delete). Nothing else touches it; PG_LOCKS table verifies that that this table is totally free of any transaction Between iterations of function D. So my question is this: Shouldn't VACUUM FULL clean Table C and reclaim all its space? It doesn't. It usually reports the same number of pages before and after the Vacuum. We have to resort to TRUNCATE to clean and reclaim this table, which Must be empty at the beginning of function D. Any insights appreciated. Thanks, Mark Steben Senior Database Administrator @utoRevenueT A Dominion Enterprises Company 480 Pleasant Street Suite B200 Lee, MA 01238 413-243-4800 Home Office 413-243-4809 Corporate Fax msteben blocked::mailto:[EMAIL PROTECTED] @autorevenue.com Visit our new website at blocked::http://www.autorevenue.com/ www.autorevenue.com IMPORTANT: The information contained in this e-mail message is confidential and is intended only for the named addressee(s). If the reader of this e-mail message is not the intended recipient (or the individual responsible for the delivery of this e-mail message to the intended recipient), please be advised that any re-use, dissemination, distribution or copying of this e-mail message is prohibited. If you have received this e-mail message in error, please reply to the sender that you have received this e-mail message in error and then delete it. Thank you.
Re: [PERFORM] Planner mis-estimation using nested loops followup
Scott Marlowe wrote On Tue, Mar 18, 2008 at 9:58 AM, Chris Kratz [EMAIL PROTECTED] wrote: Y, turning nested loops off in specific cases has increased performance greatly. It didn't fix the planner mis-estimation, just the plan it chose. It's certainly not a panacea, but it's something we now try early on when trying to speed up a query that matches these characteristics. I have to admit I've had one or two reporting queries in the past that turning off nested_loop was the only reasonable fix due to misestimation. I'd tried changing the stats targets etc and nothing really worked reliably to prevent the nested_loop from showing up in the wrong places. One cause of planner mis-estimation I've seen quite frequently is when there are a number of predicates on the data that filter the results in roughly the same manner. PostgreSQL, not knowing that the filters are highly correlated, multiplies the fraction of selected rows together. Making up an example using pseudo-code, if this is one of the subqueries: select * from orders where order_date is recent and order_fulfilled is false Used in an application where the unfulfilled orders are the recent ones. If postgresql estimates that 1% of the orders are recent, and 1% are unfulfilled, then it will assume that 0.01% are both recent and unfulfilled. If in reality it's more like 0.9%, and your actual row count will be 90 times your estimate. The only kind of simple behind-the-scenes fix for these situations that I know of is to add more indexes (such as a partial index on order_date where order_fulfilled is false), which slows down all your updates, and only works for the simplest situations. A general fix would need to calculate, store, and lookup a huge amount of correlation data. Probably equal to the square of the number of rows in pg_stats, though this could possibly be generated as needed. Perhaps if the analyze command was extended to be able to take a command line like: ANALYZE CARTESIAN CORRELATION orders(order_date,order_fulfilled); which stores the fraction for each combination of most frequent value, and domain buckets from order_date and order_fulfilled. The difficulty is whether the planner can quickly and easily determine whether appropriate correlation data exists for the query plan it is estimating. Regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/dmzmessaging.htm for details. __ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
Gregory Youngblood wrote: Also, a very informative read: http://research.google.com/archive/disk_failures.pdf In short, best thing to do is watch SMART and be prepared to try and swap a drive out before it fails completely. :) I currently have four brand new 1TB disks (7200RPM SATA - they're for our backup server). Two of them make horrible clicking noises - they're rapidly parking and unparking or doing constant seeks. One of those two also spins up very loudly, and on spin down rattles and buzzes. Their internal SMART health check reports the problem two to be just fine, and both pass a short SMART self test (smartctl -d ata -t short). Both have absurdly huge seek_error_rate values, but the SMART thresholds see nothing wrong with this. The noisy spin down one is so defective that I can't even write data to it successfully, and the other problem disk has regular I/O errors and fails an extended SMART self test (the short test fails). I see this sort of thing regularly. Vendors are obviously setting the SMART health thresholds so that there's absolutely no risk of reporting an issue with a working drive, and in the process making it basically useless for detecting failing or faulty drives. I rely on manual examination of the vendor attributes like the seek error rate, ECC recovered sectors, offline uncorrectable sectors (usually a REALLY bad sign if this grows), etc combined with regular extended SMART tests (which do a surface scan). Just using SMART - say, the basic health check - really isn't enough. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
[EMAIL PROTECTED] wrote: you can? I've never seen a raid 5 setup with more then a single parity dirve (or even the option of having more then one drives worth of redundancy). you can have hot-spare drives, but thats a different thing. With RAID 4, where the parity drives are in fact dedicated to parity information, the controller could just store the parity data mirrored on more than one drive. Unfortunately write performance on RAID 4 is absolutely horrible, and a second or third parity disk would not help with that. I suppose there's nothing stopping a controller adding a second disk's worth of duplicate parity information when striping a four or more disk RAID 5 array, but I thought that's basically what RAID 6 was. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
On Wed, 19 Mar 2008, Craig Ringer wrote: [EMAIL PROTECTED] wrote: you can? I've never seen a raid 5 setup with more then a single parity dirve (or even the option of having more then one drives worth of redundancy). you can have hot-spare drives, but thats a different thing. With RAID 4, where the parity drives are in fact dedicated to parity information, the controller could just store the parity data mirrored on more than one drive. Unfortunately write performance on RAID 4 is absolutely horrible, and a second or third parity disk would not help with that. I suppose there's nothing stopping a controller adding a second disk's worth of duplicate parity information when striping a four or more disk RAID 5 array, but I thought that's basically what RAID 6 was. just duplicating the Raid 4 or 5 pairity information will not help you if the parity drive is not one of the drives that fail. raid 6 uses a different pairity algorithm so that any two drives in the array can fail with no data loss. even this isn't completely error proof. I just went through a scare with a 15 disk array where it reported 3 dead drives after a power outage. one of the dead drives ended up being the hot-spare, and another drive that acted up worked well enough to let me eventually recover all the data (seek errors), but it was a very scary week while I worked through this. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner mis-estimation using nested loops followup
At 00:24 08/03/19, Matthew wrote: On Tue, 18 Mar 2008, Chris Kratz wrote: In moderately complex to very complex ad hoc queries in our system, we were consistently having the system massively underestimate the number of rows coming out of join at a low level making these queries very slow and inefficient. I have long thought that perhaps Postgres should be a little more cautious about its estimates, and assume the worst case scenario sometimes, rather than blindly following the estimates from the statistics. The problem is that Postgres uses the statistics to generate best estimates of the cost. However, it does not take into account the consequences of being wrong. If it was more clever, then it may be able to decide to use a non-optimal algorithm according to the best estimate, if the optimal algorithm has the possibility of blowing up to 1000 times the work if the estimates are off by a bit. Such cleverness would be very cool, but (I understand) a lot of work. It would hopefully solve this problem. Matthew Just a crazy thought. If Postgres could check its own estimates or set some limits while executing the query and, if it found that the estimates were way off, fall back to a less optimal plan immediately or the next time, that would be cool. KC -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
[EMAIL PROTECTED] writes: On Tue, 18 Mar 2008, Gregory Stark wrote: You can have as many parity drives as you want with RAID 5 too. you can? I've never seen a raid 5 setup with more then a single parity dirve (or even the option of having more then one drives worth of redundancy). you can have hot-spare drives, but thats a different thing. what controller/software lets you do this? Hm, some research shows I may have completely imagined this. I don't see why you couldn't but I can't find any evidence that this feature exists. I could have sworn I've seen it before though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
[EMAIL PROTECTED] wrote: just duplicating the Raid 4 or 5 pairity information will not help you if the parity drive is not one of the drives that fail. Good point - and no doubt why nothing supports extra disks worth of parity on RAID 5, which would be entirely useless (still only protecting against a 1-disk failure but wasting more space). Except, apparently, the earlier poster's RAID 5 controller that DOES support extra parity disks. It must just be hot spares, nothing else makes any sense. even this isn't completely error proof. I just went through a scare with a 15 disk array where it reported 3 dead drives after a power outage. one of the dead drives ended up being the hot-spare, and another drive that acted up worked well enough to let me eventually recover all the data (seek errors), but it was a very scary week while I worked through this. As file systems can be corrupted, files deleted, etc, I try to make sure that all my data is sufficiently well backed up that a week's worth of recovery effort is never needed. Dead array? Rebuild and restore from backups. Admittedly this practice has arisen because of a couple of scares much like you describe, but at least now it happens. I even test the backups ;-) Big SATA 7200rpm disks are so cheap compared to high performance SAS or even 10kRPM SATA disks that it seems like a really bad idea not to have a disk-based backup server with everything backed up quick to hand. For that reason I'm absolutely loving PostgreSQL's archive_wal feature and support for a warm spare server. I can copy the WAL files to another machine and immediately restore them there (providing a certain level of inherent testing) as well as writing them to tape. It's absolutely wonderful. Sure, the warm spare will run like a man in knee-deep mud, but it'll do in an emergency. The existing database used by the app I'm working to replace is an ISAM-based single host shared-file DB where all the user processes access the DB directly. Concurrency is only supported through locking, there's no transaction support, referential integrity checking, data typing, no SQL of any sort, AND it's prone to corruption and data loss if a user process is killed. User processes are killed when the user's terminal is closed or loses its connection. Backups are only possible once a day when all users are logged off. It's not an application where losing half a day of data is fun. On top of all that it runs on SCO OpenServer 5.0.5 (which has among other things the most broken C toolchain I've ever seen). So ... hooray for up-to-date, well tested backups and how easy PostgreSQL makes them. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
Gregory Youngblood wrote: In my way of thinking, and what I was referring to above, was using those error conditions to identify drives to change before the reported complete failures. Yes, that will mean changing drives before SMART actually says there is a full failure, and you may have to fight to get a drive replaced under warranty when you do so, but you are protecting your data. I actually find it surprisingly easy to get a disk replaced based on a printed SMART report showing uncorrectable sectors or just very high reallocated sector counts etc. Almost suspiciously easy. I would not be at all surprised if the disk vendors are, at least for their 7200rpm SATA disks, recording a black mark against the serial number, doing a low level reformat and sending them back out as a new disk to another customer. Some of the new disks I've received have lifetimes and logs that suggest they might be such refurbs - much longer test logs than most new drives for example, as well as earlier serial numbers than others ordered at the same time. They're also much, much more likely to be DOA or develop defects early. I agree with you completely that waiting for SMART to actually indicate a true failure is pointless due to the thresholds set by mfrs. But using SMART for early warning signs still has value IMO. I could not agree more. smartmontools is right up there with tools like wireshark, mrt, and tcptraceroute in my most-vital toolbox, and it's mostly because of its ability to examine the vendor attributes and kick off scheduled self tests. I've saved a great deal of dead-disk-replacement hassle by ensuring that smartd is configured to run extended self tests on the disks in all the machines I operate at least fortnightly, and short tests at least weekly. Being able to plan ahead to swap a dying disk is very nice indeed. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] TB-sized databases
Ron Mayer wrote: Tom Lane wrote: Ron Mayer [EMAIL PROTECTED] writes: Would another possible condition for considering Cartesian joins be be: * Consider Cartesian joins when a unique constraint can prove that at most one row will be pulled from one of the tables that would be part of this join? What for? That would still lead us to consider large numbers of totally useless joins. Often I get order-of-magnitude better queries by forcing the cartesian join even without multi-column indexes. Ah - and sometimes even 2 order of magnitude improvements. 1.1 seconds with Cartesian join, 200 seconds if it doesn't use it. logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host = 'www.real.com' and ref_path = '/products/player/more_info/moreinfo.html' and ref_query = '?ID=370DC=LANG=PN=RealOne%20PlayerPV=6.0.11.818PT=OS=CM=CMV=LS=RE=RA=RV=' and useragent = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows 98)' offset 0 ) as a; QUERY PLAN -- Nested Loop (cost=6465.12..7575.91 rows=367 width=2096) (actual time=1118.741..1119.207 rows=122 loops=1) - Limit (cost=0.00..14.22 rows=1 width=218) (actual time=0.526..0.542 rows=1 loops=1) - Nested Loop (cost=0.00..14.22 rows=1 width=218) (actual time=0.524..0.537 rows=1 loops=1) - Index Scan using i_ref__val on d_ref (cost=0.00..7.83 rows=1 width=127) (actual time=0.168..0.170 rows=1 loops=1) Index Cond: (((ref_path)::text = '/products/player/more_info/moreinfo.html'::text) AND ((ref_host)::text = 'www.real.com'::text) AND ((ref_query)::text = '?ID=370DC=LANG=PN=RealOne%20PlayerPV=6.0.11.818PT=OS=CM=CMV=LS=RE=RA=RV='::text)) - Index Scan using i_uag__val on d_uag (cost=0.00..6.38 rows=1 width=91) (actual time=0.347..0.355 rows=1 loops=1) Index Cond: ((useragent)::text = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows 98)'::text) - Bitmap Heap Scan on fact (cost=6465.12..7556.18 rows=367 width=32) (actual time=1118.196..1118.491 rows=122 loops=1) Recheck Cond: ((fact.uag_id = a.uag_id) AND (fact.ref_id = a.ref_id)) - BitmapAnd (cost=6465.12..6465.12 rows=367 width=0) (actual time=1115.565..1115.565 rows=0 loops=1) - Bitmap Index Scan on i__fact__uag_id (cost=0.00..2770.83 rows=196223 width=0) (actual time=813.859..813.859 rows=1183470 loops=1) Index Cond: (fact.uag_id = a.uag_id) - Bitmap Index Scan on i__fact__ref_id (cost=0.00..3581.50 rows=253913 width=0) (actual time=8.667..8.667 rows=13751 loops=1) Index Cond: (fact.ref_id = a.ref_id) Total runtime: 1122.245 ms (15 rows) logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host = 'www.real.com' and ref_path = '/products/player/more_info/moreinfo.html' and ref_query = '?ID=370DC=LANG=PN=RealOne%20PlayerPV=6.0.11.818PT=OS=CM=CMV=LS=RE=RA=RV=' and useragent = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows 98)' ) as a; QUERY PLAN Hash Join (cost=2827.72..398919.05 rows=1 width=242) (actual time=114138.193..200622.416 rows=122 loops=1) Hash Cond: (fact.ref_id = d_ref.ref_id) - Nested Loop (cost=2819.88..398908.65 rows=511 width=119) (actual time=1524.600..199522.182 rows=1183470 loops=1) - Index Scan using i_uag__val on d_uag (cost=0.00..6.38 rows=1 width=91) (actual time=0.023..0.033 rows=1 loops=1) Index Cond: ((useragent)::text = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows 98)'::text) - Bitmap Heap Scan on fact (cost=2819.88..396449.49 rows=196223 width=32) (actual time=1524.562..197627.135 rows=1183470 loops=1) Recheck Cond: (fact.uag_id = d_uag.uag_id) - Bitmap Index Scan on i__fact__uag_id (cost=0.00..2770.83 rows=196223 width=0) (actual time=758.888..758.888 rows=1183470 loops=1) Index Cond: (fact.uag_id = d_uag.uag_id) - Hash (cost=7.83..7.83 rows=1 width=127) (actual time=0.067..0.067 rows=1 loops=1) - Index Scan using i_ref__val on d_ref (cost=0.00..7.83 rows=1 width=127) (actual
Re: [PERFORM] question on TRUNCATE vs VACUUM FULL
So my question is this: Shouldn’t VACUUM FULL clean Table C and reclaim all its space? You've got concepts mixed up. TRUNCATE deletes all of the data from a particular table (and works in all dbms's). http://www.postgresql.org/docs/8.3/interactive/sql-truncate.html VACUUM FULL is a postgres-specific thing which does work behind the scenes to clean up MVCC left-overs. It does not touch any current data or records in the table, it's purely behind the scenes work. http://www.postgresql.org/docs/current/interactive/sql-vacuum.html The two have completely different uses and nothing to do with each other what-so-ever. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
On Wed, 2008-03-19 at 07:44 +0900, Craig Ringer wrote: Gregory Youngblood wrote: Also, a very informative read: http://research.google.com/archive/disk_failures.pdf In short, best thing to do is watch SMART and be prepared to try and swap a drive out before it fails completely. :) I currently have four brand new 1TB disks (7200RPM SATA - they're for our backup server). Two of them make horrible clicking noises - they're rapidly parking and unparking or doing constant seeks. One of those two also spins up very loudly, and on spin down rattles and buzzes. Their internal SMART health check reports the problem two to be just fine, and both pass a short SMART self test (smartctl -d ata -t short). Both have absurdly huge seek_error_rate values, but the SMART thresholds see nothing wrong with this. --8 snip 8-- In that Google report, one of their conclusions was that after the first scan error drives were 39 times more likely to fail within the next 60 days. And, first errors in reallocations, etc. also correlated to higher failure probabilities. In my way of thinking, and what I was referring to above, was using those error conditions to identify drives to change before the reported complete failures. Yes, that will mean changing drives before SMART actually says there is a full failure, and you may have to fight to get a drive replaced under warranty when you do so, but you are protecting your data. I agree with you completely that waiting for SMART to actually indicate a true failure is pointless due to the thresholds set by mfrs. But using SMART for early warning signs still has value IMO.