Re: [PERFORM] FSM - per database or per installation?
On 20/11/2009 2:33 AM, Heikki Linnakangas wrote: Craig James wrote: Are the FSM parameters for each database, or the entire Postgres system? In other words, if I have 100 databases, do I need to increase max_fsm_pages and max_fsm_relations by a factor of 100, or keep them the same as if I just have one database? I suspect they're per-database, i.e. as I add databases, I don't have to increase the FSM parameters, but the documentation isn't 100% clear on this point. It's per cluster, ie *not* per-database. The parameter is gone in 8.4, BTW. See: http://www.postgresql.org/docs/8.4/static/release-8-4.html#AEN95067 for why they've been removed, which boils down to PostgreSQL manages the fsm automatically now and no longer requires all that RAM to do it, either. Thanks Heikki - the fsm _really_ simplify admin and remove a bunch of common gotchas for Pg users. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Optimizer use of index slows down query by factor
Hi, we experience some strange performance problems, we've already found a workaround for us, but are curious if it's a known problem of the optimizer. Tested with the following Postgres Version: 8.2.15 and 8.3.9 AUTOVACUUM is enabled, explicit VACUUM and REINDEX both tables and the whole DB. effective_cache_size = 3096MB default_statistics_target = 100 shared_buffers = 1024MB work_mem = 64MB Table Schema: Table click Column | Type | Modifiers -+-+--- click_id| integer | not null member_id | integer | link_id | integer | click_timestamp | bigint | remote_host | text| user_agent | text| Indexes: click_pkey PRIMARY KEY, btree (click_id) idx_click_1 btree (link_id) idx_click_2 btree (click_timestamp) Table link Column | Type | Modifiers +-+--- link_id| integer | not null link_url | text| task_id| integer | link_type | integer | action_id | integer | link_alias | text| deleted| boolean | deletable | boolean | Indexes: link_pkey PRIMARY KEY, btree (link_id) idx_link_1 btree (task_id) Rows in click table contains: 22874089 Rows in link table contains:4220601 The following query is slow when index scan is enabled: SELECT link.link_alias,link.link_type,COUNT(click.click_id),COUNT(distinct click.member_id) FROM link LEFT JOIN click ON link.link_id=click.link_id WHERE (link.link_type=8 OR link.link_type=9) AND link.task_id=1556 AND (link.deletable IS NULL OR link.deletable=false)GROUP BY link.link_type,link.link_alias LIMIT 1000 Explain with index scan enabled: explain analyze SELECT link.link_alias,link.link_type,COUNT(click.click_id),COUNT(distinct click.member_id) FROM link LEFT JOIN click ON link.link_id=click.link_id WHERE (link.link_type=8 OR link.link_type=9) AND link.task_id=1556 AND (link.deletable IS NULL OR link.deletable=false)GROUP BY link.link_type,link.link_alias LIMIT 1000; QUERY PLAN - Limit (cost=1416936.47..1417849.48 rows=1 width=30) (actual time=277062.951..277073.144 rows=12 loops=1) - GroupAggregate (cost=1416936.47..1417849.48 rows=1 width=30) (actual time=277062.949..277073.126 rows=12 loops=1) - Sort (cost=1416936.47..1417119.07 rows=73040 width=30) (actual time=277062.820..277066.219 rows=6445 loops=1) Sort Key: link.link_type, link.link_alias Sort Method: quicksort Memory: 696kB - Merge Right Join (cost=1604.91..1411036.15 rows=73040 width=30) (actual time=277027.644..277050.946 rows=6445 loops=1) Merge Cond: (click.link_id = link.link_id) - Index Scan using idx_click_1 on click (cost=0.00..1351150.42 rows=22874088 width=12) (actual time=6.915..263327.439 rows=22409997 loops=1) - Sort (cost=1604.91..1638.61 rows=13477 width=26) (actual time=12.172..15.640 rows=6445 loops=1) Sort Key: link.link_id Sort Method: quicksort Memory: 33kB - Index Scan using idx_link_1 on link (cost=0.00..680.51 rows=13477 width=26) (actual time=5.707..12.043 rows=126 loops=1) Index Cond: (task_id = 1556) Filter: (((deletable IS NULL) OR (NOT deletable)) AND ((link_type = 8) OR (link_type = 9))) Total runtime: 277082.204 ms (15 rows) Explain with set enable_indexscan=false; explain analyze SELECT link.link_alias,link.link_type,COUNT(click.click_id),COUNT(distinct click.member_id) FROM link LEFT JOIN click ON link.link_id=click.link_id WHERE (link.link_type=8 OR link.link_type=9) AND link.task_id=1556 AND (link.deletable IS NULL OR link.deletable=false)GROUP BY link.link_type,link.link_alias LIMIT 1000; QUERY PLAN -- Limit (cost=2577764.28..2578677.29 rows=1 width=30) (actual time=51713.324..51723.517 rows=12 loops=1) - GroupAggregate (cost=2577764.28..2578677.29 rows=1 width=30) (actual time=51713.322..51723.499 rows=12 loops=1) - Sort (cost=2577764.28..2577946.88 rows=73040 width=30) (actual time=51713.191..51716.600 rows=6445 loops=1) Sort Key: link.link_type, link.link_alias Sort Method: quicksort Memory: 696kB - Hash Left Join (cost=1140942.18..2571863.96 rows=73040 width=30) (actual time=45276.194..51702.053 rows=6445 loops=1) Hash Cond: (link.link_id = click.link_id) - Bitmap Heap Scan on link (cost=253.20..34058.86 rows=13477 width=26) (actual time=0.044..0.168 rows=126
[PERFORM] Multicolumn index - WHERE ... ORDER BY
Hi there, I've got a small question about multicolumn indexes. I have a table with ~5M rows (43 bytes per column - is that relevant?) (but eventually it could grow up to 50M rows), used to store e-mail logs. I am trying to build a web frontend to search mails in this table. I usually want only the last mails processed by my mail system, so typically all my queries end with: ... ORDER BY time LIMIT 50; Before that, I have usually have a WHERE clause on a indexed column. Example of a query I might have: SELECT id FROM mail WHERE from_address LIKE 'bill%' ORDER BY time DESC LIMIT 50; I observed that the ordering adds a significant overhead to my queries - this seems quite logical, because of the ORDER BY which has to inspect every row matching the WHERE clause. The approach taken by the query planner is one of the following: 1) if it thinks there are not so much rows containg 'bill' as prefix of the 'from_address' column, it performs an index scan (or a bitmap index scan) using my index on 'from_address', then sorts all results according to the 'time' column. 2) if it thinks there are many rows containing 'bill' as prefix of the 'from_address' column, it performs an reverse index scan using my index on 'time', and looks sequentially if the 'from_address' column contains 'bill' as prefix. The problem is that not so much is in my case approx 10K rows sometimes. It seems to be pretty costly to perform an (bitmap) index scan over all these rows. As I only want the first few rows anyway (LIMIT 50), I thought that there had to be some better solution. The solution I had in mind was to create a multicolumn index over 'from_address' and 'time': CREATE INDEX idx_from_time ON mail (from_address, time DESC); so that it could directly use the 'time' ordering and lookup only the first 50 rows using the index. but... it doesn't work :-) i.e. my multicolumn index is never used. So: - do you guys have any ideas why it doesn't work? - do you see an alternative solution? Infos: - I use PostgreSQL 8.4.2 - I regularly VACUUM and ANALYZE my db. Statistics look OK. - I'm relatively new to PostgreSQL, so maybe this question is trivial? Thanks in advance, and happy holidays! -- lucas maystre trainee open systems ag raeffelstrasse 29 ch-8045 zurich t: +41 44 455 74 00 f: +41 44 455 74 01 l...@open.ch http://www.open.ch -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] SATA drives performance
Hi all, I'm trying to figure out which HW configuration with 3 SATA drives is the best in terms of reliability and performance for Postgres database. I'm thinking to connect two drives in RAID 0, and to keep the database (and WAL) on these disks - to improve the write performance of the SATA drives. The third drive will be used to reduce the cost of the RAID 0 failure without reducing the performance. Say, I could configure Postgres to use the third drive as backup for WAL files, with archive_timeout set to 15 minutes. Daily backups will be created on different server. Loss of last 15 minute updates is something the customer can afford. Also, one day restore time is case of failure is also affordable (to reinstall the OS, Postgres, restore backup, and load WALs). The server will be remotely administered, that is why I'm not going for RAID 1, 1+0 or some other solution for which, I beleive, the local administion is crucial. Server must be low budget, that is why I'm avoiding SAS drives. We will use CentOS Linux and Postgres 8.4. The database will have 90% of read actions, and 10% of writes. I would like to hear your opinion, is this reasonable or I should reconsider RAID 1? Regards, Ognjen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Performance with partitions/inheritance and multiple tables
Hi, We currently have a large table (9 million rows) of which only the last couple of days worth of data is queried on a regular basis. To improve performance we are thinking of partitioning the table. One idea is: Current_data = last days worth archive_data today (goes back to 2005) The idea being proposed at work is: current_data = today's data prior years data - be broken down into one table per day archive_data - data older than a year. My question is: a) Does postgres suffer a performance hit say if there are 200 child tables. b) What about aggregation between dates in the last year. eg total sales for firm a for the last year. It will need to look up n number of tables. Any ideas, tips, gotchas in implementing partitioning would be welcome. It is a somewhat mission critical (not trading, so not as mission critical) system. How expensive is maintaining so many partitions both in terms of my writing / maintaining scripts and performance. Thanks in advance. Radhika
Re: [PERFORM] SATA drives performance
2009/12/24 Ognjen Blagojevic ogn...@etf.bg.ac.yu: Hi all, I'm trying to figure out which HW configuration with 3 SATA drives is the best in terms of reliability and performance for Postgres database. I'm thinking to connect two drives in RAID 0, and to keep the database (and WAL) on these disks - to improve the write performance of the SATA drives. The third drive will be used to reduce the cost of the RAID 0 failure without reducing the performance. Say, I could configure Postgres to use the third drive as backup for WAL files, with archive_timeout set to 15 minutes. Daily backups will be created on different server. Loss of last 15 minute updates is something the customer can afford. Also, one day restore time is case of failure is also affordable (to reinstall the OS, Postgres, restore backup, and load WALs). The server will be remotely administered, that is why I'm not going for RAID 1, 1+0 or some other solution for which, I beleive, the local administion is crucial. Server must be low budget, that is why I'm avoiding SAS drives. We will use CentOS Linux and Postgres 8.4. The database will have 90% of read actions, and 10% of writes. I would like to hear your opinion, is this reasonable or I should reconsider RAID 1? If you're running RAID-0 and suffer a drive failure, the system becomes somewhat less cheaper because you now have to rescue it and get it up and running again. I.e. you've moved your cost from hardware to your time. I'd recommend RAID-1 with a 3 disk mirror. Linux now knows to read from 1 drive at a time even for a single user to get very good read bandwidth ( I routinely see read speeds on a pair of WD Black 7200 RPM SATA drives approaching 200MB/s (they are ~100MB/s each). Your redundancy is increased, so that should one drive fail you're still completely redundant. Also 1TB drives are CHEAP nowadays, even the WD blacks and similar drives from other manufacturers. If you need more storage than a single 1TB drive can provide, then you'll need some other answer. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SATA drives performance
A couple of thoughts occur to me: 1. For reads, RAID 1 should also be good: it will allow a read to occur from whichever disk can provide the data fastest. 2. Also, for reads, the more RAM you have, the better (for caching). I'd suspect that another 8GB of RAM is a better expenditure than a 2nd drive in many cases. 3. RAID 0 is twice as unreliable as no raid. I'd recommend using RAID 1 intead. If you use the Linux software mdraid, remote admin is easy. 4. If you can tolerate the risk of the most recent transactions being lost, look at asynchronous commit. Likewise, you *might* consider operating with a write cache enabled. Otherwise, the time for fdatasync() is what's critical. 5. For a 2-disk setup, I think that main DB on one, with WAL on the other will beat having everything on a single RAID0. 6. The WAL is relatively small: you might consider a (cheap) solid-state disk for it. 7. If you have 3 equal disks, try doing some experiments. My inclination would be to set them all up with ext4, then have the first disk set up as a split between OS and WAL; the 2nd disk set up for /var/lib/postgresql, and the 3rd disk as a backup for everything (and a spare OS with SSH access). 8. Lastly, if you need remote administration, and can justify another £100 or so, the HP iLO (integrated lights out) cards are rather useful: these effectively give you VNC without OS support, even for the BIOS. Best wishes, Richard Ognjen Blagojevic wrote: Hi all, I'm trying to figure out which HW configuration with 3 SATA drives is the best in terms of reliability and performance for Postgres database. I'm thinking to connect two drives in RAID 0, and to keep the database (and WAL) on these disks - to improve the write performance of the SATA drives. The third drive will be used to reduce the cost of the RAID 0 failure without reducing the performance. Say, I could configure Postgres to use the third drive as backup for WAL files, with archive_timeout set to 15 minutes. Daily backups will be created on different server. Loss of last 15 minute updates is something the customer can afford. Also, one day restore time is case of failure is also affordable (to reinstall the OS, Postgres, restore backup, and load WALs). The server will be remotely administered, that is why I'm not going for RAID 1, 1+0 or some other solution for which, I beleive, the local administion is crucial. Server must be low budget, that is why I'm avoiding SAS drives. We will use CentOS Linux and Postgres 8.4. The database will have 90% of read actions, and 10% of writes. I would like to hear your opinion, is this reasonable or I should reconsider RAID 1? Regards, Ognjen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizer use of index slows down query by factor
Michael Ruf m...@inxmail.de writes: we experience some strange performance problems, we've already found a workaround for us, but are curious if it's a known problem of the optimizer. I think you need to see about getting this rowcount estimate to be more accurate: - Index Scan using idx_link_1 on link (cost=0.00..680.51 rows=13477 width=26) (actual time=5.707..12.043 rows=126 loops=1) Index Cond: (task_id = 1556) Filter: (((deletable IS NULL) OR (NOT deletable)) AND ((link_type = 8) OR (link_type = 9))) If it realized there'd be only 126 rows out of that scan, it'd probably have gone for a nestloop join against the big table, which I think would be noticeably faster than either of the plans you show here. You already did crank up default_statistics_target, so I'm not sure if raising it further would help any. What I'd suggest is trying to avoid using non-independent AND/OR conditions. For instance recasting the first OR as just deletable is not true would probably result in a better estimate. The size of the error seems to be more than that would account for though, so I suspect that the deletable and link_type conditions are interdependent. Is it practical to recast your data representation to avoid that? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance with partitions/inheritance and multiple tables
Radhika, If the data is 9 million rows, then I would suggest that you leave it as it is, unless the server configuration and the number of users firing queries simultaneously is a matter of concern. Try creating indexes on often used fields and use EXPLAIN to speed performance of the queries ... and of course proper configuration of autovacuum. I have seen query results within a few ms. on similar amount of data on a 2GB RHEL RAID 5 system, so it should not have been an issue. HTH, Shrirang Chitnis -- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Radhika S Sent: Thursday, December 24, 2009 8:12 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] Performance with partitions/inheritance and multiple tables Hi, We currently have a large table (9 million rows) of which only the last couple of days worth of data is queried on a regular basis. To improve performance we are thinking of partitioning the table. One idea is: Current_data = last days worth archive_data today (goes back to 2005) The idea being proposed at work is: current_data = today's data prior years data - be broken down into one table per day archive_data - data older than a year. My question is: a) Does postgres suffer a performance hit say if there are 200 child tables. b) What about aggregation between dates in the last year. eg total sales for firm a for the last year. It will need to look up n number of tables. Any ideas, tips, gotchas in implementing partitioning would be welcome. It is a somewhat mission critical (not trading, so not as mission critical) system. How expensive is maintaining so many partitions both in terms of my writing / maintaining scripts and performance. Thanks in advance. Radhika The information contained in this message, including any attachments, is attorney privileged and/or confidential information intended only for the use of the individual or entity named as addressee. The review, dissemination, distribution or copying of this communication by or to anyone other than the intended addressee is strictly prohibited. If you have received this communication in error, please immediately notify the sender by replying to the message and destroy all copies of the original message. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SATA drives performance
Richard Neill wrote: 3. RAID 0 is twice as unreliable as no raid. I'd recommend using RAID 1 intead. If you use the Linux software mdraid, remote admin is easy. The main thing to be wary of with Linux software RAID-1 is that you configure things so that both drives are capable of booting the system. It's easy to mirror the data, but not the boot loader and the like. 7. If you have 3 equal disks, try doing some experiments. My inclination would be to set them all up with ext4... I have yet to yet a single positive thing about using ext4 for PostgreSQL. Stick with ext3, where the problems you might run into are at least well understood and performance is predictable. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Multicolumn index - WHERE ... ORDER BY
Lucas Maystre l...@open.ch writes: Example of a query I might have: SELECT id FROM mail WHERE from_address LIKE 'bill%' ORDER BY time DESC LIMIT 50; The solution I had in mind was to create a multicolumn index over 'from_address' and 'time': CREATE INDEX idx_from_time ON mail (from_address, time DESC); so that it could directly use the 'time' ordering and lookup only the first 50 rows using the index. but... it doesn't work :-) i.e. my multicolumn index is never used. So: - do you guys have any ideas why it doesn't work? The from_address condition isn't simple equality, so the output of a scan wouldn't be sorted by time --- it would have subranges that are sorted, but that's no help overall. You still have to read the whole scan output and re-sort. So this index has no advantage over the smaller index on just from_address. - do you see an alternative solution? There might be some use in an index on (time, from_address). That gives the correct time ordering, and at least the prefix part of the from_address condition can be checked in the index without visiting the heap. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SATA drives performance
Hello, Instead of using 3 disks in RAID-0 and one without RAID for archive, I would rather invest into one extra disk and have either a RAID 1+0 setup or use two disks in RAID-1 for the WAL and two disks in RAID-1 for the main database (I'm not sure which perform better between those two solutions). RAID-1 will give you about twice as fast reads as no RAID (and RAID 1+0 will give you twice as fast as RAID 0), with no significant penalty for writing, and it'll save a lot of manpower in case on disk dies. If you can afford hot-swappable disks, you can even replace a failed disk live, in a few minutes, with no failure at software level. Everything can be remotely setup, including adding/removing a disk from RAID array, if you use Linux software RAID (mdadm), except of course the physical swap of the disk, but that can be done by a non-technician. This solution costs only one extra disk (which is quite cheap nowadays) and will deliver enhanced performances and save a lot of manpower and downtime in case of disk breaking. Regards, -- Gaël Le Mignot - g...@pilotsystems.net Pilot Systems - 9, rue Desargues - 75011 Paris Tel : +33 1 44 53 05 55 - www.pilotsystems.net Gérez vos contacts et vos newsletters : www.cockpit-mailing.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SATA drives performance
Gaël Le Mignot wrote: This solution costs only one extra disk (which is quite cheap nowadays) I would wager that the system being used here only has enough space to house 3 drives, thus the question, which means that adding a fourth drive probably requires buying a whole new server. Nowadays the drives themselves are rarely the limiting factor on how many people use, since you can get a stack of them for under $100 each. Instead the limit for small servers is always based on the physical enclosure and then potentially the number of available drive ports. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SATA drives performance
On 12/24/2009 10:51 AM, Greg Smith wrote: 7. If you have 3 equal disks, try doing some experiments. My inclination would be to set them all up with ext4... I have yet to yet a single positive thing about using ext4 for PostgreSQL. Stick with ext3, where the problems you might run into are at least well understood and performance is predictable. Hi Greg: Can you be more specific? I am using ext4 without problems than I have discerned - but mostly for smaller databases (~10 databases, one almost about 1 Gbyte, most under 500 Mbytes). Is it the delayed allocation feature that is of concern? I believe this feature is in common with other file systems such as XFS, and provided that the caller is doing things properly according to POSIX and/or the file system authors understanding of POSIX, which includes fsync()/fdatasync()/O_DIRECT (which PostgreSQL does?), everything is fine? File systems failures have been pretty rare for me lately, so it's hard to say for sure whether my setup is really running well until it does fail one day and I find out. (Not too concerned, though, as I keep off site pg_dump backups of the database on a regular schedule - the databases are small enough to afford this :-) ) Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SATA drives performance
Mark Mielke wrote: Can you be more specific? I am using ext4 without problems than I have discerned - but mostly for smaller databases (~10 databases, one almost about 1 Gbyte, most under 500 Mbytes). Every time I do hear about ext4, so far it's always in the context of something that doesn't work well--not hearing about improvements yet. For example, there was a thread on this list earlier this month titled 8.4.1 ubuntu karmic slow createdb that had a number of people chime saying they weren't happy with ext4 for various reasons. Also, I have zero faith in the ability of the Linux kernel development process to produce stable code anymore, they're just messing with too many things every single day. Any major new features that come out of there I assume need a year or two to stabilize before I'll put a production server on them and feel safe, because that this point a stable release means nothing in terms of kernel QA. Something major like a filesystem introduction would be closer to the two year estimate side. We're not even remotely close to stable yet with ext4 when stuff like http://bugzilla.kernel.org/show_bug.cgi?id=14354 is still going on. My rough estimate is that ext4 becomes usable and free of major bugs in late 2010, best case. At this point anyone who deploys it is still playing with fire. File systems failures have been pretty rare for me lately, so it's hard to say for sure whether my setup is really running well until it does fail one day and I find out. All of the ext4 issues I've heard of that worry me are either a) performance related and due to the barrier code not doing what was expected, or b) crash related. No number of anecdotal it works for me reports can make up for those classes of issue because you will only see both under very specific circumstances. I'm glad you have a good backup plan though. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SATA drives performance
Greg Smith wrote: Richard Neill wrote: 3. RAID 0 is twice as unreliable as no raid. I'd recommend using RAID 1 intead. If you use the Linux software mdraid, remote admin is easy. The main thing to be wary of with Linux software RAID-1 is that you configure things so that both drives are capable of booting the system. It's easy to mirror the data, but not the boot loader and the like. Good point. I actually did this on a home PC (2 disks in RAID 1). The solution is simple: just grub-install /dev/sda; grub-install /dev/sdb and that's all you have to do, provided that /boot is on the raid array. Of course, with a server machine, it's nearly impossible to use mdadm raid: you are usually compelled to use a hardware raid card. Those are a pain, and less configurable, but it will take care of the bootloader issue. Obviously, test it both ways. 7. If you have 3 equal disks, try doing some experiments. My inclination would be to set them all up with ext4... I have yet to yet a single positive thing about using ext4 for PostgreSQL. Stick with ext3, where the problems you might run into are at least well understood and performance is predictable. I did some measurements on fdatasync() performance for ext2,ext3,ext4. I found ext2 was fastest, ext4 was twice as slow as ext2, and ext3 was about 5 times slower than ext2. Also, ext4 is doesn't having an appallingly slow fsck. We've had pretty good results from ext4. Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SATA drives performance
On 12/24/2009 05:12 PM, Richard Neill wrote: Of course, with a server machine, it's nearly impossible to use mdadm raid: you are usually compelled to use a hardware raid card. Could you expand on that? - Jeremy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SATA drives performance
Jeremy Harris wrote: On 12/24/2009 05:12 PM, Richard Neill wrote: Of course, with a server machine, it's nearly impossible to use mdadm raid: you are usually compelled to use a hardware raid card. Could you expand on that? Both of the last machines I bought (an IBM X3550 and an HP DL380) come with hardware raid solutions. These are an utter nuisance because: - they can only be configured from the BIOS (or with a bootable utility CD). Linux has very basic monitoring tools, but no way to reconfigure the array, or add disks to empty hot-swap slots while the system is running. - If there is a Linux raid config program, it's not part of the main packaged distro, but usually a pre-built binary, available for only one release/kernel of the wrong distro. - the IBM one had dodgy firmware, which, until updated, caused the disk to totally fail after a few days. - you pay a lot of money for something effectively pointless, and have less control and less flexibility. After my experience with the X3550, I hunted for any server that would ship without hardware raid, i.e. connect the 8 SATA hotswap slots direct to the motherboard, or where the hardware raid could be de-activated completely, and put into pass-through mode. Neither HP nor IBM make such a thing. Richard - Jeremy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SATA drives performance
On Thu, Dec 24, 2009 at 11:09 AM, Richard Neill rn...@cam.ac.uk wrote: Jeremy Harris wrote: On 12/24/2009 05:12 PM, Richard Neill wrote: Of course, with a server machine, it's nearly impossible to use mdadm raid: you are usually compelled to use a hardware raid card. Could you expand on that? Both of the last machines I bought (an IBM X3550 and an HP DL380) come with hardware raid solutions. These are an utter nuisance because: - they can only be configured from the BIOS (or with a bootable utility CD). Linux has very basic monitoring tools, but no way to reconfigure the array, or add disks to empty hot-swap slots while the system is running. - If there is a Linux raid config program, it's not part of the main packaged distro, but usually a pre-built binary, available for only one release/kernel of the wrong distro. - the IBM one had dodgy firmware, which, until updated, caused the disk to totally fail after a few days. - you pay a lot of money for something effectively pointless, and have less control and less flexibility. After my experience with the X3550, I hunted for any server that would ship without hardware raid, i.e. connect the 8 SATA hotswap slots direct to the motherboard, or where the hardware raid could be de-activated completely, and put into pass-through mode. Neither HP nor IBM make such a thing. Yep. And that's why I never order servers from them. There are dozens of reputable white box builders (I use Aberdeen who give me a 5 year all parts warranty and incredible customer service, but there are plenty to choose from) and they build the machine I ask them to build. For hardware RAID I use Areca 1680 series, and they also provide me with machines with software RAID for lighter loads (slave dbs, reporting dbs, and stats dbs) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SATA drives performance
Richard and others, thank you all for your answers. My comments inline. Richard Neill wrote: 2. Also, for reads, the more RAM you have, the better (for caching). I'd suspect that another 8GB of RAM is a better expenditure than a 2nd drive in many cases. The size of the RAM is already four time of the database size, so I believe I won't get any more benefit if it is increased. The number of simultaneous connections to the database is small -- around 5. What I'm trying to do with the hard disk configuration is to increase the write speed. 3. RAID 0 is twice as unreliable as no raid. I'd recommend using RAID 1 intead. If you use the Linux software mdraid, remote admin is easy. No, actually it is HP ML series server with HW RAID. I don't have too much experience with it, but I believe that the remote administration might be hard. And that was the main reason I was avoiding RAID 1. 5. For a 2-disk setup, I think that main DB on one, with WAL on the other will beat having everything on a single RAID0. 6. The WAL is relatively small: you might consider a (cheap) solid-state disk for it. These are exactly the thing I was also considering. -- but needed advice from people who tried it already. Regards, Ognjen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SATA drives performance
This isn't true. IBMs IPS series controllers can the checked and configured via the ipssend utility that works very well in 2.6.x LINUX. Scott Marlowe scott.marl...@gmail.com wrote: On Thu, Dec 24, 2009 at 11:09 AM, Richard Neill rn...@cam.ac.uk wrote: Jeremy Harris wrote: On 12/24/2009 05:12 PM, Richard Neill wrote: Of course, with a server machine, it's nearly impossible to use mdadm raid: you are usually compelled to use a hardware raid card. Could you expand on that? Both of the last machines I bought (an IBM X3550 and an HP DL380) come with hardware raid solutions. These are an utter nuisance because: - they can only be configured from the BIOS (or with a bootable utility CD). Linux has very basic monitoring tools, but no way to reconfigure the array, or add disks to empty hot-swap slots while the system is running. - If there is a Linux raid config program, it's not part of the main packaged distro, but usually a pre-built binary, available for only one release/kernel of the wrong distro. - the IBM one had dodgy firmware, which, until updated, caused the disk to totally fail after a few days. - you pay a lot of money for something effectively pointless, and have less control and less flexibility. After my experience with the X3550, I hunted for any server that would ship without hardware raid, i.e. connect the 8 SATA hotswap slots direct to the motherboard, or where the hardware raid could be de-activated completely, and put into pass-through mode. Neither HP nor IBM make such a thing. Yep. And that's why I never order servers from them. There are dozens of reputable white box builders (I use Aberdeen who give me a 5 year all parts warranty and incredible customer service, but there are plenty to choose from) and they build the machine I ask them to build. For hardware RAID I use Areca 1680 series, and they also provide me with machines with software RAID for lighter loads (slave dbs, reporting dbs, and stats dbs) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Message composed using K-9 mail on Android. Apologies for improper reply quoting (not supported) by client. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SATA drives performance
Adam Tauno Williams wrote: This isn't true. IBMs IPS series controllers can the checked and configured via the ipssend utility that works very well in 2.6.x LINUX. Unfortunately, what we got (in the IBM) was the garbage ServeRaid 8kl card. This one is atrocious - it shipped with a hideous firmware bug. And there is no way to bypass it. The HP have the P400 cards, which are decent in themselves, just not as good as software raid. Richard Scott Marlowe scott.marl...@gmail.com wrote: On Thu, Dec 24, 2009 at 11:09 AM, Richard Neill rn...@cam.ac.uk wrote: Jeremy Harris wrote: On 12/24/2009 05:12 PM, Richard Neill wrote: Of course, with a server machine, it's nearly impossible to use mdadm raid: you are usually compelled to use a hardware raid card. Could you expand on that? Both of the last machines I bought (an IBM X3550 and an HP DL380) come with hardware raid solutions. These are an utter nuisance because: - they can only be configured from the BIOS (or with a bootable utility CD). Linux has very basic monitoring tools, but no way to reconfigure the array, or add disks to empty hot-swap slots while the system is running. - If there is a Linux raid config program, it's not part of the main packaged distro, but usually a pre-built binary, available for only one release/kernel of the wrong distro. - the IBM one had dodgy firmware, which, until updated, caused the disk to totally fail after a few days. - you pay a lot of money for something effectively pointless, and have less control and less flexibility. After my experience with the X3550, I hunted for any server that would ship without hardware raid, i.e. connect the 8 SATA hotswap slots direct to the motherboard, or where the hardware raid could be de-activated completely, and put into pass-through mode. Neither HP nor IBM make such a thing. Yep. And that's why I never order servers from them. There are dozens of reputable white box builders (I use Aberdeen who give me a 5 year all parts warranty and incredible customer service, but there are plenty to choose from) and they build the machine I ask them to build. For hardware RAID I use Areca 1680 series, and they also provide me with machines with software RAID for lighter loads (slave dbs, reporting dbs, and stats dbs) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Message composed using K-9 mail on Android. Apologies for improper reply quoting (not supported) by client. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SATA drives performance
On Thu, Dec 24, 2009 at 3:51 PM, Richard Neill rn...@cam.ac.uk wrote: Adam Tauno Williams wrote: This isn't true. IBMs IPS series controllers can the checked and configured via the ipssend utility that works very well in 2.6.x LINUX. Unfortunately, what we got (in the IBM) was the garbage ServeRaid 8kl card. This one is atrocious - it shipped with a hideous firmware bug. And there is no way to bypass it. The HP have the P400 cards, which are decent in themselves, just not as good as software raid. Yeah, the HP400 gets pretty meh reviews here on the lists. The P600 is adequate and the P800 seems to be a good performer. Can you replace the IBM RAID controller with some other controller? Even just a simple 4 or 8 port SATA card with no RAID capability would be better than something that locks up. Personally I'd call my rep and ask him to come pick up his crap server and give me a check to replace it if it was that bad. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SATA drives performance
Scott Marlowe wrote: On Thu, Dec 24, 2009 at 3:51 PM, Richard Neill rn...@cam.ac.uk wrote: Adam Tauno Williams wrote: This isn't true. IBMs IPS series controllers can the checked and configured via the ipssend utility that works very well in 2.6.x LINUX. Unfortunately, what we got (in the IBM) was the garbage ServeRaid 8kl card. This one is atrocious - it shipped with a hideous firmware bug. And there is no way to bypass it. Can you replace the IBM RAID controller with some other controller? Even just a simple 4 or 8 port SATA card with no RAID capability would be better than something that locks up. A replacement would have been nice, however the 8kl is very tightly integrated with the motherboard and the backplane. We'd have had to buy a PCI-X card, and then get out the soldering iron to fix the cables. To be fair, the 8kl is now working OK; also there was a note in the box mentioning that firmware updates should be applied if available. What I found unbelievable was that IBM shipped the server to me in a state with known crashing firmware (a sufficiently bad bug imho to merit a product recall), and hadn't bothered to flash it themselves in the factory. Usually BIOS updates are only applied by the end user if there is a specific issue to fix, and if the product line has been out for years, but that particular server was only assembled 3 weeks ago, why would one expect a company of IBM's standing to ship it in that state. Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SATA drives performance
On Thu, Dec 24, 2009 at 5:15 PM, Richard Neill rn...@cam.ac.uk wrote: Scott Marlowe wrote: On Thu, Dec 24, 2009 at 3:51 PM, Richard Neill rn...@cam.ac.uk wrote: Adam Tauno Williams wrote: This isn't true. IBMs IPS series controllers can the checked and configured via the ipssend utility that works very well in 2.6.x LINUX. Unfortunately, what we got (in the IBM) was the garbage ServeRaid 8kl card. This one is atrocious - it shipped with a hideous firmware bug. And there is no way to bypass it. Can you replace the IBM RAID controller with some other controller? Even just a simple 4 or 8 port SATA card with no RAID capability would be better than something that locks up. A replacement would have been nice, however the 8kl is very tightly integrated with the motherboard and the backplane. We'd have had to buy a PCI-X card, and then get out the soldering iron to fix the cables. To be fair, the 8kl is now working OK; also there was a note in the box mentioning that firmware updates should be applied if available. What I found unbelievable was that IBM shipped the server to me in a state with known crashing firmware (a sufficiently bad bug imho to merit a product recall), and hadn't bothered to flash it themselves in the factory. Usually BIOS updates are only applied by the end user if there is a specific issue to fix, and if the product line has been out for years, but that particular server was only assembled 3 weeks ago, why would one expect a company of IBM's standing to ship it in that state. It does kind of knock the stuffing out of the argument that buying from the big vendors ensures good hardware experiences. I've had similar problems from all the big vendors in the past. I can't imagine getting treated that way by my current supplied. It's one thing for some obscure bug in a particular ubuntu kernel to interact poorly with a piece of equipment, but when a hardware RAID controller arrives in a basically broken state, that's inexcusable. It's really not too much to expect working hardware on arrival. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance