[PERFORM] Databases optimization
Hi, I have a database cluster running PostgreSQL 8.2 and I have **new Linux virtualized database environment running PostgreSQL 9.0 My question is how to ensure that database schemas are always performing and scalable and databases optimized and entirely migrated Thanks in advance! Hany
[PERFORM] Postgres for a data warehouse, 5-10 TB
I have been a MySQL user for years, including owning a few multi-gigabyte databases for my websites, and using it to host algebra.com (about 12 GB database). I have had my ups and downs with MySQL. The ups were ease of use and decent performance for small databases such as algebra.com. The downs were things like twenty hour REPAIR TABLE operations on a 35 GB table, etc. Right now I have a personal (one user) project to create a 5-10 Terabyte data warehouse. The largest table will consume the most space and will take, perhaps, 200,000,000 rows. I want to use it to obtain valuable business intelligence and to make money. I expect it to grow, never shrink, and to be accessed via batch queries. I do not care for batch queries to be super fast, for example an hour per query would be just fine. However, while an hour is fine, two weeks per query is NOT fine. I have a server with about 18 TB of storage and 48 GB of RAM, and 12 CPU cores. My initial plan was to use MySQL, InnoDB, and deal with problems as they arise. Perhaps, say, I would implement my own joining procedures. After reading some disparaging stuff about InnoDB performance on large datasets, however, I am getting cold feet. I have a general feeling that, perhaps, I will not be able to succeed with MySQL, or, perhaps, with either MySQL and Postgres. I do not know much about Postgres, but I am very eager to learn and see if I can use it for my purposes more effectively than MySQL. I cannot shell out $47,000 per CPU for Oracle for this project. To be more specific, the batch queries that I would do, I hope, would either use small JOINS of a small dataset to a large dataset, or just SELECTS from one big table. So... Can Postgres support a 5-10 TB database with the use pattern stated above? Thanks! i
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On Sun, Sep 11, 2011 at 6:35 AM, Igor Chudov ichu...@gmail.com wrote: I have a server with about 18 TB of storage and 48 GB of RAM, and 12 CPU cores. 1 or 2 fast cores is plenty for what you're doing. But the drive array and how it's configured etc are very important. There's a huge difference between 10 2TB 7200RPM SATA drives in a software RAID-5 and 36 500G 15kRPM SAS drives in a RAID-10 (SW or HW would both be ok for data warehouse.) I do not know much about Postgres, but I am very eager to learn and see if I can use it for my purposes more effectively than MySQL. I cannot shell out $47,000 per CPU for Oracle for this project. To be more specific, the batch queries that I would do, I hope, Hopefully if needs be you can spend some small percentage of that for a fast IO subsystem is needed. would either use small JOINS of a small dataset to a large dataset, or just SELECTS from one big table. So... Can Postgres support a 5-10 TB database with the use pattern stated above? I use it on a ~3TB DB and it works well enough. Fast IO is the key here. Lots of drives in RAID-10 or HW RAID-6 if you don't do a lot of random writing. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
For 10 TB table and 3hours, disks should have a transfer about 1GB/s (seqscan). 2011/9/11, Scott Marlowe scott.marl...@gmail.com: On Sun, Sep 11, 2011 at 6:35 AM, Igor Chudov ichu...@gmail.com wrote: I have a server with about 18 TB of storage and 48 GB of RAM, and 12 CPU cores. 1 or 2 fast cores is plenty for what you're doing. But the drive array and how it's configured etc are very important. There's a huge difference between 10 2TB 7200RPM SATA drives in a software RAID-5 and 36 500G 15kRPM SAS drives in a RAID-10 (SW or HW would both be ok for data warehouse.) I do not know much about Postgres, but I am very eager to learn and see if I can use it for my purposes more effectively than MySQL. I cannot shell out $47,000 per CPU for Oracle for this project. To be more specific, the batch queries that I would do, I hope, Hopefully if needs be you can spend some small percentage of that for a fast IO subsystem is needed. would either use small JOINS of a small dataset to a large dataset, or just SELECTS from one big table. So... Can Postgres support a 5-10 TB database with the use pattern stated above? I use it on a ~3TB DB and it works well enough. Fast IO is the key here. Lots of drives in RAID-10 or HW RAID-6 if you don't do a lot of random writing. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- pasman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On Sun, Sep 11, 2011 at 7:52 AM, Scott Marlowe scott.marl...@gmail.comwrote: On Sun, Sep 11, 2011 at 6:35 AM, Igor Chudov ichu...@gmail.com wrote: I have a server with about 18 TB of storage and 48 GB of RAM, and 12 CPU cores. 1 or 2 fast cores is plenty for what you're doing. I need those cores to perform other tasks, like image manipulation with imagemagick, XML forming and parsing etc. But the drive array and how it's configured etc are very important. There's a huge difference between 10 2TB 7200RPM SATA drives in a software RAID-5 and 36 500G 15kRPM SAS drives in a RAID-10 (SW or HW would both be ok for data warehouse.) Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6 configuration. They are managed by a 3WARE 9750 RAID CARD. I would say that I am not very concerned with linear relationship of read speed to disk speed. If that stuff is somewhat slow, it is OK with me. What I want to avoid is severe degradation of performance due to size (time complexity greater than O(1)), disastrous REPAIR TABLE operations etc. I do not know much about Postgres, but I am very eager to learn and see if I can use it for my purposes more effectively than MySQL. I cannot shell out $47,000 per CPU for Oracle for this project. To be more specific, the batch queries that I would do, I hope, Hopefully if needs be you can spend some small percentage of that for a fast IO subsystem is needed. I am actually open for suggestions here. would either use small JOINS of a small dataset to a large dataset, or just SELECTS from one big table. So... Can Postgres support a 5-10 TB database with the use pattern stated above? I use it on a ~3TB DB and it works well enough. Fast IO is the key here. Lots of drives in RAID-10 or HW RAID-6 if you don't do a lot of random writing. I do not plan to do a lot of random writing. My current design is that my perl scripts write to a temporary table every week, and then I do INSERT..ON DUPLICATE KEY UPDATE. By the way, does that INSERT UPDATE functionality or something like this exist in Postgres? i
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
2011/9/11 pasman pasmański pasma...@gmail.com For 10 TB table and 3hours, disks should have a transfer about 1GB/s (seqscan). I have 6 Gb/s disk drives, so it should be not too far, maybe 5 hours for a seqscan. i 2011/9/11, Scott Marlowe scott.marl...@gmail.com: On Sun, Sep 11, 2011 at 6:35 AM, Igor Chudov ichu...@gmail.com wrote: I have a server with about 18 TB of storage and 48 GB of RAM, and 12 CPU cores. 1 or 2 fast cores is plenty for what you're doing. But the drive array and how it's configured etc are very important. There's a huge difference between 10 2TB 7200RPM SATA drives in a software RAID-5 and 36 500G 15kRPM SAS drives in a RAID-10 (SW or HW would both be ok for data warehouse.) I do not know much about Postgres, but I am very eager to learn and see if I can use it for my purposes more effectively than MySQL. I cannot shell out $47,000 per CPU for Oracle for this project. To be more specific, the batch queries that I would do, I hope, Hopefully if needs be you can spend some small percentage of that for a fast IO subsystem is needed. would either use small JOINS of a small dataset to a large dataset, or just SELECTS from one big table. So... Can Postgres support a 5-10 TB database with the use pattern stated above? I use it on a ~3TB DB and it works well enough. Fast IO is the key here. Lots of drives in RAID-10 or HW RAID-6 if you don't do a lot of random writing. -- Sent via pgsql-performance mailing list ( pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- pasman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On Sun, Sep 11, 2011 at 3:59 PM, Igor Chudov ichu...@gmail.com wrote: Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6 configuration. They are managed by a 3WARE 9750 RAID CARD. I would say that I am not very concerned with linear relationship of read speed to disk speed. If that stuff is somewhat slow, it is OK with me. With Raid 6 you'll have abysmal performance on write operations. In data warehousing, there's lots of writes to temporary files, for sorting and stuff like that. You should either migrate to raid 10, or set up a separate array for temporary files, perhaps raid 0. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On 09/11/2011 07:35 AM, Igor Chudov wrote: I have been a MySQL user for years, including owning a few multi-gigabyte databases for my websites, and using it to host algebra.com http://algebra.com (about 12 GB database). I have had my ups and downs with MySQL. The ups were ease of use and decent performance for small databases such as algebra.com http://algebra.com. The downs were things like twenty hour REPAIR TABLE operations on a 35 GB table, etc. Right now I have a personal (one user) project to create a 5-10 Terabyte data warehouse. The largest table will consume the most space and will take, perhaps, 200,000,000 rows. I want to use it to obtain valuable business intelligence and to make money. I expect it to grow, never shrink, and to be accessed via batch queries. I do not care for batch queries to be super fast, for example an hour per query would be just fine. However, while an hour is fine, two weeks per query is NOT fine. I have a server with about 18 TB of storage and 48 GB of RAM, and 12 CPU cores. My initial plan was to use MySQL, InnoDB, and deal with problems as they arise. Perhaps, say, I would implement my own joining procedures. After reading some disparaging stuff about InnoDB performance on large datasets, however, I am getting cold feet. I have a general feeling that, perhaps, I will not be able to succeed with MySQL, or, perhaps, with either MySQL and Postgres. I do not know much about Postgres, but I am very eager to learn and see if I can use it for my purposes more effectively than MySQL. I cannot shell out $47,000 per CPU for Oracle for this project. To be more specific, the batch queries that I would do, I hope, would either use small JOINS of a small dataset to a large dataset, or just SELECTS from one big table. So... Can Postgres support a 5-10 TB database with the use pattern stated above? Thanks! i That is a scale or two larger than I have experience with. I converted my website database from mysql to PG, and it has several db's between 1 and 10 gig. There are parts of the website that were faster with mysql, and there are parts faster with PG. One spot, because PG has superior join support on select statements, I was able to change the code to generate a single more complicated sql statement vs. mysql that had to fire off several simpler statements. Its a search screen where you can type in 15'ish different options. I was able to generate a single sql statement which joins 8 some odd tables and plenty of where statements. PG runs it in the blink of an eye. Its astonishing compared to the pain of mysql. If you ever have to write your own join, or your own lookup function, that's a failure of your database. One spot that was slower was a batch insert of data. Its not so much slower that it was a problem. I use COPY on PG vs prepared insert's on mysql. It was pretty close, but mysql still won. Seeing as you can setup and test both databases, have you considered a trial run? Things to watch for: I think the same amount of data will use more disk space in PG than in mysql. Importing data into PG should use COPY and multiple connections at the same time. PG will only use multi-core if you use multiple connections. (each connecion uses one core). Huge result sets (like a select statement that returns 1,000,000 rows) will be slow. PG is a much fuller database than mysql, and as such you can influence its join types, and function calls. (table scan vs index, immutable function vs stable, perl function vs sql). So if at first it appears slow, you have a million options. I think the only option you have in mysql is to pull the data back and code it yourself. Upgrading to major versions of PG may or may not be painful. (mysql sometimes works seamlessly between versions, it appears brilliant. But I have had problems with an update, and when it goes bad, you dont have a lot of options). In the past PG's only method of upgrade was a full backup of old, restore in new. Things have gotten better, there is new pg_upgrade support (still kinda new though), and there is some 3rd party replication support where you replicate your 9.0 database to a new 9.1 database, and at some point you promote the new 9.1 database as the new master. Or something like that. I've only read posts about it, never done it. But with that much data, you'll need an upgrade plan. All in all, if I can summarize my personal view: mysql is fast at the expense of safety and usability. (mysql still cannot do update's with subselects). PG is safe and usable at the expense of speed, and you wont be disappointed by the speed. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On Sun, Sep 11, 2011 at 9:16 AM, Claudio Freire klaussfre...@gmail.comwrote: On Sun, Sep 11, 2011 at 3:59 PM, Igor Chudov ichu...@gmail.com wrote: Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6 configuration. They are managed by a 3WARE 9750 RAID CARD. I would say that I am not very concerned with linear relationship of read speed to disk speed. If that stuff is somewhat slow, it is OK with me. With Raid 6 you'll have abysmal performance on write operations. In data warehousing, there's lots of writes to temporary files, for sorting and stuff like that. You should either migrate to raid 10, or set up a separate array for temporary files, perhaps raid 0. Thanks. I will rebuild the RAID array early next week and I will see if I have a Raid 10 option with that card. Quantitatively, what would you say is the write speed difference between RAID 10 and RAID 6?
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On 09/11/2011 08:59 AM, Igor Chudov wrote: I do not plan to do a lot of random writing. My current design is that my perl scripts write to a temporary table every week, and then I do INSERT..ON DUPLICATE KEY UPDATE. By the way, does that INSERT UPDATE functionality or something like this exist in Postgres? i You have two options: 1) write a function like: create function doinsert(_id integer, _value text) returns void as $$ begin update thetable set value = _value where id = _id; if not found then insert into thetable(id, value) values (_id, _value); end if end; $$ language plpgsql; 2) use two sql statements: -- update the existing update realTable set value = (select value from tmp where tmp.id = realTable.id) where exists (select value from tmp where tmp.id = realTable.id); -- insert the missing insert into realTable(id, value) select id, value from tmp where not exists(select 1 from realTable where tmp.id = realTable.id); -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On Sun, Sep 11, 2011 at 4:16 PM, Andy Colson a...@squeakycode.net wrote: Upgrading to major versions of PG may or may not be painful. (mysql sometimes works seamlessly between versions, it appears brilliant. But I have had problems with an update, and when it goes bad, you dont have a lot of options). In the past PG's only method of upgrade was a full backup of old, restore in new. Things have gotten better, there is new pg_upgrade support (still kinda new though), and there is some 3rd party replication support where you replicate your 9.0 database to a new 9.1 database, and at some point you promote the new 9.1 database as the new master. Or something like that. I've only read posts about it, never done it. But with that much data, you'll need an upgrade plan. I have used slony to do database migration. It is a pain to set up, but it saves you hours of downtime. Basically, you replicate your 9.0 database into a 9.1 slave while the 9.0 is still hot and working, so you only have a very small downtime. It's an option, but it's a lot of work to set up, only warranted if you really cannot afford the downtime. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On 09/11/2011 09:21 AM, Igor Chudov wrote: On Sun, Sep 11, 2011 at 9:16 AM, Claudio Freire klaussfre...@gmail.com mailto:klaussfre...@gmail.com wrote: On Sun, Sep 11, 2011 at 3:59 PM, Igor Chudov ichu...@gmail.com mailto:ichu...@gmail.com wrote: Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6 configuration. They are managed by a 3WARE 9750 RAID CARD. I would say that I am not very concerned with linear relationship of read speed to disk speed. If that stuff is somewhat slow, it is OK with me. With Raid 6 you'll have abysmal performance on write operations. In data warehousing, there's lots of writes to temporary files, for sorting and stuff like that. You should either migrate to raid 10, or set up a separate array for temporary files, perhaps raid 0. Thanks. I will rebuild the RAID array early next week and I will see if I have a Raid 10 option with that card. Quantitatively, what would you say is the write speed difference between RAID 10 and RAID 6? Note that using RAID 10, while faster, cuts your usable space in half. 12 2TB drives in raid 10 == 6 drives * 2TB == 12 TB total space. That's not big enough, is it? -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On Sun, Sep 11, 2011 at 4:21 PM, Igor Chudov ichu...@gmail.com wrote: Quantitatively, what would you say is the write speed difference between RAID 10 and RAID 6? https://support.nstein.com/blog/archives/73 There you can see a comparison with 4 drives, and raid 10 is twice as fast. Since raid 5/6 doesn't scale write performance at all (it performs as a single drive), it's quite expected. 12 drives would probably be around 6 times as fast as raid 6. You definitely should do some benchmarks to confirm, though. And Andy is right, you'll have a lot less space. If raid 10 doesn't give you enough room, just leave two spare drives for a raid 0 temporary partition. That will be at least twice as fast as doing temporary tables on the raid 6. You'll obviously have to get creative, tons of options. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On Sun, Sep 11, 2011 at 17:23, Andy Colson a...@squeakycode.net wrote: On 09/11/2011 08:59 AM, Igor Chudov wrote: By the way, does that INSERT UPDATE functionality or something like this exist in Postgres? You have two options: 1) write a function like: create function doinsert(_id integer, _value text) returns void as 2) use two sql statements: Unfortunately both of these options have caveats. Depending on your I/O speed, you might need to use multiple loader threads to saturate the write bandwidth. However, neither option is safe from race conditions. If you need to load data from multiple threads at the same time, they won't see each other's inserts (until commit) and thus cause unique violations. If you could somehow partition their operation by some key, so threads are guaranteed not to conflict each other, then that would be perfect. The 2nd option given by Andy is probably faster. You *could* code a race-condition-safe function, but that would be a no-go on a data warehouse, since each call needs a separate subtransaction which involves allocating a transaction ID. Which brings me to another important point: don't do lots of small write transactions, SAVEPOINTs or PL/pgSQL subtransactions. Besides being inefficient, they introduce a big maintenance burden. In PostgreSQL's MVCC, each tuple contains a reference to the 32-bit transaction ID that inserted it (xmin). After hitting the maximum 32-bit value transaction ID, the number wraps around. To prevent old rows from appearing as new, a vacuum freeze process will run after passing autovacuum_freeze_max_age transactions (200 million by default) to update all old rows in your database. Using fewer transaction IDs means it runs less often. On small databases, this is usually not important. But on a 10TB data warehouse, rewriting a large part of your database totally kills performance for any other processes. This is detailed in the documentation: http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On Sep 11, 2011, at 9:21 AM, Igor Chudov wrote: On Sun, Sep 11, 2011 at 9:16 AM, Claudio Freire klaussfre...@gmail.com wrote: On Sun, Sep 11, 2011 at 3:59 PM, Igor Chudov ichu...@gmail.com wrote: Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6 configuration. They are managed by a 3WARE 9750 RAID CARD. I would say that I am not very concerned with linear relationship of read speed to disk speed. If that stuff is somewhat slow, it is OK with me. With Raid 6 you'll have abysmal performance on write operations. In data warehousing, there's lots of writes to temporary files, for sorting and stuff like that. You should either migrate to raid 10, or set up a separate array for temporary files, perhaps raid 0. Thanks. I will rebuild the RAID array early next week and I will see if I have a Raid 10 option with that card. Quantitatively, what would you say is the write speed difference between RAID 10 and RAID 6? As someone who migrated a RAID 5 installation to RAID 10, I am getting far better read and write performance on heavy calculation queries. Writing on the RAID 5 really made things crawl. For lots of writing, I think RAID 10 is the best. It should also be noted that I changed my filesystem from ext3 to XFS - this is something you can look into as well. Ogden
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On Sun, Sep 11, 2011 at 1:36 PM, Ogden li...@darkstatic.com wrote: As someone who migrated a RAID 5 installation to RAID 10, I am getting far better read and write performance on heavy calculation queries. Writing on the RAID 5 really made things crawl. For lots of writing, I think RAID 10 is the best. It should also be noted that I changed my filesystem from ext3 to XFS - this is something you can look into as well. Ogden RAID 10 on XFS here, too, both in OLTP and Data-warehousing scenarios. Our largest OLTP is ~375 GB, and PostgreSQL performs admirably (we converted from MSSQL to PostgreSQL, and we've had more issues with network bottlenecks since converting (where MSSQL was always the bottleneck before)). Now that we have fiber interconnects between our two main datacenters, I'm actually having to work again haha. But yeah, we tried quite a few file systems, and XFS **for our workloads** performed better than everything else we tested, and RAID 10 is a given if you do any significant writing.
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
Sorry, meant to send this to the list. For really big data-warehousing, this document really helped us: http://pgexperts.com/document.html?id=49
[PERFORM] PostgreSQL performance tweaking on new hardware
I want to thank members on this list which helped me benchmark and conclude that RAID 10 on a XFS filesystem was the way to go over what we had prior. PostgreSQL we have been using with Perl for the last 8 years and it has been nothing but outstanding for us. Things have definitely worked out much better and the writes are much much faster. Since I want the maximum performance from our new servers, I want to make sure the configuration is what is recommended. Things are running fine and queries that would take seconds prior now only take one or two. I have read a lot of guides on tweaking PostgreSQL as well as a book, however, I would like someone to just review the settings I have and let me know if it's too crazy. It's for a considerably heavy write database with a lot of calculation queries (percentages, averages, sums, etc). This is my setup: 2 x Intel E5645 (12 Core CPU total) 64 GB Ram RAID 10 (/var/lib/pgsql lives on it's own RAID controller) on XFS PostgreSQL 9.0.4 on Debian Squeeze Database size about 200Gb. And in postgresql.conf: max_connections = 200 shared_buffers = 8GB temp_buffers = 128MB work_mem = 40MB maintenance_work_mem = 1GB wal_buffers = 16MB effective_cache_size = 48GB seq_page_cost = 1.0 random_page_cost = 1.1 cpu_tuple_cost = 0.1 cpu_index_tuple_cost = 0.05 cpu_operator_cost = 0.01 default_statistics_target = 1000 With these settings, output from free -m (Megabytes): total used free sharedbuffers cached Mem: 64550 56605 7945 0 0 55907 -/+ buffers/cache:697 63852 Swap: 7628 6 7622 top shows: Swap: 7812088k total, 6788k used, 7805300k free, 57343264k cached Any suggestions would be awesome. Thank you Ogden -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
2011/9/11 pasman pasmański pasma...@gmail.com: For 10 TB table and 3hours, disks should have a transfer about 1GB/s (seqscan). Random data point. Our primary servers were built for OLTP with 48 cores and 32 15kSAS drives. We started out on Arecas but the Supermicro 1Us we were using didn't provide enough cooling and the Arecas were burning out after 2 to 4 months, so on those machines, we pulled the Arecas and replaced them with simple LSI SAS non-RAID cards. Both were RAID-10, the latter with linux software RAID. With the Arecas the OLTP performance is outstanding, garnering us ~8500tps at 40 to 50 threads. However, sequentual performance was just so so at around read / write speeds of 500/350MB/s. The SW RAID-10 can read AND write at right around 1GB/s. what it lacks in transactional throughput it more than makes up for in sequential read / write performance. Another data point. We had a big Oracle installation at my last job, and OLAP queries were killing it midday, so I built a simple replication system to grab rows from the big iron Oracle SUN box and shove into a single core P IV 2.xGHz machine with 4 120G SATA drives in SW RAID-10. That machine handily beat the big iron Oracle machine at OLAP queries, running in 20 minutes what was taking well over an hour for the big Oracle machine to do, even during its (Oracle machine) off peak load times. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL performance tweaking on new hardware
On Sun, Sep 11, 2011 at 1:50 PM, Ogden li...@darkstatic.com wrote: I want to thank members on this list which helped me benchmark and conclude that RAID 10 on a XFS filesystem was the way to go over what we had prior. PostgreSQL we have been using with Perl for the last 8 years and it has been nothing but outstanding for us. Things have definitely worked out much better and the writes are much much faster. Since I want the maximum performance from our new servers, I want to make sure the configuration is what is recommended. Things are running fine and queries that would take seconds prior now only take one or two. I have read a lot of guides on tweaking PostgreSQL as well as a book, however, I would like someone to just review the settings I have and let me know if it's too crazy. It's for a considerably heavy write database with a lot of calculation queries (percentages, averages, sums, etc). This is my setup: 2 x Intel E5645 (12 Core CPU total) 64 GB Ram RAID 10 (/var/lib/pgsql lives on it's own RAID controller) on XFS PostgreSQL 9.0.4 on Debian Squeeze Database size about 200Gb. And in postgresql.conf: max_connections = 200 shared_buffers = 8GB temp_buffers = 128MB work_mem = 40MB maintenance_work_mem = 1GB wal_buffers = 16MB effective_cache_size = 48GB seq_page_cost = 1.0 random_page_cost = 1.1 cpu_tuple_cost = 0.1 cpu_index_tuple_cost = 0.05 cpu_operator_cost = 0.01 default_statistics_target = 1000 With these settings, output from free -m (Megabytes): total used free shared buffers cached Mem: 64550 56605 7945 0 0 55907 -/+ buffers/cache: 697 63852 Swap: 7628 6 7622 top shows: Swap: 7812088k total, 6788k used, 7805300k free, 57343264k cached Any suggestions would be awesome. Well, what's your workload like? If you'd like to smooth out lots of heavy writing, then look at cranking up checkpoint_segments, increase checkpoint timeout to 2h, and play with checkpoint completion target. If you write a lot of the same rows over and over, then keep it down in the 0.5 range. If you tend to write all unique rows, then closer to 1.0 is better. We run at 0.8. As you increase checkpoint completion target, you'll increase the amount of writes that have to happen twice to the storage array, so unless you're 100% sure you don't write to the same blocks / tuples a lot, keep it below 1.0. Also if you're NOT using a battery backed caching RAID controller look into upgrading to one. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Databases optimization
I doubt you'll get much useful feedback because your question is too broad for a mailing list answer. If you're looking for basic performance guidelines, I recommend Greg Smith's PostgreSQL 9.0 High Performance [1] (disclaimer: I used to work with Greg and got a free copy), although I don't think he spent much time on running virtualized (which certainly could affect things). Then if you have *specific* hardware or query questions, this list is a great resource. [1]: http://www.2ndquadrant.com/books/postgresql-9-0-high-performance/ --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] RAID Controller (HP P400) beat by SW-RAID?
We've currently got PG 8.4.4 running on a whitebox hardware set up, with (2) 5410 Xeon's, and 16GB of RAM. It's also got (4) 7200RPM SATA drives, using the onboard IDE controller and ext3. A few weeks back, we purchased two refurb'd HP DL360's G5's, and were hoping to set them up with PG 9.0.2, running replicated. These machines have (2) 5410 Xeon's, 36GB of RAM, (6) 10k SAS drives, and are using the HP SA P400i with 512MB of BBWC. PG is running on an ext4 (noatime) partition, and they drives configured as RAID 1+0 (seems with this controller, I cannot do JBOD). I've spent a few hours going back and forth benchmarking the new systems, and have set up the DWC, and the accelerator cache using hpacucli. I've tried accelerator caches of 25/75, 50/50, and 75/25. To start with, I've set the relevant parameters in postgresql.conf the same on the new config as the old: max_connections = 150 shared_buffers = 6400MB (have tried as high as 20GB) work_mem = 20MB (have tried as high as 100MB) effective_io_concurrency = 6 fsync = on synchronous_commit = off wal_buffers = 16MB checkpoint_segments = 30 (have tried 200 when I was loading the db) random_page_cost = 2.5 effective_cache_size = 10240MB (have tried as high as 16GB) First thing I noticed is that it takes the same amount of time to load the db (about 40 minutes) on the new hardware as the old hardware. I was really hoping with the faster, additional drives and a hardware RAID controller, that this would be faster. The database is only about 9GB with pg_dump (about 28GB with indexes). Using pgfouine I've identified about 10 problematic SELECT queries that take anywhere from .1 seconds to 30 seconds on the old hardware. Running these same queries on the new hardware is giving me results in the .2 to 66 seconds. IE, it's twice as slow. I've tried increasing the shared_buffers, and some other parameters (work_mem), but haven't yet seen the new hardware perform even at the same speed as the old hardware. I was really hoping that with hardware RAID that something would be faster (loading times, queries, etc...). What am I doing wrong? About the only thing left that I know to try is to drop the RAID1+0 and go to RAID0 in hardware, and do RAID1 in software. Any other thoughts? Thanks! -- Anthony
Re: [PERFORM] Databases optimization
Thanks Maciek. I really do not know where to start or how to explain my question I am newbie to Postgres. I will try to get more information from the development team and SA's Cheers Hany On Mon, Sep 12, 2011 at 10:22 AM, Maciek Sakrejda msakre...@truviso.comwrote: I doubt you'll get much useful feedback because your question is too broad for a mailing list answer. If you're looking for basic performance guidelines, I recommend Greg Smith's PostgreSQL 9.0 High Performance [1] (disclaimer: I used to work with Greg and got a free copy), although I don't think he spent much time on running virtualized (which certainly could affect things). Then if you have *specific* hardware or query questions, this list is a great resource. [1]: http://www.2ndquadrant.com/books/postgresql-9-0-high-performance/ --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com
Re: [PERFORM] RAID Controller (HP P400) beat by SW-RAID?
* Anthony Presley (anth...@resolution.com) wrote: I was really hoping that with hardware RAID that something would be faster (loading times, queries, etc...). What am I doing wrong? ext3 and ext4 do NOT perform identically out of the box.. You might be running into the write barriers problem here with ext4 forcing the RAID controllers to push commits all the way to the hard drive before returning (thus making the BBWC next to useless). You might try w/ ext3 on the new system instead. Also, the p800's are definitely better than the p400's, but I don't know that it's the controller that's really the issue here.. Thanks, Stephen signature.asc Description: Digital signature
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
* Igor Chudov (ichu...@gmail.com) wrote: Right now I have a personal (one user) project to create a 5-10 Terabyte data warehouse. The largest table will consume the most space and will take, perhaps, 200,000,000 rows. I run data-warehouse databases on that order (current largest single instance is ~4TB running under 9.0.4). If the largest table is only 200M rows, PG should handle that quite well. Our data is partitioned by month and each month is about 200M records and simple queries can run in 15-20 minutes (with a single thread), with complex windowing queries (split up and run in parallel) finishing in a couple of hours. However, while an hour is fine, two weeks per query is NOT fine. What's really, really, really useful are two things: EXPLAIN, and this mailing list. :) Seriously, run EXPLAIN on your queries before you run them and see if how the query is going to be executed makes sense. Here's a real easy hint: if it says External Sort and has big numbers, come talk to us here- that's about one of the worst things you can possibly do. Of course, PG's going to avoid doing that, but you may have written a query (unintentionally) which forces PG to do a sort, or something else. I have a server with about 18 TB of storage and 48 GB of RAM, and 12 CPU cores. If you partition up your data and don't mind things running in different transactions, you can definitely get a speed boost with PG by running things in parallel. PG will handle that very well, in fact, if two queries are running against the same table, PG will actually combine them and only actually read the data from disk once. I cannot shell out $47,000 per CPU for Oracle for this project. The above data warehouse was migrated from an Oracle-based system. :) To be more specific, the batch queries that I would do, I hope, would either use small JOINS of a small dataset to a large dataset, or just SELECTS from one big table. Make sure that you set your 'work_mem' correctly- PG will use that to figure out if it can hash the small table (you want that to happen, trust me..). If you do end up having sorts, it'll also use the work_mem value to figure out how much memory to use for sorting. So... Can Postgres support a 5-10 TB database with the use pattern stated above? Yes, certainly. Thanks, Stephen signature.asc Description: Digital signature
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On Sun, Sep 11, 2011 at 6:01 PM, Stephen Frost sfr...@snowman.net wrote: * Igor Chudov (ichu...@gmail.com) wrote: Right now I have a personal (one user) project to create a 5-10 Terabyte data warehouse. The largest table will consume the most space and will take, perhaps, 200,000,000 rows. I run data-warehouse databases on that order (current largest single instance is ~4TB running under 9.0.4). If the largest table is only 200M rows, PG should handle that quite well. Our data is partitioned by month and each month is about 200M records and simple queries can run in 15-20 minutes (with a single thread), with complex windowing queries (split up and run in parallel) finishing in a couple of hours. Which brings up a question. Can I partition data by month (or quarter), without that month being part of PRIMARY KEY? If this question sounds weird, I am asking because MySQL enforces this, which does not fit my data. If I can keep my primary key to be the ID that I want (which comes with data), but still partition it by month, I will be EXTREMELY happy. However, while an hour is fine, two weeks per query is NOT fine. What's really, really, really useful are two things: EXPLAIN, and this mailing list. :) Seriously, run EXPLAIN on your queries before you run them and see if how the query is going to be executed makes sense. Here's a real easy hint: if it says External Sort and has big numbers, come talk to us here- that's about one of the worst things you can possibly do. Of course, PG's going to avoid doing that, but you may have written a query (unintentionally) which forces PG to do a sort, or something else. Very good, thanks I have a server with about 18 TB of storage and 48 GB of RAM, and 12 CPU cores. If you partition up your data and don't mind things running in different transactions, you can definitely get a speed boost with PG by running things in parallel. PG will handle that very well, in fact, if two queries are running against the same table, PG will actually combine them and only actually read the data from disk once. I cannot shell out $47,000 per CPU for Oracle for this project. The above data warehouse was migrated from an Oracle-based system. :) I am wondering, why? To be more specific, the batch queries that I would do, I hope, would either use small JOINS of a small dataset to a large dataset, or just SELECTS from one big table. Make sure that you set your 'work_mem' correctly- PG will use that to figure out if it can hash the small table (you want that to happen, trust me..). If you do end up having sorts, it'll also use the work_mem value to figure out how much memory to use for sorting. I could, say, set work_mem to 30 GB? (64 bit linux) So... Can Postgres support a 5-10 TB database with the use pattern stated above? Yes, certainly. that's great to know. i Thanks, Stephen -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAk5tPc8ACgkQrzgMPqB3kigtSgCffwEmi3AD6Ryff7qZyQYieyKQ jhoAoJDFC1snQmwCIBUjwlC6WVRyAOkn =LPtP -END PGP SIGNATURE-
Re: [PERFORM] RAID Controller (HP P400) beat by SW-RAID?
Dne 12.9.2011 00:44, Anthony Presley napsal(a): We've currently got PG 8.4.4 running on a whitebox hardware set up, with (2) 5410 Xeon's, and 16GB of RAM. It's also got (4) 7200RPM SATA drives, using the onboard IDE controller and ext3. A few weeks back, we purchased two refurb'd HP DL360's G5's, and were hoping to set them up with PG 9.0.2, running replicated. These machines have (2) 5410 Xeon's, 36GB of RAM, (6) 10k SAS drives, and are using the HP SA P400i with 512MB of BBWC. PG is running on an ext4 (noatime) partition, and they drives configured as RAID 1+0 (seems with this controller, I cannot do JBOD). I've spent a few hours going back and forth benchmarking the new systems, and have set up the DWC, and the accelerator cache using hpacucli. I've tried accelerator caches of 25/75, 50/50, and 75/25. Whas is an 'accelerator cache'? Is that the cache on the controller? Then give 100% to the write cache - the read cache does not need to be protected by the battery, the page cache at the OS level can do the same service. Provide more details about the ext3/ext4 - there are various data modes (writeback, ordered, journal), various other settings (barriers, stripe size, ...) that matter. According to the benchmark I've done a few days back, the performance difference between ext3 and ext4 is rather small, when comparing equally configured file systems (i.e. data=journal vs. data=journal) etc. With read-only workload (e.g. just SELECT statements), the config does not matter (e.g. journal is just as fast as writeback). See for example these comparisons read-only workload: http://bit.ly/q04Tpg read-write workload: http://bit.ly/qKgWgn The ext4 is usually a bit faster than equally configured ext3, but the difference should not be 100%. To start with, I've set the relevant parameters in postgresql.conf the same on the new config as the old: max_connections = 150 shared_buffers = 6400MB (have tried as high as 20GB) work_mem = 20MB (have tried as high as 100MB) effective_io_concurrency = 6 fsync = on synchronous_commit = off wal_buffers = 16MB checkpoint_segments = 30 (have tried 200 when I was loading the db) random_page_cost = 2.5 effective_cache_size = 10240MB (have tried as high as 16GB) First thing I noticed is that it takes the same amount of time to load the db (about 40 minutes) on the new hardware as the old hardware. I was really hoping with the faster, additional drives and a hardware RAID controller, that this would be faster. The database is only about 9GB with pg_dump (about 28GB with indexes). Using pgfouine I've identified about 10 problematic SELECT queries that take anywhere from .1 seconds to 30 seconds on the old hardware. Running these same queries on the new hardware is giving me results in the .2 to 66 seconds. IE, it's twice as slow. I've tried increasing the shared_buffers, and some other parameters (work_mem), but haven't yet seen the new hardware perform even at the same speed as the old hardware. In that case some of the assumptions is wrong. For example the new RAID is slow for some reason. Bad stripe size, slow controller, ... Do the basic hw benchmarking, i.e. use bonnie++ to benchmark the disk, etc. Only if this provides expected results (i.e. the new hw performs better) it makes sense to mess with the database. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
On Mon, Sep 12, 2011 at 1:16 AM, Igor Chudov ichu...@gmail.com wrote: I could, say, set work_mem to 30 GB? (64 bit linux) I don't think you'd want that. Remember, work_mem is the amount of memory *per sort*. Queries can request several times that much memory, once per sort they need to perform. You can set it really high, but not 60% of your RAM - that wouldn't be wise. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID Controller (HP P400) beat by SW-RAID?
On September 11, 2011 03:44:34 PM Anthony Presley wrote: First thing I noticed is that it takes the same amount of time to load the db (about 40 minutes) on the new hardware as the old hardware. I was really hoping with the faster, additional drives and a hardware RAID controller, that this would be faster. The database is only about 9GB with pg_dump (about 28GB with indexes). Loading the DB is going to be CPU-bound (on a single) core, unless your disks really suck, which they don't. Most of the time will be spent building indexes. I don't know offhand why the queries are slower, though, unless you're not getting as much cached before testing as on the older box. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID Controller (HP P400) beat by SW-RAID?
On Sun, Sep 11, 2011 at 4:44 PM, Anthony Presley anth...@resolution.com wrote: We've currently got PG 8.4.4 running on a whitebox hardware set up, with (2) 5410 Xeon's, and 16GB of RAM. It's also got (4) 7200RPM SATA drives, using the onboard IDE controller and ext3. A few weeks back, we purchased two refurb'd HP DL360's G5's, and were hoping to set them up with PG 9.0.2, running replicated. These machines have (2) 5410 Xeon's, 36GB of RAM, (6) 10k SAS drives, and are using the HP SA P400i with 512MB of BBWC. PG is running on an ext4 (noatime) partition, and they Two issues here. One is that the onboard controller and disks on the old machine might not be obeying fsync properly, giving a speed boost at the expense of crash safeness. Two is that the P400 has gotten pretty horrible performance reviews on this list in the past. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
* Igor Chudov (ichu...@gmail.com) wrote: Can I partition data by month (or quarter), without that month being part of PRIMARY KEY? The way partitioning works in PG is by using CHECK constraints. Not sure if you're familiar with those (not sure if MySQL has them), so here's a quick example: Create a parent table. Then create two tables which inherit from that parent table (this is more of an implementation detail than anything else, the parent table is always empty, it's just there to be the single, combined, table that you run your select queries against). On each of the two 'child' tables, create a CHECK constraint. On table1, you do: alter table table1 add check (date '2000-01-01'); On table2, you do: alter table table2 add check (date = '2000-01-01'); Once those are done, you can query against the 'parent' table with something like: select * from parent where date = '2010-01-01'; And PG will realize it only has to look at table2 to get the results for that query. This means the partitioning can be more-or-less any check constraint that will be satisfied by the data in the table (and PG will check/enforce this) and that PG can figure out will eliminate a partition from possibly having the data that matches the request. Technically, this means that you could have all kinds of different ways your data is split across the partitions, but remember that all the constraints have to actually be TRUE. :) Typically, people do split based on the PK, but it's not required (realize that PG doesn't support cross-table PKs, so if you don't have CHECK constraints which make sure that the tables don't cover the same PK value, you could end up with duplicate values across the tables...). If this question sounds weird, I am asking because MySQL enforces this, which does not fit my data. That part is a little strange.. If I can keep my primary key to be the ID that I want (which comes with data), but still partition it by month, I will be EXTREMELY happy. As I said above, the actual PK is going to be independent and in the base/child tables. That said, yes, you could have the PK in each table be whatever you want and you use month to partition the 'main' table. You then have to come up with some other way to make sure your PK is enforced, however, or figure out a way to deal with things if it's not. Based on what you've been describing, I'm afraid you'd have to actually search all the partitions for a given ID on an update, to figure out if you're doing an UPDATE or an INSERT... Unless, of course, the month is included in the PK somewhere, or is in the incoming data and you can be 100% confident that the incoming data is never wrong.. :) I am wondering, why? Cost, and we had a real hard time (this was a while ago..) getting Oracle to run decently on Linux, and the Sun gear was just too damn expensive. Also, ease of maintenance- it takes a LOT less effort to keep a PG database set up and running smoothly than an Oracle one, imv. I could, say, set work_mem to 30 GB? (64 bit linux) You can, but you have to be careful with it, because PG will think it can use 30GB for EACH sort in a given query, and in EACH hash in a given query. What I would recommend is setting the default to something like 256MB and then looking at specific queries and bumping it up for those queries when it's clear that it'll help the query and won't cause the system to go into swap. Note that you can set work_mem for a given session after you connect to the database, just do: set work_mem = '1GB'; in your session before running other queries. Doing that won't impact other sessions. Thanks, Stephen signature.asc Description: Digital signature
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
* Claudio Freire (klaussfre...@gmail.com) wrote: I don't think you'd want that. Remember, work_mem is the amount of memory *per sort*. Queries can request several times that much memory, once per sort they need to perform. You can set it really high, but not 60% of your RAM - that wouldn't be wise. Oh, I dunno.. It's only used by the planner, so sometimes you have to bump it up, especially when PG thinks the number of rows returned from something will be a lot more than it really will be. :) /me has certain queries where it's been set to 100GB... ;) I agree that it shouldn't be the default, however. That's asking for trouble. Do it for the specific queries that need it. Thanks, Stephen signature.asc Description: Digital signature
Re: [PERFORM] Databases optimization
On Sun, Sep 11, 2011 at 5:22 PM, Maciek Sakrejda msakre...@truviso.comwrote: performance guidelines, I recommend Greg Smith's PostgreSQL 9.0 High Performance [1] (disclaimer: I used to work with Greg and got a free copy) I'll second that. PostgreSQL 9.0 High Performance is an excellent resource (I recommend it even for non-PostgreSQL admins because it goes so in-depth on Linux tuning) so whether you get it for free or not, it's worth the time it takes to read and absorb the info. I've never run PostgreSQL virtualized, but I can say that if it's anything like running SQL Server virtualized, it's not a terribly good idea.
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
Hi, On 12 September 2011 12:28, Stephen Frost sfr...@snowman.net wrote: Once those are done, you can query against the 'parent' table with something like: select * from parent where date = '2010-01-01'; And PG will realize it only has to look at table2 to get the results for that query. This means the partitioning can be more-or-less any check constraint that will be satisfied by the data in the table (and PG will check/enforce this) and that PG can figure out will eliminate a partition from possibly having the data that matches the request. Theory is nice but there are few gotchas (in 8.4) : - planner can use constant expressions only. You will get scans across all partitions when you use function (like now(), immutable function with constant arguments), sub query (like part_col = (select x from ...) .. ) or anything which can't be evaluated to constat during query planning. - partitions constraints are not pushed to joins (assuming tables partitioned by primary key): select ... from X left join Y on X.primary_key = Y.primary_key where part_col = ... and X.primary_key = .,, and X.primary_key ... must be rewritten like select ... from X left join Y on X.primary_key = Y.primary_key and X.primary_key = .,, and Y.primary_key ... where X.primary_key = .,, and X.primary_key ... in order to avoid scan entire Y table (not only relevant partitions) - ORDER BY / LIMIT X issue fixed in 9.1 (Allow inheritance table scans to return meaningfully-sorted results. Moreover all queries should have 'WHERE' on column which is used for partitioning otherwise partitioning is not very useful (yes, it could simplify data management -- drop partition vs delete from X where part_col between A and B) -- Ondrej Ivanic (ondrej.iva...@gmail.com) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID Controller (HP P400) beat by SW-RAID?
From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Anthony Presley Sent: Sunday, September 11, 2011 4:45 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] RAID Controller (HP P400) beat by SW-RAID? We've currently got PG 8.4.4 running on a whitebox hardware set up, with (2) 5410 Xeon's, and 16GB of RAM. It's also got (4) 7200RPM SATA drives, using the onboard IDE controller and ext3. A few weeks back, we purchased two refurb'd HP DL360's G5's, and were hoping to set them up with PG 9.0.2, running replicated. These machines have (2) 5410 Xeon's, 36GB of RAM, (6) 10k SAS drives, and are using the HP SA P400i with 512MB of BBWC. PG is running on an ext4 (noatime) partition, and they drives configured as RAID 1+0 (seems with this controller, I cannot do JBOD). I've spent a few hours going back and forth benchmarking the new systems, and have set up the DWC, and the accelerator cache using hpacucli. I've tried accelerator caches of 25/75, 50/50, and 75/25. I would start of by recommending a more current version of 9.0...like 9.0.4 since you are building a new box. The rumor mill says 9.0.5 and 9.1.0 might be out soon (days?). but that is just rumor mill. Don't bank on it. What kernel are you on ? Long time HP user here, for better and worse... so here are a few other little things I recommend. Check the bios power management. Make sure it is set where you want it. (IIRC the G5s have this, I know G6s and G7s do). This can help with nasty latency problems if the box has been idle for a while then needs to start doing work. The p400i is not a great card, compared to more modern one, but you should be able to beat the old setup with what you have. Faster clocked cpu's more spindles, faster RPM spindles. Assuming the battery is working, with XFS or ext4 you can use nobarrier mount option and you should see some improvement. Make sure the raid card's firmware is current. I can't stress this enough. HP fixed a nasty bug with Raid 1+0 a few months ago where you could eat your data... They also seem to be fixing a lot of other bugs along the way as well. So do yourself a big favor and make sure that firmware is current. It might just head off headache down the road. Also make sure you have a 8.10.? (IIRC the version number right) or better version of hpacucli... there have been some fixes to that utility as well. IIRC most of the fixes in this have been around recognizing newere cards (812s and 410s) but some interface bugs have been fixed as well. You may need new packages for HP health. (I don't recall the official name, but new versions if hpacucli might not play well with old versions of hp health. Its HP so they have a new version about every month for firmware and their cli utility... thats HP for us. Anyways that is my fast input. Best of luck, -Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance