[PERFORM] Hardware advice for scalable warehouse db
Hi list, My employer will be donated a NetApp FAS 3040 SAN [1] and we want to run our warehouse DB on it. The pg9.0 DB currently comprises ~1.5TB of tables, 200GB of indexes, and grows ~5%/month. The DB is not update critical, but undergoes larger read and insert operations frequently. My employer is a university with little funds and we have to find a cheap way to scale for the next 3 years, so the SAN seems a good chance to us. We are now looking for the remaining server parts to maximize DB performance with costs = $4000. I digged out the following configuration with the discount we receive from Dell: 1 x Intel Xeon X5670, 6C, 2.93GHz, 12M Cache 16 GB (4x4GB) Low Volt DDR3 1066Mhz PERC H700 SAS RAID controller 4 x 300 GB 10k SAS 6Gbps 2.5 in RAID 10 I was thinking to put the WAL and the indexes on the local disks, and the rest on the SAN. If funds allow, we might downgrade the disks to SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible). Any comments on the configuration? Any experiences with iSCSI vs. Fibre Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a cheap alternative how to connect as many as 16 x 2TB disks as DAS? Thanks so much! Best, Chris [1]: http://www.b2net.co.uk/netapp/fas3000.pdf -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware advice for scalable warehouse db
chris wrote: My employer is a university with little funds and we have to find a cheap way to scale for the next 3 years, so the SAN seems a good chance to us. A SAN is rarely ever the cheapest way to scale anything; you're paying extra for reliability instead. I was thinking to put the WAL and the indexes on the local disks, and the rest on the SAN. If funds allow, we might downgrade the disks to SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible). If you want to keep the bulk of the data on the SAN, this is a reasonable way to go, performance-wise. But be aware that losing the WAL means your database is likely corrupted. That means that much of the reliability benefit of the SAN is lost in this configuration. Any experiences with iSCSI vs. Fibre Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a cheap alternative how to connect as many as 16 x 2TB disks as DAS? I've never heard anyone recommend iSCSI if you care at all about performance, while FC works fine for this sort of job. The physical dimensions of 3.5 drives makes getting 16 of them in one reasonably sized enclosure normally just out of reach. But a Dell PowerVault MD1000 will give you 15 x 2TB as inexpensively as possible in a single 3U space (well, as cheaply as you want to go--you might build your own giant box cheaper but I wouldn't recommend ). I've tested MD1000, MD1200, and MD1220 arrays before, and always gotten seriously good performance relative to the dollars spent with that series. Only one of these Dell storage arrays I've heard two disappointing results from (but not tested directly yet) is the MD3220. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Inoptimal query plan for max() and multicolumn index
On 20/06/2011 07:35, Vladimir Kulev wrote: But this does not work: # explain analyze select max(timestamp) from sms where number in ('5502712','5802693','5801981'); Try to rewrite that query this way: explain analyze select timestamp from sms where number in ('5502712','5802693','5801981') order by timestamp desc limit 1; Regards Gaetano Mendola -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Inoptimal query plan for max() and multicolumn index
On 20/06/2011 07:35, Vladimir Kulev wrote: But this does not work: # explain analyze select max(timestamp) from sms where number in ('5502712','5802693','5801981'); Try to rewrite that query this way: explain analyze select timestamp from sms where number in ('5502712','5802693','5801981') order by timestamp desc limit 1; Regards Gaetano Mendola -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware advice for scalable warehouse db
1 x Intel Xeon X5670, 6C, 2.93GHz, 12M Cache 16 GB (4x4GB) Low Volt DDR3 1066Mhz PERC H700 SAS RAID controller 4 x 300 GB 10k SAS 6Gbps 2.5 in RAID 10 Apart from Gregs excellent recommendations. I would strongly suggest more memory. 16GB in 2011 is really on the low side. PG is using memory (either shared_buffers og OS cache) for keeping frequently accessed data in. Good recommendations are hard without knowledge of data and access-patterns, but 64, 128 and 256GB system are quite frequent when you have data that can't all be in memory at once. SAN's are nice, but I think you can buy a good DAS thing each year for just the support cost of a Netapp, but you might have gotten a really good deal there too. But you are getting a huge amount of advanced configuration features and potential ways of sharing and.. and .. just see the specs. .. and if you need those the SAN is a good way to go, but they do come with a huge pricetag. Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware advice for scalable warehouse db
On 7/15/2011 2:10 AM, Greg Smith wrote: chris wrote: My employer is a university with little funds and we have to find a cheap way to scale for the next 3 years, so the SAN seems a good chance to us. A SAN is rarely ever the cheapest way to scale anything; you're paying extra for reliability instead. I was thinking to put the WAL and the indexes on the local disks, and the rest on the SAN. If funds allow, we might downgrade the disks to SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible). If you want to keep the bulk of the data on the SAN, this is a reasonable way to go, performance-wise. But be aware that losing the WAL means your database is likely corrupted. That means that much of the reliability benefit of the SAN is lost in this configuration. Any experiences with iSCSI vs. Fibre Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a cheap alternative how to connect as many as 16 x 2TB disks as DAS? I've never heard anyone recommend iSCSI if you care at all about performance, while FC works fine for this sort of job. The physical dimensions of 3.5 drives makes getting 16 of them in one reasonably sized enclosure normally just out of reach. But a Dell PowerVault MD1000 will give you 15 x 2TB as inexpensively as possible in a single 3U space (well, as cheaply as you want to go--you might build your own giant box cheaper but I wouldn't recommend ). I'm curious what people think of these: http://www.pc-pitstop.com/sas_cables_enclosures/scsase166g.asp I currently have my database on two of these and for my purpose they seem to be fine and are quite a bit less expensive than the Dell MD1000. I actually have three more of the 3G versions with expanders for mass storage arrays (RAID0) and haven't had any issues with them in the three years I've had them. Bob -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Unexpected seq scans when expected result is 1 row out of milions
Svetlin Manavski svetlin.manav...@gmail.com writes: The question is: why do we get a seq scan on appqosdata.tcpsessiondata_1 and appqosdata.tcpsessiondata_2 even if the planner estimates correctly 1 row out of millions could potentially be selected? As you can see ~90% of the time is spent on those 2 partitions even if they are not apparently different from any of the others. Well, there must be *something* different about them. Are you sure they've got the same indexes as the others? It would be useful to see psql's \d report for those partitions, as well as for one of the partitions that's behaving as expected. You might also compare the EXPLAIN results for doing the query on just one child table between the normal and misbehaving partitions. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware advice for scalable warehouse db
On Fri, Jul 15, 2011 at 12:34 AM, chris chri...@gmx.net wrote: I was thinking to put the WAL and the indexes on the local disks, and the rest on the SAN. If funds allow, we might downgrade the disks to SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible). Just to add to the conversation, there's no real advantage to putting WAL on SSD. Indexes can benefit from them, but WAL is mosty seqwuential throughput and for that a pair of SATA 1TB drives at 7200RPM work just fine for most folks. For example, in one big server we're running we have 24 drives in a RAID-10 for the /data/base dir with 4 drives in a RAID-10 for pg_xlog, and those 4 drives tend to have the same io util % under iostat as the 24 drives under normal usage. It takes a special kind of load (lots of inserts happening in large transactions quickly) for the 4 drive RAID-10 to have more than 50% util ever. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware advice for scalable warehouse db
On Fri, Jul 15, 2011 at 10:39 AM, Robert Schnabel schnab...@missouri.edu wrote: I'm curious what people think of these: http://www.pc-pitstop.com/sas_cables_enclosures/scsase166g.asp I currently have my database on two of these and for my purpose they seem to be fine and are quite a bit less expensive than the Dell MD1000. I actually have three more of the 3G versions with expanders for mass storage arrays (RAID0) and haven't had any issues with them in the three years I've had them. I have a co-worker who's familiar with them and they seem a lot like the 16 drive units we use from Aberdeen, which fully outfitted with 15k SAS drives run $5k to $8k depending on the drives etc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware advice for scalable warehouse db
Just to add to the conversation, there's no real advantage to putting WAL on SSD. Indexes can benefit from them, but WAL is mosty seqwuential throughput and for that a pair of SATA 1TB drives at 7200RPM work just fine for most folks. Actually, there's a strong disadvantage to putting WAL on SSD. SSD is very prone to fragmentation if you're doing a lot of deleting and replacing files. I've implemented data warehouses where the database was on SSD but WAL was still on HDD. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware advice for scalable warehouse db
Hi list, Thanks a lot for your very helpful feedback! I've tested MD1000, MD1200, and MD1220 arrays before, and always gotten seriously good performance relative to the dollars spent Great hint, but I'm afraid that's too expensive for us. But it's a great way to scale over the years, I'll keep that in mind. I had a look at other server vendors who offer 4U servers with slots for 16 disks for 4k in total (w/o disks), maybe that's an even cheaper/better solution for us. If you had the choice between 16 x 2TB SATA vs. a server with some SSDs for WAL/indexes and a SAN (with SATA disk) for data, what would you choose performance-wise? Again, thanks so much for your help. Best, Chris -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware advice for scalable warehouse db
On Fri, Jul 15, 2011 at 11:49 AM, chris r. chri...@gmx.net wrote: Hi list, Thanks a lot for your very helpful feedback! I've tested MD1000, MD1200, and MD1220 arrays before, and always gotten seriously good performance relative to the dollars spent Great hint, but I'm afraid that's too expensive for us. But it's a great way to scale over the years, I'll keep that in mind. I had a look at other server vendors who offer 4U servers with slots for 16 disks for 4k in total (w/o disks), maybe that's an even cheaper/better solution for us. If you had the choice between 16 x 2TB SATA vs. a server with some SSDs for WAL/indexes and a SAN (with SATA disk) for data, what would you choose performance-wise? Again, thanks so much for your help. Best, Chris SATA drives can easily flip bits and postgres does not checksum data, so it will not automatically detect corruption for you. I would steer well clear of SATA unless you are going to be using a fs like ZFS which checksums data. I would hope that a SAN would detect this for you, but I have no idea. -- Rob Wultsch wult...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware advice for scalable warehouse db
On 7/14/11 11:34 PM, chris wrote: Any comments on the configuration? Any experiences with iSCSI vs. Fibre Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a cheap alternative how to connect as many as 16 x 2TB disks as DAS? Here's the problem with iSCSI: on gigabit ethernet, your maximum possible throughput is 100mb/s, which means that your likely maximum database throughput (for a seq scan or vacuum, for example) is 30mb/s. That's about a third of what you can get with good internal RAID. While multichannel iSCSI is possible, it's hard to configure, and doesn't really allow you to spread a *single* request across multiple channels. So: go with fiber channel if you're using a SAN. iSCSI also has horrible lag times, but you don't care about that so much for DW. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Statistics and Multi-Column indexes
On 07/10/2011 02:31 PM, Samuel Gendler wrote: What about partitioning tables by tenant id and then maintaining indexes on each partition independent of tenant id, since constraint exclusion should handle filtering by tenant id for you. That seems like a potentially more tolerable variant of #5 How many tenants are we talking about? I gather partitioning starts to become problematic when the number of partitions gets large. I thought I had replied... Apparently I didn't. The database can grow in two dimensions: The number of tenants and the number of rows per tenant. We have many tenants with relatively little data and a few with a lot of data. So the number of tenants is known ahead of time and might be 1000's. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/14/2011 04:03 PM, Jeff Janes wrote: On Wed, Jul 13, 2011 at 3:41 PM, larslhofha...@yahoo.com wrote: On 07/13/2011 11:42 AM, Kevin Grittner wrote: So transactions without an XID *are* sensitive to synchronous_commit. That's likely a useful clue. How much did it help the run time of the SELECT which followed the UPDATE? It has surprisingly little impact on the SELECT side: If your fsync is truly fsyncing, it seems like it should have considerable effect. Could you strace with both -ttt and -T, with and without synchronous commit? Cheers, Jeff Ok, here we go: Q\0\0\0_select count(*) from test w..., 8192, 0, NULL, NULL) = 96 5.357152 1310774187.750791 gettimeofday({1310774187, 750809}, NULL) = 0 0.22 1310774187.751023 lseek(12, 0, SEEK_END) = 329908224 0.23 1310774187.751109 lseek(15, 0, SEEK_END) = 396607488 0.22 1310774187.751186 lseek(18, 0, SEEK_END) = 534175744 0.22 1310774187.751360 lseek(12, 0, SEEK_END) = 329908224 0.23 1310774187.753389 brk(0x248e000)= 0x248e000 0.26 1310774187.753953 brk(0x24ce000)= 0x24ce000 0.23 1310774187.755158 brk(0x254e000)= 0x254e000 0.24 1310774187.766605 brk(0x245)= 0x245 0.000170 1310774187.766852 lseek(23, 4513792, SEEK_SET) = 4513792 0.23 1310774187.766927 write(23, f\320\1\0\1\0\0\0\320\0\0\0\0\340D-\22\0\0\0\30@!0..., 32768) = 32768 0.75 1310774187.767071 fdatasync(23) = 0 0.002618 1310774187.769760 gettimeofday({1310774187, 769778}, NULL) = 0 0.22 1310774187.769848 sendto(5, \2\0\0\0\350\0\0\0\1@\0\0\2\0\0\0\1\0\0\0\0\0\0\0\r\201\0\0\0\2\0..., 232, 0, NULL, 0) = 232 0.64 1310774187.769993 sendto(6, T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D..., 66, 0, NULL, 0) = 66 0.000199 (23 is the WAL fd) vs. Q\0\0\0_select count(*) from test w..., 8192, 0, NULL, NULL) = 96 7.343720 1310774306.978767 gettimeofday({1310774306, 978785}, NULL) = 0 0.21 1310774306.978994 lseek(12, 0, SEEK_END) = 330883072 0.24 1310774306.979080 lseek(15, 0, SEEK_END) = 397131776 0.21 1310774306.979157 lseek(18, 0, SEEK_END) = 534732800 0.22 1310774306.979332 lseek(12, 0, SEEK_END) = 330883072 0.22 1310774306.983096 brk(0x248e000)= 0x248e000 0.26 1310774306.983653 brk(0x24ce000)= 0x24ce000 0.23 1310774306.984667 brk(0x254e000)= 0x254e000 0.23 1310774306.996040 brk(0x245)= 0x245 0.000168 1310774306.996298 gettimeofday({1310774306, 996317}, NULL) = 0 0.21 1310774306.996388 sendto(5, \2\0\0\0\350\0\0\0\1@\0\0\2\0\0\0\1\0\0\0\0\0\0\0\r\201\0\0\0\2\0..., 232, 0, NULL, 0) = 232 0.78 1310774306.996550 sendto(6, T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D..., 66, 0, NULL, 0) = 66 0.000202 So the difference is only 2ms. The size of the WAL buffers written is on 32k, Here's an example with more dirty rows (I basically let the updater run for a while dirtying very many rows). Q\0\0\0_select count(*) from test w..., 8192, 0, NULL, NULL) = 96 23.690018 1310775141.398780 gettimeofday({1310775141, 398801}, NULL) = 0 0.28 1310775141.399018 lseek(12, 0, SEEK_END) = 372514816 0.23 1310775141.399105 lseek(15, 0, SEEK_END) = 436232192 0.22 1310775141.399185 lseek(18, 0, SEEK_END) = 573620224 0.23 1310775141.399362 lseek(12, 0, SEEK_END) = 372514816 0.24 1310775141.414017 brk(0x249)= 0x249 0.28 1310775141.414575 brk(0x24d)= 0x24d 0.25 1310775141.415600 brk(0x255)= 0x255 0.24 1310775141.417757 semop(229383, {{0, -1, 0}}, 1) = 0 0.24 ... 1310775141.448998 semop(229383, {{0, -1, 0}}, 1) = 0 0.25 1310775141.453134 brk(0x2452000)= 0x2452000 0.000167 1310775141.453377 fadvise64(22, 0, 0, POSIX_FADV_DONTNEED) = 0 0.25 1310775141.453451 close(22) = 0 0.32 1310775141.453537 open(pg_xlog/000100D100C2, O_RDWR) = 22 0.59 1310775141.453696 write(22, f\320\3\0\1\0\0\0\321\0\0\0\0\0\0\3023\356\17N\23l\vN\0\0\0\1\0 \0\0..., 5365760) = 5365760 0.005991 1310775141.459798 write(22, f\320\1\0\1\0\0\0\321\0\0\0\0\340Q\302`\5\0\915!000..., 9019392) = 9019392 0.010062 1310775141.469965 fdatasync(22) = 0 0.231385 1310775141.701424 semop(229383, {{2, 1, 0}}, 1) = 0 0.31 1310775141.702657 gettimeofday({1310775141, 702682}, NULL) = 0 0.28 1310775141.702765 sendto(5, \2\0\0\0\350\0\0\0\1@\0\0\2\0\0\0\1\0\0\0\0\0\0\0\r\201\0\0\0\2\0..., 232, 0, NULL, 0) = 232 0.71 1310775141.702942 sendto(6, T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D..., 66, 0, NULL, 0) = 66 0.000220 vs Q\0\0\0_select count(*) from test w..., 8192, 0, NULL, NULL) = 96 55.595425 1310775406.842823 gettimeofday({1310775406, 842842}, NULL) = 0 0.26 1310775406.843092 lseek(12, 0, SEEK_END) = 382787584 0.23 1310775406.843179 lseek(15, 0, SEEK_END) = 457596928 0.42 1310775406.843280 lseek(18, 0, SEEK_END)