Re: [PERFORM] Hardware/OS recommendations for large databases (5TB)
Does anyone have recommendations for hardware and/or OS to work with around 5TB datasets? Hardware-wise I'd say dual core opterons. One dual-core-opteron performs better than two single-core at the same speed. Tyan makes some boards that have four sockets, thereby giving you 8 cpu's (if you need that many). Sun and HP also makes nice hardware although the Tyan board is more competetive priced. OS wise I would choose the FreeBSD amd64 port but partititions larger than 2 TB needs some special care, using gpt rather than disklabel etc., tools like fsck may not be able to completely check partitions larger than 2 TB. Linux or Solaris with either LVM or Veritas FS sounds like candidates. I have been working with datasets no bigger than around 30GB, and that (I'm afraid to admit) has been in MSSQL. Well, our data are just below 30 GB so I can't help you there :-) regards Claus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Help speeding up delete
Because I think we need to. The above would only delete rows that have name = 'obsid' and value = 'oid080505'. We need to delete all rows that have the same ids as those rows. However, from what you note, I bet we could do: DELETE FROM tmp_table2 WHERE id IN (SELECT id FROM temp_table2 WHERE name = 'obsid' and value= 'oid080505'); However, even that seems to have a much higher cost than I'd expect: lab.devel.configdb=# explain delete from tmp_table2 where id in (select id from tmp_table2 where name='obsid' and value = 'oid080505'); NOTICE: QUERY PLAN: Seq Scan on tmp_table2 (cost=0.00..65705177237.26 rows=769844 width=6) SubPlan - Materialize (cost=42674.32..42674.32 rows=38 width=50) - Seq Scan on tmp_table2 (cost=0.00..42674.32 rows=38 width=50) EXPLAIN And, sure enough, is taking an extrordinarily long time to run (more than 10 minutes so far, compared to 10seconds for the select). Is this really typical of deletes? It appears (to me) to be the Seq Scan on tmp_table2 that is the killer here. If we put an index on, would it help? (The user claims she tried that and it's EXPLAIN cost went even higher, but I haven't checked that...) Earlier pg versions have always been bad at dealing with IN subqueries. Try rewriting it as (with fixing any broken syntax, I'm not actually testing this :P) DELETE FROM tmp_table2 WHERE EXISTS (SELECT * FROM tmp_table2 t2 WHERE t2.id=tmp_table2.id AND t2.name='obsid' AND t2.value='oid080505') I assume you do have an index on tmp_table2.id :-) And that it's non-unique? (If it was unique, the previous simplification of the query really should've worked..) Do you also have an index on name,value or something like that, so you get an index scan from it? //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Hardware/OS recommendations for large databases (
Adam, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Claus Guttesen Sent: Tuesday, November 15, 2005 12:29 AM To: Adam Weisberg Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB) Does anyone have recommendations for hardware and/or OS to work with around 5TB datasets? Hardware-wise I'd say dual core opterons. One dual-core-opteron performs better than two single-core at the same speed. Tyan makes some boards that have four sockets, thereby giving you 8 cpu's (if you need that many). Sun and HP also makes nice hardware although the Tyan board is more competetive priced. OS wise I would choose the FreeBSD amd64 port but partititions larger than 2 TB needs some special care, using gpt rather than disklabel etc., tools like fsck may not be able to completely check partitions larger than 2 TB. Linux or Solaris with either LVM or Veritas FS sounds like candidates. I agree - you can get a very good one from www.acmemicro.com or www.rackable.com with 8x 400GB SATA disks and the new 3Ware 9550SX SATA RAID controller for about $6K with two Opteron 272 CPUs and 8GB of RAM on a Tyan 2882 motherboard. We get about 400MB/s sustained disk read performance on these (with tuning) on Linux using the xfs filesystem, which is one of the most critical factors for large databases. Note that you want to have your DBMS use all of the CPU and disk channel bandwidth you have on each query, which takes a parallel database like Bizgres MPP to achieve. Regards, - Luke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Hardware/OS recommendations for large databases (5TB)
Hardware-wise I'd say dual core opterons. One dual-core-opteron performs better than two single-core at the same speed. Tyan makes some boards that have four sockets, thereby giving you 8 cpu's (if you need that many). Sun and HP also makes nice hardware although the Tyan board is more competetive priced. just FYI: tyan makes a 8 socket motherboard (up to 16 cores!): http://www.swt.com/vx50.html It can be loaded with up to 128 gb memory if all the sockets are filled :). Merlin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke,Have you tried the areca cards, they are slightly faster yet.DaveOn 15-Nov-05, at 7:09 AM, Luke Lonergan wrote: I agree - you can get a very good one from www.acmemicro.com or www.rackable.com with 8x 400GB SATA disks and the new 3Ware 9550SX SATA RAID controller for about $6K with two Opteron 272 CPUs and 8GB of RAM on a Tyan 2882 motherboard. We get about 400MB/s sustained disk read performance on these (with tuning) on Linux using the xfs filesystem, which is one of the most critical factors for large databases. Note that you want to have your DBMS use all of the CPU and disk channel bandwidth you have on each query, which takes a parallel database like Bizgres MPP to achieve. Regards,
Re: [PERFORM] Help speeding up delete
Magnus Hagander wrote: Because I think we need to. The above would only delete rows that have name = 'obsid' and value = 'oid080505'. We need to delete all rows that have the same ids as those rows. However, from what you note, I bet we could do: DELETE FROM tmp_table2 WHERE id IN (SELECT id FROM temp_table2 WHERE name = 'obsid' and value= 'oid080505'); However, even that seems to have a much higher cost than I'd expect: lab.devel.configdb=# explain delete from tmp_table2 where id in (select id from tmp_table2 where name='obsid' and value = 'oid080505'); NOTICE: QUERY PLAN: Seq Scan on tmp_table2 (cost=0.00..65705177237.26 rows=769844 width=6) SubPlan - Materialize (cost=42674.32..42674.32 rows=38 width=50) - Seq Scan on tmp_table2 (cost=0.00..42674.32 rows=38 width=50) EXPLAIN ... Earlier pg versions have always been bad at dealing with IN subqueries. Try rewriting it as (with fixing any broken syntax, I'm not actually testing this :P) DELETE FROM tmp_table2 WHERE EXISTS (SELECT * FROM tmp_table2 t2 WHERE t2.id=tmp_table2.id AND t2.name='obsid' AND t2.value='oid080505') Thanks - that looks *significantly* better: lab.devel.configdb=# explain delete from tmp_table2 where exists (select 1 from tmp_table2 t2 where t2.id=tmp_table2.id and t2.name='obsid' and t2.value='oid080505'); NOTICE: QUERY PLAN: Seq Scan on tmp_table2 (cost=0.00..9297614.80 rows=769844 width=6) SubPlan - Index Scan using inv_index_2 on tmp_table2 t2 (cost=0.00..6.02 rows=1 width=0) EXPLAIN (This is after putting an index on the (id,name,value) tuple.) That outer seq scan is still annoying, but maybe this will be fast enough. I've passed this on, along with the (strong) recommendation that they upgrade PG. Thanks!! -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Hardware/OS recommendations for large databases (
Dave, From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave CramerSent: Tuesday, November 15, 2005 6:15 AMTo: Luke LonerganCc: Adam Weisberg; pgsql-performance@postgresql.orgSubject: Re: [PERFORM] Hardware/OS recommendations for large databases ( Luke, Have you tried the areca cards, they are slightly faster yet. No, I've been curious since I read an earlier posting here. I've had a lot more experience with the 3Ware cards, mostly good, and they've been doing a lot ofvolume with Rackable/Yahoowhich gives me some more confidence. The new 3Ware 9550SX cards use a PowerPC for checksumming, so their write performance is now up to par with the best cards I believe. We find that you still need to set Linux readahead to at least 8MB (blockdev --setra) to get maximum read performance on them, is that your experience with the Arecas? We get about 260MB/s read on 8 drives in RAID5 without the readahead tuning and about 400MB/s with it. - Luke
Re: [PERFORM] Hardware/OS recommendations for large databases (
Merlin, just FYI: tyan makes a 8 socket motherboard (up to 16 cores!): http://www.swt.com/vx50.html It can be loaded with up to 128 gb memory if all the sockets are filled :). Cool! Just remember that you can't get more than 1 CPU working on a query at a time without a parallel database. - Luke ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Hardware/OS recommendations for large databases (
Merlin, just FYI: tyan makes a 8 socket motherboard (up to 16 cores!): http://www.swt.com/vx50.html It can be loaded with up to 128 gb memory if all the sockets are filled :). Another thought - I priced out a maxed out machine with 16 cores and 128GB of RAM and 1.5TB of usable disk - $71,000. You could instead buy 8 machines that total 16 cores, 128GB RAM and 28TB of disk for $48,000, and it would be 16 times faster in scan rate, which is the most important factor for large databases. The size would be 16 rack units instead of 5, and you'd have to add a GigE switch for $1500. Scan rate for above SMP: 200MB/s Scan rate for above cluster: 3,200Mb/s You could even go dual core and double the memory on the cluster and you'd about match the price of the god box. - Luke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Hardware/OS recommendations for large databases (
On Tue, Nov 15, 2005 at 09:33:25AM -0500, Luke Lonergan wrote: write performance is now up to par with the best cards I believe. We find that you still need to set Linux readahead to at least 8MB (blockdev --setra) to get maximum read performance on them, is that your What on earth does that do to your seek performance? Mike Stone ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB)
Merlin, just FYI: tyan makes a 8 socket motherboard (up to 16 cores!): http://www.swt.com/vx50.html It can be loaded with up to 128 gb memory if all the sockets are filled :). Another thought - I priced out a maxed out machine with 16 cores and 128GB of RAM and 1.5TB of usable disk - $71,000. You could instead buy 8 machines that total 16 cores, 128GB RAM and 28TB of disk for $48,000, and it would be 16 times faster in scan rate, which is the most important factor for large databases. The size would be 16 rack units instead of 5, and you'd have to add a GigE switch for $1500. It's hard to say what would be better. My gut says the 5u box would be a lot better at handling high cpu/high concurrency problems...like your typical business erp backend. This is pure speculation of course...I'll defer to the experts here. Merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB)
Luke, -Original Message- From: Luke Lonergan [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 15, 2005 7:10 AM To: Adam Weisberg Cc: pgsql-performance@postgresql.org Subject: RE: [PERFORM] Hardware/OS recommendations for large databases ( 5TB) Adam, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Claus Guttesen Sent: Tuesday, November 15, 2005 12:29 AM To: Adam Weisberg Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB) Does anyone have recommendations for hardware and/or OS to work with around 5TB datasets? Hardware-wise I'd say dual core opterons. One dual-core-opteron performs better than two single-core at the same speed. Tyan makes some boards that have four sockets, thereby giving you 8 cpu's (if you need that many). Sun and HP also makes nice hardware although the Tyan board is more competetive priced. OS wise I would choose the FreeBSD amd64 port but partititions larger than 2 TB needs some special care, using gpt rather than disklabel etc., tools like fsck may not be able to completely check partitions larger than 2 TB. Linux or Solaris with either LVM or Veritas FS sounds like candidates. I agree - you can get a very good one from www.acmemicro.com or www.rackable.com with 8x 400GB SATA disks and the new 3Ware 9550SX SATA RAID controller for about $6K with two Opteron 272 CPUs and 8GB of RAM on a Tyan 2882 motherboard. We get about 400MB/s sustained disk read performance on these (with tuning) on Linux using the xfs filesystem, which is one of the most critical factors for large databases. Note that you want to have your DBMS use all of the CPU and disk channel bandwidth you have on each query, which takes a parallel database like Bizgres MPP to achieve. Regards, - Luke The What's New FAQ for PostgreSQL 8.1 says the buffer manager for 8.1 has been enhanced to scale almost linearly with the number of processors, leading to significant performance gains on 8-way, 16-way, dual-core, and multi-core CPU servers. Why not just use it as-is? Cheers, Adam ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Hardware/OS recommendations for large databases (
Because only 1 cpu is used on each query. - Luke -- Sent from my BlackBerry Wireless Device -Original Message- From: Adam Weisberg [EMAIL PROTECTED] To: Luke Lonergan [EMAIL PROTECTED] CC: pgsql-performance@postgresql.org pgsql-performance@postgresql.org Sent: Tue Nov 15 10:40:53 2005 Subject: RE: [PERFORM] Hardware/OS recommendations for large databases ( 5TB) Luke, -Original Message- From: Luke Lonergan [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 15, 2005 7:10 AM To: Adam Weisberg Cc: pgsql-performance@postgresql.org Subject: RE: [PERFORM] Hardware/OS recommendations for large databases ( 5TB) Adam, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Claus Guttesen Sent: Tuesday, November 15, 2005 12:29 AM To: Adam Weisberg Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB) Does anyone have recommendations for hardware and/or OS to work with around 5TB datasets? Hardware-wise I'd say dual core opterons. One dual-core-opteron performs better than two single-core at the same speed. Tyan makes some boards that have four sockets, thereby giving you 8 cpu's (if you need that many). Sun and HP also makes nice hardware although the Tyan board is more competetive priced. OS wise I would choose the FreeBSD amd64 port but partititions larger than 2 TB needs some special care, using gpt rather than disklabel etc., tools like fsck may not be able to completely check partitions larger than 2 TB. Linux or Solaris with either LVM or Veritas FS sounds like candidates. I agree - you can get a very good one from www.acmemicro.com or www.rackable.com with 8x 400GB SATA disks and the new 3Ware 9550SX SATA RAID controller for about $6K with two Opteron 272 CPUs and 8GB of RAM on a Tyan 2882 motherboard. We get about 400MB/s sustained disk read performance on these (with tuning) on Linux using the xfs filesystem, which is one of the most critical factors for large databases. Note that you want to have your DBMS use all of the CPU and disk channel bandwidth you have on each query, which takes a parallel database like Bizgres MPP to achieve. Regards, - Luke The What's New FAQ for PostgreSQL 8.1 says the buffer manager for 8.1 has been enhanced to scale almost linearly with the number of processors, leading to significant performance gains on 8-way, 16-way, dual-core, and multi-core CPU servers. Why not just use it as-is? Cheers, Adam ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware
Hi Dave, On Mon, 2005-11-14 at 18:51 -0500, Dave Cramer wrote: Joost, I've got experience with these controllers and which version do you have. I'd expect to see higher than 50MB/s although I've never tried RAID 5 I routinely see closer to 100MB/s with RAID 1+0 on their 9000 series OK, than there must be hope. I would also suggest that shared buffers should be higher than 7500, closer to 3, and effective cache should be up around 200k In my current 8.1 situation I use shared_buffers = 4, effective_cache_size = 131072 . work_mem is awfully high, remember that this will be given to each and every connection and can be more than 1x this number per connection depending on the number of sorts done in the query. I use such a high number because I am the only user querying and my queries do sorted joins etc. fsync=false ? I'm not even sure why we have this option, but I'd never set it to false. I want as much speed as possible for a database conversion that MUST be handled in 1 weekend (it lasts now, with the current speed almost 7 centuries. I may be off a millenium). If it fails because of hardware problem (the only reason we want and need fsync?) we will try next weekend until it finally goes right. What I can see is that only the *write* performance of *long updates* (and not inserts) are slow and they get slower in time: the first few thousand go relatively fast, after that PostgreSQL crawls to a halt (other benchmarks like bonnie++ or just dd'ing a big file don't have this behavior). I did notice that changing the I/O scheduler's nr_request from the default 128 to 1024 or even 4096 made a remarkable performance improvement. I suspect that experimenting with other I/O schedululers could improve performance. But it is hard to find any useful documentation about I/O schedulers. -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware
Title: Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware Joost, On 11/15/05 8:35 AM, Joost Kraaijeveld [EMAIL PROTECTED] wrote: thousand go relatively fast, after that PostgreSQL crawls to a halt (other benchmarks like bonnie++ or just dd'ing a big file don't have this behavior). With RAID5, it could matter a lot what block size you run your dd bigfile test with. You should run dd if=/dev/zero of=bigfile bs=8k count=50 for a 2GB main memory machine, multiply the count by (your mem/2GB). It is very important with the 3Ware cards to match the driver to the firmware revision. I did notice that changing the I/O scheduler's nr_request from the default 128 to 1024 or even 4096 made a remarkable performance improvement. I suspect that experimenting with other I/O schedululers could improve performance. But it is hard to find any useful documentation about I/O schedulers. You could try deadline, theres no harm, but Ive found that when you reach the point of experimenting with schedulers, you are probably not addressing the real problem. On a 3Ware 9500 with HW RAID5 and 4 or more disks I think you should get 100MB/s write rate, which is double what Postgres can use. We find that Postgres, even with fsync=false, will only run at a net COPY speed of about 8-12 MB/s, where 12 is the Bizgres number. 8.1 might do 10. But to get the 10 or 12, the WAL writing and other writing is about 4-5X more than the net write speed, or the speed at which the input file is parsed and read into the database. So, if you can get your dd bigfile test to write data at 50MB/s+ with a blocksize of 8KB, you should be doing well enough. Incidentally, we also find that using the XFS filesystem and setting the readahead to 8MB or more is extremely beneficial for performance with the 3Ware cards (and with others, but especially for the older 3Ware cards). Regards, - Luke
Re: [PERFORM] Hardware/OS recommendations for large databases (
Title: Re: [PERFORM] Hardware/OS recommendations for large databases ( Mike, On 11/15/05 6:55 AM, Michael Stone [EMAIL PROTECTED] wrote: On Tue, Nov 15, 2005 at 09:33:25AM -0500, Luke Lonergan wrote: write performance is now up to par with the best cards I believe. We find that you still need to set Linux readahead to at least 8MB (blockdev --setra) to get maximum read performance on them, is that your What on earth does that do to your seek performance? Were in decision support, as is our poster here, so seek isnt the issue, its sustained sequential transfer rate that we need. At 8MB, Id not expect too much damage though the default is 1.5MB. - Luke
Re: [PERFORM] Hardware/OS recommendations for large databases (
Title: Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB) Merlin, On 11/15/05 7:20 AM, Merlin Moncure [EMAIL PROTECTED] wrote: It's hard to say what would be better. My gut says the 5u box would be a lot better at handling high cpu/high concurrency problems...like your typical business erp backend. This is pure speculation of course...I'll defer to the experts here. With Oracle RAC, which is optimized for OLTP and uses a shared memory caching model, maybe or maybe not. Id put my money on the SMP in that case as you suggest, but what happens when the OS dies? For data warehousing, OLAP and decision support applications, RAC and other shared memory/disk architectures dont do you any good and the SMP machine is better by a bit. However, if you have an MPP database, where disk and memory are not shared, then the SMP machine is tens or hundreds of times slower than the cluster of the same price. - Luke
Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB)
Merlin Moncure wrote: You could instead buy 8 machines that total 16 cores, 128GB RAM and It's hard to say what would be better. My gut says the 5u box would be a lot better at handling high cpu/high concurrency problems...like your typical business erp backend. This is pure speculation of course...I'll defer to the experts here. In this specific case (data warehouse app), multiple machines is the better bet. Load data on 1 machine, copy to other servers and then use a middleman to spread out SQL statements to each machine. I was going to suggest pgpool as the middleman but I believe it's limited to 2 machines max at this time. I suppose you could daisy chain pgpools running on every machine. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware
Hi Luke, On Tue, 2005-11-15 at 10:42 -0800, Luke Lonergan wrote: With RAID5, it could matter a lot what block size you run your “dd bigfile” test with. You should run “dd if=/dev/zero of=bigfile bs=8k count=50” for a 2GB main memory machine, multiply the count by (your mem/2GB). If I understand correctly (I have 4GB ram): [EMAIL PROTECTED]:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=100 100+0 records in 100+0 records out 819200 bytes transferred in 304.085269 seconds (26939812 bytes/sec) Which looks suspicious: 26308 MB/sec??? It is very important with the 3Ware cards to match the driver to the firmware revision. OK, I am running 1 driver behind the firmware. I did notice that changing the I/O scheduler's nr_request from the default 128 to 1024 or even 4096 made a remarkable performance improvement. I suspect that experimenting with other I/O schedululers could improve performance. But it is hard to find any useful documentation about I/O schedulers. You could try deadline, there’s no harm, but I’ve found that when you reach the point of experimenting with schedulers, you are probably not addressing the real problem. It depends. I/O Schedulers (I assume) have a purpose: some schedulers should be more appropriate for some circumstances. And maybe my specific circumstances (converting a database with *many updates*) is a specific circumstance. I really don't know On a 3Ware 9500 with HW RAID5 and 4 or more disks I think you should get 100MB/s write rate, which is double what Postgres can use. We find that Postgres, even with fsync=false, will only run at a net COPY speed of about 8-12 MB/s, where 12 is the Bizgres number. 8.1 might do 10. But to get the 10 or 12, the WAL writing and other writing is about 4-5X more than the net write speed, or the speed at which the input file is parsed and read into the database. As I have an (almost) seperate WAL disk: iostat does not show any significant writing on the WAL disk So, if you can get your “dd bigfile” test to write data at 50MB/s+ with a blocksize of 8KB, you should be doing well enough. See above. Incidentally, we also find that using the XFS filesystem and setting the readahead to 8MB or more is extremely beneficial for performance with the 3Ware cards (and with others, but especially for the older 3Ware cards). I don't have problems with my read performance but *only* with my *update* performance (and not even insert performance). But than again I am not the only one with these problems: http://www.issociate.de/board/goto/894541/3ware_+_RAID5_ +_xfs_performance.html#msg_894541 http://lkml.org/lkml/2005/4/20/110 http://seclists.org/lists/linux-kernel/2005/Oct/1171.html I am happy to share the tables against which I am running my checks -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Too Many OR's?
I have a query that's making the planner do the wrong thing (for my definition of wrong) and I'm looking for advice on what to tune to make it do what I want. The query consists or SELECT'ing a few fields from a table for a large number of rows. The table has about seventy thousand rows and the user is selecting some subset of them. I first do a SELECT...WHERE to determine the unique identifiers I want (works fine) and then I do a SELECT WHERE IN giving the list of id's I need additional data on (which I see from EXPLAIN just gets translated into a very long list of OR's). Everything works perfectly until I get to 65301 rows. At 65300 rows, it does an index scan and takes 2197.193 ms. At 65301 rows it switches to a sequential scan and takes 778951.556 ms. Values known not to affect this are: work_mem, effective_cache_size. Setting random_page_cost from 4 to 1 helps (79543.214 ms) but I'm not really sure what '1' means, except it's relative. Of course, setting 'enable_seqscan false' helps immensely (2337.289 ms) but that's as inelegant of a solution as I've found - if there were other databases on this install that wouldn't be the right approach. Now I can break this down into multiple SELECT's in code, capping each query at 65300 rows, and that's a usable workaround, but academically I'd like to know how to convince the planner to do it my way. It's making a bad guess about something but I'm not sure what. I didn't see any hard-coded limits grepping through the source (though it is close to the 16-bit unsigned boundry - probably coincidental) so if anyone has ideas or pointers to how I might figure out what's going wrong that would be helpful. Thanks, -Bill - Bill McGonigle, Owner Work: 603.448.4440 BFC Computing, LLC Home: 603.448.1668 [EMAIL PROTECTED] Mobile: 603.252.2606 http://www.bfccomputing.com/Pager: 603.442.1833 Jabber: [EMAIL PROTECTED] Text: [EMAIL PROTECTED] Blog: http://blog.bfccomputing.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware
Joost Kraaijeveld wrote: If I understand correctly (I have 4GB ram): [EMAIL PROTECTED]:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=100 100+0 records in 100+0 records out 819200 bytes transferred in 304.085269 seconds (26939812 bytes/sec) Which looks suspicious: 26308 MB/sec??? Eh? That looks more like ~25.7 MB/sec, assuming 1MB = 1024*1024 bytes. -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware
On Tue, 2005-11-15 at 12:41 -0700, Steve Wampler wrote: Joost Kraaijeveld wrote: If I understand correctly (I have 4GB ram): [EMAIL PROTECTED]:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=100 100+0 records in 100+0 records out 819200 bytes transferred in 304.085269 seconds (26939812 bytes/sec) Which looks suspicious: 26308 MB/sec??? Eh? That looks more like ~25.7 MB/sec, assuming 1MB = 1024*1024 bytes. Oooops. This calculation error is not typical for my testing (I think ;-)). -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Too Many OR's?
On Tue, 2005-11-15 at 13:12, Bill McGonigle wrote: I have a query that's making the planner do the wrong thing (for my definition of wrong) and I'm looking for advice on what to tune to make it do what I want. The query consists or SELECT'ing a few fields from a table for a large number of rows. The table has about seventy thousand rows and the user is selecting some subset of them. I first do a SELECT...WHERE to determine the unique identifiers I want (works fine) and then I do a SELECT WHERE IN giving the list of id's I need additional data on (which I see from EXPLAIN just gets translated into a very long list of OR's). Everything works perfectly until I get to 65301 rows. At 65300 rows, it does an index scan and takes 2197.193 ms. At 65301 rows it switches to a sequential scan and takes 778951.556 ms. Values known not to affect this are: work_mem, effective_cache_size. Setting random_page_cost from 4 to 1 helps (79543.214 ms) but I'm not really sure what '1' means, except it's relative. Of course, setting 'enable_seqscan false' helps immensely (2337.289 ms) but that's as inelegant of a solution as I've found - if there were other databases on this install that wouldn't be the right approach. Now I can break this down into multiple SELECT's in code, capping each query at 65300 rows, and that's a usable workaround, but academically I'd like to know how to convince the planner to do it my way. It's making a bad guess about something but I'm not sure what. I didn't see any hard-coded limits grepping through the source (though it is close to the 16-bit unsigned boundry - probably coincidental) so if anyone has ideas or pointers to how I might figure out what's going wrong that would be helpful. OK, there IS a point at which switching to a sequential scan will be fast. I.e. when you're getting everything in the table. But the database is picking a number where to switch that is too low. First, we need to know if the statistics are giving the query planner a good enough idea of how many rows it's really gonna get versus how many it expects. Do an explain your query here and see how many it thinks it's gonna get. Since you've actually run it, you know how many it really is going to get, so there's no need for an explain analyze your query here just yet. Now, as long as the approximation is pretty close, fine. But if it's off by factors, then we need to increase the statistics target on that column, with: ALTER TABLE name ALTER columnname SET STATISTICS xxx where xxx is the new number. The default is set in your postgresql.conf file, and is usually pretty low, say 10. You can go up to 1000, but that makes query planning take longer. Try some incremental increase to say 20 or 40 or even 100, and run analyze on that table then do an explain on it again until the estimate is close. Once the estimate is close, you use change random_page_cost to get the query planner to switch at the right time. Change the number of in() numbers and play with random_page_cost and see where that sweet spot is. note that what seems right on a single table for a single user may not be best as you increase load or access other tables. random_page_cost represents the increase in a random access versus a sequential access. As long as your data fit into ram, the difference is pretty much none (i.e. random_page_cost=1) so don't set it too low, or accessing REALLY large data sets could become REALLY slow, as it uses indexes when it should have been sequentially scanning. Also, check what you've got effective_cache set to. This tells postgresql how much memory your kernel is using for cache, and so lets it know about how likely it is that your current data set under your query is to be in there. Also, read this: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB)
Unless there was a way to guarantee consistency, it would be hard at best to make this work. Convergence on large data sets across boxes is non-trivial, and diffing databases is difficult at best. Unless there was some form of automated way to ensure consistency, going 8 ways into separate boxes is *very* hard. I do suppose that if you have fancy storage (EMC, Hitachi) you could do BCV or Shadow copies. But in terms of commodity stuff, I'd have to agree with Merlin. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of William Yu Sent: Tuesday, November 15, 2005 10:57 AM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB) Merlin Moncure wrote: You could instead buy 8 machines that total 16 cores, 128GB RAM and It's hard to say what would be better. My gut says the 5u box would be a lot better at handling high cpu/high concurrency problems...like your typical business erp backend. This is pure speculation of course...I'll defer to the experts here. In this specific case (data warehouse app), multiple machines is the better bet. Load data on 1 machine, copy to other servers and then use a middleman to spread out SQL statements to each machine. I was going to suggest pgpool as the middleman but I believe it's limited to 2 machines max at this time. I suppose you could daisy chain pgpools running on every machine. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hardware/OS recommendations for large databases (
On 11/15/05, Luke Lonergan [EMAIL PROTECTED] wrote: Adam, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Claus Guttesen Sent: Tuesday, November 15, 2005 12:29 AM To: Adam Weisberg Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB) Does anyone have recommendations for hardware and/or OS to work with around 5TB datasets? Hardware-wise I'd say dual core opterons. One dual-core-opteron performs better than two single-core at the same speed. Tyan makes some boards that have four sockets, thereby giving you 8 cpu's (if you need that many). Sun and HP also makes nice hardware although the Tyan board is more competetive priced. OS wise I would choose the FreeBSD amd64 port but partititions larger than 2 TB needs some special care, using gpt rather than disklabel etc., tools like fsck may not be able to completely check partitions larger than 2 TB. Linux or Solaris with either LVM or Veritas FS sounds like candidates. I agree - you can get a very good one from www.acmemicro.com or www.rackable.com with 8x 400GB SATA disks and the new 3Ware 9550SX SATA RAID controller for about $6K with two Opteron 272 CPUs and 8GB of RAM on a Tyan 2882 motherboard. We get about 400MB/s sustained disk read performance on these (with tuning) on Linux using the xfs filesystem, which is one of the most critical factors for large databases. Spend a fortune on dual core CPUs and then buy crappy disks... I bet for most applications this system will be IO bound, and you will see a nice lot of drive failures in the first year of operation with consumer grade drives. Spend your money on better Disks, and don't bother with Dual Core IMHO unless you can prove the need for it. Alex ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Hardware/OS recommendations for large databases (
Not at random access in RAID 10 they aren't, and anyone with their head screwed on right is using RAID 10. The 9500S will still beat the Areca cards at RAID 10 database access patern. Alex. On 11/15/05, Dave Cramer [EMAIL PROTECTED] wrote: Luke, Have you tried the areca cards, they are slightly faster yet. Dave On 15-Nov-05, at 7:09 AM, Luke Lonergan wrote: I agree - you can get a very good one from www.acmemicro.com or www.rackable.com with 8x 400GB SATA disks and the new 3Ware 9550SX SATA RAID controller for about $6K with two Opteron 272 CPUs and 8GB of RAM on a Tyan 2882 motherboard. We get about 400MB/s sustained disk read performance on these (with tuning) on Linux using the xfs filesystem, which is one of the most critical factors for large databases. Note that you want to have your DBMS use all of the CPU and disk channel bandwidth you have on each query, which takes a parallel database like Bizgres MPP to achieve. Regards, ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware
Title: Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware Joost, On 11/15/05 11:51 AM, Joost Kraaijeveld [EMAIL PROTECTED] wrote: On Tue, 2005-11-15 at 12:41 -0700, Steve Wampler wrote: Joost Kraaijeveld wrote: If I understand correctly (I have 4GB ram): [EMAIL PROTECTED]:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=100 100+0 records in 100+0 records out 819200 bytes transferred in 304.085269 seconds (26939812 bytes/sec) Which looks suspicious: 26308 MB/sec??? Eh? That looks more like ~25.7 MB/sec, assuming 1MB = 1024*1024 bytes. Oooops. This calculation error is not typical for my testing (I think ;-)). Summarizing the two facts of note: the write result is 1/4 of what you should be getting, and you are running 1 driver behind the firmware. You might update your driver, rerun the test, and if you still have the slow result, verify that your filesystem isnt fragmented (multiple undisciplined apps on the same filesystem will do that). WAL on a separate disk, on a separate controller? What is the write performance there? Regards, - Luke
Re: [PERFORM] Hardware/OS recommendations for large databases (
Title: Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB) James, On 11/15/05 11:07 AM, James Mello [EMAIL PROTECTED] wrote: Unless there was a way to guarantee consistency, it would be hard at best to make this work. Convergence on large data sets across boxes is non-trivial, and diffing databases is difficult at best. Unless there was some form of automated way to ensure consistency, going 8 ways into separate boxes is *very* hard. I do suppose that if you have fancy storage (EMC, Hitachi) you could do BCV or Shadow copies. But in terms of commodity stuff, I'd have to agree with Merlin. Its a matter of good software that handles the distribution / parallel query optimization / distributed transactions and management features. Combine that with a gigabit ethernet switch and it works we routinely get 50x speedup over SMP on OLAP / Decision Support workloads. Regards, - Luke
Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware
Hi Luke, On Tue, 2005-11-15 at 22:07 -0800, Luke Lonergan wrote: You might update your driver, I will do that (but I admit that I am not looking forward to it. When I was young and did not make money with my computer, I liked challenges like compiling kernels and not being able to boot the computer. Not any more :-)). WAL on a separate disk, on a separate controller? What is the write performance there? WAL is on a separate disk and a separate controller, write performance: [EMAIL PROTECTED]:/tmp$ dd if=/dev/zero of=bigfile bs=8k count=100 100+0 records in 100+0 records out 819200 bytes transferred in 166.499230 seconds (49201429 bytes/sec) The quest continues... -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly