[PERFORM] NAS, SAN or any alternate solution ?
Hi all, I've been searching the list for a while but couldn't find any up-to-date information relating to my problem. We have a production server with postgresql on cygwin that currently deels with about 200 Gigs of data (1 big IDE drive). We plan to move to linux for some reasons I don't have to explain. Our aim is also to be able to increase our storage capacity up to approximately 1 or 2 terabytes and to speed up our production process. As we are a small microsoft addicted company , we have some difficulties to choose the best configuration that would best meet our needs. Our production process is based on transaction (mostly huge inserts) and disk access is the main bottlle-neck. Our main concern is hardware related : Would NAS or SAN be good solutions ? (I've read that NAS uses NFS which could slow down the transfer rate ??) Has anyone ever tried one of these with postgresql ? I would appreciate any comments. Thanks in advance. Benjamin. Benjamin Simon - Ingénieur Développement Cartographie http://www.loxane.com tel : 01 30 40 24 00 Fax : 01 30 40 24 04 LOXANE 271, Chaussée Jules César 95250 Beauchamp France
Re: [PERFORM] NAS, SAN or any alternate solution ?
On Tue, 2004-07-20 at 01:52, [EMAIL PROTECTED] wrote: Hi all, I've been searching the list for a while but couldn't find any up-to-date information relating to my problem. We have a production server with postgresql on cygwin that currently deels with about 200 Gigs of data (1 big IDE drive). We plan to move to linux for some reasons I don't have to explain. Our aim is also to be able to increase our storage capacity up to approximately 1 or 2 terabytes and to speed up our production process. As we are a small microsoft addicted company , we have some difficulties to choose the best configuration that would best meet our needs. Our production process is based on transaction (mostly huge inserts) and disk access is the main bottlle-neck. Our main concern is hardware related : Would NAS or SAN be good solutions ? (I've read that NAS uses NFS which could slow down the transfer rate ??) Has anyone ever tried one of these with postgresql ? Your best bet would likely be a large external RAID system with lots o cache. Next would be a fast internal RAID card like the LSI Megaraid cards, with lots of drives and batter backed cache. Next would be a SAN, but be careful, there may be issues with some cards and their drivers under linux, research them well before deciding. NFS is right out if you want good performance AND reliability. The cheapest solution that is likely to meet your needs would be the internal RAID card with battery backed cache. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Réf. : Re: [PERFORM] NAS, SAN or any alternate solution ?
Thanks a lot Scott. It seems that we were totally wrong when considering a network storage solution. I've read your techdoc http://techdocs.postgresql.org/guides/DiskTuningGuide and found many interesting remarks. I think that we will know focus on external Raid systems which seem to be relativily affordable compared to NAS or SAN (we would have had the budget for one of these). As we don't plan to have more than 5 connections (I.E process), we think SATA drives would fit our requirements. Could this be an issue for an after crash recovery ? We also hesitate concerning the raid level to use. We are currently comparing raid 1+0 and raid 5 but we have no actual idea on which one to use. Our priorities are : 1) performance 2) recovery 3) price 4) back-up It could be nice to have any comments from people who have already set up a similar platform, giving some precise details of the hardware configuration : - brand of the raid device, - technology used (SCSI/IDE, RAID level ...), - size of the database, number of disks/size of disks ... Such a knowledge base may be useful to convince people to migrate to opensource cheap reliable solutions. Thanks again. Benjamin. Scott Marlowe [EMAIL PROTECTED] Envoyé par : [EMAIL PROTECTED] 20/07/2004 10:20 Pour :[EMAIL PROTECTED] cc :[EMAIL PROTECTED] Objet :Re: [PERFORM] NAS, SAN or any alternate solution ? On Tue, 2004-07-20 at 01:52, [EMAIL PROTECTED] wrote: Hi all, I've been searching the list for a while but couldn't find any up-to-date information relating to my problem. We have a production server with postgresql on cygwin that currently deels with about 200 Gigs of data (1 big IDE drive). We plan to move to linux for some reasons I don't have to explain. Our aim is also to be able to increase our storage capacity up to approximately 1 or 2 terabytes and to speed up our production process. As we are a small microsoft addicted company , we have some difficulties to choose the best configuration that would best meet our needs. Our production process is based on transaction (mostly huge inserts) and disk access is the main bottlle-neck. Our main concern is hardware related : Would NAS or SAN be good solutions ? (I've read that NAS uses NFS which could slow down the transfer rate ??) Has anyone ever tried one of these with postgresql ? Your best bet would likely be a large external RAID system with lots o cache. Next would be a fast internal RAID card like the LSI Megaraid cards, with lots of drives and batter backed cache. Next would be a SAN, but be careful, there may be issues with some cards and their drivers under linux, research them well before deciding. NFS is right out if you want good performance AND reliability. The cheapest solution that is likely to meet your needs would be the internal RAID card with battery backed cache. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Réf. : Re: [PERFORM] NAS,
[EMAIL PROTECTED] wrote: As we don't plan to have more than 5 connections (I.E process), we think SATA drives would fit our requirements. Could this be an issue for an after crash recovery ? If you can disable the write ATA write cache, then you have safety. Unfortunately many cards under Linux show up as SCSI devices, and you can't access this setting. Does anyone know if the newer SATA cards let you control this? You might want to keep and eye on the upcoming native windows port in 7.5 - It will come with a fearsome array of caveats... but you have been running cygwin in production! - and I am inclined to think the native port will be more solid than this configuration. regards Mark ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Réf. : Re: Réf. : Re: [PERFORM] NAS, SAN or any alternate solution ?
I must say that cygwin did well (there exists good software on windows, i've found one)... as a prototype ... when I look at the postgresql poll (http://www.postgresql.org/survey.php?View=1SurveyID=11), it seems like I'm not alone !! Actually, the major problem was the limit of the available allocable memory restricted by cygwin. We don't plan to wait for the 7.5 win native version of postgresql. It was hard enough to decide moving to linux, I don't want to rollback everything :) Thanks for the advice, I will definetely have a look at the new version anyway as soon as it is released. Regards, Benjamin. Mark Kirkwood [EMAIL PROTECTED] 20/07/2004 12:04 Pour :[EMAIL PROTECTED] cc :[EMAIL PROTECTED] Objet :Re: Réf. : Re: [PERFORM] NAS, SAN or any alternate solution ? [EMAIL PROTECTED] wrote: As we don't plan to have more than 5 connections (I.E process), we think SATA drives would fit our requirements. Could this be an issue for an after crash recovery ? If you can disable the write ATA write cache, then you have safety. Unfortunately many cards under Linux show up as SCSI devices, and you can't access this setting. Does anyone know if the newer SATA cards let you control this? You might want to keep and eye on the upcoming native windows port in 7.5 - It will come with a fearsome array of caveats... but you have been running cygwin in production! - and I am inclined to think the native port will be more solid than this configuration. regards Mark
Re: [PERFORM] Working on huge RAM based datasets
Sorry for the late reply - I've been away, and I've had problems posting too :( Merlin, I'd like to come back with a few more points! That's the whole point: memory is a limited resource. If pg is crawling, then the problem is simple: you need more memory. My posting only relates to the scenario where RAM is not a limiting factor, a scenario which shall become increasingly common over the next few years, as 64 bit processors and OSs allow the exploitation of ever larger, ever cheaper RAM. Incidentally, If PG is crawling, memory might be the problem...but not necessarily - could be disk bound on writes. The question is: is it postgresql's responsibility to manage that resource? I think you are confusing the issue of RAM and address space. Any application can acquire a piece of address space for its own use. It is the responsibility of the application to do what it needs with that address space. I'm interested in how PG could do something better in its address space when it knows that it can fit all the data it operates on within that address space. Though the OS is responsible for determining whether that address space is RAM resident or not, in my scenario, this is irrelevant, because there *will* be enough RAM for everything, and the OS will, in that scenario, allow all the address space to become RAM resident. I am not advocating undermining the OS in any way. It would be stupid to make PGSQL take over the running of the hardware. I've learned the hard way that bypassing the OS is just a big pain up the backside! Pg is a data management tool, not a memory management tool. I'm not criticising PG. PG is actually a 'DISK/MEMORY' data management tool. It manages data which lives on disks, but it can only operate on that data in memory, and goes to some lengths to try to fit bits of disk data in a defined piece of memory, and push them back out again. At the moment, this model assumes that RAM is a scarce resource. The model still 'sort of' works when RAM is actually not scarce, because the OS effectively uses that extra RAM to make IO *appear* to be quicker, and indeed, I've found that a hint has been added to PG to tell it how much the OS is likely to be caching. But the question is this: If you wrote a DB from scratch with the assumption that *all* the data could fit in the address space allocated by the postmaster, and you were confident that the OS had enough RAM so that you never suffered vmem page misses, couldn't you make things go much faster? A more pertinent question is: Could PG be extended to have a flag, which when enabled, told it to operate with the assumption that it could fit all the disk data in RAM, and implement the data organisation optimisations that rely on the persistence of data in address space? The same 'let's manage everything' argument also frequently gets brought up wrt file i/o, because people assume the o/s sucks at file management. Well, I'm not saying this. I have substantial experience with high performance file IO through a filesystem. But if you are interested in high speed IO, naive 'let the OS do everything' approach isn't often good enough. You, the application, have to be aware that the order and timing of IO requests, along with the size of IO block you cause to trigger, have a dramatic impact on the speed with which the data reaches your app, OS or no OS. Most high speed storage still relies on spinning things containing data that can only be accessed in a certain way, and data movement is page boundary sensitive. The OS may hide these details from you, but you, the app writer, have to have an understanding of the underlying reality if you want to optimise performance. I want to stress that at no point am I advocating *not* using the OS. PG should do ALL IO and memory allocation through the OS, otherwise you end up with a platform specific product that is of little use. That given, there is still the opportunity for PG to be able to operate far more efficiently in my high memory scenario. Wouldn't your backend processes like to have the entire database sitting ready in address space (ram resident, of course!), indexes all fully built? No tuple more than a few machine instructions away? Imagine the postmaster isn't having to frantically decide which bits of data to kick out of the workspace in order to keep the backends happy. Imagine the postmaster isn't having to build structures to keep track of the newly read in blocks of data from 'disk' (or OS cache). Imagine that everything was just there... Is this not a compelling scenario? At some point, hard disks will be replaced by solid state memory technologies... This is irrelevant to my scenario. The optimisations I crave are to do with getting the entire database in a query-optimised form near to the CPUS - i.e. in fast RAM. (I'd expect solid state disk ram to be much slower than the RAM that sits nearer the CPU). The speed of the persistent storage system (whether spinning
Re: [PERFORM] Odd sorting behaviour
On Thu, Jul 15, 2004 at 02:08:54PM +0200, Steinar H. Gunderson wrote: sort_mem is already 16384, which I thought would be plenty -- I tried increasing it to 65536 which made exactly zero difference. :-) I've tried some further tweaking, but I'm still unable to force it into doing a hash join -- any ideas how I can find out why it chooses a merge join? /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] NAS, SAN or any alternate solution ?
...and on Tue, Jul 20, 2004 at 09:52:56AM +0200, [EMAIL PROTECTED] used the keyboard: Hi all, I've been searching the list for a while but couldn't find any up-to-date information relating to my problem. We have a production server with postgresql on cygwin that currently deels with about 200 Gigs of data (1 big IDE drive). We plan to move to linux for some reasons I don't have to explain. Our aim is also to be able to increase our storage capacity up to approximately 1 or 2 terabytes and to speed up our production process. As we are a small microsoft addicted company , we have some difficulties to choose the best configuration that would best meet our needs. Our production process is based on transaction (mostly huge inserts) and disk access is the main bottlle-neck. Our main concern is hardware related : Would NAS or SAN be good solutions ? (I've read that NAS uses NFS which could slow down the transfer rate ??) Has anyone ever tried one of these with postgresql ? I would appreciate any comments. Thanks in advance. Hello Simon, We're testing 3ware Escalade 9000, which is a hardware-raid SATA controller with VERY good support for Linux (including direct access for S.M.A.R.T. applications, which is a serious problem with other RAID controllers), featuring RAID levels 0, 1, 10, 5, JBOD, up to 12 SATA channels (that's 3ware Escalade 9500S-12, they also come in 4- and 8-channel versions, up to four cards can be fitted into a system), up to 1GB battery-backed ECC RAM (128MB out-of-the-box) and most of all, excellent tuning guides that actually manage to exceed the scope of merely making you come up with good benchmark results for that controller in a specific test environment. Our preliminary tests show that a setup of four 250GB SATA Maxtors that aren't really qualified as fast drives, in RAID5 can deliver block writes of 50MB/s, rewrites at about 35MB/s and reads of approximately 180MB/s, which is rougly 2.5-times the performance of previous Escalades. You can find more info on Escalade 9000 series, benchmarks and other stuff here: http://www.3ware.com/products/serial_ata9000.asp http://www.3ware.com/products/benchmarks_sata.asp http://www.3ware.dk/fileadmin/3ware/documents/Benchmarks/Linux_kernel_2.6_Benchmarking.pdf Oh, and not to forget - the price for a 3ware 9500S-12, the version we're testing ranges between EUR1000 and EUR1500, depending on the contract you have with the reseller and the intended use of the device. SATA disks are dirt-cheap nowadays, as has been mentioned before. I do agree on the reliability of cache-usage setting those drives report though, it may or may not be true. But one never knows that for sure with SCSI drives either. At least you can assert that proper controller cache sizing with drives that usually feature 8MB (!!!) cache, will mostly ensure that even the largest amount of data that could fit into a hard disk cache of the entire array (96MB) will still be available in the controller cache after a power failure, for it to be re-checked and ensured it is properly written. Hope this helps, -- Grega Bremec Senior Administrator Noviforum Ltd., Software Media http://www.noviforum.si/ pgptdbYC1z9Fk.pgp Description: PGP signature
Re: [PERFORM] NAS, SAN or any alternate solution ?
Oh, and not to forget - the price for a 3ware 9500S-12, the version we're testing ranges between EUR1000 and EUR1500, depending on the contract you have with the reseller and the intended use of the device. SATA disks are dirt-cheap nowadays, as has been mentioned before. Correction, EUR500 and EUR1000, VAT not included. :) Sorry for the mix-up. -- Grega Bremec Senior Administrator Noviforum Ltd., Software Media http://www.noviforum.si/ pgpLKhS3Qk0se.pgp Description: PGP signature
Re: Réf. : Re: [PERFORM] NAS, SAN or any
On Tue, 2004-07-20 at 03:32, [EMAIL PROTECTED] wrote: Thanks a lot Scott. It seems that we were totally wrong when considering a network storage solution. I've read your techdoc http://techdocs.postgresql.org/guides/DiskTuningGuide and found many interesting remarks. I think that we will know focus on external Raid systems which seem to be relativily affordable compared to NAS or SAN (we would have had the budget for one of these). As we don't plan to have more than 5 connections (I.E process), we think SATA drives would fit our requirements. Could this be an issue for an after crash recovery ? If you're looking at (S)ATA RAID, definitely look at escalade, as another poster mentioned. Last year I and a few other folks on the lists were testing RAID controllers for survival of the power plug pull test, and the Escalade passed (someone else did the testing, I tested the LSI MegaRAID 320-2 controller with battery backed cache). We also hesitate concerning the raid level to use. We are currently comparing raid 1+0 and raid 5 but we have no actual idea on which one to use. Our priorities are : 1) performance 2) recovery 3) price 4) back-up Basically, for a smaller number of drivers, RAID 1+0 is almost always a win over RAID 5. As the number of drives in the array grows, RAID 5 usually starts to pull back in the lead. RAID 5 definitely gives you the most storage for your dollar of any of the redundant array types. The more important point of a RAID controller is that it have battery backed cache to make sure that the database server isn't waiting for WAL writes all the time. A single port LSI Megaraid 320-1 controller is only about $500 or less, the last time I checked (with battery backed cache, order it WITH the battery and cache, otherwise you may have a hard time finding the right parts later on.) It supports hot spares for automatic rebuild. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] NAS, SAN or any alternate solution ?
[EMAIL PROTECTED] wrote: Would NAS or SAN be good solutions ? (I've read that NAS uses NFS which could slow down the transfer rate ??) Has anyone ever tried one of these with postgresql ? Not (yet) with Postgres, but my company has run ~100GB Oracle database on NAS (NetApp) for the past couple of years. We've found it to outperform local attached storage, and it has been extremely reliable and flexible. Our DBAs wouldn't give it up without a fight. Joe ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Odd sorting behaviour
Steinar, I've tried some further tweaking, but I'm still unable to force it into doing a hash join -- any ideas how I can find out why it chooses a merge join? I'm sorry, I can't really give your issue the attention it deserves. At this point, I'd have to get a copy of your database, and play around with alternate query structures; and I don't have time. Sorry! -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Odd sorting behaviour
Steinar, I've tried some further tweaking, but I'm still unable to force it into doing a hash join -- any ideas how I can find out why it chooses a merge join? Actually, quick question -- have you tried setting enable_mergjoin=false to see the plan the system comes up with? Is it in fact faster? -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Unbearably slow cascading deletes
On Tue, 20 Jul 2004 [EMAIL PROTECTED] wrote: I have (among other things) a parent table with 200 records and a child table with 20MM or more. I set up referential integrity on the FK with ON DELETE CASCADE. It appears that when a DELETE is done on the parent table, the child table deletion is done with a sequential scan. I say this because it took over four minutes to delete a parent record THAT HAD NO CHILDREN. The DB is recently analyzed and SELECTs in the child table are done by the appropriate index on the FK. Let me guess, the cascade trigger's query plan is decided at schema load time, when the optimizer has no clue. Is there a way to fix this without writing my own triggers, using PL/PGSQL EXECUTE to delay the planner? The query plan should be decided at the first cascaded delete for the key in the session. However, IIRC, it's using $arguments for the key values, so it's possible that that is giving it a different plan than it would get if the value were known. What do you get if you prepare the query with an argument for the key and use explain execute? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Unbearably slow cascading deletes
On Tue, 20 Jul 2004, Stephan Szabo wrote: On Tue, 20 Jul 2004 [EMAIL PROTECTED] wrote: I have (among other things) a parent table with 200 records and a child table with 20MM or more. I set up referential integrity on the FK with ON DELETE CASCADE. It appears that when a DELETE is done on the parent table, the child table deletion is done with a sequential scan. I say this because it took over four minutes to delete a parent record THAT HAD NO CHILDREN. The DB is recently analyzed and SELECTs in the child table are done by the appropriate index on the FK. Let me guess, the cascade trigger's query plan is decided at schema load time, when the optimizer has no clue. Is there a way to fix this without writing my own triggers, using PL/PGSQL EXECUTE to delay the planner? The query plan should be decided at the first cascaded delete for the key in the session. However, IIRC, it's using $arguments for the key values, so it's possible that that is giving it a different plan than it would get if the value were known. What do you get if you prepare the query with an argument for the key and use explain execute? To be clear, I mean prepare/explain execute an example select/delete from the fk. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Unbearably slow cascading deletes
PREPARE c(int4) AS DELETE FROM childtable WHERE fk=$1; EXPLAIN EXECUTE c(-1); gives an index scan. PREPARE c2(int4) AS DELETE FROM parenttable WHERE key=$1; EXPLAIN EXECUTE c2(1); gives a seq scan on the parent table (itself a little curious) and no explanation of what the triggers are doing. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Unbearably slow cascading deletes
I FOUND IT! A second trigger that doesn't belong.. OK, we're set now, and thanks for showing me some ways to check what the planner is up to. Is there a way of seeing what the triggers will do? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Unbearably slow cascading deletes
I have (among other things) a parent table with 200 records and a child table with 20MM or more. I set up referential integrity on the FK with ON DELETE CASCADE. It appears that when a DELETE is done on the parent table, the child table deletion is done with a sequential scan. I say this because it took over four minutes to delete a parent record THAT HAD NO CHILDREN. The DB is recently analyzed and SELECTs in the child table are done by the appropriate index on the FK. Do you have an index on the foreign key field? Chris ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Odd sorting behaviour
I could of course post the updated query plan if anybody is interested; let me know. (The data is still available if anybody needs it as well, of course.) I've taken a look and managed to cut out quite a bit of used time. You'll need to confirm it's the same results though (I didn't -- it is the same number of results (query below) First off, DROP INDEX prodid_index;. It doesn't help anything since the primary key is just as usable, but it does take enough space that it causes thrashing in the buffer_cache. Any queries based on prodid will use the index for the PRIMARY KEY instead. Secondly, I had no luck getting the hashjoin but this probably doesn't matter. I've assumed that the number of users will climb faster than the product set offered, and generated additional data via the below command run 4 times: INSERT INTO opinions SELECT prodid, uid + (SELECT max(uid) FROM opinions), opinion FROM opinions; I found that by this point, the hashjoin and mergejoin have essentially the same performance -- in otherwords, as you grow you'll want the mergejoin eventually so I wouldn't worry about it too much. New Query cuts about 1/3rd the time, forcing hashjoin gets another 1/3rd but see the above note: SELECT o3.prodid , SUM(o3.opinion*o12.correlation) AS total_correlation FROM opinions o3 -- Plain join okay since o12.correlation 0 -- eliminates any NULLs anyway. -- Was RIGHT JOIN JOIN (SELECT o2.uid , SUM(o1.opinion*o2.opinion)/SQRT(count(*)::numeric) AS correlation FROM opinions AS o1 JOIN opinions AS o2 USING (prodid) WHERE o1.uid = 1355 GROUP BY o2.uid ) AS o12 USING (uid) -- Was old Left join WHERE o3.prodid NOT IN (SELECT prodid FROM opinions AS o4 WHERE uid = 1355) AND o3.opinion 0 AND o12.correlation 0 GROUP BY o3.prodid ORDER BY total_correlation desc; ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly