Re: [PERFORM] SSD + RAID
Scott Carey wrote: On Feb 20, 2010, at 3:19 PM, Bruce Momjian wrote: Dan Langille wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Bruce Momjian wrote: Matthew Wakeling wrote: On Fri, 13 Nov 2009, Greg Smith wrote: In order for a drive to work reliably for database use such as for PostgreSQL, it cannot have a volatile write cache. You either need a write cache with a battery backup (and a UPS doesn't count), or to turn the cache off. The SSD performance figures you've been looking at are with the drive's write cache turned on, which means they're completely fictitious and exaggerated upwards for your purposes. In the real world, that will result in database corruption after a crash one day. Seagate are claiming to be on the ball with this one. http://www.theregister.co.uk/2009/12/08/seagate_pulsar_ssd/ I have updated our documentation to mention that even SSD drives often have volatile write-back caches. Patch attached and applied. Hmmm. That got me thinking: consider ZFS and HDD with volatile cache. Do the characteristics of ZFS avoid this issue entirely? No, I don't think so. ZFS only avoids partial page writes. ZFS still assumes something sent to the drive is permanent or it would have no way to operate. ZFS is write-back cache aware, and safe provided the drive's cache flushing and write barrier related commands work. It will flush data in 'transaction groups' and flush the drive write caches at the end of those transactions. Since its copy on write, it can ensure that all the changes in the transaction group appear on disk, or all are lost. This all works so long as the cache flush commands do. Agreed, thought I thought the problem was that SSDs lie about their cache flush like SATA drives do, or is there something I am missing? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
Bruce Momjian wrote: Agreed, thought I thought the problem was that SSDs lie about their cache flush like SATA drives do, or is there something I am missing? There's exactly one case I can find[1] where this century's IDE drives lied more than any other drive with a cache: Under 120GB Maxtor drives from late 2003 to early 2004. and it's apparently been worked around for years. Those drives claimed to support the FLUSH_CACHE_EXT feature (IDE command 0xEA), but did not support sending 48-bit commands which was needed to send the cache flushing command. And for that case a workaround for Linux was quickly identified by checking for *both* the support for 48-bit commands and support for the flush cache extension[2]. Beyond those 2004 drive + 2003 kernel systems, I think most the rest of such reports have been various misfeatures in some of Linux's filesystems (like EXT3 that only wants to send drives cache-flushing commands when inode change[3]) and linux software raid misfeatures ...and ISTM those would affect SSDs the same way they'd affect SATA drives. [1] http://lkml.org/lkml/2004/5/12/132 [2] http://lkml.org/lkml/2004/5/12/200 [3] http://www.mail-archive.com/linux-ker...@vger.kernel.org/msg272253.html -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Auto Vacuum out of memory
On Mon, Feb 15, 2010 at 3:20 PM, Rose Zhou r...@anatec.com wrote: We bought a new WinXP x64 Professional, it has 12GB memory. I installed postgresql-8.4.1-1-windows version on this PC, also installed another .Net application which reads in data from a TCP port and insert/update the database, the data volume is large, with heavy writing and updating on a partitioned table. I configured the PostgreSQL as below: Shared_buffers=1024MB effective_cache_size=5120MB work_mem=32MB maintenance_work_men=200MB But I got the Auto Vacuum out-of-memory error. The detailed configuration is as follows, can anyone suggest what is the best configuration from the performance perspective? Please see http://wiki.postgresql.org/wiki/Guide_to_reporting_problems You haven't provided very much detail here - for example, the error message that you got is conspicuously absent. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
Ron Mayer wrote: Bruce Momjian wrote: Agreed, thought I thought the problem was that SSDs lie about their cache flush like SATA drives do, or is there something I am missing? There's exactly one case I can find[1] where this century's IDE drives lied more than any other drive with a cache: Ron is correct that the problem of mainstream SATA drives accepting the cache flush command but not actually doing anything with it is long gone at this point. If you have a regular SATA drive, it almost certainly supports proper cache flushing. And if your whole software/storage stacks understands all that, you should not end up with corrupted data just because there's a volative write cache in there. But the point of this whole testing exercise coming back into vogue again is that SSDs have returned this negligent behavior to the mainstream again. See http://opensolaris.org/jive/thread.jspa?threadID=121424 for a discussion of this in a ZFS context just last month. There are many documented cases of Intel SSDs that will fake a cache flush, such that the only way to get good reliable writes is to totally disable their writes caches--at which point performance is so bad you might as well have gotten a RAID10 setup instead (and longevity is toast too). This whole area remains a disaster area and extreme distrust of all the SSD storage vendors is advisable at this point. Basically, if I don't see the capacitor responsible for flushing outstanding writes, and get a clear description from the manufacturer how the cached writes are going to be handled in the event of a power failure, at this point I have to assume the answer is badly and your data will be eaten. And the prices for SSDs that meet that requirement are still quite steep. I keep hoping somebody will address this market at something lower than the standard enterprise prices. The upcoming SandForce designs seem to have thought this through correctly: http://www.anandtech.com/storage/showdoc.aspx?i=3702p=6 But the product's not out to the general public yet (just like the Seagate units that claim to have capacitor backups--I heard a rumor those are also Sandforce designs actually, so they may be the only ones doing this right and aiming at a lower price). -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us
Re: [PERFORM] SSD + RAID
On 22-2-2010 6:39 Greg Smith wrote: But the point of this whole testing exercise coming back into vogue again is that SSDs have returned this negligent behavior to the mainstream again. See http://opensolaris.org/jive/thread.jspa?threadID=121424 for a discussion of this in a ZFS context just last month. There are many documented cases of Intel SSDs that will fake a cache flush, such that the only way to get good reliable writes is to totally disable their writes caches--at which point performance is so bad you might as well have gotten a RAID10 setup instead (and longevity is toast too). That's weird. Intel's SSD's didn't have a write cache afaik: I asked Intel about this and it turns out that the DRAM on the Intel drive isn't used for user data because of the risk of data loss, instead it is used as memory by the Intel SATA/flash controller for deciding exactly where to write data (I'm assuming for the wear leveling/reliability algorithms). http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3403p=10 But that is the old version, perhaps the second generation does have a bit of write caching. I can understand a SSD might do unexpected things when it loses power all of a sudden. It will probably try to group writes to fill a single block (and those blocks vary in size but are normally way larger than those of a normal spinning disk, they are values like 256 or 512KB) and it might loose that waiting until a full block can be written-data or perhaps it just couldn't complete a full block-write due to the power failure. Although that behavior isn't really what you want, it would be incorrect to blame write caching for the behavior if the device doesn't even have a write cache ;) Best regards, Arjen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance