Re: [PERFORM] SSD + RAID

2010-02-21 Thread Bruce Momjian
Scott Carey wrote:
 On Feb 20, 2010, at 3:19 PM, Bruce Momjian wrote:
 
  Dan Langille wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  Bruce Momjian wrote:
  Matthew Wakeling wrote:
  On Fri, 13 Nov 2009, Greg Smith wrote:
  In order for a drive to work reliably for database use such as for
  PostgreSQL, it cannot have a volatile write cache.  You either need a 
  write
  cache with a battery backup (and a UPS doesn't count), or to turn the 
  cache
  off.  The SSD performance figures you've been looking at are with the 
  drive's
  write cache turned on, which means they're completely fictitious and
  exaggerated upwards for your purposes.  In the real world, that will 
  result
  in database corruption after a crash one day.
  Seagate are claiming to be on the ball with this one.
 
  http://www.theregister.co.uk/2009/12/08/seagate_pulsar_ssd/
 
  I have updated our documentation to mention that even SSD drives often
  have volatile write-back caches.  Patch attached and applied.
 
  Hmmm.  That got me thinking: consider ZFS and HDD with volatile cache.
  Do the characteristics of ZFS avoid this issue entirely?
 
  No, I don't think so.  ZFS only avoids partial page writes.  ZFS still
  assumes something sent to the drive is permanent or it would have no way
  to operate.
 
 
 ZFS is write-back cache aware, and safe provided the drive's
 cache flushing and write barrier related commands work.  It will
 flush data in 'transaction groups' and flush the drive write
 caches at the end of those transactions.  Since its copy on
 write, it can ensure that all the changes in the transaction
 group appear on disk, or all are lost.  This all works so long
 as the cache flush commands do.

Agreed, thought I thought the problem was that SSDs lie about their
cache flush like SATA drives do, or is there something I am missing?

--
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com
  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2010-02-21 Thread Ron Mayer
Bruce Momjian wrote:
 Agreed, thought I thought the problem was that SSDs lie about their
 cache flush like SATA drives do, or is there something I am missing?

There's exactly one case I can find[1] where this century's IDE
drives lied more than any other drive with a cache:

  Under 120GB Maxtor drives from late 2003 to early 2004.

and it's apparently been worked around for years.

Those drives claimed to support the FLUSH_CACHE_EXT feature (IDE
command 0xEA), but did not support sending 48-bit commands which
was needed to send the cache flushing command.

And for that case a workaround for Linux was quickly identified by
checking for *both* the support for 48-bit commands and support for the
flush cache extension[2].


Beyond those 2004 drive + 2003 kernel systems, I think most the rest
of such reports have been various misfeatures in some of Linux's
filesystems (like EXT3 that only wants to send drives cache-flushing
commands when inode change[3]) and linux software raid misfeatures

...and ISTM those would affect SSDs the same way they'd affect SATA drives.


[1] http://lkml.org/lkml/2004/5/12/132
[2] http://lkml.org/lkml/2004/5/12/200
[3] http://www.mail-archive.com/linux-ker...@vger.kernel.org/msg272253.html



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Auto Vacuum out of memory

2010-02-21 Thread Robert Haas
On Mon, Feb 15, 2010 at 3:20 PM, Rose Zhou r...@anatec.com wrote:
 We bought a new WinXP x64 Professional, it has 12GB memory.

 I installed postgresql-8.4.1-1-windows version on this PC, also installed
 another .Net application which reads in data from a TCP port and
 insert/update the database, the data volume is large, with heavy writing and
 updating on a partitioned table.

 I configured the PostgreSQL as below:

 Shared_buffers=1024MB
 effective_cache_size=5120MB
 work_mem=32MB
 maintenance_work_men=200MB

 But I got the Auto Vacuum out-of-memory error. The detailed configuration is
 as follows, can anyone suggest what is the best configuration from the
 performance perspective?

Please see http://wiki.postgresql.org/wiki/Guide_to_reporting_problems

You haven't provided very much detail here - for example, the error
message that you got is conspicuously absent.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2010-02-21 Thread Greg Smith

Ron Mayer wrote:

Bruce Momjian wrote:
  

Agreed, thought I thought the problem was that SSDs lie about their
cache flush like SATA drives do, or is there something I am missing?



There's exactly one case I can find[1] where this century's IDE
drives lied more than any other drive with a cache:


Ron is correct that the problem of mainstream SATA drives accepting the 
cache flush command but not actually doing anything with it is long gone 
at this point.  If you have a regular SATA drive, it almost certainly 
supports proper cache flushing.  And if your whole software/storage 
stacks understands all that, you should not end up with corrupted data 
just because there's a volative write cache in there.


But the point of this whole testing exercise coming back into vogue 
again is that SSDs have returned this negligent behavior to the 
mainstream again.  See 
http://opensolaris.org/jive/thread.jspa?threadID=121424 for a discussion 
of this in a ZFS context just last month.  There are many documented 
cases of Intel SSDs that will fake a cache flush, such that the only way 
to get good reliable writes is to totally disable their writes 
caches--at which point performance is so bad you might as well have 
gotten a RAID10 setup instead (and longevity is toast too).


This whole area remains a disaster area and extreme distrust of all the 
SSD storage vendors is advisable at this point.  Basically, if I don't 
see the capacitor responsible for flushing outstanding writes, and get a 
clear description from the manufacturer how the cached writes are going 
to be handled in the event of a power failure, at this point I have to 
assume the answer is badly and your data will be eaten.  And the 
prices for SSDs that meet that requirement are still quite steep.  I 
keep hoping somebody will address this market at something lower than 
the standard enterprise prices.  The upcoming SandForce designs seem 
to have thought this through correctly:  
http://www.anandtech.com/storage/showdoc.aspx?i=3702p=6  But the 
product's not out to the general public yet (just like the Seagate units 
that claim to have capacitor backups--I heard a rumor those are also 
Sandforce designs actually, so they may be the only ones doing this 
right and aiming at a lower price).


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us



Re: [PERFORM] SSD + RAID

2010-02-21 Thread Arjen van der Meijden

On 22-2-2010 6:39 Greg Smith wrote:

But the point of this whole testing exercise coming back into vogue
again is that SSDs have returned this negligent behavior to the
mainstream again. See
http://opensolaris.org/jive/thread.jspa?threadID=121424 for a discussion
of this in a ZFS context just last month. There are many documented
cases of Intel SSDs that will fake a cache flush, such that the only way
to get good reliable writes is to totally disable their writes
caches--at which point performance is so bad you might as well have
gotten a RAID10 setup instead (and longevity is toast too).


That's weird. Intel's SSD's didn't have a write cache afaik:
I asked Intel about this and it turns out that the DRAM on the Intel 
drive isn't used for user data because of the risk of data loss, instead 
it is used as memory by the Intel SATA/flash controller for deciding 
exactly where to write data (I'm assuming for the wear 
leveling/reliability algorithms).

http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3403p=10

But that is the old version, perhaps the second generation does have a 
bit of write caching.


I can understand a SSD might do unexpected things when it loses power 
all of a sudden. It will probably try to group writes to fill a single 
block (and those blocks vary in size but are normally way larger than 
those of a normal spinning disk, they are values like 256 or 512KB) and 
it might loose that waiting until a full block can be written-data or 
perhaps it just couldn't complete a full block-write due to the power 
failure.
Although that behavior isn't really what you want, it would be incorrect 
to blame write caching for the behavior if the device doesn't even have 
a write cache ;)


Best regards,

Arjen


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance