Re: [PERFORM] Contemplating SSD Hardware RAID

2011-06-22 Thread Shaun Thomas

On 06/21/2011 05:17 PM, Greg Smith wrote:


If they just do the same style of write cache and reliability rework
to the enterprise line, but using better flash, I agree that the
first really serious yet affordable product for the database market
may finally come out of that.


After we started our research in this area and finally settled on 
FusionIO PCI cards (which survived several controlled and uncontrolled 
failures completely intact), a consultant tried telling us he could 
build us a cage of SSDs for much cheaper, and with better performance.


Once I'd stopped laughing, I quickly shooed him away. One of the reasons 
the PCI cards do so well is that they operate in a directly 
memory-addressable manner, and always include capacitors. You lose some 
overhead due to the CPU running the driver, and you can't boot off of 
them, but they're leagues ahead in terms of safety.


But like you said, they're certainly not what most people would call 
affordable. 640GB for two orders of magnitude more than an equivalent 
hard drive would cost? Ouch. Most companies are familiar---and hence 
comfortable---with RAIDs of various flavors, so they see SSD performance 
numbers and think to themselves What if that were in a RAID? Right 
now, drives aren't quite there yet, or the ones that are cost more than 
most want to spend.


It's a shame, really. But I'm willing to wait it out for now.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Contemplating SSD Hardware RAID

2011-06-21 Thread Greg Smith

On 06/20/2011 11:54 PM, Dan Harris wrote:
I understand that the majority of consumer grade SSD drives lack the 
required capacitor to complete a write on a sudden power loss.  But, 
what about pairing up with a hardware controller with BBU write 
cache?  Can the write cache be disabled at the drive and result in a 
safe setup?


Sometimes, but not always, and you'll be playing a risky and 
unpredictable game to try it.  See 
http://wiki.postgresql.org/wiki/Reliable_Writes for some anecdotes.  And 
even if the reliability works out, you'll kill the expected longevity 
and performance of the drive.


I'm exploring the combination of an Areca 1880ix-12 controller with 6x 
OCZ Vertex 3 V3LT-25SAT3 2.5 240GB SATA III drives in RAID-10.  Has 
anyone tried this combination?  What nasty surprise am I overlooking 
here?


You can expect database corruption the first time something unexpected 
interrupts the power to the server.  That's nasty, but it's not 
surprising--that's well documented as what happens when you run 
PostreSQL on hardware with this feature set.  You have to get a Vertex 3 
Pro to get one of the reliable 3rd gen designs from them with a 
supercap.  (I don't think those are even out yet though)  We've had 
reports here of the earlier Vertex 2 Pro being fully stress tested and 
working out well.  I wouldn't even bother with a regular Vertex 3, 
because I don't see any reason to believe it could be reliable for 
database use, just like the Vertex 2 failed to work in that role.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Contemplating SSD Hardware RAID

2011-06-21 Thread Yeb Havinga

On 2011-06-21 08:33, Greg Smith wrote:

On 06/20/2011 11:54 PM, Dan Harris wrote:

I'm exploring the combination of an Areca 1880ix-12 controller with 
6x OCZ Vertex 3 V3LT-25SAT3 2.5 240GB SATA III drives in RAID-10.  
Has anyone tried this combination?  What nasty surprise am I 
overlooking here?


You can expect database corruption the first time something unexpected 
interrupts the power to the server.  That's nasty, but it's not 
surprising--that's well documented as what happens when you run 
PostreSQL on hardware with this feature set.  You have to get a Vertex 
3 Pro to get one of the reliable 3rd gen designs from them with a 
supercap.  (I don't think those are even out yet though)  We've had 
reports here of the earlier Vertex 2 Pro being fully stress tested and 
working out well.  I wouldn't even bother with a regular Vertex 3, 
because I don't see any reason to believe it could be reliable for 
database use, just like the Vertex 2 failed to work in that role.




I've tested both the Vertex 2, Vertex 2 Pro and Vertex 3. The vertex 3 
pro is not yet available. The vertex 3 I tested with pgbench didn't 
outperform the vertex 2 (yes, it was attached to a SATA III port). Also, 
the vertex 3 didn't work in my designated system until a firmware 
upgrade that came available ~2.5 months after I purchased it. The 
support call I had with OCZ failed to mention it, and by pure 
coincidende when I did some more testing at a later time, I ran the 
firmware upgrade tool (that kind of hides which firmwares are available, 
if any) and it did an update, after that it was compatible with the 
designated motherboard.


Another disappointment was that after I had purchased the Vertex 3 
drive, OCZ announced a max-iops vertex 3. Did that actually mean I 
bought an inferior version? Talking about a bad out-of-the-box 
experience. -1 ocz fan boy.


When putting such a SSD up for database use I'd only consider a vertex 2 
pro (for the supercap), paired with another SSD of a different brand 
with supercap (i.e. the recent intels). When this is done on a 
motherboard with  1 sata controller, you'd have controller redundancy 
and can also survive single drive failures when a drive wears out. 
Having two different SSD versions decreases the chance of both wearing 
out the same time, and make you a bit more resilient against firmware 
bugs. It would be great if there was yet another supercapped SSD brand, 
with a modified md software raid that reads all three drives at once and 
compares results, instead of the occasional check. If at least two 
drives agree on the contents, return the data.


--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Contemplating SSD Hardware RAID

2011-06-21 Thread Yeb Havinga

On 2011-06-21 09:51, Yeb Havinga wrote:

On 2011-06-21 08:33, Greg Smith wrote:

On 06/20/2011 11:54 PM, Dan Harris wrote:

I'm exploring the combination of an Areca 1880ix-12 controller with 
6x OCZ Vertex 3 V3LT-25SAT3 2.5 240GB SATA III drives in RAID-10.  
Has anyone tried this combination?  What nasty surprise am I 
overlooking here?


I forgot to mention that with an SSD it's important to watch the 
remaining lifetime. These values can be read with smartctl. When putting 
the disk behind a hardware raid controller, you might not be able to 
read them from the OS, and the hardware RAID firmware might be to old to 
not know about the SSD lifetime indicator or not even show it.


--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Contemplating SSD Hardware RAID

2011-06-21 Thread Florian Weimer
* Yeb Havinga:

 I forgot to mention that with an SSD it's important to watch the
 remaining lifetime. These values can be read with smartctl. When
 putting the disk behind a hardware raid controller, you might not be
 able to read them from the OS, and the hardware RAID firmware might be
 to old to not know about the SSD lifetime indicator or not even show
 it.

3ware controllers offer SMART pass-through, and smartctl supports it.
I'm sure there's something similar for Areca controllers.

-- 
Florian Weimerfwei...@bfk.de
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Contemplating SSD Hardware RAID

2011-06-21 Thread Greg Smith

On 06/21/2011 07:19 AM, Florian Weimer wrote:

3ware controllers offer SMART pass-through, and smartctl supports it.
I'm sure there's something similar for Areca controllers.
   


Depends on the model, drives, and how you access the management 
interface.  For both manufacturers actually.  Check out 
http://notemagnet.blogspot.com/2008/08/linux-disk-failures-areca-is-not-so.html 
for example.  There I talk about problems with a specific Areca 
controller, as well as noting in a comment at the end that there are 
limitations with 3ware supporting not supporting SMART reports against 
SAS drives.


Part of the whole evaluation chain for new server hardware, especially 
for SSD, needs to be a look at what SMART data you can get.  Yeb, I'd be 
curious to get more details about what you've been seeing here if you 
can share it.  You have more different models around than I have access 
to, especially the OCZ ones which I can't get my clients to consider 
still.  (Their concerns about compatibility and support from a 
relatively small vendor are not completely unfounded)


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Contemplating SSD Hardware RAID

2011-06-21 Thread Anton Rommerskirchen
Am Dienstag, 21. Juni 2011 05:54:26 schrieb Dan Harris:
 I'm looking for advice from the I/O gurus who have been in the SSD game
 for a while now.

 I understand that the majority of consumer grade SSD drives lack the
 required capacitor to complete a write on a sudden power loss.  But,
 what about pairing up with a hardware controller with BBU write cache?
 Can the write cache be disabled at the drive and result in a safe setup?

 I'm exploring the combination of an Areca 1880ix-12 controller with 6x
 OCZ Vertex 3 V3LT-25SAT3 2.5 240GB SATA III drives in RAID-10.  Has
 anyone tried this combination?  What nasty surprise am I overlooking here?

 Thanks
 -Dan

Wont work.

period.

long story: the loss of the write in the ssd cache is substantial. 

You will loss perhaps the whole system.

I have tested since 2006 ssd - adtron 2GB for 1200 Euro at first ... 

i can only advice to use a enterprise ready ssd. 

candidates: intel new series , sandforce pro discs.

i tried to submit a call at apc to construct a device thats similar to a 
buffered drive frame (a capacitor holds up the 5 V since cache is written 
back) , but they have not answered. so no luck in using mainstream ssd for 
the job. 

loss of the cache - or for mainstream sandforce the connection - will result 
in loss of changed frames (i.e 16 Mbytes of data per frame) in ssd.

if this is the root of your filesystem - forget the disk.

btw.: since 2 years i have tested 16 discs for speed only. i sell the disc 
after the test. i got 6 returns for failure within those 2 years - its really 
happening to the mainstream discs.
 
-- 
Mit freundlichen Grüssen
Anton Rommerskirchen

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Contemplating SSD Hardware RAID

2011-06-21 Thread Yeb Havinga

On 2011-06-21 17:11, Greg Smith wrote:

On 06/21/2011 07:19 AM, Florian Weimer wrote:

3ware controllers offer SMART pass-through, and smartctl supports it.
I'm sure there's something similar for Areca controllers.


Depends on the model, drives, and how you access the management 
interface.  For both manufacturers actually.  Check out 
http://notemagnet.blogspot.com/2008/08/linux-disk-failures-areca-is-not-so.html 
for example.  There I talk about problems with a specific Areca 
controller, as well as noting in a comment at the end that there are 
limitations with 3ware supporting not supporting SMART reports against 
SAS drives.


Part of the whole evaluation chain for new server hardware, especially 
for SSD, needs to be a look at what SMART data you can get.  Yeb, I'd 
be curious to get more details about what you've been seeing here if 
you can share it.  You have more different models around than I have 
access to, especially the OCZ ones which I can't get my clients to 
consider still.  (Their concerns about compatibility and support from 
a relatively small vendor are not completely unfounded)




This is what a windows OCZ tool explains about the different smart 
values (excuse for no mark up) for a Vertex 2 Pro.


SMART READ DATA
Revision: 10
Attributes List
  1: SSD Raw Read Error RateNormalized Rate: 120 
total ECC and RAISE errors
  5: SSD Retired Block CountReserve blocks 
remaining: 100%

  9: SSD Power-On Hours Total hours power on: 451
 12: SSD Power Cycle Count  Count of power on/off 
cycles: 61

 13: SSD Soft Read Error Rate   Normalized Rate: 120
100: SSD GBytes Erased  Flash memory erases 
across the entire drive: 128 GB
170: SSD Number of Remaining Spares Number of reserve Flash 
memory blocks: 17417
171: SSD Program Fail Count Total number of Flash 
program operation failures: 0
172: SSD Erase Fail Count   Total number of Flash 
erase operation failures: 0
174: SSD Unexpected power loss countTotal number of 
unexpected power loss: 13
177: SSD Wear Range Delta   Delta between most-worn 
and least-worn Flash blocks: 0
181: SSD Program Fail Count Total number of Flash 
program operation failures: 0
182: SSD Erase Fail Count   Total number of Flash 
erase operation failures: 0
184: SSD End to End Error Detection I/O errors detected 
during reads from flash memory: 0
187: SSD Reported Uncorrectable Errors  Uncorrectable RAISE 
errors reported to the host for all data access: 0
194: SSD Temperature Monitoring Current: 26  High: 37 
Low: 0

195: SSD ECC On-the-fly Count   Normalized Rate: 120
196: SSD Reallocation Event Count   Total number of 
reallocated Flash blocks: 0
198: SSD Uncorrectable Sector Count Total number of 
uncorrectable errors when reading/writing a sector: 0
199: SSD SATA R-Errors Error Count  Current SATA RError 
count: 0

201: SSD Uncorrectable Soft Read Error Rate Normalized Rate: 120
204: SSD Soft ECC Correction Rate (RAISE)   Normalized Rate: 120
230: SSD Life Curve Status  Current state of drive 
operation based upon the Life Curve: 100
231: SSD Life Left  Approximate SDD life 
Remaining: 99%
232: SSD Available Reserved Space   Amount of Flash memory 
space in reserve (GB): 17
235: SSD Supercap HealthCondition of an 
external SuperCapacitor Health in mSec: 0
241: SSD Lifetime writes from host  Number of bytes written 
to SSD: 448 GB
242: SSD Lifetime reads from host   Number of bytes read 
from SSD: 192 GB


Same tool for a Vertex 3 (not pro)

SMART READ DATA
Revision: 10
Attributes List
  1: SSD Raw Read Error RateNormalized Rate: 120 
total ECC and RAISE errors
  5: SSD Retired Block CountReserve blocks 
remaining: 100%

  9: SSD Power-On Hours Total hours power on: 7
 12: SSD Power Cycle Count  Count of power on/off 
cycles: 13
171: SSD Program Fail Count Total number of Flash 
program operation failures: 0
172: SSD Erase Fail Count   Total number of Flash 
erase operation failures: 0
174: SSD Unexpected power loss countTotal number of 
unexpected power loss: 10
177: SSD Wear Range Delta   Delta between most-worn 
and least-worn Flash blocks: 0
181: SSD Program Fail Count Total number of Flash 
program operation failures: 0
182: SSD Erase Fail Count   Total number of Flash 
erase operation failures: 0
187: SSD Reported Uncorrectable Errors  Uncorrectable RAISE 
errors reported to the host for all data 

Re: [PERFORM] Contemplating SSD Hardware RAID

2011-06-21 Thread Yeb Havinga

On 2011-06-21 22:10, Yeb Havinga wrote:



There's some info burried in 
http://archives.postgresql.org/pgsql-performance/2011-03/msg00350.php 
where two Vertex 2 pro's are compared; the first has been really 
hammered with pgbench, the second had a few months duty in a 
workstation. The raw value of SSD Available Reserved Space seems to be 
a good candidate to watch to go to 0, since the pgbenched-drive has 
16GB left and the workstation disk 17GB. Would be cool to graph with 
e.g. symon (http://i.imgur.com/T4NAq.png)




I forgot to mention that both newest firmware of the drives as well as 
svn versions of smartmontools are advisable, before figuring out what 
all those strange values mean. It's too bad however that OCZ doesn't let 
the user choose which firmware to run (the tool always picks the 
newest), so after every upgrade it'll be a surprise what values are 
supported or if any of the values are reset or differently interpreted. 
Even when disks in production might not be upgraded eagerly, replacing a 
faulty drive means that one probably needs to be upgraded first and it 
would be nice to have a uniform smart value readout for the monitoring 
tools.


--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Contemplating SSD Hardware RAID

2011-06-21 Thread Scott Marlowe
On Tue, Jun 21, 2011 at 2:25 PM, Yeb Havinga yebhavi...@gmail.com wrote:

 strange values mean. It's too bad however that OCZ doesn't let the user
 choose which firmware to run (the tool always picks the newest), so after
 every upgrade it'll be a surprise what values are supported or if any of the

That right there pretty much eliminates them from consideration for
enterprise applications.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Contemplating SSD Hardware RAID

2011-06-21 Thread Merlin Moncure
On Tue, Jun 21, 2011 at 3:32 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Tue, Jun 21, 2011 at 2:25 PM, Yeb Havinga yebhavi...@gmail.com wrote:

 strange values mean. It's too bad however that OCZ doesn't let the user
 choose which firmware to run (the tool always picks the newest), so after
 every upgrade it'll be a surprise what values are supported or if any of the

 That right there pretty much eliminates them from consideration for
 enterprise applications.

As much as I've been irritated with Intel for being intentionally
oblique on the write caching issue -- I think they remain more or less
the only game in town for enterprise use.  The x25-e has been the only
drive up until recently to seriously consider for write heavy
applications (and Greg is pretty skeptical about that even).   I have
directly observed vertex pro drives burning out in ~ 18 months in
constant duty applications (which if you did the math is about right
on schedule) -- not good enough IMO.

ISTM Intel is clearly positioning the 710 Lyndonville as the main
drive in database environments to go with for most cases.   At 3300
IOPS (see 
http://www.anandtech.com/show/4452/intel-710-and-720-ssd-specifications)
and some tinkering that results in 65 times greater longevity than
standard MLC, I expect the drive will be a huge hit as long as can
sustain those numbers writing durably and it comes it at under the
10$/gb price point.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Contemplating SSD Hardware RAID

2011-06-21 Thread Greg Smith

On 06/21/2011 05:35 PM, Merlin Moncure wrote:

On Tue, Jun 21, 2011 at 3:32 PM, Scott Marlowescott.marl...@gmail.com  wrote:
   

On Tue, Jun 21, 2011 at 2:25 PM, Yeb Havingayebhavi...@gmail.com  wrote:

 

It's too bad however that OCZ doesn't let the user
choose which firmware to run (the tool always picks the newest), so after
every upgrade it'll be a surprise what values are supported or if any of the
   

That right there pretty much eliminates them from consideration for
enterprise applications.
 

As much as I've been irritated with Intel for being intentionally
oblique on the write caching issue -- I think they remain more or less
the only game in town for enterprise use.


That's at the core of why I have been so consistently cranky about 
them.  The sort of customers I deal with who are willing to spend money 
on banks of SSD will buy Intel, and the Enterprise feature set seems 
completely enough that it doesn't set off any alarms to them.  The same 
is not true of OCZ, which unfortunately means I never even get them onto 
the vendor grid in the first place.  Everybody runs out to buy the Intel 
units instead, they get burned by the write cache issues, lose data, and 
sometimes they even blame PostgreSQL for it.


I have a customer who has around 50 X25-E drives, a little stack of them 
in six servers running two similar databases.  They each run about a 
terabyte, and refill about every four months (old data eventually ages 
out, replaced by new).  At the point I started working with them, they 
had lost the entire recent history twice--terabyte gone, 
whoosh!--because the power reliability is poor in their area.  And 
network connectivity is bad enough that they can't ship this volume of 
updates to elsewhere either.


It happened again last month, and for the first time the database was 
recoverable.  I converted one server to be a cold spare, just archive 
the WAL files.  And that's the only one that lived through the nasty 
power spike+outage that corrupted the active databases on both the 
master and the warm standby of each set.  All four of the servers where 
PostgreSQL was writing data and expected proper fsync guarantees, all 
gone from one power issue.  At the point I got involved, they were about 
to cancel this entire PostgreSQL experiment because they assumed the 
database had to be garbage that this kept happening; until I told them 
about this known issue they never considered the drives were the 
problem.  That's what I think of when people ask me about the Intel X25-E.


I've very happy with the little 3rd generation consumer grade SSD I 
bought from Intel though (320 series).  If they just do the same style 
of write cache and reliability rework to the enterprise line, but using 
better flash, I agree that the first really serious yet affordable 
product for the database market may finally come out of that.  We're 
just not there yet, and unfortunately for the person who started this 
round of discussion throwing hardware RAID at the problem doesn't make 
this go away either.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Contemplating SSD Hardware RAID

2011-06-20 Thread Dan Harris
I'm looking for advice from the I/O gurus who have been in the SSD game 
for a while now.


I understand that the majority of consumer grade SSD drives lack the 
required capacitor to complete a write on a sudden power loss.  But, 
what about pairing up with a hardware controller with BBU write cache?  
Can the write cache be disabled at the drive and result in a safe setup?


I'm exploring the combination of an Areca 1880ix-12 controller with 6x 
OCZ Vertex 3 V3LT-25SAT3 2.5 240GB SATA III drives in RAID-10.  Has 
anyone tried this combination?  What nasty surprise am I overlooking here?


Thanks
-Dan

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance