Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-24 Thread Greg Smith

On 5/22/13 2:45 PM, Shaun Thomas wrote:

That read rate and that throughput suggest 8k reads. The queue size is
270+, which is pretty high for a single device, even when it's an SSD.
Some SSDs seem to break down on queue sizes over 4, and 15 sectors
spread across a read queue of 270 is pretty hash. The drive tested here
basically fell over on servicing a huge diverse read queue, which
suggests a firmware issue.


That's basically it.  I don't know that I'd put the blame specifically 
onto a firmware issue without further evidence that's the case though. 
The last time I chased down a SSD performance issue like this it ended 
up being a Linux scheduler bug.  One thing I plan to do for future SSD 
tests is to try and replicate this issue better, starting by increasing 
the number of clients to at least 300.


Related:  if anyone read my Seeking PostgreSQL talk last year, some of 
my Intel 320 results there were understating the drive's worst-case 
performance due to a testing setup error.  I have a blog entry talking 
about what was wrong and how it slipped past me at 
http://highperfpostgres.com/2013/05/seeking-revisited-intel-320-series-and-ncq/


With that loose end sorted, I'll be kicking off a brand new round of SSD 
tests on a 24 core server here soon.  All those will appear on my blog. 
 The 320 drive is returning as the bang for buck champ, along with a DC 
S3700 and a Seagate 1TB Hybrid drive with NAND durable write cache.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-23 Thread Andrea Suisani

On 05/22/2013 03:30 PM, Merlin Moncure wrote:

On Tue, May 21, 2013 at 7:19 PM, Greg Smith g...@2ndquadrant.com wrote:

On 5/20/13 6:32 PM, Merlin Moncure wrote:


[cut]


The only really huge gain to be had using SSD is commit rate at a low client
count.  There you can easily do 5,000/second instead of a spinning disk that
is closer to 100, for less than what the battery-backed RAID card along
costs to speed up mechanical drives.  My test server's 100GB DC S3700 was
$250.  That's still not two orders of magnitude faster though.


That's most certainly *not* the only gain to be had: random read rates
of large databases (a very important metric for data analysis) can
easily hit 20k tps.  So I'll stand by the figure. Another point: that
5000k commit raid is sustained, whereas a raid card will spectacularly
degrade until the cache overflows; it's not fair to compare burst with
sustained performance.  To hit 5000k sustained commit rate along with
good random read performance, you'd need a very expensive storage
system.   Right now I'm working (not by choice) with a teir-1 storage
system (let's just say it rhymes with 'weefax') and I would trade it
for direct attached SSD in a heartbeat.

Also, note that 3rd party benchmarking is showing the 3700 completely
smoking the 710 in database workloads (for example, see
http://www.anandtech.com/show/6433/intel-ssd-dc-s3700-200gb-review/6).


[cut]

Sorry for interrupting but on a related note I would like to know your
opinions on what the anandtech review said about 3700 poor performance
on Oracle Swingbench, quoting the relevant part that you can find here (*)

quote

[..] There are two components to the Swingbench test we're running here:
the database itself, and the redo log. The redo log stores all changes that
are made to the database, which allows the database to be reconstructed in
the event of a failure. In good DB design, these two would exist on separate
storage systems, but in order to increase IO we combined them both for this 
test.
Accesses to the DB end up being 8KB and random in nature, a definite strong suit
of the S3700 as we've already shown. The redo log however consists of a bunch
of 1KB - 1.5KB, QD1, sequential accesses. The S3700, like many of the newer
controllers we've tested, isn't optimized for low queue depth, sub-4KB, 
sequential
workloads like this. [..]

/quote

Does this kind of scenario apply to postgresql wal files repo ?

Thanks
andrea


(*) http://www.anandtech.com/show/6433/intel-ssd-dc-s3700-200gb-review/5


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-23 Thread Merlin Moncure
On Thu, May 23, 2013 at 1:56 AM, Andrea Suisani sick...@opinioni.net wrote:
 On 05/22/2013 03:30 PM, Merlin Moncure wrote:

 On Tue, May 21, 2013 at 7:19 PM, Greg Smith g...@2ndquadrant.com wrote:

 On 5/20/13 6:32 PM, Merlin Moncure wrote:


 [cut]


 The only really huge gain to be had using SSD is commit rate at a low
 client
 count.  There you can easily do 5,000/second instead of a spinning disk
 that
 is closer to 100, for less than what the battery-backed RAID card along
 costs to speed up mechanical drives.  My test server's 100GB DC S3700 was
 $250.  That's still not two orders of magnitude faster though.


 That's most certainly *not* the only gain to be had: random read rates
 of large databases (a very important metric for data analysis) can
 easily hit 20k tps.  So I'll stand by the figure. Another point: that
 5000k commit raid is sustained, whereas a raid card will spectacularly
 degrade until the cache overflows; it's not fair to compare burst with
 sustained performance.  To hit 5000k sustained commit rate along with
 good random read performance, you'd need a very expensive storage
 system.   Right now I'm working (not by choice) with a teir-1 storage
 system (let's just say it rhymes with 'weefax') and I would trade it
 for direct attached SSD in a heartbeat.

 Also, note that 3rd party benchmarking is showing the 3700 completely
 smoking the 710 in database workloads (for example, see
 http://www.anandtech.com/show/6433/intel-ssd-dc-s3700-200gb-review/6).


 [cut]

 Sorry for interrupting but on a related note I would like to know your
 opinions on what the anandtech review said about 3700 poor performance
 on Oracle Swingbench, quoting the relevant part that you can find here (*)

 quote

 [..] There are two components to the Swingbench test we're running here:
 the database itself, and the redo log. The redo log stores all changes that
 are made to the database, which allows the database to be reconstructed in
 the event of a failure. In good DB design, these two would exist on separate
 storage systems, but in order to increase IO we combined them both for this
 test.
 Accesses to the DB end up being 8KB and random in nature, a definite strong
 suit
 of the S3700 as we've already shown. The redo log however consists of a
 bunch
 of 1KB - 1.5KB, QD1, sequential accesses. The S3700, like many of the newer
 controllers we've tested, isn't optimized for low queue depth, sub-4KB,
 sequential
 workloads like this. [..]

 /quote

 Does this kind of scenario apply to postgresql wal files repo ?

huh -- I don't think so.  wal file segments are 8kb aligned, ditto
clog, etc.  In XLogWrite():

  /* OK to write the page(s) */
  from = XLogCtl-pages + startidx * (Size) XLOG_BLCKSZ;
  nbytes = npages * (Size) XLOG_BLCKSZ;  --
  errno = 0;
  if (write(openLogFile, from, nbytes) != nbytes)
  {

AFICT, that's the only way you write out xlog.  One thing I would
definitely advise though is to disable partial page writes if it's
enabled.   s3700 is algined on 8kb blocks internally -- hm.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-23 Thread Andrea Suisani

On 05/23/2013 03:47 PM, Merlin Moncure wrote:

[cut]


quote

[..] There are two components to the Swingbench test we're running here:
the database itself, and the redo log. The redo log stores all changes that
are made to the database, which allows the database to be reconstructed in
the event of a failure. In good DB design, these two would exist on separate
storage systems, but in order to increase IO we combined them both for this
test.
Accesses to the DB end up being 8KB and random in nature, a definite strong
suit
of the S3700 as we've already shown. The redo log however consists of a
bunch
of 1KB - 1.5KB, QD1, sequential accesses. The S3700, like many of the newer
controllers we've tested, isn't optimized for low queue depth, sub-4KB,
sequential
workloads like this. [..]

/quote

Does this kind of scenario apply to postgresql wal files repo ?


huh -- I don't think so.  wal file segments are 8kb aligned, ditto
clog, etc.  In XLogWrite():

   /* OK to write the page(s) */
   from = XLogCtl-pages + startidx * (Size) XLOG_BLCKSZ;
   nbytes = npages * (Size) XLOG_BLCKSZ;  --
   errno = 0;
   if (write(openLogFile, from, nbytes) != nbytes)
   {

AFICT, that's the only way you write out xlog.  One thing I would
definitely advise though is to disable partial page writes if it's
enabled.   s3700 is algined on 8kb blocks internally -- hm.


many thanks merlin for both the explanation and the good advice :)

andrea




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Merlin Moncure
On Tue, May 21, 2013 at 7:19 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 5/20/13 6:32 PM, Merlin Moncure wrote:

 When it comes to databases, particularly in the open source postgres
 world, hard drives are completely obsolete.  SSD are a couple of
 orders of magnitude faster and this (while still slow in computer
 terms) is fast enough to put storage into the modern area by anyone
 who is smart enough to connect a sata cable.


 You're skirting the edge of vendor Kool-Aid here.  I'm working on a very
 detailed benchmark vs. real world piece centered on Intel's 710 models, one
 of the few reliable drives on the market.  (Yes, I have a DC S3700 too, just
 not as much data yet)  While in theory these drives will hit two orders of
 magnitude speed improvement, and I have benchmarks where that's the case, in
 practice I've seen them deliver less than 5X better too.  You get one guess
 which I'd consider more likely to happen on a difficult database server
 workload.

 The only really huge gain to be had using SSD is commit rate at a low client
 count.  There you can easily do 5,000/second instead of a spinning disk that
 is closer to 100, for less than what the battery-backed RAID card along
 costs to speed up mechanical drives.  My test server's 100GB DC S3700 was
 $250.  That's still not two orders of magnitude faster though.

That's most certainly *not* the only gain to be had: random read rates
of large databases (a very important metric for data analysis) can
easily hit 20k tps.  So I'll stand by the figure. Another point: that
5000k commit raid is sustained, whereas a raid card will spectacularly
degrade until the cache overflows; it's not fair to compare burst with
sustained performance.  To hit 5000k sustained commit rate along with
good random read performance, you'd need a very expensive storage
system.   Right now I'm working (not by choice) with a teir-1 storage
system (let's just say it rhymes with 'weefax') and I would trade it
for direct attached SSD in a heartbeat.

Also, note that 3rd party benchmarking is showing the 3700 completely
smoking the 710 in database workloads (for example, see
http://www.anandtech.com/show/6433/intel-ssd-dc-s3700-200gb-review/6).

Anyways, SSD installation in the post-capactior era has been 100.0%
correlated in my experience (admittedly, around a dozen or so systems)
with removal of storage as the primary performance bottleneck, and
I'll stand by that.  I'm not claiming to work with extremely high
transaction rate systems but then again neither are most of the people
reading this list.  Disk drives are obsolete for database
installations.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Greg Smith

On 5/22/13 9:30 AM, Merlin Moncure wrote:

That's most certainly *not* the only gain to be had: random read rates
of large databases (a very important metric for data analysis) can
easily hit 20k tps.  So I'll stand by the figure.


They can easily hit that number.  Or they can do this:

Device: r/sw/s  rMB/s  wMB/s avgrq-sz avgqu-sz  await svctm  %util
sdd 2702.80  19.40  19.67   0.1614.91   273.68  71.74  0.37 100.00
sdd 2707.60  13.00  19.53   0.1014.78   276.61  90.34  0.37 100.00

That's an Intel 710 being crushed by a random read database server 
workload, unable to deliver even 3000 IOPS / 20MB/s.  I have hours of 
data like this from several servers.  Yes, the DC S3700 drives are at 
least twice as fast on average, but I haven't had one for long enough to 
see what its worst case really looks like yet.


Here's a mechanical drive hitting its limits on the same server as the 
above:


Device: r/s w/s  rMB/s  wMB/s avgrq-sz avgqu-sz   await  svctm 
%util
sdb  100.80  220.60   1.06   1.7918.16   228.78  724.11   3.11 
100.00
sdb  119.20  220.40   1.09   1.7717.22   228.36  677.46   2.94 
100.00


Giving around 3MB/s.  I am quite happy saying the SSD is delivering 
about a single order of magnitude improvement, in both throughput and 
latency.  But that's it, and a single order of magnitude improvement is 
sometimes not good enough to solve all storage issues.


If all you care about is speed, the main situation where I've found 
there to still be value in tier 1 storage are extremely write-heavy 
workloads.  The best write numbers I've seen out of Postgres are still 
going into a monster EMC unit, simply because the unit I was working 
with had 16GB of durable cache.  Yes, that only supports burst speeds, 
but 16GB absorbs a whole lot of writes before it fills.  Write 
re-ordering and combining can accelerate traditional disk quite a bit 
when it's across a really large horizon like that.



Anyways, SSD installation in the post-capacitor era has been 100.0%
correlated in my experience (admittedly, around a dozen or so systems)
with removal of storage as the primary performance bottleneck, and
I'll stand by that.


I wish it were that easy for everyone, but that's simply not true.  Are 
there lots of systems where SSD makes storage look almost free it's so 
fast?  Sure.  But presuming all systems will look like that is 
optimistic, and it sets unreasonable expectations.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Merlin Moncure
On Wed, May 22, 2013 at 9:18 AM, Greg Smith g...@2ndquadrant.com wrote:
 On 5/22/13 9:30 AM, Merlin Moncure wrote:

 That's most certainly *not* the only gain to be had: random read rates
 of large databases (a very important metric for data analysis) can
 easily hit 20k tps.  So I'll stand by the figure.


 They can easily hit that number.  Or they can do this:

 Device: r/sw/s  rMB/s  wMB/s avgrq-sz avgqu-sz  await svctm  %util
 sdd 2702.80  19.40  19.67   0.1614.91   273.68  71.74  0.37 100.00
 sdd 2707.60  13.00  19.53   0.1014.78   276.61  90.34  0.37 100.00

yup -- I've seen this too...the high transaction rates quickly fall
over when there is concurrent writing (but for bulk 100% read OLAP
queries I see the higher figure more often than not).   Even so, it's
a huge difference over 100.   unfortunately, I don't have a s3700 to
test with, but based on everything i've seen it looks like it's a
mostly solved problem. (for example, see here:
http://www.storagereview.com/intel_ssd_dc_s3700_series_enterprise_ssd_review).
  Tests that drive the 710 to 3k iops were not able to take the 3700
down under 10k at any queue depth.  Take a good look at the 8k
preconditioning curve latency chart -- everything you need to know is
right there; it's a completely different controller and offers much
better worst case performance.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Greg Smith

On 5/22/13 11:05 AM, Merlin Moncure wrote:

unfortunately, I don't have a s3700 to
test with, but based on everything i've seen it looks like it's a
mostly solved problem. (for example, see here:
http://www.storagereview.com/intel_ssd_dc_s3700_series_enterprise_ssd_review).
   Tests that drive the 710 to 3k iops were not able to take the 3700
down under 10k at any queue depth.


I have two weeks of real-world data from DC S3700 units in production 
and a pile of synthetic test results.  The S3700 drives are at least 2X 
as fast as the 710 models, and there are synthetic tests where it's 
closer to 10X.


On a 5,000 IOPS workload that crushed a pair of 710 units, the new 
drives are only hitting 50% utilization now.  Does that make worst-case 
10K?  Maybe.  I can't just extrapolate from the 50% figures and predict 
the throughput I'll see at 100% though, so I'm still waiting for more 
data before I feel comfortable saying exactly what the worst case looks 
like.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Shaun Thomas

On 05/22/2013 08:30 AM, Merlin Moncure wrote:


I'm not claiming to work with extremely high transaction rate systems
but then again neither are most of the people reading this list.
Disk drives are obsolete for database installations.


Well, you may not be able to make that claim, but I can. While we don't 
use Intel SSDs, our first-gen FusinoIO cards can deliver about 20k 
PostgreSQL TPS of our real-world data right off the device before 
caching effects start boosting the numbers. These days, devices like 
this make our current batch look like rusty old hulks in comparison, so 
the gap is just widening. Hard drives stand no chance at all.


An 8-drive 15k RPM RAID-10 gave us about 1800 TPS back when we switched 
to FusionIO about two years ago. So, while Intel drives themselves may 
not be able to hit sustained 100x speeds over spindles, it's pretty 
clear that that's a firmware or implementation limitation.


The main issue is that the sustained sequence scan speeds are 
generally less than an order of magnitude faster than drives. So as soon 
as you hit something that isn't limited by random IOPS, spindles get a 
chance to catch up. But those situations are few and far between in a 
heavy transactional setting. Having used NVRAM/SSDs, I could never go 
back so long as the budget allows us to procure them.


A data warehouse? Maybe spindles still have a place there. Heavy 
transactional system? Not a chance.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread David Boreham

On 5/22/2013 8:18 AM, Greg Smith wrote:


They can easily hit that number.  Or they can do this:

Device: r/sw/s  rMB/s  wMB/s avgrq-sz avgqu-sz  await svctm  
%util

sdd 2702.80  19.40  19.67   0.1614.91   273.68  71.74 0.37 100.00
sdd 2707.60  13.00  19.53   0.1014.78   276.61  90.34 0.37 100.00

That's an Intel 710 being crushed by a random read database server 
workload, unable to deliver even 3000 IOPS / 20MB/s.  I have hours of 
data like this from several servers.


This is interesting. Do you know what it is about the workload that 
leads to the unusually low rps ?






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Greg Smith

On 5/22/13 12:56 PM, Shaun Thomas wrote:

Well, you may not be able to make that claim, but I can. While we don't
use Intel SSDs, our first-gen FusinoIO cards can deliver about 20k
PostgreSQL TPS of our real-world data right off the device before
caching effects start boosting the numbers.


I've seen FusionIO hit that 20K commit number, as well as hitting 75K 
IOPS on random reads (600MB/s).  They are roughly 5 to 10X faster than 
the Intel 320/710 drives.  There's a corresponding price hit though, and 
having to provision PCI-E cards is a pain in some systems.


A claim that a FusionIO drive in particular is capable of 100X the 
performance of a spinning drive, that I wouldn't dispute.  I even made 
that claim myself with some benchmark numbers to back it up: 
http://www.fusionio.com/blog/fusion-io-boosts-postgresql-performance/ 
That's not just a generic SSD anymore though.



An 8-drive 15k RPM RAID-10 gave us about 1800 TPS back when we switched
to FusionIO about two years ago. So, while Intel drives themselves may
not be able to hit sustained 100x speeds over spindles, it's pretty
clear that that's a firmware or implementation limitation.


1800 TPS to 20K TPS is just over a 10X speedup.

As for Intel vs. FusionIO, rather than implementation quality it's more 
what architecture you're willing to pay for.  If you test a few models 
across Intel's product line, you can see there's a rough size vs. speed 
correlation.  The larger units have more channels of flash going at the 
same time.  FusionIO has architected such that there is a wide write 
path even on their smallest cards.  That 75K IOPS number I got even out 
of their little 80GB card.  (since dropped from the product line)


I can buy a good number of Intel DC S3700 drives for what a FusionIO 
card costs though.



The main issue is that the sustained sequence scan speeds are
generally less than an order of magnitude faster than drives. So as soon
as you hit something that isn't limited by random IOPS, spindles get a
chance to catch up.


I have some moderately fast SSD based transactional systems that are 
still using traditional drives with battery-backed cache for the 
sequential writes of the WAL volume, where the data volume is on Intel 
710 disks.  WAL writes really burn through flash cells, too, so keeping 
them on traditional drives can be cost effective in a few ways.  That 
approach is lucky to hit 10K TPS though, so it can't compete against 
what a PCI-E card like the FusionIO drives are capable of.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Shaun Thomas

On 05/22/2013 12:31 PM, David Boreham wrote:


Device: r/sw/s  rMB/s  wMB/s avgrq-sz avgqu-sz  await svctm %util
sdd 2702.80  19.40  19.67   0.1614.91   273.68  71.74 0.37 100.00
sdd 2707.60  13.00  19.53   0.1014.78   276.61  90.34 0.37 100.00

That's an Intel 710 being crushed by a random read database server
workload, unable to deliver even 3000 IOPS / 20MB/s.  I have hours of
data like this from several servers.


This is interesting. Do you know what it is about the workload that
leads to the unusually low rps ?


That read rate and that throughput suggest 8k reads. The queue size is 
270+, which is pretty high for a single device, even when it's an SSD. 
Some SSDs seem to break down on queue sizes over 4, and 15 sectors 
spread across a read queue of 270 is pretty hash. The drive tested here 
basically fell over on servicing a huge diverse read queue, which 
suggests a firmware issue.


Often this is because the device was optimized for sequential reads and 
post lower IOPS than is theoretically possible so they can advertise 
higher numbers alongside consumer-grade disks. They're Greg's disks 
though. :)


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Joshua D. Drake


On 05/22/2013 11:06 AM, Greg Smith wrote:


I have some moderately fast SSD based transactional systems that are
still using traditional drives with battery-backed cache for the
sequential writes of the WAL volume, where the data volume is on Intel
710 disks.  WAL writes really burn through flash cells, too, so keeping
them on traditional drives can be cost effective in a few ways.  That
approach is lucky to hit 10K TPS though, so it can't compete against
what a PCI-E card like the FusionIO drives are capable of.


Greg, can you elaborate on the SSD + Xlog issue? What type of burn 
through are we talking about?


JD



--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Greg Smith

On 5/22/13 3:06 PM, Joshua D. Drake wrote:

Greg, can you elaborate on the SSD + Xlog issue? What type of burn
through are we talking about?


You're burning through flash cells at a multiple of the total WAL write 
volume.  The system I gave iostat snapshots from upthread (with the 
Intel 710 hitting its limit) archives about 1TB of WAL each week.  The 
actual amount of WAL written in terms of erased flash blocks is even 
higher though, because sometimes the flash is hit with partial page 
writes.  The write amplification of WAL is much worse than the main 
database.


I gave a rough intro to this on the Intel drives at 
http://blog.2ndquadrant.com/intel_ssds_lifetime_and_the_32/ and there's 
a nice Write endurance table at 
http://www.tomshardware.com/reviews/ssd-710-enterprise-x25-e,3038-2.html


The cheapest of the Intel SSDs I have here only guarantees 15TB of total 
write endurance.  Eliminating 1TB of writes per week by moving the WAL 
off SSD is a pretty significant change, even though the burn rate isn't 
a simple linear thing--you won't burn the flash out in only 15 weeks.


The production server is actually using the higher grade 710 drives that 
aim for 900TB instead.  But I do have standby servers using the low 
grade stuff, so anything I can do to decrease SSD burn rate without 
dropping performance is useful.  And only the top tier of transaction 
rates will outrun a RAID1 pair of 15K drives dedicated to WAL.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Merlin Moncure
On Wed, May 22, 2013 at 2:30 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 5/22/13 3:06 PM, Joshua D. Drake wrote:

 Greg, can you elaborate on the SSD + Xlog issue? What type of burn
 through are we talking about?


 You're burning through flash cells at a multiple of the total WAL write
 volume.  The system I gave iostat snapshots from upthread (with the Intel
 710 hitting its limit) archives about 1TB of WAL each week.  The actual
 amount of WAL written in terms of erased flash blocks is even higher though,
 because sometimes the flash is hit with partial page writes.  The write
 amplification of WAL is much worse than the main database.

 I gave a rough intro to this on the Intel drives at
 http://blog.2ndquadrant.com/intel_ssds_lifetime_and_the_32/ and there's a
 nice Write endurance table at
 http://www.tomshardware.com/reviews/ssd-710-enterprise-x25-e,3038-2.html

 The cheapest of the Intel SSDs I have here only guarantees 15TB of total
 write endurance.  Eliminating 1TB of writes per week by moving the WAL off
 SSD is a pretty significant change, even though the burn rate isn't a simple
 linear thing--you won't burn the flash out in only 15 weeks.

Certainly, intel 320 is not designed for 1tb/week workloads.

 The production server is actually using the higher grade 710 drives that aim
 for 900TB instead.  But I do have standby servers using the low grade stuff,
 so anything I can do to decrease SSD burn rate without dropping performance
 is useful.  And only the top tier of transaction rates will outrun a RAID1
 pair of 15K drives dedicated to WAL.

s3700 is rated for 10 drive writes/day for 5 years. so, for 200gb drive, that's
200gb * 10/day * 365 days * 5, that's 3.65 million gigabytes or ~ 3.5 petabytes.

1tb/week would take 67 years to burn through / whatever you assume for
write amplification / whatever extra penalty you give if you are
shooting for  5 year duty cycle (flash degrades faster the older it
is)  *for a single 200gb device*.  write endurance is not a problem
for this drive, in fact it's a very reasonable assumption that the
faster worst case random performance is directly related to reduced
write amplification.  btw,  cost/pb of this drive is less than half of
the 710 (which IMO was obsolete the day the s3700 hit the street).

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Shaun Thomas

On 05/22/2013 02:51 PM, Merlin Moncure wrote:


s3700 is rated for 10 drive writes/day for 5 years. so, for 200gb
drive, that's 200gb * 10/day * 365 days * 5, that's 3.65 million
gigabytes or ~ 3.5 petabytes.


Nice. And on that note:

http://www.tomshardware.com/reviews/ssd-dc-s3700-raid-0-benchmarks,3480.html

They actually over-saturated the backplane with 24 of these drives in a 
giant RAID-0, tipping the scales at around 3.1M IOPS. Not bad for 
consumer-level drives. I'd love to see a RAID-10 of these.


I'm having a hard time coming up with a database workload that would run 
into performance problems with a (relatively inexpensive) setup like this.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Greg Smith

On 5/22/13 3:51 PM, Merlin Moncure wrote:

s3700 is rated for 10 drive writes/day for 5 years. so, for 200gb drive, that's
200gb * 10/day * 365 days * 5, that's 3.65 million gigabytes or ~ 3.5 petabytes.


Yes, they've improved on the 1.5PB that the 710 drives topped out at. 
For that particular drive, this is unlikely to be a problem.  But I'm 
not willing to toss out longevity issues at therefore irrelevant in all 
cases.  Some flash still costs a lot more than Intel's SSDs do, like the 
FusionIO products.  Chop even a few percent of the wear out of the price 
tag on a RAMSAN and you've saved some real money.


And there are some other products with interesting 
price/performance/capacity combinations that are also sensitive to 
wearout.  Seagate's hybrid drives have turned interesting now that they 
cache writes safely for example.  There's no cheaper way to get 1TB with 
flash write speeds for small commits than that drive right now.  (Test 
results on that drive coming soon, along with my full DC S3700 review)



btw,  cost/pb of this drive is less than half of
the 710 (which IMO was obsolete the day the s3700 hit the street).


You bet, and I haven't recommended anyone buy a 710 since the 
announcement.  However, hit the street is still an issue.  No one has 
been able to keep DC S3700 drives in stock very well yet.  It took me 
three tries through Newegg before my S3700 drive actually shipped.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Merlin Moncure
On Wed, May 22, 2013 at 3:06 PM, Greg Smith g...@2ndquadrant.com wrote:
 You bet, and I haven't recommended anyone buy a 710 since the announcement.
 However, hit the street is still an issue.  No one has been able to keep
 DC S3700 drives in stock very well yet.  It took me three tries through
 Newegg before my S3700 drive actually shipped.

Well, let's look a the facts:
*) 2x write endurance vs 710 (500x 320)
*) 2-10x performance depending on workload specifics
*) much better worst case/average latency
*) half the cost of the 710!?

After obsoleting hard drives with the introduction of the 320/710,
intel managed to obsolete their *own* entire lineup with the s3700
(with the exception of the pcie devices and the ultra low cost
notebook 1$/gb segment).  I'm amazed these drives were sold at that
price point: they could have been sold at 3-4x the current price and
still have a willing market (note, please don't do this).  Presumably
most of the inventory is being bought up by small channel resellers
for a quick profit.

Even by the fast moving standards of the SSD world this product is an
absolute game changer and has ushered in the new era of fast storage
with a loud 'gong'. Oh, the major vendors will still keep their
rip-off going on a little longer selling their storage trays, raid
controllers, entry/mid level SANS, SAS HBAs etc at huge markup to
customers who don't need them (some will still need them, but the bar
suddenly just got spectacularly raised before you have to look into
enterprise gear).  CRT was overtaken by LCD monitor in mind 2004 in
terms of sales: I'd say it's late 2002/early 2003, at least for new
deployments.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread CSS
On May 22, 2013, at 4:06 PM, Greg Smith wrote:

 And there are some other products with interesting price/performance/capacity 
 combinations that are also sensitive to wearout.  Seagate's hybrid drives 
 have turned interesting now that they cache writes safely for example.  
 There's no cheaper way to get 1TB with flash write speeds for small commits 
 than that drive right now.  (Test results on that drive coming soon, along 
 with my full DC S3700 review)

I am really looking forward to that.  Will you announce here or just post on 
the 2ndQuadrant blog?

Another hybrid solution is to run ZFS on some decent hard drives and then put 
the ZFS intent log on SSDs.  With very synthetic benchmarks, the random write 
performance is excellent.

All of these discussions about alternate storage media are great - everyone has 
different needs and there are certainly a number of deployments that can get 
away with spending much less money by adding some solid state storage.  
There's really an amazing number of options today…

Thanks,

Charles

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Joshua D. Drake


On 05/22/2013 01:57 PM, Merlin Moncure wrote:


On Wed, May 22, 2013 at 3:06 PM, Greg Smith g...@2ndquadrant.com wrote:

You bet, and I haven't recommended anyone buy a 710 since the announcement.
However, hit the street is still an issue.  No one has been able to keep
DC S3700 drives in stock very well yet.  It took me three tries through
Newegg before my S3700 drive actually shipped.


Well, let's look a the facts:
*) 2x write endurance vs 710 (500x 320)
*) 2-10x performance depending on workload specifics
*) much better worst case/average latency
*) half the cost of the 710!?


I am curious how the 710 or S3700 stacks up against the new M500 from 
Crucial? I know Intel is kind of the goto for these things but the m500 
is power off protected and rated at: Endurance: 72TB total bytes written 
(TBW), equal to 40GB per day for 5 years .


Granted it isn't he fasted pig in the poke but it sure seems like a very 
reasonable drive for the price:


http://www.newegg.com/Product/Product.aspx?Item=20-148-695ParentOnly=1IsVirtualParent=1

Sincerely,

Joshua D. Drake

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Merlin Moncure
On Wed, May 22, 2013 at 5:42 PM, Joshua D. Drake j...@commandprompt.com wrote:
 I am curious how the 710 or S3700 stacks up against the new M500 from
 Crucial? I know Intel is kind of the goto for these things but the m500 is
 power off protected and rated at: Endurance: 72TB total bytes written (TBW),
 equal to 40GB per day for 5 years .

I don't think the m500 is power safe (nor is any drive at the 1$/gb
price point).  This drive is positioned as a desktop class disk drive.
 AFAIK, the s3700 strongly outclasses all competitors on price,
performance, or both.  Once you give up enterprise features of
endurance and iops you have many options (samsung 840 is another one).
 Pretty soon these types of drives are going to be standard kit in
workstations (and we'll be back to the IDE area of corrupted data,
ha!).  I would recommend none of them for server class use, they are
inferior in terms of $/iop and $/gb written.

for server class drives, see:
hitachi ssd400m (10$/gb, slower!)
kingston e100,
etc.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Joshua D. Drake


On 05/22/2013 04:37 PM, Merlin Moncure wrote:


On Wed, May 22, 2013 at 5:42 PM, Joshua D. Drake j...@commandprompt.com wrote:

I am curious how the 710 or S3700 stacks up against the new M500 from
Crucial? I know Intel is kind of the goto for these things but the m500 is
power off protected and rated at: Endurance: 72TB total bytes written (TBW),
equal to 40GB per day for 5 years .


I don't think the m500 is power safe (nor is any drive at the 1$/gb
price point).


According the the data sheet it is power safe.

http://investors.micron.com/releasedetail.cfm?ReleaseID=732650
http://www.micron.com/products/solid-state-storage/client-ssd/m500-ssd

Sincerely,

JD




--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Mark Kirkwood

On 23/05/13 13:01, Joshua D. Drake wrote:


On 05/22/2013 04:37 PM, Merlin Moncure wrote:


On Wed, May 22, 2013 at 5:42 PM, Joshua D. Drake 
j...@commandprompt.com wrote:

I am curious how the 710 or S3700 stacks up against the new M500 from
Crucial? I know Intel is kind of the goto for these things but the 
m500 is
power off protected and rated at: Endurance: 72TB total bytes 
written (TBW),

equal to 40GB per day for 5 years .


I don't think the m500 is power safe (nor is any drive at the 1$/gb
price point).


According the the data sheet it is power safe.

http://investors.micron.com/releasedetail.cfm?ReleaseID=732650
http://www.micron.com/products/solid-state-storage/client-ssd/m500-ssd




Yeah - they apparently have a capacitor on board.

Their write endurance is where they don't compare so favorably to the 
S3700 (they are *much* cheaper mind you):


- M500 120GB drive: 40GB per day for 5 years
- S3700 100GB drive: 1000GB per day for 5 years

But great to see more reasonably priced SSD with power off protection.

Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Greg Smith

On 5/22/13 6:42 PM, Joshua D. Drake wrote:

I am curious how the 710 or S3700 stacks up against the new M500 from
Crucial? I know Intel is kind of the goto for these things but the m500
is power off protected and rated at: Endurance: 72TB total bytes written
(TBW), equal to 40GB per day for 5 years .


The M500 is fine on paper, I had that one on my list of things to 
evaluate when I can.  The general reliability of Crucial's consumer SSD 
has looked good recently.  I'm not going to recommend that one until I 
actually see one work as expected though.  I'm waiting for one to pass 
by or I reach a new toy purchasing spree.


What makes me step very carefully here is watching what Intel went 
through when they released their first supercap drive, the 320 series. 
If you look at the nastiest of the firmware bugs they had, like the 
infamous 8MB bug, a lot of them were related to the new clean shutdown 
feature.  It's the type of firmware that takes some exposure to the real 
world to flush out the bugs.  The last of the enthusiast SSD players who 
tried to take this job on was OCZ with the Vertex 3 Pro, and they never 
got that model quite right before abandoning it altogether.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Greg Smith

On 5/22/13 4:57 PM, Merlin Moncure wrote:

Oh, the major vendors will still keep their
rip-off going on a little longer selling their storage trays, raid
controllers, entry/mid level SANS, SAS HBAs etc at huge markup to
customers who don't need them (some will still need them, but the bar
suddenly just got spectacularly raised before you have to look into
enterprise gear).


The angle to distinguish enterprise hardware is moving on to error 
related capabilities.  Soon we'll see SAS drives with the 520 byte 
sectors and checksumming for example.


And while SATA drives have advanced a long way, they haven't caught up 
with SAS for failure handling.  It's still far too easy for a single 
crazy SATA device to force crippling bus resets for example.  Individual 
SATA ports don't expect to share things with others, while SAS chains 
have a much better protocol for handling things.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Mark Kirkwood

On 23/05/13 13:32, Mark Kirkwood wrote:

On 23/05/13 13:01, Joshua D. Drake wrote:


On 05/22/2013 04:37 PM, Merlin Moncure wrote:


On Wed, May 22, 2013 at 5:42 PM, Joshua D. Drake 
j...@commandprompt.com wrote:

I am curious how the 710 or S3700 stacks up against the new M500 from
Crucial? I know Intel is kind of the goto for these things but the 
m500 is
power off protected and rated at: Endurance: 72TB total bytes 
written (TBW),

equal to 40GB per day for 5 years .


I don't think the m500 is power safe (nor is any drive at the 1$/gb
price point).


According the the data sheet it is power safe.

http://investors.micron.com/releasedetail.cfm?ReleaseID=732650
http://www.micron.com/products/solid-state-storage/client-ssd/m500-ssd




Yeah - they apparently have a capacitor on board.



Make that quite a few capacitors (top right corner):

http://regmedia.co.uk/2013/05/07/m500_4.jpg


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Merlin Moncure
On Wednesday, May 22, 2013, Joshua D. Drake j...@commandprompt.com wrote:

 On 05/22/2013 04:37 PM, Merlin Moncure wrote:

 On Wed, May 22, 2013 at 5:42 PM, Joshua D. Drake j...@commandprompt.com
wrote:

 I am curious how the 710 or S3700 stacks up against the new M500 from
 Crucial? I know Intel is kind of the goto for these things but the m500
is
 power off protected and rated at: Endurance: 72TB total bytes written
(TBW),
 equal to 40GB per day for 5 years .

 I don't think the m500 is power safe (nor is any drive at the 1$/gb
 price point).

 According the the data sheet it is power safe.

 http://investors.micron.com/releasedetail.cfm?ReleaseID=732650
 http://www.micron.com/products/solid-state-storage/client-ssd/m500-ssd

Wow, that seems like a pretty good deal then assuming it works and performs
decently.

merlin


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Greg Smith

On 5/22/13 10:04 PM, Mark Kirkwood wrote:

Make that quite a few capacitors (top right corner):
http://regmedia.co.uk/2013/05/07/m500_4.jpg


There are some more shots and descriptions of the internals in the 
excellent review at 
http://techreport.com/review/24666/crucial-m500-ssd-reviewed


That also highlights the big problem with this drive that's kept me from 
buying one so far:


Unlike rivals Intel and Samsung, Crucial doesn't provide utility 
software with a built-in health indicator. The M500's payload of SMART 
attributes doesn't contain any references to flash wear or bytes 
written, either. Several of the SMART attributes are labeled 
Vendor-specific, but you'll need to guess what they track and read the 
associated values using third-party software.


That's a serious problem for most business use of this sort of drive.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Mark Kirkwood

On 23/05/13 14:22, Greg Smith wrote:

On 5/22/13 10:04 PM, Mark Kirkwood wrote:

Make that quite a few capacitors (top right corner):
http://regmedia.co.uk/2013/05/07/m500_4.jpg


There are some more shots and descriptions of the internals in the 
excellent review at 
http://techreport.com/review/24666/crucial-m500-ssd-reviewed


That also highlights the big problem with this drive that's kept me 
from buying one so far:


Unlike rivals Intel and Samsung, Crucial doesn't provide utility 
software with a built-in health indicator. The M500's payload of SMART 
attributes doesn't contain any references to flash wear or bytes 
written, either. Several of the SMART attributes are labeled 
Vendor-specific, but you'll need to guess what they track and read 
the associated values using third-party software.


That's a serious problem for most business use of this sort of drive.



Agreed - I was thinking the same thing!

Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Mark Kirkwood

On 23/05/13 14:26, Mark Kirkwood wrote:

On 23/05/13 14:22, Greg Smith wrote:

On 5/22/13 10:04 PM, Mark Kirkwood wrote:

Make that quite a few capacitors (top right corner):
http://regmedia.co.uk/2013/05/07/m500_4.jpg


There are some more shots and descriptions of the internals in the 
excellent review at 
http://techreport.com/review/24666/crucial-m500-ssd-reviewed


That also highlights the big problem with this drive that's kept me 
from buying one so far:


Unlike rivals Intel and Samsung, Crucial doesn't provide utility 
software with a built-in health indicator. The M500's payload of 
SMART attributes doesn't contain any references to flash wear or 
bytes written, either. Several of the SMART attributes are labeled 
Vendor-specific, but you'll need to guess what they track and read 
the associated values using third-party software.


That's a serious problem for most business use of this sort of drive.



Agreed - I was thinking the same thing!




Having said that, there does seem to be a wear leveling counter in its 
SMART attributes - but, yes - I'd like to see indicators more similar 
the level of detail that Intel provides.


Cheers

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Joshua D. Drake


On 05/22/2013 07:17 PM, Merlin Moncure wrote:


  According the the data sheet it is power safe.
 
  http://investors.micron.com/releasedetail.cfm?ReleaseID=732650
  http://www.micron.com/products/solid-state-storage/client-ssd/m500-ssd

Wow, that seems like a pretty good deal then assuming it works and
performs decently.


Yeah that was my thinking. Sure it isn't an S3700 but for the money it 
is still faster than the comparable spindle configuration.


JD



merlin




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-21 Thread Greg Smith

On 5/20/13 6:32 PM, Merlin Moncure wrote:


When it comes to databases, particularly in the open source postgres
world, hard drives are completely obsolete.  SSD are a couple of
orders of magnitude faster and this (while still slow in computer
terms) is fast enough to put storage into the modern area by anyone
who is smart enough to connect a sata cable.


You're skirting the edge of vendor Kool-Aid here.  I'm working on a very 
detailed benchmark vs. real world piece centered on Intel's 710 models, 
one of the few reliable drives on the market.  (Yes, I have a DC S3700 
too, just not as much data yet)  While in theory these drives will hit 
two orders of magnitude speed improvement, and I have benchmarks where 
that's the case, in practice I've seen them deliver less than 5X better 
too.  You get one guess which I'd consider more likely to happen on a 
difficult database server workload.


The only really huge gain to be had using SSD is commit rate at a low 
client count.  There you can easily do 5,000/second instead of a 
spinning disk that is closer to 100, for less than what the 
battery-backed RAID card along costs to speed up mechanical drives.  My 
test server's 100GB DC S3700 was $250.  That's still not two orders of 
magnitude faster though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-20 Thread Tomas Vondra
On 20.5.2013 05:00, Greg Smith wrote:
 On 5/16/13 8:06 PM, Tomas Vondra wrote:
 Have you considered using a UPS? That would make the SSDs about as
 reliable as SATA/SAS drives - the UPS may fail, but so may a BBU unit on
 the SAS controller.
 
 That's not true at all.  Any decent RAID controller will have an option
 to stop write-back caching when the battery is bad.  Things will slow
 badly when that happens, but there is zero data risk from a short-term
 BBU failure.  The only serious risk with a good BBU setup are that
 you'll have a power failure lasting so long that the battery runs down
 before the cache can be flushed to disk.

That's true, no doubt about that. What I was trying to say is that a
controller with BBU (or a SSD with proper write cache protection) is
about as safe as an UPS when it comes to power outages. Assuming both
are properly configured / watched / checked.

Sure, there are scenarios where UPS is not going to help (e.g. a PSU
failure) so a controller with BBU is better from this point of view.
I've seen crashes with both options (BBU / UPS), both because of
misconfiguration and hw issues. BTW I don't know what controller are we
talking about here - it might be as crappy as the SSD drives.

What I was thinking about in this case is using two SSD-based systems
with UPSes. That'd allow fast failover (which may not be possible with
the SAS based replica, as it does not handle the load).

But yes, I do agree that the provider should be ashamed for not
providing reliable SSDs in the first place. Getting reliable SSDs should
be the first option - all these suggestions are really just workarounds
of this rather simple issue.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-19 Thread Tomas Vondra
On 17.5.2013 03:34, Mark Kirkwood wrote:
 On 17/05/13 12:06, Tomas Vondra wrote:
 Hi,

 On 16.5.2013 16:46, Cuong Hoang wrote:
 
 Pro for the master server. I'm aware of write cache issue on SSDs in
 case of power loss. However, our hosting provider doesn't offer any
 other choices of SSD drives with supercapacitor. To minimise risk, we
 will also set up another RAID 10 SAS in streaming replication mode. For
 our application, a few seconds of data loss is acceptable.

 Streaming replication allows zero data loss if used in synchronous mode.

 
 I'm not sure synchronous replication is really an option here as it will
 slow the master down to spinning disk io speeds, unless the standby is
 configured with SSDs as well - which probably defeats the purpose of
 this setup.

The master waits for reception of the data, not writing them to the
disks. It will have to write them eventually (and that might cause
issues), but I'm not really sure it's that simple.

 On the other hand, if the system is so loaded that a pure SAS (spinning
 drive) solution can't keen up, then the standby lag may get to be way
 more than a few seconds...which means look out for huge data loss.

Don't forget the slave does not perform all the I/O (searching for the
row etc.). It's difficult to say how much this will save, though.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-19 Thread Tomas Vondra
Do you really need a running standby for fast failover? What about doing
plain WAL archiging? I'd definitely consider that, because even if you
setup a SAS-based replica, you can't use it for production as it does no
handle the load.

I think you could setup WAL archiving and in case of crash just use the
base backup and replay the WAL from the archive.

This means the SAS-based system is purely for WAL archiving, i.e.
performs only sequential writes which should not be a big deal.

The recovery will be performed on the SSD system, which should handle it
fine. If you need faster recovery, you may perform it incrementally on
the SAS system (it will take some time, but it won't influence the
master). You might do that daily or something like that.

The only problem with this is that this is file based, and could mean
lag (up to 16MB or archive_timeout). But this should not be problem if
you place the WAL on SAS drives with controller. If you use RAID, you
should be perfectly fine.

So this is what I'd suggest:

  1) use SSD for data files, SAS RAID1 for WAL on the master
  2) setup WAL archiving (base backup + archive on SAS system)
  3) update the base backup daily (incremental recovery)
  4) in case of crash, keep WAL from the archive and pg_xlog on the
 SAS RAID (on master)


Tomas


On 17.5.2013 02:21, Cuong Hoang wrote:
 Hi Tomas,
 
 We have a lot of small updates and some inserts. The database size is at
 35GB including indexes and TOAST. We think it will keep growing to about
 200GB. We usually have a burst of about 500k writes in about 5-10
 minutes which basically cripples IO on the current servers. I've tried
 to increase the checkpoint_segments, checkpoint_timeout etc. as
 recommended in PostgreSQL 9.0 Performance book. However, it seems like
 our server just couldn't handle the current load.
 
 Here is the server specs:
 
 Dual E5620, 32GB RAM, 4x1TB SAS 15k in RAID10
 
 Here are some core PostgreSQL configs:
 
 shared_buffers = 2GB# min 128kB
 work_mem = 64MB # min 64kB
 maintenance_work_mem = 1GB  # min 1MB
 wal_buffers = 16MB
 checkpoint_segments = 128
 checkpoint_timeout = 30min
 checkpoint_completion_target = 0.7
 
 
 Thanks,
 Cuong
 
 
 On Fri, May 17, 2013 at 10:06 AM, Tomas Vondra t...@fuzzy.cz
 mailto:t...@fuzzy.cz wrote:
 
 Hi,
 
 On 16.5.2013 16:46, Cuong Hoang wrote:
  Hi all,
 
  Our application is heavy write and IO utilisation has been the problem
  for us for a while. We've decided to use RAID 10 of 4x500GB
 Samsung 840
 
 What does heavy write mean in your case? Does that mean a lot of small
 transactions or few large ones?
 
 What have you done to tune the server?
 
  Pro for the master server. I'm aware of write cache issue on SSDs in
  case of power loss. However, our hosting provider doesn't offer any
  other choices of SSD drives with supercapacitor. To minimise risk, we
  will also set up another RAID 10 SAS in streaming replication
 mode. For
  our application, a few seconds of data loss is acceptable.
 
 Streaming replication allows zero data loss if used in synchronous mode.
 
  My question is, would corrupted data files on the primary server
 affect
  the streaming standby? In other word, is this setup acceptable in
 terms
  of minimising deficiency of SSDs?
 
 It should be.
 
 Have you considered using a UPS? That would make the SSDs about as
 reliable as SATA/SAS drives - the UPS may fail, but so may a BBU unit on
 the SAS controller.
 
 Tomas
 
 
 --
 Sent via pgsql-performance mailing list
 (pgsql-performance@postgresql.org
 mailto:pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-19 Thread Cuong Hoang
Thanks for suggestion Tomas. We're about to set up WAL backup to Amazon S3.
I think this should cover all of our bases. At least for the moment,
SAS-based standby seems to keep up with the master because that's its sole
purpose. We're not sending queries to the hot standby. We also consider
switching the hot standby to fast failover as you suggested. I guess for
now we should stick to streaming replication because the slave is still
keeping up with the master.

Btw, after switching to SSD, performance improves vastly. IO utilisation
drops from 100% to 6% in peak periods. That's an order of magnitude faster!

Cheers,
Cuong


On Mon, May 20, 2013 at 8:34 AM, Tomas Vondra t...@fuzzy.cz wrote:

 Do you really need a running standby for fast failover? What about doing
 plain WAL archiging? I'd definitely consider that, because even if you
 setup a SAS-based replica, you can't use it for production as it does no
 handle the load.

 I think you could setup WAL archiving and in case of crash just use the
 base backup and replay the WAL from the archive.

 This means the SAS-based system is purely for WAL archiving, i.e.
 performs only sequential writes which should not be a big deal.

 The recovery will be performed on the SSD system, which should handle it
 fine. If you need faster recovery, you may perform it incrementally on
 the SAS system (it will take some time, but it won't influence the
 master). You might do that daily or something like that.

 The only problem with this is that this is file based, and could mean
 lag (up to 16MB or archive_timeout). But this should not be problem if
 you place the WAL on SAS drives with controller. If you use RAID, you
 should be perfectly fine.

 So this is what I'd suggest:

   1) use SSD for data files, SAS RAID1 for WAL on the master
   2) setup WAL archiving (base backup + archive on SAS system)
   3) update the base backup daily (incremental recovery)
   4) in case of crash, keep WAL from the archive and pg_xlog on the
  SAS RAID (on master)


 Tomas


 On 17.5.2013 02:21, Cuong Hoang wrote:
  Hi Tomas,
 
  We have a lot of small updates and some inserts. The database size is at
  35GB including indexes and TOAST. We think it will keep growing to about
  200GB. We usually have a burst of about 500k writes in about 5-10
  minutes which basically cripples IO on the current servers. I've tried
  to increase the checkpoint_segments, checkpoint_timeout etc. as
  recommended in PostgreSQL 9.0 Performance book. However, it seems like
  our server just couldn't handle the current load.
 
  Here is the server specs:
 
  Dual E5620, 32GB RAM, 4x1TB SAS 15k in RAID10
 
  Here are some core PostgreSQL configs:
 
  shared_buffers = 2GB# min 128kB
  work_mem = 64MB # min 64kB
  maintenance_work_mem = 1GB  # min 1MB
  wal_buffers = 16MB
  checkpoint_segments = 128
  checkpoint_timeout = 30min
  checkpoint_completion_target = 0.7
 
 
  Thanks,
  Cuong
 
 
  On Fri, May 17, 2013 at 10:06 AM, Tomas Vondra t...@fuzzy.cz
  mailto:t...@fuzzy.cz wrote:
 
  Hi,
 
  On 16.5.2013 16:46, Cuong Hoang wrote:
   Hi all,
  
   Our application is heavy write and IO utilisation has been the
 problem
   for us for a while. We've decided to use RAID 10 of 4x500GB
  Samsung 840
 
  What does heavy write mean in your case? Does that mean a lot of
 small
  transactions or few large ones?
 
  What have you done to tune the server?
 
   Pro for the master server. I'm aware of write cache issue on SSDs
 in
   case of power loss. However, our hosting provider doesn't offer any
   other choices of SSD drives with supercapacitor. To minimise risk,
 we
   will also set up another RAID 10 SAS in streaming replication
  mode. For
   our application, a few seconds of data loss is acceptable.
 
  Streaming replication allows zero data loss if used in synchronous
 mode.
 
   My question is, would corrupted data files on the primary server
  affect
   the streaming standby? In other word, is this setup acceptable in
  terms
   of minimising deficiency of SSDs?
 
  It should be.
 
  Have you considered using a UPS? That would make the SSDs about as
  reliable as SATA/SAS drives - the UPS may fail, but so may a BBU
 unit on
  the SAS controller.
 
  Tomas
 
 
  --
  Sent via pgsql-performance mailing list
  (pgsql-performance@postgresql.org
  mailto:pgsql-performance@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance
 
 



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-19 Thread Greg Smith

On 5/16/13 8:06 PM, Tomas Vondra wrote:

Have you considered using a UPS? That would make the SSDs about as
reliable as SATA/SAS drives - the UPS may fail, but so may a BBU unit on
the SAS controller.


That's not true at all.  Any decent RAID controller will have an option 
to stop write-back caching when the battery is bad.  Things will slow 
badly when that happens, but there is zero data risk from a short-term 
BBU failure.  The only serious risk with a good BBU setup are that 
you'll have a power failure lasting so long that the battery runs down 
before the cache can be flushed to disk.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-19 Thread Greg Smith

On 5/16/13 7:52 PM, Cuong Hoang wrote:

The standby host will be disk-based so it
will be less vulnerable to power loss.


If it can keep up with replay from the faster master, that sounds like a 
decent backup.  Make sure you setup all write caches very carefully on 
that system, because it's going to be your best hope to come back up 
quickly after a real crash.


Any vendor that pushes Samsung 840 drives for database use should be 
ashamed of themselves.  Those drives are turning into the new 
incarnation of what we saw with the Intel X25-E/X-25-M:  they're very 
popular, but any system built with them will corrupt itself on the first 
failure.   I expect to see a new spike in people needing data recovery 
help after losing their Samsung 840 based servers start soon.



I forgot to mention that we'll set up Wal-e
https://github.com/wal-e/wal-e to ship base backups and WALs to Amazon
S3 continuous as another safety measure. Again, the lost of a few WALs
would not be a big issue for us.


That's a useful plan.  Just make sure you ship new base backups fairly 
often.  If you have to fall back to that copy of the data, you'll need 
to replay anything that's happened since the last base backup happened. 
 That can easily result in a week of downtime if you're only shipping 
backups once per month, for example.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-17 Thread David Rees
On Thu, May 16, 2013 at 7:46 AM, Cuong Hoang climbingr...@gmail.com wrote:
 For our application, a few seconds of data loss is acceptable.

If a few seconds of data loss is acceptable, I would seriously look at
the synchronous_commit setting and think about turning that off rather
than risk silent corruption with non-enterprise SSDs.

http://www.postgresql.org/docs/9.2/interactive/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT

Unlike fsync, setting this parameter to off does not create any risk
of database inconsistency: an operating system or database crash might
result in some recent allegedly-committed transactions being lost, but
the database state will be just the same as if those transactions had
been aborted cleanly. So, turning synchronous_commit off can be a
useful alternative when performance is more important than exact
certainty about the durability of a transaction.

With a default wal_writer_delay setting of 200ms, you will only be at
risk of losing at most 600ms of transactions in the event of an
unexpected crash or power loss, but write performance should go up a
huge amount, especially if they are a lot of small writes as you
describe.

-Dave


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-17 Thread Merlin Moncure
On Fri, May 17, 2013 at 1:34 AM, David Rees dree...@gmail.com wrote:
 On Thu, May 16, 2013 at 7:46 AM, Cuong Hoang climbingr...@gmail.com wrote:
 For our application, a few seconds of data loss is acceptable.

 If a few seconds of data loss is acceptable, I would seriously look at
 the synchronous_commit setting and think about turning that off rather
 than risk silent corruption with non-enterprise SSDs.

That is not going to help.  Since the drives lie about fsync, upon a
power event you must assume the database is corrupt.  I think his
proposed configuration is the best bet (although I would strongly
consider putting SSD on the standby as well).   Personally, I think
non SSD drives are obsolete for database purposes and will not
recommend them for any configuration.  Ideally though, OP would be
using S3700 and we wouldn't be having this conversation.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-17 Thread Merlin Moncure
On Fri, May 17, 2013 at 8:17 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Fri, May 17, 2013 at 1:34 AM, David Rees dree...@gmail.com wrote:
 On Thu, May 16, 2013 at 7:46 AM, Cuong Hoang climbingr...@gmail.com wrote:
 For our application, a few seconds of data loss is acceptable.

 If a few seconds of data loss is acceptable, I would seriously look at
 the synchronous_commit setting and think about turning that off rather
 than risk silent corruption with non-enterprise SSDs.

 That is not going to help.


whoops -- misread your post heh (you were suggesting to use classic
hard drives).  yeah, that might work but it only buys you so much
particuarly if there is a lot of random activity in the heap.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-16 Thread Merlin Moncure
On Thu, May 16, 2013 at 9:46 AM, Cuong Hoang climbingr...@gmail.com wrote:
 Hi all,

 Our application is heavy write and IO utilisation has been the problem for
 us for a while. We've decided to use RAID 10 of 4x500GB Samsung 840 Pro for
 the master server. I'm aware of write cache issue on SSDs in case of power
 loss. However, our hosting provider doesn't offer any other choices of SSD
 drives with supercapacitor. To minimise risk, we will also set up another
 RAID 10 SAS in streaming replication mode. For our application, a few
 seconds of data loss is acceptable.

 My question is, would corrupted data files on the primary server affect the
 streaming standby? In other word, is this setup acceptable in terms of
 minimising deficiency of SSDs?

Data corruption caused by sudden power event on the master will not
cross over.  Basically with this configuration you must switch over to
the standby in that case.  Corruption caused by other issues, say a
faulty drive, will transfer over however.  Block checksum feature of
9.3 as a strategy to reduce the risk of that class of issue.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-16 Thread Jeff Janes
On Thu, May 16, 2013 at 7:46 AM, Cuong Hoang climbingr...@gmail.com wrote:

 Hi all,

 Our application is heavy write and IO utilisation has been the problem for
 us for a while. We've decided to use RAID 10 of 4x500GB Samsung 840 Pro for
 the master server. I'm aware of write cache issue on SSDs in case of power
 loss. However, our hosting provider doesn't offer any other choices of SSD
 drives with supercapacitor. To minimise risk, we will also set up another
 RAID 10 SAS in streaming replication mode. For our application, a few
 seconds of data loss is acceptable.

 My question is, would corrupted data files on the primary server affect
 the streaming standby? In other word, is this setup acceptable in terms of
 minimising deficiency of SSDs?



That seems rather scary to me for two reasons.

If the data center has a sudden power failure, why would it not take out
both machines either simultaneously or in short succession?  Can you verify
that the hosting provider does not have them on the same UPS (or even
worse, as two virtual machines on the same physical host)?

The other issue is that you'd have to make sure the master does not restart
after a crash.  If your init.d scripts just blindly start postgresql, then
after a sudden OS restart it will automatically enter recovery and then
open as usual, even though it might be silently corrupt.  At that point it
will be generating WAL based on corrupt data (and incorrect query results),
and propagating that to the standby.   So you have to be paranoid that if
the master ever crashes, it is shot in the head and then reconstructed from
the standby.

Cheers,

Jeff


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-16 Thread Merlin Moncure
On Thu, May 16, 2013 at 1:34 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Thu, May 16, 2013 at 7:46 AM, Cuong Hoang climbingr...@gmail.com wrote:

 Hi all,

 Our application is heavy write and IO utilisation has been the problem for
 us for a while. We've decided to use RAID 10 of 4x500GB Samsung 840 Pro for
 the master server. I'm aware of write cache issue on SSDs in case of power
 loss. However, our hosting provider doesn't offer any other choices of SSD
 drives with supercapacitor. To minimise risk, we will also set up another
 RAID 10 SAS in streaming replication mode. For our application, a few
 seconds of data loss is acceptable.

 My question is, would corrupted data files on the primary server affect
 the streaming standby? In other word, is this setup acceptable in terms of
 minimising deficiency of SSDs?



 That seems rather scary to me for two reasons.

 If the data center has a sudden power failure, why would it not take out
 both machines either simultaneously or in short succession?  Can you verify
 that the hosting provider does not have them on the same UPS (or even worse,
 as two virtual machines on the same physical host)?

I took it to mean that his standby's raid 10 SAS meant disk drive
based standby.  Agree that server should not be configured to
autostart through init.d.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-16 Thread Jeff Janes
On Thu, May 16, 2013 at 11:46 AM, Merlin Moncure mmonc...@gmail.com wrote:

 On Thu, May 16, 2013 at 1:34 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  On Thu, May 16, 2013 at 7:46 AM, Cuong Hoang climbingr...@gmail.com
 wrote:
 
  Hi all,
 
  Our application is heavy write and IO utilisation has been the problem
 for
  us for a while. We've decided to use RAID 10 of 4x500GB Samsung 840 Pro
 for
  the master server. I'm aware of write cache issue on SSDs in case of
 power
  loss. However, our hosting provider doesn't offer any other choices of
 SSD
  drives with supercapacitor. To minimise risk, we will also set up
 another
  RAID 10 SAS in streaming replication mode. For our application, a few
  seconds of data loss is acceptable.
 
  My question is, would corrupted data files on the primary server affect
  the streaming standby? In other word, is this setup acceptable in terms
 of
  minimising deficiency of SSDs?
 
 
 
  That seems rather scary to me for two reasons.
 
  If the data center has a sudden power failure, why would it not take out
  both machines either simultaneously or in short succession?  Can you
 verify
  that the hosting provider does not have them on the same UPS (or even
 worse,
  as two virtual machines on the same physical host)?

 I took it to mean that his standby's raid 10 SAS meant disk drive
 based standby.


I had not considered that.   If the master can't keep up with IO using disk
drives, wouldn't a replica using them probably fall infinitely far behind
trying to keep up with the workload?

Maybe the best choice would just be stick with the current set-up (one
server, spinning rust) and just turn off synchrounous_commit, since he is
already willing to take the loss of a few seconds of transactions.

Cheers,

Jeff


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-16 Thread Cuong Hoang
Thank you for your advice guys. We'll definitely turn off init.d script for
PostgreSQL on the master. The standby host will be disk-based so it will be
less vulnerable to power loss.

I forgot to mention that we'll set up Wal-e https://github.com/wal-e/wal-e to
ship base backups and WALs to Amazon S3 continuous as another safety
measure. Again, the lost of a few WALs would not be a big issue for us.

Do you think that this setup will be acceptable for our purposes?

Thanks,
Cuong


On Fri, May 17, 2013 at 8:39 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Thu, May 16, 2013 at 11:46 AM, Merlin Moncure mmonc...@gmail.comwrote:

 On Thu, May 16, 2013 at 1:34 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  On Thu, May 16, 2013 at 7:46 AM, Cuong Hoang climbingr...@gmail.com
 wrote:
 
  Hi all,
 
  Our application is heavy write and IO utilisation has been the problem
 for
  us for a while. We've decided to use RAID 10 of 4x500GB Samsung 840
 Pro for
  the master server. I'm aware of write cache issue on SSDs in case of
 power
  loss. However, our hosting provider doesn't offer any other choices of
 SSD
  drives with supercapacitor. To minimise risk, we will also set up
 another
  RAID 10 SAS in streaming replication mode. For our application, a few
  seconds of data loss is acceptable.
 
  My question is, would corrupted data files on the primary server affect
  the streaming standby? In other word, is this setup acceptable in
 terms of
  minimising deficiency of SSDs?
 
 
 
  That seems rather scary to me for two reasons.
 
  If the data center has a sudden power failure, why would it not take out
  both machines either simultaneously or in short succession?  Can you
 verify
  that the hosting provider does not have them on the same UPS (or even
 worse,
  as two virtual machines on the same physical host)?

 I took it to mean that his standby's raid 10 SAS meant disk drive
 based standby.


 I had not considered that.   If the master can't keep up with IO using
 disk drives, wouldn't a replica using them probably fall infinitely far
 behind trying to keep up with the workload?

 Maybe the best choice would just be stick with the current set-up (one
 server, spinning rust) and just turn off synchrounous_commit, since he is
 already willing to take the loss of a few seconds of transactions.

 Cheers,

 Jeff



Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-16 Thread Tomas Vondra
Hi,

On 16.5.2013 16:46, Cuong Hoang wrote:
 Hi all,
 
 Our application is heavy write and IO utilisation has been the problem
 for us for a while. We've decided to use RAID 10 of 4x500GB Samsung 840

What does heavy write mean in your case? Does that mean a lot of small
transactions or few large ones?

What have you done to tune the server?

 Pro for the master server. I'm aware of write cache issue on SSDs in
 case of power loss. However, our hosting provider doesn't offer any
 other choices of SSD drives with supercapacitor. To minimise risk, we
 will also set up another RAID 10 SAS in streaming replication mode. For
 our application, a few seconds of data loss is acceptable.

Streaming replication allows zero data loss if used in synchronous mode.

 My question is, would corrupted data files on the primary server affect
 the streaming standby? In other word, is this setup acceptable in terms
 of minimising deficiency of SSDs?

It should be.

Have you considered using a UPS? That would make the SSDs about as
reliable as SATA/SAS drives - the UPS may fail, but so may a BBU unit on
the SAS controller.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-16 Thread Cuong Hoang
Hi Tomas,

We have a lot of small updates and some inserts. The database size is at
35GB including indexes and TOAST. We think it will keep growing to about
200GB. We usually have a burst of about 500k writes in about 5-10 minutes
which basically cripples IO on the current servers. I've tried to increase
the checkpoint_segments, checkpoint_timeout etc. as recommended in
PostgreSQL 9.0 Performance book. However, it seems like our server just
couldn't handle the current load.

Here is the server specs:

Dual E5620, 32GB RAM, 4x1TB SAS 15k in RAID10

Here are some core PostgreSQL configs:

shared_buffers = 2GB# min 128kB
work_mem = 64MB # min 64kB
maintenance_work_mem = 1GB  # min 1MB
wal_buffers = 16MB
checkpoint_segments = 128
checkpoint_timeout = 30min
checkpoint_completion_target = 0.7


Thanks,
Cuong


On Fri, May 17, 2013 at 10:06 AM, Tomas Vondra t...@fuzzy.cz wrote:

 Hi,

 On 16.5.2013 16:46, Cuong Hoang wrote:
  Hi all,
 
  Our application is heavy write and IO utilisation has been the problem
  for us for a while. We've decided to use RAID 10 of 4x500GB Samsung 840

 What does heavy write mean in your case? Does that mean a lot of small
 transactions or few large ones?

 What have you done to tune the server?

  Pro for the master server. I'm aware of write cache issue on SSDs in
  case of power loss. However, our hosting provider doesn't offer any
  other choices of SSD drives with supercapacitor. To minimise risk, we
  will also set up another RAID 10 SAS in streaming replication mode. For
  our application, a few seconds of data loss is acceptable.

 Streaming replication allows zero data loss if used in synchronous mode.

  My question is, would corrupted data files on the primary server affect
  the streaming standby? In other word, is this setup acceptable in terms
  of minimising deficiency of SSDs?

 It should be.

 Have you considered using a UPS? That would make the SSDs about as
 reliable as SATA/SAS drives - the UPS may fail, but so may a BBU unit on
 the SAS controller.

 Tomas


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-16 Thread Mark Kirkwood

On 17/05/13 12:06, Tomas Vondra wrote:

Hi,

On 16.5.2013 16:46, Cuong Hoang wrote:



Pro for the master server. I'm aware of write cache issue on SSDs in
case of power loss. However, our hosting provider doesn't offer any
other choices of SSD drives with supercapacitor. To minimise risk, we
will also set up another RAID 10 SAS in streaming replication mode. For
our application, a few seconds of data loss is acceptable.


Streaming replication allows zero data loss if used in synchronous mode.



I'm not sure synchronous replication is really an option here as it will 
slow the master down to spinning disk io speeds, unless the standby is 
configured with SSDs as well - which probably defeats the purpose of 
this setup.


On the other hand, if the system is so loaded that a pure SAS (spinning 
drive) solution can't keen up, then the standby lag may get to be way 
more than a few seconds...which means look out for huge data loss.


I'd be inclined to apply more leverage to hosting provider to source 
SSDs suitable for your needs, or change hosting providers.


Regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance