Re: [ceph-users] Changing SSD Landscape

2017-06-08 Thread Reed Dier
I did stumble across Samsung PM1725/a in both AIC and 2.5” U.2 form factor.

AIC starts at 1.6T and goes up to 6.4T, while 2.5” goes from 800G up to 6.4T.

The thing that caught my eye with this model is the x8 lanes in AIC, and the 
5DWPD over 5 years.

No idea on how available it is, or how it compares price wise, but comparing to 
the Micron 9100, you can get 5DWPD compared to 3DWPD, which when talking in 
terms of journal devices, which could be a big difference in lifespan.

And from what I read, the PM1725a isn’t as performant as say the P3700, or some 
other enterprise NVMe drives like the HGST SN100, its still NVMe, and leaps and 
bounds lower latency and deeper queuing compared to SATA SSDs.

Reed

> On Jun 8, 2017, at 2:43 AM, Luis Periquito  wrote:
> 
> Looking at that anandtech comparison it seems the Micron usually is
> worse than the P3700.
> 
> This week I asked for a few nodes with P3700 400G and got an answer as
> they're end of sale, and the supplier wouldn't be able to get it
> anywhere in the world. Has anyone got a good replacement for these?
> 
> The official replacement is the P4600, but those start at 2T and has
> the appropriate price rise (it's slightly cheaper per GB than the
> P3700), and it hasn't been officially released yet.
> 
> The P4800X (Optane) costs about the same as the P4600 and is small...
> 
> Not really sure about the Micron 9100, and couldn't find anything
> interesting/comparable in the Samsung range...
> 
> 
> On Wed, May 17, 2017 at 5:03 PM, Reed Dier  wrote:
>> Agreed, the issue I have seen is that the P4800X (Optane) is demonstrably
>> more expensive than the P3700 for a roughly equivalent amount of storage
>> space (400G v 375G).
>> 
>> However, the P4800X is perfectly suited to a Ceph environment, with 30 DWPD,
>> or 12.3 PBW. And on top of that, it seems to generally outperform the P3700
>> in terms of latency, iops, and raw throughput, especially at greater queue
>> depths. The biggest thing I took away was performance consistency.
>> 
>> Anandtech did a good comparison against the P3700 and the Micron 9100 MAX,
>> ironically the 9100 MAX has been the model I have been looking at to replace
>> P3700’s in future OSD nodes.
>> 
>> http://www.anandtech.com/show/11209/intel-optane-ssd-dc-p4800x-review-a-deep-dive-into-3d-xpoint-enterprise-performance/
>> 
>> There are also the DC P4500 and P4600 models in the pipeline from Intel,
>> also utilizing 3D NAND, however I have been told that they will not be
>> shipping in volume until mid to late Q3.
>> And as was stated earlier, these are all starting at much larger storage
>> sizes, 1-4T in size, and with respective endurance ratings of 1.79 PBW and
>> 10.49 PBW for endurance on the 2TB versions of each of those. Which should
>> equal about .5 and ~3 DWPD for most workloads.
>> 
>> At least the Micron 5100 MAX are finally shipping in volume to offer a
>> replacement to Intel S3610, though no good replacement for the S3710 yet
>> that I’ve seen on the endurance part.
>> 
>> Reed
>> 
>> On May 17, 2017, at 5:44 AM, Luis Periquito  wrote:
>> 
>> Anyway, in a couple months we'll start testing the Optane drives. They
>> are small and perhaps ideal journals, or?
>> 
>> The problem with optanes is price: from what I've seen they cost 2x or
>> 3x as much as the P3700...
>> But at least from what I've read they do look really great...
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing SSD Landscape

2017-06-08 Thread Luis Periquito
Looking at that anandtech comparison it seems the Micron usually is
worse than the P3700.

This week I asked for a few nodes with P3700 400G and got an answer as
they're end of sale, and the supplier wouldn't be able to get it
anywhere in the world. Has anyone got a good replacement for these?

The official replacement is the P4600, but those start at 2T and has
the appropriate price rise (it's slightly cheaper per GB than the
P3700), and it hasn't been officially released yet.

The P4800X (Optane) costs about the same as the P4600 and is small...

Not really sure about the Micron 9100, and couldn't find anything
interesting/comparable in the Samsung range...


On Wed, May 17, 2017 at 5:03 PM, Reed Dier  wrote:
> Agreed, the issue I have seen is that the P4800X (Optane) is demonstrably
> more expensive than the P3700 for a roughly equivalent amount of storage
> space (400G v 375G).
>
> However, the P4800X is perfectly suited to a Ceph environment, with 30 DWPD,
> or 12.3 PBW. And on top of that, it seems to generally outperform the P3700
> in terms of latency, iops, and raw throughput, especially at greater queue
> depths. The biggest thing I took away was performance consistency.
>
> Anandtech did a good comparison against the P3700 and the Micron 9100 MAX,
> ironically the 9100 MAX has been the model I have been looking at to replace
> P3700’s in future OSD nodes.
>
> http://www.anandtech.com/show/11209/intel-optane-ssd-dc-p4800x-review-a-deep-dive-into-3d-xpoint-enterprise-performance/
>
> There are also the DC P4500 and P4600 models in the pipeline from Intel,
> also utilizing 3D NAND, however I have been told that they will not be
> shipping in volume until mid to late Q3.
> And as was stated earlier, these are all starting at much larger storage
> sizes, 1-4T in size, and with respective endurance ratings of 1.79 PBW and
> 10.49 PBW for endurance on the 2TB versions of each of those. Which should
> equal about .5 and ~3 DWPD for most workloads.
>
> At least the Micron 5100 MAX are finally shipping in volume to offer a
> replacement to Intel S3610, though no good replacement for the S3710 yet
> that I’ve seen on the endurance part.
>
> Reed
>
> On May 17, 2017, at 5:44 AM, Luis Periquito  wrote:
>
> Anyway, in a couple months we'll start testing the Optane drives. They
> are small and perhaps ideal journals, or?
>
> The problem with optanes is price: from what I've seen they cost 2x or
> 3x as much as the P3700...
> But at least from what I've read they do look really great...
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing SSD Landscape

2017-05-18 Thread Reed Dier
> BTW, you asked about Samsung parts earlier. We are running these
> SM863's in a block storage cluster:
> 
> Model Family: Samsung based SSDs
> Device Model: SAMSUNG MZ7KM240HAGR-0E005
> Firmware Version: GXM1003Q
> 
>  
> 177 Wear_Leveling_Count 0x0013   094   094   005Pre-fail
> Always   -   2195
> 
> The problem is that I don't know how to see how many writes have gone
> through these drives.
> 
> But maybe they're EOL anyway?
> 
> Cheers, Dan

I have SM863a 1.9T’s in an all SSD pool.

Model Family: Samsung based SSDs
Device Model: SAMSUNG MZ7KM1T9HMJP-5

The easiest way to read the number of ‘drive writes’ is the WLC/177 attribute. 
Where ‘Value’ is going to be normalized value of percentage used (out of 100%) 
counting down, and the ‘raw value’ is going to be your actual Program/Erase 
Cycles average value, aka your drive writes.

> ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
> WHEN_FAILED RAW_VALUE
>   9 Power_On_Hours  0x0032   099   099   000Old_age   Always  
>  -   1758
> 177 Wear_Leveling_Count 0x0013   099   099   005Pre-fail  Always  
>  -   7


So in my case,  for this drive in question, the average of all the NAND has 
been fully written 7 times.

The 1.9T SM863 is rated at 12.32 PBW, with a warranty period of 5 years, so 
~3.6 DWPD, or ~6,500 drive writes for the total life of the drive.

Now your drive shows 2,195 PE Cycles, which would be about 33% of the total PE 
cycles its rated for. I’m guessing that some of the NAND may have higher PE 
cycles than others, and the raw value reported may be the max value, rather 
than the average.

Intel reports the min/avg/max on their drives using isdct.

> $ sudo isdct show -smart ad -intelssd 0
> 
> - SMART Attributes PHMD_400AGN -
> - AD -
> AverageEraseCycles : 256
> Description : Wear Leveling Count
> ID : AD
> MaximumEraseCycles : 327
> MinimumEraseCycles : 188
> Normalized : 98
> Raw : 1099533058236

This is a P3700, one of the oldest in use. So this one has seen ~2% of its life 
expectancy usage, where some NAND has seen 75% more PE cycles than others.

Would be curious what the raw value for Samsung is reporting, but thats an easy 
way to gauge drive writes.

Reed

> On May 18, 2017, at 3:30 AM, Dan van der Ster  wrote:
> 
> On Thu, May 18, 2017 at 3:11 AM, Christian Balzer  > wrote:
>> On Wed, 17 May 2017 18:02:06 -0700 Ben Hines wrote:
>> 
>>> Well, ceph journals are of course going away with the imminent bluestore.
>> Not really, in many senses.
>> 
> 
> But we should expect far fewer writes to pass through the RocksDB and
> its WAL, right? So perhaps lower endurance flash will be usable.
> 
> BTW, you asked about Samsung parts earlier. We are running these
> SM863's in a block storage cluster:
> 
> Model Family: Samsung based SSDs
> Device Model: SAMSUNG MZ7KM240HAGR-0E005
> Firmware Version: GXM1003Q
> 
>  9 Power_On_Hours  0x0032   098   098   000Old_age
> Always   -   9971
> 177 Wear_Leveling_Count 0x0013   094   094   005Pre-fail
> Always   -   2195
> 241 Total_LBAs_Written  0x0032   099   099   000Old_age
> Always   -   701300549904
> 242 Total_LBAs_Read 0x0032   099   099   000Old_age
> Always   -   20421265
> 251 NAND_Writes 0x0032   100   100   000Old_age
> Always   -   1148921417736
> 
> The problem is that I don't know how to see how many writes have gone
> through these drives.
> Total_LBAs_Written appears to be bogus -- it's based on time. It
> matches exactly the 3.6DWPD spec'd for that model:
>  3.6*240GB*9971 hours = 358.95TB
>  701300549904 LBAs * 512Bytes/LBA = 359.06TB
> 
> If we trust Wear_Leveling_Count then we're only dropping 6% in a year
> -- these should be good.
> 
> But maybe they're EOL anyway?
> 
> Cheers, Dan
> 
>>> Are small SSDs still useful for something with Bluestore?
>>> 
>> Of course, the WAL and other bits for the rocksdb, read up on it.
>> 
>> On top of that is the potential to improve things further with things
>> like bcache.
>> 
>>> For speccing out a cluster today that is a many 6+ months away from being
>>> required, which I am going to be doing, i was thinking all-SSD would be the
>>> way to go. (or is all-spinner performant with Bluestore?) Too early to make
>>> that call?
>>> 
>> Your call and funeral with regards to all spinners (depending on your
>> needs).
>> Bluestore at the very best of circumstances could double your IOPS, but
>> there are other factors at play and most people who NEED SSD journals now
>> would want something with SSDs in Bluestore as well.
>> 
>> If you're planning to actually deploy a (entirely) Bluestore cluster in
>> production with mission critical data before next year, you're a lot
>> braver than me.
>> An early adoption scheme with Bluestore nodes being in their own failure
>> domain (rack) would 

Re: [ceph-users] Changing SSD Landscape

2017-05-18 Thread Nick Fisk
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Dan 
> van der Ster
> Sent: 18 May 2017 09:30
> To: Christian Balzer <ch...@gol.com>
> Cc: ceph-users <ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] Changing SSD Landscape
> 
> On Thu, May 18, 2017 at 3:11 AM, Christian Balzer <ch...@gol.com> wrote:
> > On Wed, 17 May 2017 18:02:06 -0700 Ben Hines wrote:
> >
> >> Well, ceph journals are of course going away with the imminent bluestore.
> > Not really, in many senses.
> >
> 
> But we should expect far fewer writes to pass through the RocksDB and its 
> WAL, right? So perhaps lower endurance flash will be
> usable.

Depends, I flagged up an issue in Bluestore where client latency writing to 
spinners was tied to the underlying disks latency. Sage has introduced a new 
deferred write feature which does a similar double write strategy to Filestore, 
first into the WAL, where it gets coalesced and then written out to the disk. 
The deferred writes are tuneable, as in you can say only defer writes up to 
128kbetc. But if you want the same write latency you see in Filestore, then 
you will encounter increased SSD wear to match it. 

> 
> BTW, you asked about Samsung parts earlier. We are running these SM863's in a 
> block storage cluster:
> 
> Model Family: Samsung based SSDs
> Device Model: SAMSUNG MZ7KM240HAGR-0E005
> Firmware Version: GXM1003Q
> 
>   9 Power_On_Hours  0x0032   098   098   000Old_age
> Always   -   9971
> 177 Wear_Leveling_Count 0x0013   094   094   005Pre-fail
> Always   -   2195
> 241 Total_LBAs_Written  0x0032   099   099   000Old_age
> Always   -   701300549904
> 242 Total_LBAs_Read 0x0032   099   099   000Old_age
> Always   -   20421265
> 251 NAND_Writes 0x0032   100   100   000Old_age
> Always   -   1148921417736
> 
> The problem is that I don't know how to see how many writes have gone through 
> these drives.
> Total_LBAs_Written appears to be bogus -- it's based on time. It matches 
> exactly the 3.6DWPD spec'd for that model:
>   3.6*240GB*9971 hours = 358.95TB
>   701300549904 LBAs * 512Bytes/LBA = 359.06TB
> 
> If we trust Wear_Leveling_Count then we're only dropping 6% in a year
> -- these should be good.
> 
> But maybe they're EOL anyway?
> 
> Cheers, Dan
> 
> >> Are small SSDs still useful for something with Bluestore?
> >>
> > Of course, the WAL and other bits for the rocksdb, read up on it.
> >
> > On top of that is the potential to improve things further with things
> > like bcache.
> >
> >> For speccing out a cluster today that is a many 6+ months away from
> >> being required, which I am going to be doing, i was thinking all-SSD
> >> would be the way to go. (or is all-spinner performant with
> >> Bluestore?) Too early to make that call?
> >>
> > Your call and funeral with regards to all spinners (depending on your
> > needs).
> > Bluestore at the very best of circumstances could double your IOPS,
> > but there are other factors at play and most people who NEED SSD
> > journals now would want something with SSDs in Bluestore as well.
> >
> > If you're planning to actually deploy a (entirely) Bluestore cluster
> > in production with mission critical data before next year, you're a
> > lot braver than me.
> > An early adoption scheme with Bluestore nodes being in their own
> > failure domain (rack) would be the best I could see myself doing in my
> > generic cluster.
> > For the 2 mission critical production clusters, they are (will be)
> > frozen most likely.
> >
> > Christian
> >
> >> -Ben
> >>
> >> On Wed, May 17, 2017 at 5:30 PM, Christian Balzer <ch...@gol.com> wrote:
> >>
> >> >
> >> > Hello,
> >> >
> >> > On Wed, 17 May 2017 11:28:17 +0200 Eneko Lacunza wrote:
> >> >
> >> > > Hi Nick,
> >> > >
> >> > > El 17/05/17 a las 11:12, Nick Fisk escribió:
> >> > > > There seems to be a shift in enterprise SSD products to larger
> >> > > > less
> >> > write intensive products and generally costing more than what
> >> > > > the existing P/S 3600/3700 ranges were. For example the new
> >> > > > Intel NVME
> >> > P4600 range seems to start at 2TB. Although I mention Intel
> >> > > > products, this seems to be the general outlook across all
> >> > manufacturers

Re: [ceph-users] Changing SSD Landscape

2017-05-18 Thread Dan van der Ster
On Thu, May 18, 2017 at 3:11 AM, Christian Balzer  wrote:
> On Wed, 17 May 2017 18:02:06 -0700 Ben Hines wrote:
>
>> Well, ceph journals are of course going away with the imminent bluestore.
> Not really, in many senses.
>

But we should expect far fewer writes to pass through the RocksDB and
its WAL, right? So perhaps lower endurance flash will be usable.

BTW, you asked about Samsung parts earlier. We are running these
SM863's in a block storage cluster:

Model Family: Samsung based SSDs
Device Model: SAMSUNG MZ7KM240HAGR-0E005
Firmware Version: GXM1003Q

  9 Power_On_Hours  0x0032   098   098   000Old_age
Always   -   9971
177 Wear_Leveling_Count 0x0013   094   094   005Pre-fail
Always   -   2195
241 Total_LBAs_Written  0x0032   099   099   000Old_age
Always   -   701300549904
242 Total_LBAs_Read 0x0032   099   099   000Old_age
Always   -   20421265
251 NAND_Writes 0x0032   100   100   000Old_age
Always   -   1148921417736

The problem is that I don't know how to see how many writes have gone
through these drives.
Total_LBAs_Written appears to be bogus -- it's based on time. It
matches exactly the 3.6DWPD spec'd for that model:
  3.6*240GB*9971 hours = 358.95TB
  701300549904 LBAs * 512Bytes/LBA = 359.06TB

If we trust Wear_Leveling_Count then we're only dropping 6% in a year
-- these should be good.

But maybe they're EOL anyway?

Cheers, Dan

>> Are small SSDs still useful for something with Bluestore?
>>
> Of course, the WAL and other bits for the rocksdb, read up on it.
>
> On top of that is the potential to improve things further with things
> like bcache.
>
>> For speccing out a cluster today that is a many 6+ months away from being
>> required, which I am going to be doing, i was thinking all-SSD would be the
>> way to go. (or is all-spinner performant with Bluestore?) Too early to make
>> that call?
>>
> Your call and funeral with regards to all spinners (depending on your
> needs).
> Bluestore at the very best of circumstances could double your IOPS, but
> there are other factors at play and most people who NEED SSD journals now
> would want something with SSDs in Bluestore as well.
>
> If you're planning to actually deploy a (entirely) Bluestore cluster in
> production with mission critical data before next year, you're a lot
> braver than me.
> An early adoption scheme with Bluestore nodes being in their own failure
> domain (rack) would be the best I could see myself doing in my generic
> cluster.
> For the 2 mission critical production clusters, they are (will be) frozen
> most likely.
>
> Christian
>
>> -Ben
>>
>> On Wed, May 17, 2017 at 5:30 PM, Christian Balzer  wrote:
>>
>> >
>> > Hello,
>> >
>> > On Wed, 17 May 2017 11:28:17 +0200 Eneko Lacunza wrote:
>> >
>> > > Hi Nick,
>> > >
>> > > El 17/05/17 a las 11:12, Nick Fisk escribió:
>> > > > There seems to be a shift in enterprise SSD products to larger less
>> > write intensive products and generally costing more than what
>> > > > the existing P/S 3600/3700 ranges were. For example the new Intel NVME
>> > P4600 range seems to start at 2TB. Although I mention Intel
>> > > > products, this seems to be the general outlook across all
>> > manufacturers. This presents some problems for acquiring SSD's for Ceph
>> > > > journal/WAL use if your cluster is largely write only and wouldn't
>> > benefit from using the extra capacity brought by these SSD's to
>> > > > use as cache.
>> > > >
>> > > > Is anybody in the same situation and is struggling to find good P3700
>> > 400G replacements?
>> > > >
>> > > We usually build tiny ceph clusters, with 1 gbit network and S3610/S3710
>> > > 200GB SSDs for journals. We have been experiencing supply problems for
>> > > those disks lately, although it seems that 400GB disks are available, at
>> > > least for now.
>> > >
>> > This. Very much THIS.
>> >
>> > We're trying to get 200 or 400 or even 800GB DC S3710 or S3610s here
>> > recently with zero success.
>> > And I'm believing our vendor for a change that it's not their fault.
>> >
>> > What seems to be happening (no official confirmation, but it makes all the
>> > sense in the world to me) is this:
>> >
>> > Intel is trying to switch to 3DNAND (like they did with the 3520s), but
>> > while not having officially EOL'ed the 3(6/7)10s also allowed the supply
>> > to run dry.
>> >
>> > Which of course is not a smart move, because now people are massively
>> > forced to look for alternatives and if they work unlikely to come back.
>> >
>> > I'm looking at oversized Samsungs (base model equivalent to 3610s) and am
>> > following this thread for other alternatives.
>> >
>> > Christian
>> > --
>> > Christian BalzerNetwork/Systems Engineer
>> > ch...@gol.com   Global OnLine Japan/Rakuten Communications
>> > http://www.gol.com/
>> > ___
>> > ceph-users mailing list
>> > 

Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Christian Balzer
On Wed, 17 May 2017 18:02:06 -0700 Ben Hines wrote:

> Well, ceph journals are of course going away with the imminent bluestore.
Not really, in many senses.

> Are small SSDs still useful for something with Bluestore?
>
Of course, the WAL and other bits for the rocksdb, read up on it.

On top of that is the potential to improve things further with things
like bcache.
 
> For speccing out a cluster today that is a many 6+ months away from being
> required, which I am going to be doing, i was thinking all-SSD would be the
> way to go. (or is all-spinner performant with Bluestore?) Too early to make
> that call?
> 
Your call and funeral with regards to all spinners (depending on your
needs). 
Bluestore at the very best of circumstances could double your IOPS, but
there are other factors at play and most people who NEED SSD journals now
would want something with SSDs in Bluestore as well.

If you're planning to actually deploy a (entirely) Bluestore cluster in
production with mission critical data before next year, you're a lot
braver than me.
An early adoption scheme with Bluestore nodes being in their own failure
domain (rack) would be the best I could see myself doing in my generic
cluster.
For the 2 mission critical production clusters, they are (will be) frozen
most likely.

Christian

> -Ben
> 
> On Wed, May 17, 2017 at 5:30 PM, Christian Balzer  wrote:
> 
> >
> > Hello,
> >
> > On Wed, 17 May 2017 11:28:17 +0200 Eneko Lacunza wrote:
> >  
> > > Hi Nick,
> > >
> > > El 17/05/17 a las 11:12, Nick Fisk escribió:  
> > > > There seems to be a shift in enterprise SSD products to larger less  
> > write intensive products and generally costing more than what  
> > > > the existing P/S 3600/3700 ranges were. For example the new Intel NVME  
> > P4600 range seems to start at 2TB. Although I mention Intel  
> > > > products, this seems to be the general outlook across all  
> > manufacturers. This presents some problems for acquiring SSD's for Ceph  
> > > > journal/WAL use if your cluster is largely write only and wouldn't  
> > benefit from using the extra capacity brought by these SSD's to  
> > > > use as cache.
> > > >
> > > > Is anybody in the same situation and is struggling to find good P3700  
> > 400G replacements?  
> > > >  
> > > We usually build tiny ceph clusters, with 1 gbit network and S3610/S3710
> > > 200GB SSDs for journals. We have been experiencing supply problems for
> > > those disks lately, although it seems that 400GB disks are available, at
> > > least for now.
> > >  
> > This. Very much THIS.
> >
> > We're trying to get 200 or 400 or even 800GB DC S3710 or S3610s here
> > recently with zero success.
> > And I'm believing our vendor for a change that it's not their fault.
> >
> > What seems to be happening (no official confirmation, but it makes all the
> > sense in the world to me) is this:
> >
> > Intel is trying to switch to 3DNAND (like they did with the 3520s), but
> > while not having officially EOL'ed the 3(6/7)10s also allowed the supply
> > to run dry.
> >
> > Which of course is not a smart move, because now people are massively
> > forced to look for alternatives and if they work unlikely to come back.
> >
> > I'm looking at oversized Samsungs (base model equivalent to 3610s) and am
> > following this thread for other alternatives.
> >
> > Christian
> > --
> > Christian BalzerNetwork/Systems Engineer
> > ch...@gol.com   Global OnLine Japan/Rakuten Communications
> > http://www.gol.com/
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >  


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Ben Hines
Well, ceph journals are of course going away with the imminent bluestore.
Are small SSDs still useful for something with Bluestore?

For speccing out a cluster today that is a many 6+ months away from being
required, which I am going to be doing, i was thinking all-SSD would be the
way to go. (or is all-spinner performant with Bluestore?) Too early to make
that call?

-Ben

On Wed, May 17, 2017 at 5:30 PM, Christian Balzer  wrote:

>
> Hello,
>
> On Wed, 17 May 2017 11:28:17 +0200 Eneko Lacunza wrote:
>
> > Hi Nick,
> >
> > El 17/05/17 a las 11:12, Nick Fisk escribió:
> > > There seems to be a shift in enterprise SSD products to larger less
> write intensive products and generally costing more than what
> > > the existing P/S 3600/3700 ranges were. For example the new Intel NVME
> P4600 range seems to start at 2TB. Although I mention Intel
> > > products, this seems to be the general outlook across all
> manufacturers. This presents some problems for acquiring SSD's for Ceph
> > > journal/WAL use if your cluster is largely write only and wouldn't
> benefit from using the extra capacity brought by these SSD's to
> > > use as cache.
> > >
> > > Is anybody in the same situation and is struggling to find good P3700
> 400G replacements?
> > >
> > We usually build tiny ceph clusters, with 1 gbit network and S3610/S3710
> > 200GB SSDs for journals. We have been experiencing supply problems for
> > those disks lately, although it seems that 400GB disks are available, at
> > least for now.
> >
> This. Very much THIS.
>
> We're trying to get 200 or 400 or even 800GB DC S3710 or S3610s here
> recently with zero success.
> And I'm believing our vendor for a change that it's not their fault.
>
> What seems to be happening (no official confirmation, but it makes all the
> sense in the world to me) is this:
>
> Intel is trying to switch to 3DNAND (like they did with the 3520s), but
> while not having officially EOL'ed the 3(6/7)10s also allowed the supply
> to run dry.
>
> Which of course is not a smart move, because now people are massively
> forced to look for alternatives and if they work unlikely to come back.
>
> I'm looking at oversized Samsungs (base model equivalent to 3610s) and am
> following this thread for other alternatives.
>
> Christian
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Christian Balzer

Hello,

On Wed, 17 May 2017 11:28:17 +0200 Eneko Lacunza wrote:

> Hi Nick,
> 
> El 17/05/17 a las 11:12, Nick Fisk escribió:
> > There seems to be a shift in enterprise SSD products to larger less write 
> > intensive products and generally costing more than what
> > the existing P/S 3600/3700 ranges were. For example the new Intel NVME 
> > P4600 range seems to start at 2TB. Although I mention Intel
> > products, this seems to be the general outlook across all manufacturers. 
> > This presents some problems for acquiring SSD's for Ceph
> > journal/WAL use if your cluster is largely write only and wouldn't benefit 
> > from using the extra capacity brought by these SSD's to
> > use as cache.
> >
> > Is anybody in the same situation and is struggling to find good P3700 400G 
> > replacements?
> >  
> We usually build tiny ceph clusters, with 1 gbit network and S3610/S3710 
> 200GB SSDs for journals. We have been experiencing supply problems for 
> those disks lately, although it seems that 400GB disks are available, at 
> least for now.
> 
This. Very much THIS.

We're trying to get 200 or 400 or even 800GB DC S3710 or S3610s here
recently with zero success.
And I'm believing our vendor for a change that it's not their fault. 

What seems to be happening (no official confirmation, but it makes all the
sense in the world to me) is this:

Intel is trying to switch to 3DNAND (like they did with the 3520s), but
while not having officially EOL'ed the 3(6/7)10s also allowed the supply
to run dry.

Which of course is not a smart move, because now people are massively
forced to look for alternatives and if they work unlikely to come back.

I'm looking at oversized Samsungs (base model equivalent to 3610s) and am
following this thread for other alternatives.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Reed Dier
Agreed, the issue I have seen is that the P4800X (Optane) is demonstrably more 
expensive than the P3700 for a roughly equivalent amount of storage space (400G 
v 375G).

However, the P4800X is perfectly suited to a Ceph environment, with 30 DWPD, or 
12.3 PBW. And on top of that, it seems to generally outperform the P3700 in 
terms of latency, iops, and raw throughput, especially at greater queue depths. 
The biggest thing I took away was performance consistency.

Anandtech did a good comparison against the P3700 and the Micron 9100 MAX, 
ironically the 9100 MAX has been the model I have been looking at to replace 
P3700’s in future OSD nodes.

http://www.anandtech.com/show/11209/intel-optane-ssd-dc-p4800x-review-a-deep-dive-into-3d-xpoint-enterprise-performance/
 


There are also the DC P4500 and P4600 models in the pipeline from Intel, also 
utilizing 3D NAND, however I have been told that they will not be shipping in 
volume until mid to late Q3.
And as was stated earlier, these are all starting at much larger storage sizes, 
1-4T in size, and with respective endurance ratings of 1.79 PBW and 10.49 PBW 
for endurance on the 2TB versions of each of those. Which should equal about .5 
and ~3 DWPD for most workloads.

At least the Micron 5100 MAX are finally shipping in volume to offer a 
replacement to Intel S3610, though no good replacement for the S3710 yet that 
I’ve seen on the endurance part.

Reed

> On May 17, 2017, at 5:44 AM, Luis Periquito  wrote:
> 
>>> Anyway, in a couple months we'll start testing the Optane drives. They
>>> are small and perhaps ideal journals, or?
>>> 
> The problem with optanes is price: from what I've seen they cost 2x or
> 3x as much as the P3700...
> But at least from what I've read they do look really great...
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Luis Periquito
>> Anyway, in a couple months we'll start testing the Optane drives. They
>> are small and perhaps ideal journals, or?
>>
The problem with optanes is price: from what I've seen they cost 2x or
3x as much as the P3700...
But at least from what I've read they do look really great...
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Nick Fisk
Hi Dan,

> -Original Message-
> From: Dan van der Ster [mailto:d...@vanderster.com]
> Sent: 17 May 2017 10:29
> To: Nick Fisk <n...@fisk.me.uk>
> Cc: ceph-users <ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] Changing SSD Landscape
> 
> I am currently pricing out some DCS3520's, for OSDs. Word is that the price 
> is going up, but I don't have specifics, yet.
> 
> I'm curious, does your real usage show that the 3500 series don't offer 
> enough endurance?

We've written about 700-800TB to each P3700 in about 10 months and their 
official specs show that they should be good for about 7PBW. We plan to try and 
keep these nodes running for about 5 years, so roughly about right I would 
imagine.

Looking at the 3520's, I think by the time we have enough endurance, we would 
be talking about ones at the high end of the capacity scale.

> 
> Here's one of our DCS3700's after 2.5 years of RBD + a bit of S3:
> 
> Model Family: Intel 730 and DC S35x0/3610/3700 Series SSDs
> Device Model: INTEL SSDSC2BA200G3
> Firmware Version: 5DV10270
> User Capacity:200,049,647,616 bytes [200 GB]
> 
>   9 Power_On_Hours  0x0032   100   100   000Old_age
> Always   -   22580
> 226 Workld_Media_Wear_Indic 0x0032   100   100   000Old_age
> Always   -   3471
> 228 Workload_Minutes0x0032   100   100   000Old_age
> Always   -   1354810
> 232 Available_Reservd_Space 0x0033   099   099   010Pre-fail
> Always   -   0
> 233 Media_Wearout_Indicator 0x0032   097   097   000Old_age
> Always   -   0
> 241 Host_Writes_32MiB   0x0032   100   100   000Old_age
> Always   -   8236969
> 242 Host_Reads_32MiB0x0032   100   100   000Old_age
> Always   -   7400
> 
> Still loads of endurance left.
> 
> Anyway, in a couple months we'll start testing the Optane drives. They are 
> small and perhaps ideal journals, or?

Ok, interesting. Is this the P4800 model?

> 
> -- Dan
> 
> 
> 
> On Wed, May 17, 2017 at 11:12 AM, Nick Fisk <n...@fisk.me.uk> wrote:
> > Hi All,
> >
> > There seems to be a shift in enterprise SSD products to larger less
> > write intensive products and generally costing more than what the
> > existing P/S 3600/3700 ranges were. For example the new Intel NVME
> > P4600 range seems to start at 2TB. Although I mention Intel products,
> > this seems to be the general outlook across all manufacturers. This 
> > presents some problems for acquiring SSD's for Ceph
> journal/WAL use if your cluster is largely write only and wouldn't benefit 
> from using the extra capacity brought by these SSD's to use
> as cache.
> >
> > Is anybody in the same situation and is struggling to find good P3700 400G 
> > replacements?
> >
> > Nick
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Dan van der Ster
On Wed, May 17, 2017 at 11:29 AM, Dan van der Ster  wrote:
> I am currently pricing out some DCS3520's, for OSDs. Word is that the
> price is going up, but I don't have specifics, yet.
>
> I'm curious, does your real usage show that the 3500 series don't
> offer enough endurance?
>
> Here's one of our DCS3700's after 2.5 years of RBD + a bit of S3:
>
> Model Family: Intel 730 and DC S35x0/3610/3700 Series SSDs
> Device Model: INTEL SSDSC2BA200G3
> Firmware Version: 5DV10270
> User Capacity:200,049,647,616 bytes [200 GB]
>
>   9 Power_On_Hours  0x0032   100   100   000Old_age
> Always   -   22580
> 226 Workld_Media_Wear_Indic 0x0032   100   100   000Old_age
> Always   -   3471
> 228 Workload_Minutes0x0032   100   100   000Old_age
> Always   -   1354810
> 232 Available_Reservd_Space 0x0033   099   099   010Pre-fail
> Always   -   0
> 233 Media_Wearout_Indicator 0x0032   097   097   000Old_age
> Always   -   0
> 241 Host_Writes_32MiB   0x0032   100   100   000Old_age
> Always   -   8236969
> 242 Host_Reads_32MiB0x0032   100   100   000Old_age
> Always   -   7400
>
> Still loads of endurance left.

Err... scratch that  math fail. Yes, there is enough endurance left on this
S3700. But 8236969*32MiB = 276TB which is already getting too much for
a S35x0 series.

Cheers, Dan


>
> Anyway, in a couple months we'll start testing the Optane drives. They
> are small and perhaps ideal journals, or?
>
> -- Dan
>
>
>
> On Wed, May 17, 2017 at 11:12 AM, Nick Fisk  wrote:
>> Hi All,
>>
>> There seems to be a shift in enterprise SSD products to larger less write 
>> intensive products and generally costing more than what
>> the existing P/S 3600/3700 ranges were. For example the new Intel NVME P4600 
>> range seems to start at 2TB. Although I mention Intel
>> products, this seems to be the general outlook across all manufacturers. 
>> This presents some problems for acquiring SSD's for Ceph
>> journal/WAL use if your cluster is largely write only and wouldn't benefit 
>> from using the extra capacity brought by these SSD's to
>> use as cache.
>>
>> Is anybody in the same situation and is struggling to find good P3700 400G 
>> replacements?
>>
>> Nick
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Dan van der Ster
I am currently pricing out some DCS3520's, for OSDs. Word is that the
price is going up, but I don't have specifics, yet.

I'm curious, does your real usage show that the 3500 series don't
offer enough endurance?

Here's one of our DCS3700's after 2.5 years of RBD + a bit of S3:

Model Family: Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model: INTEL SSDSC2BA200G3
Firmware Version: 5DV10270
User Capacity:200,049,647,616 bytes [200 GB]

  9 Power_On_Hours  0x0032   100   100   000Old_age
Always   -   22580
226 Workld_Media_Wear_Indic 0x0032   100   100   000Old_age
Always   -   3471
228 Workload_Minutes0x0032   100   100   000Old_age
Always   -   1354810
232 Available_Reservd_Space 0x0033   099   099   010Pre-fail
Always   -   0
233 Media_Wearout_Indicator 0x0032   097   097   000Old_age
Always   -   0
241 Host_Writes_32MiB   0x0032   100   100   000Old_age
Always   -   8236969
242 Host_Reads_32MiB0x0032   100   100   000Old_age
Always   -   7400

Still loads of endurance left.

Anyway, in a couple months we'll start testing the Optane drives. They
are small and perhaps ideal journals, or?

-- Dan



On Wed, May 17, 2017 at 11:12 AM, Nick Fisk  wrote:
> Hi All,
>
> There seems to be a shift in enterprise SSD products to larger less write 
> intensive products and generally costing more than what
> the existing P/S 3600/3700 ranges were. For example the new Intel NVME P4600 
> range seems to start at 2TB. Although I mention Intel
> products, this seems to be the general outlook across all manufacturers. This 
> presents some problems for acquiring SSD's for Ceph
> journal/WAL use if your cluster is largely write only and wouldn't benefit 
> from using the extra capacity brought by these SSD's to
> use as cache.
>
> Is anybody in the same situation and is struggling to find good P3700 400G 
> replacements?
>
> Nick
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com