Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-26 Thread Tyler Bishop
http://www.seagate.com/files/www-content/product-content/ssd-fam/1200-ssd/en-us/docs/1200-2-sas-ssd-ds1858-2-1509us.pdf
 

Which of these have you tested? I didn't even know seagate had good flash. 







Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 




From: "Tyler Bishop" <tyler.bis...@beyondhosting.net> 
To: "Frederic BRET" <frederic.b...@univ-lr.fr> 
Cc: "Andrei Mikhailovsky" <and...@arhont.com>, "ceph-users" 
<ceph-users@lists.ceph.com> 
Sent: Saturday, December 26, 2015 11:23:46 AM 
Subject: Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results 

Wow, whats the seagate part numbers? I need to test those as well. 

What SAS controller are you utilizing? 

I did some test with FIO on some of the stuff we use. 

http://sudomakeinstall.com/servers/high-end-consumer-ssd-benchmarks 







Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-26 Thread Andrei Mikhailovsky
Yes, indeed. it seems not to matter much if you do nnot have a write intensive 
cluster.

We have Intel 520s which were in production for over 2 years and only used 5% 
of their life according to smart. I've also used Samsung 840Pro, which had the 
same/similar figures over a year usage. So, I guess for my purpose, the 
endurance is not such a big deal. However, the ssds that I have absolutely suck 
performance wise for the ceph journal. Especially the Samsung drives. That's 
the main reason for wanting the 3700/3500 or their equivalent.

Andrei

- Original Message -
> From: "Tyler Bishop" <tyler.bis...@beyondhosting.net>
> To: "Lionel Bouton" <lionel+c...@bouton.name>
> Cc: "Andrei Mikhailovsky" <and...@arhont.com>, "ceph-users" 
> <ceph-users@lists.ceph.com>
> Sent: Tuesday, 22 December, 2015 16:36:21
> Subject: Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio 
> results

> Write endurance is kinda bullshit.
> 
> We have crucial 960gb drives storing data and we've only managed to take 2% 
> off
> the drives life in the period of a year and hundreds of tb written weekly.
> 
> 
> Stuff is way more durable than anyone gives it credit.
> 
> 
> - Original Message -
> From: "Lionel Bouton" <lionel+c...@bouton.name>
> To: "Andrei Mikhailovsky" <and...@arhont.com>, "ceph-users"
> <ceph-users@lists.ceph.com>
> Sent: Tuesday, December 22, 2015 11:04:26 AM
> Subject: Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio 
> results
> 
> Le 22/12/2015 13:43, Andrei Mikhailovsky a écrit :
>> Hello guys,
>>
>> Was wondering if anyone has done testing on Samsung PM863 120 GB version to 
>> see
>> how it performs? IMHO the 480GB version seems like a waste for the journal as
>> you only need to have a small disk size to fit 3-4 osd journals. Unless you 
>> get
>> a far greater durability.
> 
> The problem is endurance. If we use the 480GB for 3 OSDs each on the
> cluster we might build we expect 3 years (with some margin for error but
> not including any write amplification at the SSD level) before the SSDs
> will fail.
> In our context a 120GB model might not even last a year (endurance is
> 1/4th of the 480GB model). This is why SM863 models will probably be
> more suitable if you have access to them: you can use smaller ones which
> cost less and get more endurance (you'll have to check the performance
> though, usually smaller models have lower IOPS and bandwidth).
> 
>> I am planning to replace my current journal ssds over the next month or so 
>> and
>> would like to find out if there is an a good alternative to the Intel's
>> 3700/3500 series.
> 
> 3700 are a safe bet (the 100GB model is rated for ~1.8PBW). 3500 models
> probably don't have enough endurance for many Ceph clusters to be cost
> effective. The 120GB model is only rated for 70TBW and you have to
> consider both client writes and rebalance events.
> I'm uneasy with SSDs expected to fail within the life of the system they
> are in: you can have a cascade effect where an SSD failure brings down
> several OSDs triggering a rebalance which might make SSDs installed at
> the same time fail too. In this case in the best scenario you will reach
> your min_size (>=2) and block any writes which would prevent more SSD
> failures until you move journals to fresh SSDs. If min_size = 1 you
> might actually lose data.
> 
> If you expect to replace your current journal SSDs if I were you I would
> make a staggered deployment over several months/a year to avoid them
> failing at the same time in case of an unforeseen problem. In addition
> this would allow to evaluate the performance and behavior of a new SSD
> model with your hardware (there have been reports of performance
> problems with some combinations of RAID controllers and SSD
> models/firmware versions) without impacting your cluster's overall
> performance too much.
> 
> When using SSDs for journals you have to monitor both :
> * the SSD wear leveling or something equivalent (SMART data may not be
> available if you use a RAID controller but usually you can get the total
> amount data written) of each SSD,
> * the client writes on the whole cluster.
> And check periodically what the expected lifespan left there is for each
> of your SSD based on their current state, average write speed, estimated
> write amplification (both due to pool's size parameter and the SSD
> model's inherent write amplification) and the amount of data moved by
> rebalance events you expect to happen.
> Ideally you should make this computation before choosing the SSD models,
> but several variables a

Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-24 Thread Mart van Santen

Hello,

I've checked with a MegaRaid SAS 2208, then I get ~40 MB/s, both with a
1.9TB and a 240GB SM863 model.
So it seems the LSI MegaRaid HBA's is not optimized for a lot of single
job IOPS...
Which raises for me the question, do we know more about HBAs suitable
high performance ceph, in a previous post on this list I see the people
praising the SAS3008 CPU, but does anybody know if there are any
test/metrics in out there.

Regards,

Mart


On 12/23/2015 10:20 PM, Alex Moore wrote:
> As another data point, I recently bought a few 240GB SM863s, and found
> I was getting 79 MB/s on the single job test.
>
> In my case the SSDs are running off the onboard Intel C204 chipset's
> SATA controllers on a couple of systems with single Xeon E3-1240v2 CPUs.
>
> Alex
>
> On 23/12/2015 6:39 PM, Lionel Bouton wrote:
>> Le 23/12/2015 18:37, Mart van Santen a écrit :
>>> So, maybe you are right and is the HBA the bottleneck (LSI Logic /
>>> Symbios Logic MegaRAID SAS 2108). Under all cirumstances, I do not get
>>> close to the numbers of the PM863 quoted by Sebastien. But his site
>>> does not state what kind of HBA he is using..
>> In fact I was the one doing those tests and I added the relevant
>> information in the comments (Disqus user Gyver): the PM863 tested is
>> connected to the Intel C612 chipset SATA ports (configured as AHCI) of a
>> dual Xeon E5v3 board, so this is a purely SATA configuration.
>>
>> Best regards,
>>
>> Lionel
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Mart van Santen
Greenhost
E: m...@greenhost.nl
T: +31 20 4890444
W: https://greenhost.nl

A PGP signature can be attached to this e-mail,
you need PGP software to verify it. 
My public key is available in keyserver(s)
see: http://tinyurl.com/openpgp-manual

PGP Fingerprint: CA85 EB11 2B70 042D AF66  B29A 6437 01A1 10A3 D3A5




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-23 Thread Loris Cuoghi

Le 22/12/2015 20:03, koukou73gr a écrit :

Even the cheapest stuff nowadays has some more or less decent wear
leveling algorithm built into their controller so this won't be a
problem. Wear leveling algorithms cycle the blocks internally so wear
evens out on the whole disk.


But it would wear out faster, as wear leveling targets using *all* pages 
the same number of times, even the ones which are already used. That's 
another cause of write amplification for you. [0]


[0] https://en.wikipedia.org/wiki/Write_amplification#Wear_leveling

Even more so, on cheaper SSDs, where the wear leveling algorithm may be 
somewhat less intelligent than on so-called "data center" SSDs.




-K.

On 12/22/2015 06:57 PM, Alan Johnson wrote:

I would also add that the journal activity is write intensive so a small part 
of the drive would get excessive writes if the journal and data are co-located 
on an SSD. This would also be the case where an SSD has multiple journals 
associated with many HDDs.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-23 Thread Mart van Santen

Hi all,


On 12/22/2015 01:55 PM, Wido den Hollander wrote:
> On 22-12-15 13:43, Andrei Mikhailovsky wrote:
>> Hello guys,
>>
>> Was wondering if anyone has done testing on Samsung PM863 120 GB version to 
>> see how it performs? IMHO the 480GB version seems like a waste for the 
>> journal as you only need to have a small disk size to fit 3-4 osd journals. 
>> Unless you get a far greater durability.
>>
> In that case I would look at the SM836 from Samsung. They are sold as
> write-intensive SSDs.
>
> Wido
>


Today I received a small batch of SM863 (1.9TBs) disks. So maybe these
testresults are helpfull for making a decision
This is on an IBM X3550M4 with a MegaRaid SAS card (so not in jbod
mode). Unfortunally I have no suitable JBOD card available at my test
server so I'm stuck with the "RAID" layer in the HBA



disabled drive cache, disabled controller cache
---


1 job
---
Run status group 0 (all jobs):
  WRITE: io=906536KB, aggrb=15108KB/s, minb=15108KB/s, maxb=15108KB/s,
mint=60001msec, maxt=60001msec

Disk stats (read/write):
  sdd: ios=91/452978, merge=0/0, ticks=12/39032, in_queue=39016, util=65.04%


5 Jobs
---
Run status group 0 (all jobs):
  WRITE: io=6078.2MB, aggrb=103731KB/s, minb=103731KB/s,
maxb=103731KB/s, mint=60001msec, maxt=60001msec

Disk stats (read/write):
  sdd: ios=179/3108541, merge=0/61, ticks=24/202796, in_queue=200900,
util=99.81%

10 Jobs
---
Run status group 0 (all jobs):
  WRITE: io=9437.5MB, aggrb=161057KB/s, minb=161057KB/s,
maxb=161057KB/s, mint=60003msec, maxt=60003msec

Disk stats (read/write):
  sdd: ios=175/4827612, merge=0/228, ticks=24/452648, in_queue=451548,
util=100.00%



Enabled drive cache, disabled controller cache:
---


1 job
---
Run status group 0 (all jobs):
  WRITE: io=1837.5MB, aggrb=31358KB/s, minb=31358KB/s, maxb=31358KB/s,
mint=60001msec, maxt=60001msec

Disk stats (read/write):
  sdd: ios=91/940283, merge=0/0, ticks=4/40200, in_queue=40188, util=66.99%


5 jobs
---
Run status group 0 (all jobs):
  WRITE: io=6024.3MB, aggrb=102812KB/s, minb=102812KB/s,
maxb=102812KB/s, mint=60001msec, maxt=60001msec

Disk stats (read/write):
  sdd: ios=179/3080690, merge=0/65, ticks=24/202100, in_queue=200364,
util=99.81%



10 jobs
---
Run status group 0 (all jobs):
  WRITE: io=9524.2MB, aggrb=162536KB/s, minb=162536KB/s,
maxb=162536KB/s, mint=60003msec, maxt=60003msec

Disk stats (read/write):
  sdd: ios=164/4869333, merge=0/381, ticks=16/446660, in_queue=446080,
util=100.00%


Enabled drive cache, enabled controller cache:
---


1 job
---
Run status group 0 (all jobs):
  WRITE: io=1739.9MB, aggrb=29693KB/s, minb=29693KB/s, maxb=29693KB/s,
mint=6msec, maxt=6msec

Disk stats (read/write):
  sdd: ios=91/890287, merge=0/0, ticks=8/40096, in_queue=40044, util=66.75%


10 jobs
---
Run status group 0 (all jobs):
  WRITE: io=9056.1MB, aggrb=154554KB/s, minb=154554KB/s,
maxb=154554KB/s, mint=60001msec, maxt=60001msec

Disk stats (read/write):
  sdd: ios=176/4630400, merge=0/312, ticks=24/454824, in_queue=453900,
util=100.00%


The dd way (with caches enabled)
---
# dd if=randfile of=/dev/sdd bs=4k count=100 oflag=direct,dsync
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 36.7559 s, 29.2 MB/s
so thats about ~7K IOPS (single job of course)


So this drive in this configuration is maxing out at about ~160 MB/s @
39K IOPS , raising the blocksize from 4K to 32K raises throughput but
lowers IOPS
Amount of IOPS sounds reasonable for this quoted specs. Please note,
this is a brand new disk, so probably IOPS will slow down a bit over time.


Regards,


Mart


>> I am planning to replace my current journal ssds over the next month or so 
>> and would like to find out if there is an a good alternative to the Intel's 
>> 3700/3500 series. 
>>
>> Thanks
>>
>> Andrei
>>
>> - Original Message -
>>> From: "Wido den Hollander" <w...@42on.com>
>>> To: "ceph-users" <ceph-users@lists.ceph.com>
>>> Sent: Monday, 21 December, 2015 19:12:33
>>> Subject: Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio 
>>> results
>>> On 12/21/2015 05:30 PM, Lionel Bouton wrote:
>>>> Hi,
>>>>
>>>> Sébastien Han just added the test results I reported for these SSDs on
>>>> the following page :
>>>>
>>>> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>>>>
>>>> The table in the original post has the most important numbers and more

Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-23 Thread Lionel Bouton
Le 23/12/2015 16:18, Mart van Santen a écrit :
> Hi all,
>
>
> On 12/22/2015 01:55 PM, Wido den Hollander wrote:
>> On 22-12-15 13:43, Andrei Mikhailovsky wrote:
>>> Hello guys,
>>>
>>> Was wondering if anyone has done testing on Samsung PM863 120 GB version to 
>>> see how it performs? IMHO the 480GB version seems like a waste for the 
>>> journal as you only need to have a small disk size to fit 3-4 osd journals. 
>>> Unless you get a far greater durability.
>>>
>> In that case I would look at the SM836 from Samsung. They are sold as
>> write-intensive SSDs.
>>
>> Wido
>>
>
> Today I received a small batch of SM863 (1.9TBs) disks. So maybe these
> testresults are helpfull for making a decision
> This is on an IBM X3550M4 with a MegaRaid SAS card (so not in jbod
> mode). Unfortunally I have no suitable JBOD card available at my test
> server so I'm stuck with the "RAID" layer in the HBA
>
>
>
> disabled drive cache, disabled controller cache
> ---
>
>
> 1 job
> ---
> Run status group 0 (all jobs):
>   WRITE: io=906536KB, aggrb=15108KB/s, minb=15108KB/s, maxb=15108KB/s,
> mint=60001msec, maxt=60001msec
>
> Disk stats (read/write):
>   sdd: ios=91/452978, merge=0/0, ticks=12/39032, in_queue=39016, util=65.04%

Either the MegaRaid SAS card is the bottleneck or SM863 1.9TB are 8x
slower than PM863 480GB on this particular test which is a bit
surprising: it would make the SM863 one of the slowest (or even the
slowest) DC SSD usable as Ceph journals.
Do you have any other SSD (if possible one of the models or one similar
to the ones listed on
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
which give more than 15MB/s with one job) connected to the same card
model you could test for comparison?

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-23 Thread Mart van Santen
Hello,


On 12/23/2015 04:38 PM, Lionel Bouton wrote:
> Le 23/12/2015 16:18, Mart van Santen a écrit :
>> Hi all,
>>
>>
>> On 12/22/2015 01:55 PM, Wido den Hollander wrote:
>>> On 22-12-15 13:43, Andrei Mikhailovsky wrote:
 Hello guys,

 Was wondering if anyone has done testing on Samsung PM863 120 GB version 
 to see how it performs? IMHO the 480GB version seems like a waste for the 
 journal as you only need to have a small disk size to fit 3-4 osd 
 journals. Unless you get a far greater durability.

>>> In that case I would look at the SM836 from Samsung. They are sold as
>>> write-intensive SSDs.
>>>
>>> Wido
>>>
>> Today I received a small batch of SM863 (1.9TBs) disks. So maybe these
>> testresults are helpfull for making a decision
>> This is on an IBM X3550M4 with a MegaRaid SAS card (so not in jbod
>> mode). Unfortunally I have no suitable JBOD card available at my test
>> server so I'm stuck with the "RAID" layer in the HBA
>>
>>
>>
>> disabled drive cache, disabled controller cache
>> ---
>>
>>
>> 1 job
>> ---
>> Run status group 0 (all jobs):
>>   WRITE: io=906536KB, aggrb=15108KB/s, minb=15108KB/s, maxb=15108KB/s,
>> mint=60001msec, maxt=60001msec
>>
>> Disk stats (read/write):
>>   sdd: ios=91/452978, merge=0/0, ticks=12/39032, in_queue=39016, util=65.04%
> Either the MegaRaid SAS card is the bottleneck or SM863 1.9TB are 8x
> slower than PM863 480GB on this particular test which is a bit
> surprising: it would make the SM863 one of the slowest (or even the
> slowest) DC SSD usable as Ceph journals.
> Do you have any other SSD (if possible one of the models or one similar
> to the ones listed on
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
> which give more than 15MB/s with one job) connected to the same card
> model you could test for comparison?

The performance is strange. I've did some more tests, and it fluctuates
a bit (which is strange, as the system is about idle),
but I get between the 15MB to 30MB/s currently (1 job, 4k). But notably,
I get about the same results (also same fluctuation) with a S3700 (100GB)

I've plugged the SM863 in a different system, with an other SAS card,
which gave a better result:

(Symbios Logic SAS2308, but enclosure is a 3gpbs system)

Run status group 0 (all jobs):
  WRITE: io=2713.7MB, aggrb=46311KB/s, minb=46311KB/s, maxb=46311KB/s,
mint=60001msec, maxt=60001msec

Disk stats (read/write):
  sdb: ios=9/694054, merge=0/0, ticks=0/49284, in_queue=49252, util=82.09%

~ 46 MB/s


So, maybe you are right and is the HBA the bottleneck (LSI Logic /
Symbios Logic MegaRAID SAS 2108). Under all cirumstances, I do not get
close to the numbers of the PM863 quoted by Sebastien. But his site does
not state what kind of HBA he is using..


Regards,

Mart




>
> Lionel

-- 
Mart van Santen
Greenhost
E: m...@greenhost.nl
T: +31 20 4890444
W: https://greenhost.nl

A PGP signature can be attached to this e-mail,
you need PGP software to verify it. 
My public key is available in keyserver(s)
see: http://tinyurl.com/openpgp-manual

PGP Fingerprint: CA85 EB11 2B70 042D AF66  B29A 6437 01A1 10A3 D3A5




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-23 Thread Lionel Bouton
Le 23/12/2015 18:37, Mart van Santen a écrit :
> So, maybe you are right and is the HBA the bottleneck (LSI Logic /
> Symbios Logic MegaRAID SAS 2108). Under all cirumstances, I do not get
> close to the numbers of the PM863 quoted by Sebastien. But his site
> does not state what kind of HBA he is using..

In fact I was the one doing those tests and I added the relevant
information in the comments (Disqus user Gyver): the PM863 tested is
connected to the Intel C612 chipset SATA ports (configured as AHCI) of a
dual Xeon E5v3 board, so this is a purely SATA configuration.

Best regards,

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-23 Thread Alex Moore
As another data point, I recently bought a few 240GB SM863s, and found I 
was getting 79 MB/s on the single job test.


In my case the SSDs are running off the onboard Intel C204 chipset's 
SATA controllers on a couple of systems with single Xeon E3-1240v2 CPUs.


Alex

On 23/12/2015 6:39 PM, Lionel Bouton wrote:

Le 23/12/2015 18:37, Mart van Santen a écrit :

So, maybe you are right and is the HBA the bottleneck (LSI Logic /
Symbios Logic MegaRAID SAS 2108). Under all cirumstances, I do not get
close to the numbers of the PM863 quoted by Sebastien. But his site
does not state what kind of HBA he is using..

In fact I was the one doing those tests and I added the relevant
information in the comments (Disqus user Gyver): the PM863 tested is
connected to the Intel C612 chipset SATA ports (configured as AHCI) of a
dual Xeon E5v3 board, so this is a purely SATA configuration.

Best regards,

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-22 Thread koukou73gr
Even the cheapest stuff nowadays has some more or less decent wear
leveling algorithm built into their controller so this won't be a
problem. Wear leveling algorithms cycle the blocks internally so wear
evens out on the whole disk.

-K.

On 12/22/2015 06:57 PM, Alan Johnson wrote:
> I would also add that the journal activity is write intensive so a small part 
> of the drive would get excessive writes if the journal and data are 
> co-located on an SSD. This would also be the case where an SSD has multiple 
> journals associated with many HDDs.
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-22 Thread Andrei Mikhailovsky
Hello guys,

Was wondering if anyone has done testing on Samsung PM863 120 GB version to see 
how it performs? IMHO the 480GB version seems like a waste for the journal as 
you only need to have a small disk size to fit 3-4 osd journals. Unless you get 
a far greater durability.

I am planning to replace my current journal ssds over the next month or so and 
would like to find out if there is an a good alternative to the Intel's 
3700/3500 series. 

Thanks

Andrei

- Original Message -
> From: "Wido den Hollander" <w...@42on.com>
> To: "ceph-users" <ceph-users@lists.ceph.com>
> Sent: Monday, 21 December, 2015 19:12:33
> Subject: Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio 
> results

> On 12/21/2015 05:30 PM, Lionel Bouton wrote:
>> Hi,
>> 
>> Sébastien Han just added the test results I reported for these SSDs on
>> the following page :
>> 
>> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>> 
>> The table in the original post has the most important numbers and more
>> details can be found in the comments.
>> 
>> To sum things up, both have good performance (this isn't surprising for
>> the S3710 but AFAIK this had to be confirmed for the PM863 and my
>> company just purchased 2 of them just for these tests because they are
>> the only "DC" SSDs available at one of our hosting providers).
>> PM863 models are not designed for write-intensive applications and we
>> have yet to see how they behave in the long run (in our case where PM863
>> endurance is a bit short, if I had a choice we would test SM863 models
>> if they were available to us).
>> 
>> So at least for the PM863 please remember that this report is just about
>> the performance side (on fresh SSDs) which arguably is excellent for the
>> price but this doesn't address other conditions to check (performance
>> consistency over the long run, real-world write endurance including
>> write amplification, large scale testing to detect potential firmware
>> bugs, ...).
>> 
> 
> Interesting! I might be able to gain access to some PM836 3,84TB SSDs
> later this week.
> 
> I'll run the same tests if I can. Interesting to see how they perform.
> 
>> Best regards,
>> 
>> Lionel
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> 
> 
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-22 Thread Wido den Hollander


On 22-12-15 13:43, Andrei Mikhailovsky wrote:
> Hello guys,
> 
> Was wondering if anyone has done testing on Samsung PM863 120 GB version to 
> see how it performs? IMHO the 480GB version seems like a waste for the 
> journal as you only need to have a small disk size to fit 3-4 osd journals. 
> Unless you get a far greater durability.
> 

In that case I would look at the SM836 from Samsung. They are sold as
write-intensive SSDs.

Wido

> I am planning to replace my current journal ssds over the next month or so 
> and would like to find out if there is an a good alternative to the Intel's 
> 3700/3500 series. 
> 
> Thanks
> 
> Andrei
> 
> - Original Message -
>> From: "Wido den Hollander" <w...@42on.com>
>> To: "ceph-users" <ceph-users@lists.ceph.com>
>> Sent: Monday, 21 December, 2015 19:12:33
>> Subject: Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio 
>> results
> 
>> On 12/21/2015 05:30 PM, Lionel Bouton wrote:
>>> Hi,
>>>
>>> Sébastien Han just added the test results I reported for these SSDs on
>>> the following page :
>>>
>>> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>>>
>>> The table in the original post has the most important numbers and more
>>> details can be found in the comments.
>>>
>>> To sum things up, both have good performance (this isn't surprising for
>>> the S3710 but AFAIK this had to be confirmed for the PM863 and my
>>> company just purchased 2 of them just for these tests because they are
>>> the only "DC" SSDs available at one of our hosting providers).
>>> PM863 models are not designed for write-intensive applications and we
>>> have yet to see how they behave in the long run (in our case where PM863
>>> endurance is a bit short, if I had a choice we would test SM863 models
>>> if they were available to us).
>>>
>>> So at least for the PM863 please remember that this report is just about
>>> the performance side (on fresh SSDs) which arguably is excellent for the
>>> price but this doesn't address other conditions to check (performance
>>> consistency over the long run, real-world write endurance including
>>> write amplification, large scale testing to detect potential firmware
>>> bugs, ...).
>>>
>>
>> Interesting! I might be able to gain access to some PM836 3,84TB SSDs
>> later this week.
>>
>> I'll run the same tests if I can. Interesting to see how they perform.
>>
>>> Best regards,
>>>
>>> Lionel
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> --
>> Wido den Hollander
>> 42on B.V.
>> Ceph trainer and consultant
>>
>> Phone: +31 (0)20 700 9902
>> Skype: contact42on
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-22 Thread Lionel Bouton
Le 22/12/2015 13:43, Andrei Mikhailovsky a écrit :
> Hello guys,
>
> Was wondering if anyone has done testing on Samsung PM863 120 GB version to 
> see how it performs? IMHO the 480GB version seems like a waste for the 
> journal as you only need to have a small disk size to fit 3-4 osd journals. 
> Unless you get a far greater durability.

The problem is endurance. If we use the 480GB for 3 OSDs each on the
cluster we might build we expect 3 years (with some margin for error but
not including any write amplification at the SSD level) before the SSDs
will fail.
In our context a 120GB model might not even last a year (endurance is
1/4th of the 480GB model). This is why SM863 models will probably be
more suitable if you have access to them: you can use smaller ones which
cost less and get more endurance (you'll have to check the performance
though, usually smaller models have lower IOPS and bandwidth).

> I am planning to replace my current journal ssds over the next month or so 
> and would like to find out if there is an a good alternative to the Intel's 
> 3700/3500 series. 

3700 are a safe bet (the 100GB model is rated for ~1.8PBW). 3500 models
probably don't have enough endurance for many Ceph clusters to be cost
effective. The 120GB model is only rated for 70TBW and you have to
consider both client writes and rebalance events.
I'm uneasy with SSDs expected to fail within the life of the system they
are in: you can have a cascade effect where an SSD failure brings down
several OSDs triggering a rebalance which might make SSDs installed at
the same time fail too. In this case in the best scenario you will reach
your min_size (>=2) and block any writes which would prevent more SSD
failures until you move journals to fresh SSDs. If min_size = 1 you
might actually lose data.

If you expect to replace your current journal SSDs if I were you I would
make a staggered deployment over several months/a year to avoid them
failing at the same time in case of an unforeseen problem. In addition
this would allow to evaluate the performance and behavior of a new SSD
model with your hardware (there have been reports of performance
problems with some combinations of RAID controllers and SSD
models/firmware versions) without impacting your cluster's overall
performance too much.

When using SSDs for journals you have to monitor both :
* the SSD wear leveling or something equivalent (SMART data may not be
available if you use a RAID controller but usually you can get the total
amount data written) of each SSD,
* the client writes on the whole cluster.
And check periodically what the expected lifespan left there is for each
of your SSD based on their current state, average write speed, estimated
write amplification (both due to pool's size parameter and the SSD
model's inherent write amplification) and the amount of data moved by
rebalance events you expect to happen.
Ideally you should make this computation before choosing the SSD models,
but several variables are not always easy to predict and probably will
change during the life of your cluster.

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-22 Thread Tyler Bishop
Write endurance is kinda bullshit.

We have crucial 960gb drives storing data and we've only managed to take 2% off 
the drives life in the period of a year and hundreds of tb written weekly.


Stuff is way more durable than anyone gives it credit.


- Original Message -
From: "Lionel Bouton" <lionel+c...@bouton.name>
To: "Andrei Mikhailovsky" <and...@arhont.com>, "ceph-users" 
<ceph-users@lists.ceph.com>
Sent: Tuesday, December 22, 2015 11:04:26 AM
Subject: Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

Le 22/12/2015 13:43, Andrei Mikhailovsky a écrit :
> Hello guys,
>
> Was wondering if anyone has done testing on Samsung PM863 120 GB version to 
> see how it performs? IMHO the 480GB version seems like a waste for the 
> journal as you only need to have a small disk size to fit 3-4 osd journals. 
> Unless you get a far greater durability.

The problem is endurance. If we use the 480GB for 3 OSDs each on the
cluster we might build we expect 3 years (with some margin for error but
not including any write amplification at the SSD level) before the SSDs
will fail.
In our context a 120GB model might not even last a year (endurance is
1/4th of the 480GB model). This is why SM863 models will probably be
more suitable if you have access to them: you can use smaller ones which
cost less and get more endurance (you'll have to check the performance
though, usually smaller models have lower IOPS and bandwidth).

> I am planning to replace my current journal ssds over the next month or so 
> and would like to find out if there is an a good alternative to the Intel's 
> 3700/3500 series. 

3700 are a safe bet (the 100GB model is rated for ~1.8PBW). 3500 models
probably don't have enough endurance for many Ceph clusters to be cost
effective. The 120GB model is only rated for 70TBW and you have to
consider both client writes and rebalance events.
I'm uneasy with SSDs expected to fail within the life of the system they
are in: you can have a cascade effect where an SSD failure brings down
several OSDs triggering a rebalance which might make SSDs installed at
the same time fail too. In this case in the best scenario you will reach
your min_size (>=2) and block any writes which would prevent more SSD
failures until you move journals to fresh SSDs. If min_size = 1 you
might actually lose data.

If you expect to replace your current journal SSDs if I were you I would
make a staggered deployment over several months/a year to avoid them
failing at the same time in case of an unforeseen problem. In addition
this would allow to evaluate the performance and behavior of a new SSD
model with your hardware (there have been reports of performance
problems with some combinations of RAID controllers and SSD
models/firmware versions) without impacting your cluster's overall
performance too much.

When using SSDs for journals you have to monitor both :
* the SSD wear leveling or something equivalent (SMART data may not be
available if you use a RAID controller but usually you can get the total
amount data written) of each SSD,
* the client writes on the whole cluster.
And check periodically what the expected lifespan left there is for each
of your SSD based on their current state, average write speed, estimated
write amplification (both due to pool's size parameter and the SSD
model's inherent write amplification) and the amount of data moved by
rebalance events you expect to happen.
Ideally you should make this computation before choosing the SSD models,
but several variables are not always easy to predict and probably will
change during the life of your cluster.

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-22 Thread Wido den Hollander
On 12/22/2015 05:36 PM, Tyler Bishop wrote:
> Write endurance is kinda bullshit.
> 
> We have crucial 960gb drives storing data and we've only managed to take 2% 
> off the drives life in the period of a year and hundreds of tb written weekly.
> 
> 
> Stuff is way more durable than anyone gives it credit.
> 
> 

No, that is absolutely not true. I've seen multiple SSDs fail in Ceph
clusters. Small Samsung 850 Pro SSDs worn out within 4 months in heavy
write-intensive Ceph clusters.

> - Original Message -
> From: "Lionel Bouton" <lionel+c...@bouton.name>
> To: "Andrei Mikhailovsky" <and...@arhont.com>, "ceph-users" 
> <ceph-users@lists.ceph.com>
> Sent: Tuesday, December 22, 2015 11:04:26 AM
> Subject: Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio 
> results
> 
> Le 22/12/2015 13:43, Andrei Mikhailovsky a écrit :
>> Hello guys,
>>
>> Was wondering if anyone has done testing on Samsung PM863 120 GB version to 
>> see how it performs? IMHO the 480GB version seems like a waste for the 
>> journal as you only need to have a small disk size to fit 3-4 osd journals. 
>> Unless you get a far greater durability.
> 
> The problem is endurance. If we use the 480GB for 3 OSDs each on the
> cluster we might build we expect 3 years (with some margin for error but
> not including any write amplification at the SSD level) before the SSDs
> will fail.
> In our context a 120GB model might not even last a year (endurance is
> 1/4th of the 480GB model). This is why SM863 models will probably be
> more suitable if you have access to them: you can use smaller ones which
> cost less and get more endurance (you'll have to check the performance
> though, usually smaller models have lower IOPS and bandwidth).
> 
>> I am planning to replace my current journal ssds over the next month or so 
>> and would like to find out if there is an a good alternative to the Intel's 
>> 3700/3500 series. 
> 
> 3700 are a safe bet (the 100GB model is rated for ~1.8PBW). 3500 models
> probably don't have enough endurance for many Ceph clusters to be cost
> effective. The 120GB model is only rated for 70TBW and you have to
> consider both client writes and rebalance events.
> I'm uneasy with SSDs expected to fail within the life of the system they
> are in: you can have a cascade effect where an SSD failure brings down
> several OSDs triggering a rebalance which might make SSDs installed at
> the same time fail too. In this case in the best scenario you will reach
> your min_size (>=2) and block any writes which would prevent more SSD
> failures until you move journals to fresh SSDs. If min_size = 1 you
> might actually lose data.
> 
> If you expect to replace your current journal SSDs if I were you I would
> make a staggered deployment over several months/a year to avoid them
> failing at the same time in case of an unforeseen problem. In addition
> this would allow to evaluate the performance and behavior of a new SSD
> model with your hardware (there have been reports of performance
> problems with some combinations of RAID controllers and SSD
> models/firmware versions) without impacting your cluster's overall
> performance too much.
> 
> When using SSDs for journals you have to monitor both :
> * the SSD wear leveling or something equivalent (SMART data may not be
> available if you use a RAID controller but usually you can get the total
> amount data written) of each SSD,
> * the client writes on the whole cluster.
> And check periodically what the expected lifespan left there is for each
> of your SSD based on their current state, average write speed, estimated
> write amplification (both due to pool's size parameter and the SSD
> model's inherent write amplification) and the amount of data moved by
> rebalance events you expect to happen.
> Ideally you should make this computation before choosing the SSD models,
> but several variables are not always easy to predict and probably will
> change during the life of your cluster.
> 
> Lionel
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-22 Thread Alan Johnson
I would also add that the journal activity is write intensive so a small part 
of the drive would get excessive writes if the journal and data are co-located 
on an SSD. This would also be the case where an SSD has multiple journals 
associated with many HDDs.

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Wido 
den Hollander
Sent: Tuesday, December 22, 2015 11:46 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

On 12/22/2015 05:36 PM, Tyler Bishop wrote:
> Write endurance is kinda bullshit.
> 
> We have crucial 960gb drives storing data and we've only managed to take 2% 
> off the drives life in the period of a year and hundreds of tb written weekly.
> 
> 
> Stuff is way more durable than anyone gives it credit.
> 
> 

No, that is absolutely not true. I've seen multiple SSDs fail in Ceph clusters. 
Small Samsung 850 Pro SSDs worn out within 4 months in heavy write-intensive 
Ceph clusters.

> - Original Message -
> From: "Lionel Bouton" <lionel+c...@bouton.name>
> To: "Andrei Mikhailovsky" <and...@arhont.com>, "ceph-users" 
> <ceph-users@lists.ceph.com>
> Sent: Tuesday, December 22, 2015 11:04:26 AM
> Subject: Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB 
> fio results
> 
> Le 22/12/2015 13:43, Andrei Mikhailovsky a écrit :
>> Hello guys,
>>
>> Was wondering if anyone has done testing on Samsung PM863 120 GB version to 
>> see how it performs? IMHO the 480GB version seems like a waste for the 
>> journal as you only need to have a small disk size to fit 3-4 osd journals. 
>> Unless you get a far greater durability.
> 
> The problem is endurance. If we use the 480GB for 3 OSDs each on the 
> cluster we might build we expect 3 years (with some margin for error 
> but not including any write amplification at the SSD level) before the 
> SSDs will fail.
> In our context a 120GB model might not even last a year (endurance is 
> 1/4th of the 480GB model). This is why SM863 models will probably be 
> more suitable if you have access to them: you can use smaller ones 
> which cost less and get more endurance (you'll have to check the 
> performance though, usually smaller models have lower IOPS and bandwidth).
> 
>> I am planning to replace my current journal ssds over the next month or so 
>> and would like to find out if there is an a good alternative to the Intel's 
>> 3700/3500 series. 
> 
> 3700 are a safe bet (the 100GB model is rated for ~1.8PBW). 3500 
> models probably don't have enough endurance for many Ceph clusters to 
> be cost effective. The 120GB model is only rated for 70TBW and you 
> have to consider both client writes and rebalance events.
> I'm uneasy with SSDs expected to fail within the life of the system 
> they are in: you can have a cascade effect where an SSD failure brings 
> down several OSDs triggering a rebalance which might make SSDs 
> installed at the same time fail too. In this case in the best scenario 
> you will reach your min_size (>=2) and block any writes which would 
> prevent more SSD failures until you move journals to fresh SSDs. If 
> min_size = 1 you might actually lose data.
> 
> If you expect to replace your current journal SSDs if I were you I 
> would make a staggered deployment over several months/a year to avoid 
> them failing at the same time in case of an unforeseen problem. In 
> addition this would allow to evaluate the performance and behavior of 
> a new SSD model with your hardware (there have been reports of 
> performance problems with some combinations of RAID controllers and 
> SSD models/firmware versions) without impacting your cluster's overall 
> performance too much.
> 
> When using SSDs for journals you have to monitor both :
> * the SSD wear leveling or something equivalent (SMART data may not be 
> available if you use a RAID controller but usually you can get the 
> total amount data written) of each SSD,
> * the client writes on the whole cluster.
> And check periodically what the expected lifespan left there is for 
> each of your SSD based on their current state, average write speed, 
> estimated write amplification (both due to pool's size parameter and 
> the SSD model's inherent write amplification) and the amount of data 
> moved by rebalance events you expect to happen.
> Ideally you should make this computation before choosing the SSD 
> models, but several variables are not always easy to predict and 
> probably will change during the life of your cluster.
> 
> Lionel
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
>

Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-21 Thread Wido den Hollander
On 12/21/2015 05:30 PM, Lionel Bouton wrote:
> Hi,
> 
> Sébastien Han just added the test results I reported for these SSDs on
> the following page :
> 
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
> 
> The table in the original post has the most important numbers and more
> details can be found in the comments.
> 
> To sum things up, both have good performance (this isn't surprising for
> the S3710 but AFAIK this had to be confirmed for the PM863 and my
> company just purchased 2 of them just for these tests because they are
> the only "DC" SSDs available at one of our hosting providers).
> PM863 models are not designed for write-intensive applications and we
> have yet to see how they behave in the long run (in our case where PM863
> endurance is a bit short, if I had a choice we would test SM863 models
> if they were available to us).
> 
> So at least for the PM863 please remember that this report is just about
> the performance side (on fresh SSDs) which arguably is excellent for the
> price but this doesn't address other conditions to check (performance
> consistency over the long run, real-world write endurance including
> write amplification, large scale testing to detect potential firmware
> bugs, ...).
> 

Interesting! I might be able to gain access to some PM836 3,84TB SSDs
later this week.

I'll run the same tests if I can. Interesting to see how they perform.

> Best regards,
> 
> Lionel
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com