Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-03-09 Thread mad Engineer
Thank you Nick for explaining the problem with 4k writes.Queue depth used in this setup is 256 the maximum supported. Can you clarify that adding more nodes will not increase iops.In general how will we increase iops of a ceph cluster. Thanks for your help On Sat, Mar 7, 2015 at 5:57 PM, Nick

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-03-09 Thread Nick Fisk
Can you run the Fio test again but with a queue depth of 32. This will probably show what your cluster is capable of. Adding more nodes with SSD's will probably help scale, but only at higher io depths. At low queue depths you are probably already at the limit as per my earlier email. From:

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-03-07 Thread Nick Fisk
You are hitting serial latency limits. For a 4kb sync write to happen it has to:- 1. Travel across network from client to Primary OSD 2. Be processed by Ceph 3. Get Written to Pri OSD 4. Ack travels across network to client At 4kb these 4 steps take up a very high percentage of the actual

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-03-07 Thread mad Engineer
*Update:* *Hardware:* Upgraded RAID controller to LSI Megaraid 9341 -12Gbps 3 Samsung 840 EVO - was showing 45K iops for fio test with 7 threads and 4k block size in *JBOD *mode CPU- 16 cores @2.27Ghz RAM- 24Gb NIC- 10Gbits with *under 1 ms latency, *iperf shows 9.18 Gbps between host and client

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Philippe Schwarz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Le 28/02/2015 12:19, mad Engineer a écrit : Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Stefan Priebe - Profihost AG
Am 28.02.2015 um 12:43 schrieb Alexandre DERUMIER aderum...@odiso.com: Hi, First, test if your ssd can write fast with O_DSYNC check this blog: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ Then, try with ceph Giant (or

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Alexandre DERUMIER
But this was replication1? I never was able to do more than 30 000 with replication 3. Oh, sorry, it's was about read. for write, I think I was around 3iops with 3 nodes (2x4cores 2,1ghz each), cpu bound, with replication x1. with replication x3, around 9000iops. Going to test on

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
thanks for that link Alexandre, as per that link tried these: *850 EVO* *without dsync* dd if=randfile of=/dev/sdb1 bs=4k count=10 oflag=direct 10+0 records in 10+0 records out 40960 bytes (410 MB) copied, 4.42913 s, 92.5 MB/s with *dsync*: dd if=randfile of=/dev/sdb1 bs=4k

[ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Alexandre DERUMIER
As optimisation, try to set ioscheduler to noop, and also enable rbd_cache=true. (It's really helping for for sequential writes) but your results seem quite low, 926kb/s with 4k, it's only 200io/s. check if you don't have any big network latencies, or mtu fragementation problem. Maybe also

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Alexandre DERUMIER
Hi, First, test if your ssd can write fast with O_DSYNC check this blog: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot of optimisations for ssd for threads

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Andrei Mikhailovsky
Martin, I have been using Samsung 840 Pro for journals about 2 years now and have just replaced all my samsung drives with Intel. We have found a lot of performance issues with 840 Pro (we are using 128mb). In particular, a very strange behaviour with using 4 partitions (with 50%

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Martin B Nielsen
Hi Andrei, If there is one thing I've come to understand by now is that ceph configs, performance, hw and well - everything - seems to vary on almost people basis. I do not recognize that latency issue either, this is from one of our nodes (4x 500GB samsung 840 pro - sd[c-f]) which has been

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Somnath Roy
I would say check with rados tool like ceph_smalliobench/rados bench first to see how much performance these tools are reporting. This will help you to isolate any upstream issues. Also, check with ‘iostat –xk 1’ for the resource utilization. Hope you are running with powerful enough cpu

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
Thanks for the reply Philippe,we were using these disks in our NAS,now it looks like i am in big trouble :-( On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz p...@schwarz-fr.net wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Le 28/02/2015 12:19, mad Engineer a écrit : Hello All, I

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Somnath Roy
Sorry, I saw you have already tried with ‘rados bench’. So, some points here. 1. If you are considering write workload, I think with total of 2 copies and with 4K workload , you should be able to get ~4K iops (considering it hitting the disk, not with memstore). 2. You are having 9 OSDs and if

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Stefan Priebe
Am 28.02.2015 um 19:41 schrieb Kevin Walker: What about the Samsung 845DC Pro SSD's? These have fantastic enterprise performance characteristics. http://www.thessdreview.com/our-reviews/samsung-845dc-pro-review-800gb-class-leading-speed-endurance/ Or use SV843 from Samsung Semiconductor

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Kevin Walker
What about the Samsung 845DC Pro SSD's? These have fantastic enterprise performance characteristics. http://www.thessdreview.com/our-reviews/samsung-845dc-pro-review-800gb-class-leading-speed-endurance/ Kind regards Kevin On 28 February 2015 at 15:32, Philippe Schwarz p...@schwarz-fr.net

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
tried changing scheduler from deadline to noop also upgraded to Gaint and btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
reinstalled ceph packages and now with memstore backend [osd objectstore =memstore] its giving 400Kbps .No idea where the problem is. On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer themadengin...@gmail.com wrote: tried changing scheduler from deadline to noop also upgraded to Gaint and btrfs

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
I am re installing ceph with giant release,will soon update results with above configuration changes. my servers are Cisco UCS C 200 M1 with Integrated Intel ICH10R SATA controller.Before installing ceph i changed it to use Software RAID quoting from below link [When using the integrated RAID,