Re: RBD fio Performance concerns

2012-11-24 Thread Gregory Farnum
...@vger.kernel.org] On Behalf Of Sébastien Han Sent: 2012年11月22日 5:47 To: Mark Nelson Cc: Alexandre DERUMIER; ceph-devel; Mark Kampe Subject: Re: RBD fio Performance concerns Hi Mark, Well the most concerning thing is that I have 2 Ceph clusters and both of them show better rand than seq

Re: RBD fio Performance concerns

2012-11-23 Thread Stefan Priebe - Profihost AG
Hi, when i switch the journal to the OSD Disk seperate partiton on each disk (/dev/sdX1 for journal 1GB and /dev/sdX2 for OSD) i go down from 23.000 iops to 200 iops random 4k. Greets, Stefan Am 22.11.2012 13:50, schrieb Sébastien Han: journal is running on tmpfs to me but that changes

Re: RBD fio Performance concerns

2012-11-23 Thread Alexandre DERUMIER
...@inktank.com Envoyé: Vendredi 23 Novembre 2012 11:31:15 Objet: Re: RBD fio Performance concerns Hi, when i switch the journal to the OSD Disk seperate partiton on each disk (/dev/sdX1 for journal 1GB and /dev/sdX2 for OSD) i go down from 23.000 iops to 200 iops random 4k. Greets, Stefan Am

Re: RBD fio Performance concerns

2012-11-23 Thread Stefan Priebe - Profihost AG
Am 23.11.2012 11:47, schrieb Alexandre DERUMIER: when i switch the journal to the OSD Disk seperate partiton on each disk (/dev/sdX1 for journal 1GB and /dev/sdX2 for OSD) i go down from 23.000 iops to 200 iops random 4k. O_o , that's seem crazy... Are you sure that your partitions are

Re: RBD fio Performance concerns

2012-11-23 Thread Alexandre DERUMIER
@vger.kernel.org, Mark Kampe mark.ka...@inktank.com, Sébastien Han han.sebast...@gmail.com Envoyé: Vendredi 23 Novembre 2012 11:49:10 Objet: Re: RBD fio Performance concerns Am 23.11.2012 11:47, schrieb Alexandre DERUMIER: when i switch the journal to the OSD Disk seperate partiton on each disk (/dev

Re: RBD fio Performance concerns

2012-11-23 Thread Stefan Priebe - Profihost AG
Am 23.11.2012 12:03, schrieb Alexandre DERUMIER: so correcly aligned... Maybe try to use journal directly on the full partition, without xfs ? The same - just 200 iops for rand 4k. Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to

Re: RBD fio Performance concerns

2012-11-23 Thread Mark Nelson
:49:10 Objet: Re: RBD fio Performance concerns Am 23.11.2012 11:47, schrieb Alexandre DERUMIER: when i switch the journal to the OSD Disk seperate partiton on each disk (/dev/sdX1 for journal 1GB and /dev/sdX2 for OSD) i go down from 23.000 iops to 200 iops random 4k. O_o , that's seem crazy

Re: RBD fio Performance concerns

2012-11-23 Thread Stefan Priebe - Profihost AG
...@profihost.ag À: Alexandre DERUMIER aderum...@odiso.com Cc: Mark Nelson mark.nel...@inktank.com, ceph-devel ceph-devel@vger.kernel.org, Mark Kampe mark.ka...@inktank.com, Sébastien Han han.sebast...@gmail.com Envoyé: Vendredi 23 Novembre 2012 11:49:10 Objet: Re: RBD fio Performance concerns Am

Re: RBD fio Performance concerns

2012-11-23 Thread Alexandre DERUMIER
Kampe mark.ka...@inktank.com, Sébastien Han han.sebast...@gmail.com Envoyé: Vendredi 23 Novembre 2012 14:24:26 Objet: Re: RBD fio Performance concerns Am 23.11.2012 14:18, schrieb Mark Nelson: Agreed with Alexandre, try putting the journal on a raw partition. That's pretty insane! What

Re: RBD fio Performance concerns

2012-11-23 Thread Stefan Priebe - Profihost AG
, ceph-devel ceph-devel@vger.kernel.org, Mark Kampe mark.ka...@inktank.com, Sébastien Han han.sebast...@gmail.com Envoyé: Vendredi 23 Novembre 2012 11:49:10 Objet: Re: RBD fio Performance concerns Am 23.11.2012 11:47, schrieb Alexandre DERUMIER: when i switch the journal to the OSD Disk seperate

RE: RBD fio Performance concerns

2012-11-23 Thread Chen, Xiaoxi
-ow...@vger.kernel.org] On Behalf Of Sébastien Han Sent: 2012年11月22日 5:47 To: Mark Nelson Cc: Alexandre DERUMIER; ceph-devel; Mark Kampe Subject: Re: RBD fio Performance concerns Hi Mark, Well the most concerning thing is that I have 2 Ceph clusters and both of them show better rand than seq... I

Re: RBD fio Performance concerns

2012-11-23 Thread Stefan Priebe - Profihost AG
Priebe - Profihost AG s.pri...@profihost.ag À: Alexandre DERUMIER aderum...@odiso.com Cc: Mark Nelson mark.nel...@inktank.com, ceph-devel ceph-devel@vger.kernel.org, Mark Kampe mark.ka...@inktank.com, Sébastien Han han.sebast...@gmail.com Envoyé: Vendredi 23 Novembre 2012 11:49:10 Objet: Re: RBD fio

Re: RBD fio Performance concerns

2012-11-22 Thread Sébastien Han
Hum sorry, you're right. Forget about what I said :) On Thu, Nov 22, 2012 at 4:54 PM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: I thought the Client would then write to the 2nd is this wrong? Stefan Am 22.11.2012 um 16:49 schrieb Sébastien Han han.sebast...@gmail.com: But

Re: RBD fio Performance concerns

2012-11-22 Thread Alexandre DERUMIER
Han han.sebast...@gmail.com Cc: Mark Nelson mark.nel...@inktank.com, Alexandre DERUMIER aderum...@odiso.com, ceph-devel ceph-devel@vger.kernel.org, Mark Kampe mark.ka...@inktank.com Envoyé: Jeudi 22 Novembre 2012 14:29:03 Objet: Re: RBD fio Performance concerns Am 22.11.2012 14:22, schrieb

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
I thought the Client would then write to the 2nd is this wrong? Stefan Am 22.11.2012 um 16:49 schrieb Sébastien Han han.sebast...@gmail.com: But who cares? it's also on the 2nd node. or even on the 3rd if you have replicas 3. Yes but you could also suffer a crash while writing the first

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
Am 22.11.2012 13:50, schrieb Sébastien Han: journal is running on tmpfs to me but that changes nothing. I don't think it works then. According to the doc: Enables using libaio for asynchronous writes to the journal. Requires journal dio set to true. Ah might be but as the SSDs are pretty

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
mark.ka...@inktank.com Envoyé: Jeudi 22 Novembre 2012 14:29:03 Objet: Re: RBD fio Performance concerns Am 22.11.2012 14:22, schrieb Sébastien Han: And RAMDISK devices are too expensive. It would make sense in your infra, but yes they are really expensive. We need something like tmpfs - running

Re: RBD fio Performance concerns

2012-11-22 Thread Mark Nelson
...@gmail.com À: Mark Kampe mark.ka...@inktank.com mailto:mark.ka...@inktank.com Cc: Alexandre DERUMIER aderum...@odiso.com mailto:aderum...@odiso.com, ceph-devel ceph-devel@vger.kernel.org mailto:ceph-devel@vger.kernel.org Envoyé: Lundi 19 Novembre 2012 19:03:40 Objet: Re: RBD fio Performance

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
Otherwise you would have the same problem with the disk crashes Am 22.11.2012 um 16:55 schrieb Sébastien Han han.sebast...@gmail.com: Hum sorry, you're right. Forget about what I said :) On Thu, Nov 22, 2012 at 4:54 PM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: I thought

Re: RBD fio Performance concerns

2012-11-22 Thread Mark Nelson
14:29:03 Objet: Re: RBD fio Performance concerns Am 22.11.2012 14:22, schrieb Sébastien Han: And RAMDISK devices are too expensive. It would make sense in your infra, but yes they are really expensive. We need something like tmpfs - running in local memory but support dio. Stefan -- Mark

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
Am 22.11.2012 14:22, schrieb Sébastien Han: And RAMDISK devices are too expensive. It would make sense in your infra, but yes they are really expensive. We need something like tmpfs - running in local memory but support dio. Stefan -- To unsubscribe from this list: send the line

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
@vger.kernel.org, Mark Kampe mark.ka...@inktank.com, Sébastien Han han.sebast...@gmail.com Envoyé: Jeudi 22 Novembre 2012 15:42:14 Objet: Re: RBD fio Performance concerns Am 22.11.2012 15:37, schrieb Mark Nelson: I don't think we recommend tmpfs at all for anything other than playing around. :) I

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
s.pri...@profihost.ag À: Sébastien Han han.sebast...@gmail.com Cc: Mark Nelson mark.nel...@inktank.com, Alexandre DERUMIER aderum...@odiso.com, ceph-devel ceph-devel@vger.kernel.org, Mark Kampe mark.ka...@inktank.com Envoyé: Jeudi 22 Novembre 2012 14:29:03 Objet: Re: RBD fio Performance concerns Am

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
Am 22.11.2012 11:49, schrieb Sébastien Han: @Alexandre: cool! @ Stefan: Full SSD cluster and 10G switches? Yes Couple of weeks ago I saw that you use journal aio, did you notice performance improvement with it? journal is running on tmpfs to me but that changes nothing. Stefan -- To

Re: RBD fio Performance concerns

2012-11-22 Thread Alexandre DERUMIER
À: Mark Nelson mark.nel...@inktank.com Cc: Alexandre DERUMIER aderum...@odiso.com, ceph-devel ceph-devel@vger.kernel.org, Mark Kampe mark.ka...@inktank.com, Sébastien Han han.sebast...@gmail.com Envoyé: Jeudi 22 Novembre 2012 15:42:14 Objet: Re: RBD fio Performance concerns Am 22.11.2012 15

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
: Lundi 19 Novembre 2012 19:03:40 Objet: Re: RBD fio Performance concerns @Sage, thanks for the info :) @Mark: If you want to do sequential I/O, you should do it buffered (so that the writes can be aggregated) or with a 4M block size (very efficient and avoiding object serialization). The original

Re: RBD fio Performance concerns

2012-11-22 Thread Alexandre DERUMIER
...@inktank.com Envoyé: Mercredi 21 Novembre 2012 22:47:08 Objet: Re: RBD fio Performance concerns Hi Mark, Well the most concerning thing is that I have 2 Ceph clusters and both of them show better rand than seq... I don't have enough background to argue on your assomptions but I could try

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
Objet: Re: RBD fio Performance concerns Am 22.11.2012 15:46, schrieb Mark Nelson: I haven't played a whole lot with SSD only OSDs yet (other than noting last summer that iop performance wasn't as high as I wanted it). Is a second partition on the SSD for the journal not an option for you

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
...@inktank.com, Alexandre DERUMIER aderum...@odiso.com, ceph-devel ceph-devel@vger.kernel.org, Mark Kampe mark.ka...@inktank.com Envoyé: Jeudi 22 Novembre 2012 14:29:03 Objet: Re: RBD fio Performance concerns Am 22.11.2012 14:22, schrieb Sébastien Han: And RAMDISK devices are too expensive

Re: RBD fio Performance concerns

2012-11-22 Thread Mark Kampe
Sequential is faster than random on a disk, but we are not doing I/O to a disk, but a distributed storage cluster: small random operations are striped over multiple objects and servers, and so can proceed in parallel and take advantage of more nodes and disks. This parallelism can

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
...@gmail.com, Mark Nelson mark.nel...@inktank.com Envoyé: Jeudi 22 Novembre 2012 16:28:57 Objet: Re: RBD fio Performance concerns Am 22.11.2012 16:26, schrieb Alexandre DERUMIER: Haven't tested that. But does this makes sense? I mean data goes to Disk journal - same disk then has to copy

Re: RBD fio Performance concerns

2012-11-22 Thread Sébastien Han
original - De: Sébastien Han han.sebast...@gmail.com À: Mark Kampe mark.ka...@inktank.com Cc: Alexandre DERUMIER aderum...@odiso.com, ceph-devel ceph-devel@vger.kernel.org Envoyé: Lundi 19 Novembre 2012 19:03:40 Objet: Re: RBD fio Performance concerns @Sage, thanks for the info :) @Mark

Re: RBD fio Performance concerns

2012-11-22 Thread Mark Nelson
: Jeudi 22 Novembre 2012 14:29:03 Objet: Re: RBD fio Performance concerns Am 22.11.2012 14:22, schrieb Sébastien Han: And RAMDISK devices are too expensive. It would make sense in your infra, but yes they are really expensive. We need something like tmpfs - running in local memory but support dio

Re: RBD fio Performance concerns

2012-11-22 Thread Alexandre DERUMIER
: Alexandre DERUMIER aderum...@odiso.com Cc: ceph-devel ceph-devel@vger.kernel.org, Mark Kampe mark.ka...@inktank.com, Sébastien Han han.sebast...@gmail.com, Mark Nelson mark.nel...@inktank.com Envoyé: Jeudi 22 Novembre 2012 16:28:57 Objet: Re: RBD fio Performance concerns Am 22.11.2012 16:26

Re: RBD fio Performance concerns

2012-11-22 Thread Alexandre DERUMIER
...@odiso.com, ceph-devel ceph-devel@vger.kernel.org, Mark Kampe mark.ka...@inktank.com, Sébastien Han han.sebast...@gmail.com Envoyé: Jeudi 22 Novembre 2012 16:01:56 Objet: Re: RBD fio Performance concerns Am 22.11.2012 15:46, schrieb Mark Nelson: I haven't played a whole lot with SSD only OSDs yet

Re: RBD fio Performance concerns

2012-11-22 Thread Sébastien Han
But who cares? it's also on the 2nd node. or even on the 3rd if you have replicas 3. Yes but you could also suffer a crash while writing the first replica. If the journal is in tmpfs, there is nothing to replay. On Thu, Nov 22, 2012 at 4:35 PM, Alexandre DERUMIER aderum...@odiso.com wrote:

Re: RBD fio Performance concerns

2012-11-22 Thread Sébastien Han
journal is running on tmpfs to me but that changes nothing. I don't think it works then. According to the doc: Enables using libaio for asynchronous writes to the journal. Requires journal dio set to true. On Thu, Nov 22, 2012 at 12:48 PM, Stefan Priebe - Profihost AG s.pri...@profihost.ag

Re: RBD fio Performance concerns

2012-11-21 Thread Mark Nelson
: RBD fio Performance concerns @Sage, thanks for the info :) @Mark: If you want to do sequential I/O, you should do it buffered (so that the writes can be aggregated) or with a 4M block size (very efficient and avoiding object serialization). The original benchmark has been performed with 4M

Re: RBD fio Performance concerns

2012-11-21 Thread Mark Nelson
Kampe mark.ka...@inktank.com Cc: Alexandre DERUMIER aderum...@odiso.com, ceph-devel ceph-devel@vger.kernel.org Envoyé: Lundi 19 Novembre 2012 19:03:40 Objet: Re: RBD fio Performance concerns @Sage, thanks for the info :) @Mark: If you want to do sequential I/O, you should do it buffered (so

Re: RBD fio Performance concerns

2012-11-20 Thread Sébastien Han
: Sébastien Han han.sebast...@gmail.com À: Alexandre DERUMIER aderum...@odiso.com Cc: ceph-devel ceph-devel@vger.kernel.org, Mark Kampe mark.ka...@inktank.com Envoyé: Lundi 19 Novembre 2012 21:57:59 Objet: Re: RBD fio Performance concerns Which iodepth did you use for those benchs? I really

Re: RBD fio Performance concerns

2012-11-19 Thread Sébastien Han
Hello Mark, First of all, thank you again for another accurate answer :-). I would have expected write aggregation and cylinder affinity to have eliminated some seeks and improved rotational latency resulting in better than theoretical random write throughput. Against those expectations

Re: RBD fio Performance concerns

2012-11-19 Thread Sébastien Han
mark.ka...@inktank.com Cc: ceph-devel ceph-devel@vger.kernel.org Envoyé: Lundi 19 Novembre 2012 15:56:35 Objet: Re: RBD fio Performance concerns Hello Mark, First of all, thank you again for another accurate answer :-). I would have expected write aggregation and cylinder affinity to have

Re: RBD fio Performance concerns

2012-11-19 Thread Sage Weil
-devel@vger.kernel.org Envoy?: Lundi 19 Novembre 2012 15:56:35 Objet: Re: RBD fio Performance concerns Hello Mark, First of all, thank you again for another accurate answer :-). I would have expected write aggregation and cylinder affinity to have eliminated some seeks and improved

Re: RBD fio Performance concerns

2012-11-19 Thread Mark Kampe
Recall: 1. RBD volumes are striped (4M wide) across RADOS objects 2. distinct writes to a single RADOS object are serialized Your sequential 4K writes are direct, depth=256, so there are (at all times) 256 writes queued to the same object. All of your writes are waiting through a very

Re: RBD fio Performance concerns

2012-11-19 Thread Sébastien Han
@Sage, thanks for the info :) @Mark: If you want to do sequential I/O, you should do it buffered (so that the writes can be aggregated) or with a 4M block size (very efficient and avoiding object serialization). The original benchmark has been performed with 4M block size. And as you can see

Re: RBD fio Performance concerns

2012-11-19 Thread Sébastien Han
during read bench) - Mail original - De: Sébastien Han han.sebast...@gmail.com À: Mark Kampe mark.ka...@inktank.com Cc: Alexandre DERUMIER aderum...@odiso.com, ceph-devel ceph-devel@vger.kernel.org Envoyé: Lundi 19 Novembre 2012 19:03:40 Objet: Re: RBD fio Performance concerns @Sage

Re: RBD fio Performance concerns

2012-11-19 Thread Sébastien Han
Hello Mark, See below my benchmarks results: -RADOS Bench with 4M block size write: # rados -p bench bench 300 write -t 32 --no-cleanup Maintaining 32 concurrent writes of 4194304 bytes for at least 300 seconds. 2012-11-19 21:35:01.722143min lat: 0.255396 max lat: 8.40212 avg lat: 1.14076

Re: RBD fio Performance concerns

2012-11-19 Thread Alexandre DERUMIER
Objet: Re: RBD fio Performance concerns Which iodepth did you use for those benchs? I really don't understand why I can't get more rand read iops with 4K block ... Me neither, hope to get some clarification from the Inktank guys. It doesn't make any sense to me... -- Bien cordialement

Re: RBD fio Performance concerns

2012-11-16 Thread Mark Kampe
On 11/15/2012 12:23 PM, Sébastien Han wrote: First of all, I would like to thank you for this well explained, structured and clear answer. I guess I got better IOPS thanks to the 10K disks. 10K RPM would bring your per-drive throughput (for 4K random writes) up to 142 IOPS and your aggregate