Re: [ceph-users] KVM/QEMU rbd read latency
On Fri, Feb 17, 2017 at 3:35 PM, Phil Lacroute wrote: > I have a followup question about the debug logging. Is there any way to > dump the in-memory logs from the QEMU RBD client? If not (and I couldn’t > find a way to do this), then nothing is lost by disabling the logging on > client machines. > If you have the admin socket properly configured for client applications (via the "admin socket" config option), you can run "ceph --admin-daemon /path/to/asok log dump". -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] KVM/QEMU rbd read latency
Thanks everyone for the suggestions. Disabling the RBD cache, disabling the debug logging and building qemu with jemalloc each had a significant impact. Performance is up from 25K IOPS to 63K IOPS. Hopefully the ongoing work to reduce the number of buffer copies will yield further improvements. I have a followup question about the debug logging. Is there any way to dump the in-memory logs from the QEMU RBD client? If not (and I couldn’t find a way to do this), then nothing is lost by disabling the logging on client machines. Thanks, Phil > On Feb 16, 2017, at 1:20 PM, Jason Dillaman wrote: > > Few additional suggestions: > > 1) For high IOPS random read workloads, the librbd cache is most likely going > to be a bottleneck and is providing zero benefit. Recommend setting > "cache=none" on your librbd QEMU disk to disable it. > > 2) Disable logging via your ceph.conf. Example settings: > > debug_auth = 0/0 > debug_buffer = 0/0 > debug_context = 0/0 > debug_crypto = 0/0 > debug_finisher = 0/0 > debug_ms = 0/0 > debug_objectcacher = 0/0 > debug_objecter = 0/0 > debug_rados = 0/0 > debug_rbd = 0/0 > debug_striper = 0/0 > debug_tp = 0/0 > > The above two config changes on my small development cluster take my librbd > 4K random reads IOPS from ~9.5K to ~12.5K IOPS (+32%) > > 3) librbd / librados is very heavy with small memory allocations on the IO > path and previous reports have indicated that using jemalloc w/ QEMU shows > large improvements. > > LD_PRELOADing jemalloc within fio using the optimized config takes me from > ~12.5K IOPS to ~13.5K IOPS (+8%). > > > On Thu, Feb 16, 2017 at 3:38 PM, Steve Taylor <mailto:steve.tay...@storagecraft.com>> wrote: > > You might try running fio directly on the host using the rbd ioengine (direct > librbd) and see how that compares. The major difference between that and the > krbd test will be the page cache readahead, which will be present in the krbd > stack but not with the rbd ioengine. I would have expected the guest OS to > normalize that some due to its own page cache in the librbd test, but that > might at least give you some more clues about where to look further. > > > > <https://storagecraft.com/> Steve Taylor | Senior Software > Engineer | StorageCraft Technology Corporation <https://storagecraft.com/> > 380 Data Drive Suite 300 | Draper | Utah | 84020 > Office: 801.871.2799 | > > > > If you are not the intended recipient of this message or received it > erroneously, please notify the sender and delete it, together with any > attachments, and be advised that any dissemination or copying of this message > is prohibited. > > > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com > <mailto:ceph-users-boun...@lists.ceph.com>] On Behalf Of Phil Lacroute > Sent: Thursday, February 16, 2017 11:54 AM > To: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > Subject: [ceph-users] KVM/QEMU rbd read latency > > Hi, > > I am doing some performance characterization experiments for ceph with KVM > guests, and I’m observing significantly higher read latency when using the > QEMU rbd client compared to krbd. Is that expected or have I missed some > tuning knobs to improve this? > > Cluster details: > Note that this cluster was built for evaluation purposes, not production, > hence the choice of small SSDs with low endurance specs. > Client host OS: Debian, 4.7.0 kernel > QEMU version 2.7.0 > Ceph version Jewel 10.2.3 > Client and OSD CPU: Xeon D-1541 2.1 GHz > OSDs: 5 nodes, 3 SSDs each, one journal partition and one data partition per > SSD, XFS data file system (15 OSDs total) > Disks: DC S3510 240GB > Network: 10 GbE, dedicated switch for storage traffic Guest OS: Debian, > virtio drivers > > Performance testing was done with fio on raw disk devices using this config: > ioengine=libaio > iodepth=128 > direct=1 > size=100% > rw=randread > bs=4k > > Case 1: krbd, fio running on the raw rbd device on the client host (no guest) > IOPS: 142k > Average latency: 0.9 msec > > Case 2: krbd, fio running in a guest (libvirt config below) > > > > > > > IOPS: 119k > Average Latency: 1.1 msec > > Case 3: QEMU RBD client, fio running in a guest (libvirt config below) > > > > > > > > > IOPS: 25k > Average Latency: 5.2 msec > > The question is why the test with the QEMU RBD client (case 3) shows 4 msec > of additional latency compared the guest using the krbd-mapped image (cas
Re: [ceph-users] KVM/QEMU rbd read latency
>>We also need to support >1 librbd/librados-internal IO >>thread for outbound/inbound paths. Could be worderfull ! multiple iothread by disk is coming for qemu too. (I have seen Paolo Bonzini sending a lot of patches this month) - Mail original - De: "Jason Dillaman" À: "aderumier" Cc: "Phil Lacroute" , "ceph-users" Envoyé: Vendredi 17 Février 2017 15:16:39 Objet: Re: [ceph-users] KVM/QEMU rbd read latency On Fri, Feb 17, 2017 at 2:14 AM, Alexandre DERUMIER wrote: > and I have good hope than this new feature > "RBD: Add support readv,writev for rbd" > http://marc.info/?l=ceph-devel&m=148726026914033&w=2 Definitely will eliminate 1 unnecessary data copy -- but sadly it still will make a single copy within librbd immediately since librados *might* touch the IO memory after it has ACKed the op. Once that issue is addressed, librbd can eliminate that copy if the librbd cache is disabled. We also need to support >1 librbd/librados-internal IO thread for outbound/inbound paths. -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] KVM/QEMU rbd read latency
On Fri, Feb 17, 2017 at 2:14 AM, Alexandre DERUMIER wrote: > and I have good hope than this new feature > "RBD: Add support readv,writev for rbd" > http://marc.info/?l=ceph-devel&m=148726026914033&w=2 Definitely will eliminate 1 unnecessary data copy -- but sadly it still will make a single copy within librbd immediately since librados *might* touch the IO memory after it has ACKed the op. Once that issue is addressed, librbd can eliminate that copy if the librbd cache is disabled. We also need to support >1 librbd/librados-internal IO thread for outbound/inbound paths. -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] KVM/QEMU rbd read latency
Hi, Currently I can reduce the latency with - compile qemu to use jemalloc - disabling rbd_cache (or qemu cache=none) disabling debug in /etc/ceph.conf on the client node [global] debug asok = 0/0 debug auth = 0/0 debug buffer = 0/0 debug client = 0/0 debug context = 0/0 debug crush = 0/0 debug filer = 0/0 debug filestore = 0/0 debug finisher = 0/0 debug heartbeatmap = 0/0 debug journal = 0/0 debug journaler = 0/0 debug lockdep = 0/0 debug mds = 0/0 debug mds balancer = 0/0 debug mds locker = 0/0 debug mds log = 0/0 debug mds log expire = 0/0 debug mds migrator = 0/0 debug mon = 0/0 debug monc = 0/0 debug ms = 0/0 debug objclass = 0/0 debug objectcacher = 0/0 debug objecter = 0/0 debug optracker = 0/0 debug osd = 0/0 debug paxos = 0/0 debug perfcounter = 0/0 debug rados = 0/0 debug rbd = 0/0 debug rgw = 0/0 debug throttle = 0/0 debug timer = 0/0 debug tp = 0/0 With this, I can reach around 50-60k iops 4k with 1 disk and iothread enable. and I have good hope than this new feature "RBD: Add support readv,writev for rbd" http://marc.info/?l=ceph-devel&m=148726026914033&w=2 will help too, reducing copy (that's why I'm using jemalloc too) - Mail original - De: "Phil Lacroute" À: "ceph-users" Envoyé: Jeudi 16 Février 2017 19:53:47 Objet: [ceph-users] KVM/QEMU rbd read latency Hi, I am doing some performance characterization experiments for ceph with KVM guests, and I’m observing significantly higher read latency when using the QEMU rbd client compared to krbd. Is that expected or have I missed some tuning knobs to improve this? Cluster details: Note that this cluster was built for evaluation purposes, not production, hence the choice of small SSDs with low endurance specs. Client host OS: Debian, 4.7.0 kernel QEMU version 2.7.0 Ceph version Jewel 10.2.3 Client and OSD CPU: Xeon D-1541 2.1 GHz OSDs: 5 nodes, 3 SSDs each, one journal partition and one data partition per SSD, XFS data file system (15 OSDs total) Disks: DC S3510 240GB Network: 10 GbE, dedicated switch for storage traffic Guest OS: Debian, virtio drivers Performance testing was done with fio on raw disk devices using this config: ioengine=libaio iodepth=128 direct=1 size=100% rw=randread bs=4k Case 1: krbd, fio running on the raw rbd device on the client host (no guest) IOPS: 142k Average latency: 0.9 msec Case 2: krbd, fio running in a guest (libvirt config below) IOPS: 119k Average Latency: 1.1 msec Case 3: QEMU RBD client, fio running in a guest (libvirt config below) IOPS: 25k Average Latency: 5.2 msec The question is why the test with the QEMU RBD client (case 3) shows 4 msec of additional latency compared the guest using the krbd-mapped image (case 2). Note that the IOPS bottleneck for all of these cases is the rate at which the client issues requests, which is limited by the average latency and the maximum number of outstanding requests (128). Since the latency is the dominant factor in average read throughput for these small accesses, we would really like to understand the source of the additional latency. Thanks, Phil ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] KVM/QEMU rbd read latency
Few additional suggestions: 1) For high IOPS random read workloads, the librbd cache is most likely going to be a bottleneck and is providing zero benefit. Recommend setting "cache=none" on your librbd QEMU disk to disable it. 2) Disable logging via your ceph.conf. Example settings: debug_auth = 0/0 debug_buffer = 0/0 debug_context = 0/0 debug_crypto = 0/0 debug_finisher = 0/0 debug_ms = 0/0 debug_objectcacher = 0/0 debug_objecter = 0/0 debug_rados = 0/0 debug_rbd = 0/0 debug_striper = 0/0 debug_tp = 0/0 The above two config changes on my small development cluster take my librbd 4K random reads IOPS from ~9.5K to ~12.5K IOPS (+32%) 3) librbd / librados is very heavy with small memory allocations on the IO path and previous reports have indicated that using jemalloc w/ QEMU shows large improvements. LD_PRELOADing jemalloc within fio using the optimized config takes me from ~12.5K IOPS to ~13.5K IOPS (+8%). On Thu, Feb 16, 2017 at 3:38 PM, Steve Taylor wrote: > You might try running fio directly on the host using the rbd ioengine > (direct librbd) and see how that compares. The major difference between > that and the krbd test will be the page cache readahead, which will be > present in the krbd stack but not with the rbd ioengine. I would have > expected the guest OS to normalize that some due to its own page cache in > the librbd test, but that might at least give you some more clues about > where to look further. > > -- > > <https://storagecraft.com> Steve Taylor | Senior Software Engineer | > StorageCraft > Technology Corporation <https://storagecraft.com> > 380 Data Drive Suite 300 | Draper | Utah | 84020 > Office: 801.871.2799 <(801)%20871-2799> | > > -- > > If you are not the intended recipient of this message or received it > erroneously, please notify the sender and delete it, together with any > attachments, and be advised that any dissemination or copying of this > message is prohibited. > > -- > > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Phil Lacroute > Sent: Thursday, February 16, 2017 11:54 AM > To: ceph-users@lists.ceph.com > Subject: [ceph-users] KVM/QEMU rbd read latency > > Hi, > > I am doing some performance characterization experiments for ceph with KVM > guests, and I’m observing significantly higher read latency when using the > QEMU rbd client compared to krbd. Is that expected or have I missed some > tuning knobs to improve this? > > Cluster details: > Note that this cluster was built for evaluation purposes, not production, > hence the choice of small SSDs with low endurance specs. > Client host OS: Debian, 4.7.0 kernel > QEMU version 2.7.0 > Ceph version Jewel 10.2.3 > Client and OSD CPU: Xeon D-1541 2.1 GHz > OSDs: 5 nodes, 3 SSDs each, one journal partition and one data partition > per SSD, XFS data file system (15 OSDs total) > Disks: DC S3510 240GB > Network: 10 GbE, dedicated switch for storage traffic Guest OS: Debian, > virtio drivers > > Performance testing was done with fio on raw disk devices using this > config: > ioengine=libaio > iodepth=128 > direct=1 > size=100% > rw=randread > bs=4k > > Case 1: krbd, fio running on the raw rbd device on the client host (no > guest) > IOPS: 142k > Average latency: 0.9 msec > > Case 2: krbd, fio running in a guest (libvirt config below) > > > > > > > IOPS: 119k > Average Latency: 1.1 msec > > Case 3: QEMU RBD client, fio running in a guest (libvirt config below) > > > > > > > > > IOPS: 25k > Average Latency: 5.2 msec > > The question is why the test with the QEMU RBD client (case 3) shows 4 > msec of additional latency compared the guest using the krbd-mapped image > (case 2). > > Note that the IOPS bottleneck for all of these cases is the rate at which > the client issues requests, which is limited by the average latency and the > maximum number of outstanding requests (128). Since the latency is the > dominant factor in average read throughput for these small accesses, we > would really like to understand the source of the additional latency. > > Thanks, > Phil > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] KVM/QEMU rbd read latency
> Op 16 februari 2017 om 21:38 schreef Steve Taylor > : > > > You might try running fio directly on the host using the rbd ioengine (direct > librbd) and see how that compares. The major difference between that and the > krbd test will be the page cache readahead, which will be present in the krbd > stack but not with the rbd ioengine. I would have expected the guest OS to > normalize that some due to its own page cache in the librbd test, but that > might at least give you some more clues about where to look further. > In addition, you might want to try using VirtIO SCSI as well. Although you are getting good results with VirtIO you might give that a try. The page cache suggestion from Steve is very valid. How much memory is there in the VM in which you are running the tests and how much does the host have? Wido > > > > [cid:imagea0af4f.JPG@e3d04a1e.44ace3d9]<https://storagecraft.com> Steve > Taylor | Senior Software Engineer | StorageCraft Technology > Corporation<https://storagecraft.com> > 380 Data Drive Suite 300 | Draper | Utah | 84020 > Office: 801.871.2799 | > > > > If you are not the intended recipient of this message or received it > erroneously, please notify the sender and delete it, together with any > attachments, and be advised that any dissemination or copying of this message > is prohibited. > > > > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Phil > Lacroute > Sent: Thursday, February 16, 2017 11:54 AM > To: ceph-users@lists.ceph.com > Subject: [ceph-users] KVM/QEMU rbd read latency > > Hi, > > I am doing some performance characterization experiments for ceph with KVM > guests, and I’m observing significantly higher read latency when using the > QEMU rbd client compared to krbd. Is that expected or have I missed some > tuning knobs to improve this? > > Cluster details: > Note that this cluster was built for evaluation purposes, not production, > hence the choice of small SSDs with low endurance specs. > Client host OS: Debian, 4.7.0 kernel > QEMU version 2.7.0 > Ceph version Jewel 10.2.3 > Client and OSD CPU: Xeon D-1541 2.1 GHz > OSDs: 5 nodes, 3 SSDs each, one journal partition and one data partition per > SSD, XFS data file system (15 OSDs total) > Disks: DC S3510 240GB > Network: 10 GbE, dedicated switch for storage traffic Guest OS: Debian, > virtio drivers > > Performance testing was done with fio on raw disk devices using this config: > ioengine=libaio > iodepth=128 > direct=1 > size=100% > rw=randread > bs=4k > > Case 1: krbd, fio running on the raw rbd device on the client host (no guest) > IOPS: 142k > Average latency: 0.9 msec > > Case 2: krbd, fio running in a guest (libvirt config below) > > > > > > > IOPS: 119k > Average Latency: 1.1 msec > > Case 3: QEMU RBD client, fio running in a guest (libvirt config below) > > > > > > > > > IOPS: 25k > Average Latency: 5.2 msec > > The question is why the test with the QEMU RBD client (case 3) shows 4 msec > of additional latency compared the guest using the krbd-mapped image (case 2). > > Note that the IOPS bottleneck for all of these cases is the rate at which the > client issues requests, which is limited by the average latency and the > maximum number of outstanding requests (128). Since the latency is the > dominant factor in average read throughput for these small accesses, we would > really like to understand the source of the additional latency. > > Thanks, > Phil > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] KVM/QEMU rbd read latency
You might try running fio directly on the host using the rbd ioengine (direct librbd) and see how that compares. The major difference between that and the krbd test will be the page cache readahead, which will be present in the krbd stack but not with the rbd ioengine. I would have expected the guest OS to normalize that some due to its own page cache in the librbd test, but that might at least give you some more clues about where to look further. [cid:imagea0af4f.JPG@e3d04a1e.44ace3d9]<https://storagecraft.com> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2799 | If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Phil Lacroute Sent: Thursday, February 16, 2017 11:54 AM To: ceph-users@lists.ceph.com Subject: [ceph-users] KVM/QEMU rbd read latency Hi, I am doing some performance characterization experiments for ceph with KVM guests, and I’m observing significantly higher read latency when using the QEMU rbd client compared to krbd. Is that expected or have I missed some tuning knobs to improve this? Cluster details: Note that this cluster was built for evaluation purposes, not production, hence the choice of small SSDs with low endurance specs. Client host OS: Debian, 4.7.0 kernel QEMU version 2.7.0 Ceph version Jewel 10.2.3 Client and OSD CPU: Xeon D-1541 2.1 GHz OSDs: 5 nodes, 3 SSDs each, one journal partition and one data partition per SSD, XFS data file system (15 OSDs total) Disks: DC S3510 240GB Network: 10 GbE, dedicated switch for storage traffic Guest OS: Debian, virtio drivers Performance testing was done with fio on raw disk devices using this config: ioengine=libaio iodepth=128 direct=1 size=100% rw=randread bs=4k Case 1: krbd, fio running on the raw rbd device on the client host (no guest) IOPS: 142k Average latency: 0.9 msec Case 2: krbd, fio running in a guest (libvirt config below) IOPS: 119k Average Latency: 1.1 msec Case 3: QEMU RBD client, fio running in a guest (libvirt config below) IOPS: 25k Average Latency: 5.2 msec The question is why the test with the QEMU RBD client (case 3) shows 4 msec of additional latency compared the guest using the krbd-mapped image (case 2). Note that the IOPS bottleneck for all of these cases is the rate at which the client issues requests, which is limited by the average latency and the maximum number of outstanding requests (128). Since the latency is the dominant factor in average read throughput for these small accesses, we would really like to understand the source of the additional latency. Thanks, Phil ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] KVM/QEMU rbd read latency
Hi, I am doing some performance characterization experiments for ceph with KVM guests, and I’m observing significantly higher read latency when using the QEMU rbd client compared to krbd. Is that expected or have I missed some tuning knobs to improve this? Cluster details: Note that this cluster was built for evaluation purposes, not production, hence the choice of small SSDs with low endurance specs. Client host OS: Debian, 4.7.0 kernel QEMU version 2.7.0 Ceph version Jewel 10.2.3 Client and OSD CPU: Xeon D-1541 2.1 GHz OSDs: 5 nodes, 3 SSDs each, one journal partition and one data partition per SSD, XFS data file system (15 OSDs total) Disks: DC S3510 240GB Network: 10 GbE, dedicated switch for storage traffic Guest OS: Debian, virtio drivers Performance testing was done with fio on raw disk devices using this config: ioengine=libaio iodepth=128 direct=1 size=100% rw=randread bs=4k Case 1: krbd, fio running on the raw rbd device on the client host (no guest) IOPS: 142k Average latency: 0.9 msec Case 2: krbd, fio running in a guest (libvirt config below) IOPS: 119k Average Latency: 1.1 msec Case 3: QEMU RBD client, fio running in a guest (libvirt config below) IOPS: 25k Average Latency: 5.2 msec The question is why the test with the QEMU RBD client (case 3) shows 4 msec of additional latency compared the guest using the krbd-mapped image (case 2). Note that the IOPS bottleneck for all of these cases is the rate at which the client issues requests, which is limited by the average latency and the maximum number of outstanding requests (128). Since the latency is the dominant factor in average read throughput for these small accesses, we would really like to understand the source of the additional latency. Thanks, Phil smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com