Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-30 Thread pushpesh sharma
Just an update, there seems to be no proper way to pass iothread
parameter from openstack-nova (not at least in Juno release). So a
default single iothread per VM is what all we have. So in conclusion a
nova instance max iops on ceph rbd will be limited to 30-40K.

On Tue, Jun 16, 2015 at 10:08 PM, Alexandre DERUMIER
 wrote:
> Hi,
>
> some news about qemu with tcmalloc vs jemmaloc.
>
> I'm testing with multiple disks (with iothreads) in 1 qemu guest.
>
> And if tcmalloc is a little faster than jemmaloc,
>
> I have hit a lot of time the tcmalloc::ThreadCache::ReleaseToCentralCache bug.
>
> increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, don't help.
>
>
> with multiple disk, I'm around 200k iops with tcmalloc (before hitting the 
> bug) and 350kiops with jemmaloc.
>
> The problem is that when I hit malloc bug, I'm around 4000-1 iops, and 
> only way to fix is is to restart qemu ...
>
>
>
> - Mail original -
> De: "pushpesh sharma" 
> À: "aderumier" 
> Cc: "Somnath Roy" , "Irek Fasikhov" 
> , "ceph-devel" , "ceph-users" 
> 
> Envoyé: Vendredi 12 Juin 2015 08:58:21
> Objet: Re: rbd_cache, limiting read on high iops around 40k
>
> Thanks, posted the question in openstack list. Hopefully will get some
> expert opinion.
>
> On Fri, Jun 12, 2015 at 11:33 AM, Alexandre DERUMIER
>  wrote:
>> Hi,
>>
>> here a libvirt xml sample from libvirt src
>>
>> (you need to define  number, then assign then in disks).
>>
>> I don't use openstack, so I really don't known how it's working with it.
>>
>>
>> 
>> QEMUGuest1
>> c7a5fdbd-edaf-9455-926a-d65c16db1809
>> 219136
>> 219136
>> 2
>> 2
>> 
>> hvm
>> 
>> 
>> 
>> destroy
>> restart
>> destroy
>> 
>> /usr/bin/qemu
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>
>>
>> - Mail original -
>> De: "pushpesh sharma" 
>> À: "aderumier" 
>> Cc: "Somnath Roy" , "Irek Fasikhov" 
>> , "ceph-devel" , "ceph-users" 
>> 
>> Envoyé: Vendredi 12 Juin 2015 07:52:41
>> Objet: Re: rbd_cache, limiting read on high iops around 40k
>>
>> Hi Alexandre,
>>
>> I agree with your rational, of one iothread per disk. CPU consumed in
>> IOwait is pretty high in each VM. But I am not finding a way to set
>> the same on a nova instance. I am using openstack Juno with QEMU+KVM.
>> As per libvirt documentation for setting iothreads, I can edit
>> domain.xml directly and achieve the same effect. However in as in
>> openstack env domain xml is created by nova with some additional
>> metadata, so editing the domain xml using 'virsh edit' does not seems
>> to work(I agree, it is not a very cloud way of doing things, but a
>> hack). Changes made there vanish after saving them, due to reason
>> libvirt validation fails on the same.
>>
>> #virsh dumpxml instance-00c5 > vm.xml
>> #virt-xml-validate vm.xml
>> Relax-NG validity error : Extra element cpu in interleave
>> vm.xml:1: element domain: Relax-NG validity error : Element domain
>> failed to validate content
>> vm.xml fails to validate
>>
>> Second approach I took was to setting QoS in volumes types. But there
>> is no option to set iothreads per volume, there are parameter realted
>> to max_read/wrirte ops/bytes.
>>
>> Thirdly, editing Nova flavor and proving extra specs like
>> hw:cpu_socket/thread/core, can change guest CPU topology however again
>> no way to set iothread. It does accept hw_disk_iothreads(no type check
>> in place, i believe ), but can not pass the same in domain.xml.
>>
>> Could you suggest me a way to set the same.
>>
>> -Pushpesh
>>
>> On Wed, Jun 10, 2015 at 12:59 PM, Alexandre DERUMIER
>>  wrote:
>>>>>I need to try out the performance on qemu soon and may come back to you if 
>>>>>I need some qemu setting trick :-)
>>>
>>> Sure no problem.
>>>
>>> (BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks with 1 
>>> iothread by disk)
>>>
>>>
>>> - Mail original -
>>> De: "Somnath Roy" 
>>> À: "aderumier" , "Irek Fasikhov" 
>>> Cc: "ceph-devel" , "pushpesh sharma" 
>>> , "ceph-users" 
&g

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-22 Thread Alexandre DERUMIER
lted 

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
to max_read/wrirte ops/bytes. 

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN


BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
Thirdly, editing Nova flavor and proving extra specs like 

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
hw:cpu_socket/thread/core, can change guest CPU topology however again 

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
no way to set iothread. It does accept hw_disk_iothreads(no type check 

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
in place, i believe ), but can not pass the same in domain.xml. 

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN


BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
Could you suggest me a way to set the same. 

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN


BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
-Pushpesh 

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN


BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
On Wed, Jun 10, 2015 at 12:59 PM, Alexandre DERUMIER 

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
< aderum...@odiso.com > wrote: 

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
I need to try out the performance on qemu soon and may come back to you if I 
need some qemu setting trick :-) 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN


BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
Sure no problem. 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN


BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
(BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks with 1 
iothread by disk) 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN


BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN


BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
- Mail original - 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
De: "Somnath Roy" < somnath@sandisk.com > 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
À: "aderumier" < aderum...@odiso.com >, "Irek Fasikhov" < malm...@gmail.com > 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
Cc: "ceph-devel" < ceph-de...@vger.kernel.org >, "pushpesh sharma" < 
pushpesh@gmail.com >, "ceph-users" < ceph-users@lists.ceph.com > 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
Envoyé: Mercredi 10 Juin 2015 09:06:32 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
Objet: RE: rbd_cache, limiting read on high iops around 40k 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN


BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
Hi Alexandre, 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
Thanks for sharing the data. 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
I need to try out the performance on qemu soon and may come back to you if I 
need some qemu setting trick :-) 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN


BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
Regards 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
Somnath 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN


BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
-Original Message- 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
From: ceph-users [mailto: ceph-users-boun...@lists.ceph.com ] On Behalf Of 
Alexandre DERUMIER 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
Sent: Tuesday, June 09, 2015 10:42 PM 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
To: Irek Fasikhov 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
Cc: ceph-devel; pushpesh sharma; ceph-users 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
Subject: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN


BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
Very good work! 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
Do you have a rpm-file? 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
Thanks. 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN

BQ_BEGIN
no sorry, I'm have compiled it manually (and I'm using debian jessie as client) 

BQ_END

BQ_END

BQ_END

BQ_END

BQ_BEGIN

BQ_BEGIN

BQ_BEG

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-22 Thread Alexandre DERUMIER
iops with jemmaloc. 
>> 
>> The problem is that when I hit malloc bug, I'm around 4000-1 iops, and 
>> only way to fix is is to restart qemu ... 
>> 
>> 
>> 
>> - Mail original - 
>> De: "pushpesh sharma" < pushpesh@gmail.com > 
>> À: "aderumier" < aderum...@odiso.com > 
>> Cc: "Somnath Roy" < somnath@sandisk.com >, "Irek Fasikhov" < 
>> malm...@gmail.com >, "ceph-devel" < ceph-de...@vger.kernel.org >, 
>> "ceph-users" < ceph-users@lists.ceph.com > 
>> Envoyé: Vendredi 12 Juin 2015 08:58:21 
>> Objet: Re: rbd_cache, limiting read on high iops around 40k 
>> 
>> Thanks, posted the question in openstack list. Hopefully will get some 
>> expert opinion. 
>> 
>> On Fri, Jun 12, 2015 at 11:33 AM, Alexandre DERUMIER 
>> < aderum...@odiso.com > wrote: 
>>> Hi, 
>>> 
>>> here a libvirt xml sample from libvirt src 
>>> 
>>> (you need to define  number, then assign then in disks). 
>>> 
>>> I don't use openstack, so I really don't known how it's working with it. 
>>> 
>>> 
>>>  
>>> QEMUGuest1 
>>> c7a5fdbd-edaf-9455-926a-d65c16db1809 
>>> 219136 
>>> 219136 
>>> 2 
>>> 2 
>>>  
>>> hvm 
>>>  
>>>  
>>>  
>>> destroy 
>>> restart 
>>> destroy 
>>>  
>>> /usr/bin/qemu 
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>> 
>>> 
>>> - Mail original - 
>>> De: "pushpesh sharma" < pushpesh@gmail.com > 
>>> À: "aderumier" < aderum...@odiso.com > 
>>> Cc: "Somnath Roy" < somnath@sandisk.com >, "Irek Fasikhov" < 
>>> malm...@gmail.com >, "ceph-devel" < ceph-de...@vger.kernel.org >, 
>>> "ceph-users" < ceph-users@lists.ceph.com > 
>>> Envoyé: Vendredi 12 Juin 2015 07:52:41 
>>> Objet: Re: rbd_cache, limiting read on high iops around 40k 
>>> 
>>> Hi Alexandre, 
>>> 
>>> I agree with your rational, of one iothread per disk. CPU consumed in 
>>> IOwait is pretty high in each VM. But I am not finding a way to set 
>>> the same on a nova instance. I am using openstack Juno with QEMU+KVM. 
>>> As per libvirt documentation for setting iothreads, I can edit 
>>> domain.xml directly and achieve the same effect. However in as in 
>>> openstack env domain xml is created by nova with some additional 
>>> metadata, so editing the domain xml using 'virsh edit' does not seems 
>>> to work(I agree, it is not a very cloud way of doing things, but a 
>>> hack). Changes made there vanish after saving them, due to reason 
>>> libvirt validation fails on the same. 
>>> 
>>> #virsh dumpxml instance-00c5 > vm.xml 
>>> #virt-xml-validate vm.xml 
>>> Relax-NG validity error : Extra element cpu in interleave 
>>> vm.xml:1: element domain: Relax-NG validity error : Element domain 
>>> failed to validate content 
>>> vm.xml fails to validate 
>>> 
>>> Second approach I took was to setting QoS in volumes types. But there 
>>> is no option to set iothreads per volume, there are parameter realted 
>>> to max_read/wrirte ops/bytes. 
>>> 
>>> Thirdly, editing Nova flavor and proving extra specs like 
>>> hw:cpu_socket/thread/core, can change guest CPU topology however again 
>>> no way to set iothread. It does accept hw_disk_iothreads(no type check 
>>> in place, i believe ), but can not pass the same in domain.xml. 
>>> 
>>> Could you suggest me a way to set the same. 
>>> 
>>> -Pushpesh 
>>> 
>>> On Wed, Jun 10, 2015 at 12:59 PM, Alexandre DERUMIER 
>>> < aderum...@odiso.com > wrote: 
>>>>>> I need to try out the performance on qemu soon and may come back to you 
>>>>>> if I need some qemu setting trick :-) 
>>>> 
>>>> Sure no problem. 
>>>> 
>>>> (BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks with 1 
>>>> iothread by disk) 
>>>> 
>>>> 
>>>> - Mail original - 
>

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-22 Thread Irek Fasikhov
 >, "ceph-devel" < ceph-de...@vger.kernel.org >,
> "ceph-users" < ceph-users@lists.ceph.com >
> >> Envoyé: Vendredi 12 Juin 2015 08:58:21
> >> Objet: Re: rbd_cache, limiting read on high iops around 40k
> >>
> >> Thanks, posted the question in openstack list. Hopefully will get some
> >> expert opinion.
> >>
> >> On Fri, Jun 12, 2015 at 11:33 AM, Alexandre DERUMIER
> >> < aderum...@odiso.com > wrote:
> >>> Hi,
> >>>
> >>> here a libvirt xml sample from libvirt src
> >>>
> >>> (you need to define  number, then assign then in disks).
> >>>
> >>> I don't use openstack, so I really don't known how it's working with
> it.
> >>>
> >>>
> >>> 
> >>> QEMUGuest1
> >>> c7a5fdbd-edaf-9455-926a-d65c16db1809
> >>> 219136
> >>> 219136
> >>> 2
> >>> 2
> >>> 
> >>> hvm
> >>> 
> >>> 
> >>> 
> >>> destroy
> >>> restart
> >>> destroy
> >>> 
> >>> /usr/bin/qemu
> >>> 
> >>> 
> >>> 
> >>> 
> >>>  function='0x0'/>
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>>
> >>>
> >>> - Mail original -
> >>> De: "pushpesh sharma" < pushpesh@gmail.com >
> >>> À: "aderumier" < aderum...@odiso.com >
> >>> Cc: "Somnath Roy" < somnath@sandisk.com >, "Irek Fasikhov" <
> malm...@gmail.com >, "ceph-devel" < ceph-de...@vger.kernel.org >,
> "ceph-users" < ceph-users@lists.ceph.com >
> >>> Envoyé: Vendredi 12 Juin 2015 07:52:41
> >>> Objet: Re: rbd_cache, limiting read on high iops around 40k
> >>>
> >>> Hi Alexandre,
> >>>
> >>> I agree with your rational, of one iothread per disk. CPU consumed in
> >>> IOwait is pretty high in each VM. But I am not finding a way to set
> >>> the same on a nova instance. I am using openstack Juno with QEMU+KVM.
> >>> As per libvirt documentation for setting iothreads, I can edit
> >>> domain.xml directly and achieve the same effect. However in as in
> >>> openstack env domain xml is created by nova with some additional
> >>> metadata, so editing the domain xml using 'virsh edit' does not seems
> >>> to work(I agree, it is not a very cloud way of doing things, but a
> >>> hack). Changes made there vanish after saving them, due to reason
> >>> libvirt validation fails on the same.
> >>>
> >>> #virsh dumpxml instance-00c5 > vm.xml
> >>> #virt-xml-validate vm.xml
> >>> Relax-NG validity error : Extra element cpu in interleave
> >>> vm.xml:1: element domain: Relax-NG validity error : Element domain
> >>> failed to validate content
> >>> vm.xml fails to validate
> >>>
> >>> Second approach I took was to setting QoS in volumes types. But there
> >>> is no option to set iothreads per volume, there are parameter realted
> >>> to max_read/wrirte ops/bytes.
> >>>
> >>> Thirdly, editing Nova flavor and proving extra specs like
> >>> hw:cpu_socket/thread/core, can change guest CPU topology however again
> >>> no way to set iothread. It does accept hw_disk_iothreads(no type check
> >>> in place, i believe ), but can not pass the same in domain.xml.
> >>>
> >>> Could you suggest me a way to set the same.
> >>>
> >>> -Pushpesh
> >>>
> >>> On Wed, Jun 10, 2015 at 12:59 PM, Alexandre DERUMIER
> >>> < aderum...@odiso.com > wrote:
> >>>>>> I need to try out the performance on qemu soon and may come back to
> you if I need some qemu setting trick :-)
> >>>>
> >>>> Sure no problem.
> >>>>
> >>>> (BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks
> with 1 iothread by disk)
> >>>>
> >>>>
> >>>> - Mail original -
> >>>> De: "Somnath Roy" < somnath@sandisk.com >
> >>>> À: "aderumier" < aderum.

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-22 Thread Alexandre DERUMIER
 it's working with it. 
>>> 
>>> 
>>>  
>>> QEMUGuest1 
>>> c7a5fdbd-edaf-9455-926a-d65c16db1809 
>>> 219136 
>>> 219136 
>>> 2 
>>> 2 
>>>  
>>> hvm 
>>>  
>>>  
>>>  
>>> destroy 
>>> restart 
>>> destroy 
>>>  
>>> /usr/bin/qemu 
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>> 
>>> 
>>> - Mail original - 
>>> De: "pushpesh sharma" < pushpesh@gmail.com > 
>>> À: "aderumier" < aderum...@odiso.com > 
>>> Cc: "Somnath Roy" < somnath@sandisk.com >, "Irek Fasikhov" < 
>>> malm...@gmail.com >, "ceph-devel" < ceph-de...@vger.kernel.org >, 
>>> "ceph-users" < ceph-users@lists.ceph.com > 
>>> Envoyé: Vendredi 12 Juin 2015 07:52:41 
>>> Objet: Re: rbd_cache, limiting read on high iops around 40k 
>>> 
>>> Hi Alexandre, 
>>> 
>>> I agree with your rational, of one iothread per disk. CPU consumed in 
>>> IOwait is pretty high in each VM. But I am not finding a way to set 
>>> the same on a nova instance. I am using openstack Juno with QEMU+KVM. 
>>> As per libvirt documentation for setting iothreads, I can edit 
>>> domain.xml directly and achieve the same effect. However in as in 
>>> openstack env domain xml is created by nova with some additional 
>>> metadata, so editing the domain xml using 'virsh edit' does not seems 
>>> to work(I agree, it is not a very cloud way of doing things, but a 
>>> hack). Changes made there vanish after saving them, due to reason 
>>> libvirt validation fails on the same. 
>>> 
>>> #virsh dumpxml instance-00c5 > vm.xml 
>>> #virt-xml-validate vm.xml 
>>> Relax-NG validity error : Extra element cpu in interleave 
>>> vm.xml:1: element domain: Relax-NG validity error : Element domain 
>>> failed to validate content 
>>> vm.xml fails to validate 
>>> 
>>> Second approach I took was to setting QoS in volumes types. But there 
>>> is no option to set iothreads per volume, there are parameter realted 
>>> to max_read/wrirte ops/bytes. 
>>> 
>>> Thirdly, editing Nova flavor and proving extra specs like 
>>> hw:cpu_socket/thread/core, can change guest CPU topology however again 
>>> no way to set iothread. It does accept hw_disk_iothreads(no type check 
>>> in place, i believe ), but can not pass the same in domain.xml. 
>>> 
>>> Could you suggest me a way to set the same. 
>>> 
>>> -Pushpesh 
>>> 
>>> On Wed, Jun 10, 2015 at 12:59 PM, Alexandre DERUMIER 
>>> < aderum...@odiso.com > wrote: 
>>>>>> I need to try out the performance on qemu soon and may come back to you 
>>>>>> if I need some qemu setting trick :-) 
>>>> 
>>>> Sure no problem. 
>>>> 
>>>> (BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks with 1 
>>>> iothread by disk) 
>>>> 
>>>> 
>>>> - Mail original - 
>>>> De: "Somnath Roy" < somnath@sandisk.com > 
>>>> À: "aderumier" < aderum...@odiso.com >, "Irek Fasikhov" < 
>>>> malm...@gmail.com > 
>>>> Cc: "ceph-devel" < ceph-de...@vger.kernel.org >, "pushpesh sharma" < 
>>>> pushpesh@gmail.com >, "ceph-users" < ceph-users@lists.ceph.com > 
>>>> Envoyé: Mercredi 10 Juin 2015 09:06:32 
>>>> Objet: RE: rbd_cache, limiting read on high iops around 40k 
>>>> 
>>>> Hi Alexandre, 
>>>> Thanks for sharing the data. 
>>>> I need to try out the performance on qemu soon and may come back to you if 
>>>> I need some qemu setting trick :-) 
>>>> 
>>>> Regards 
>>>> Somnath 
>>>> 
>>>> -Original Message- 
>>>> From: ceph-users [mailto: ceph-users-boun...@lists.ceph.com ] On Behalf Of 
>>>> Alexandre DERUMIER 
>>>> Sent: Tuesday, June 09, 2015 10:42 PM 
>>>> To: Irek Fasikhov 
>>>> Cc: ceph-devel; pushpesh sharma; ceph-users 
>>>> Subject: Re: 

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-22 Thread Irek Fasikhov
t; "ceph-users" 
> >>> Envoyé: Vendredi 12 Juin 2015 07:52:41
> >>> Objet: Re: rbd_cache, limiting read on high iops around 40k
> >>>
> >>> Hi Alexandre,
> >>>
> >>> I agree with your rational, of one iothread per disk. CPU consumed in
> >>> IOwait is pretty high in each VM. But I am not finding a way to set
> >>> the same on a nova instance. I am using openstack Juno with QEMU+KVM.
> >>> As per libvirt documentation for setting iothreads, I can edit
> >>> domain.xml directly and achieve the same effect. However in as in
> >>> openstack env domain xml is created by nova with some additional
> >>> metadata, so editing the domain xml using 'virsh edit' does not seems
> >>> to work(I agree, it is not a very cloud way of doing things, but a
> >>> hack). Changes made there vanish after saving them, due to reason
> >>> libvirt validation fails on the same.
> >>>
> >>> #virsh dumpxml instance-00c5 > vm.xml
> >>> #virt-xml-validate vm.xml
> >>> Relax-NG validity error : Extra element cpu in interleave
> >>> vm.xml:1: element domain: Relax-NG validity error : Element domain
> >>> failed to validate content
> >>> vm.xml fails to validate
> >>>
> >>> Second approach I took was to setting QoS in volumes types. But there
> >>> is no option to set iothreads per volume, there are parameter realted
> >>> to max_read/wrirte ops/bytes.
> >>>
> >>> Thirdly, editing Nova flavor and proving extra specs like
> >>> hw:cpu_socket/thread/core, can change guest CPU topology however again
> >>> no way to set iothread. It does accept hw_disk_iothreads(no type check
> >>> in place, i believe ), but can not pass the same in domain.xml.
> >>>
> >>> Could you suggest me a way to set the same.
> >>>
> >>> -Pushpesh
> >>>
> >>> On Wed, Jun 10, 2015 at 12:59 PM, Alexandre DERUMIER
> >>>  wrote:
> >>>>>> I need to try out the performance on qemu soon and may come back to
> you if I need some qemu setting trick :-)
> >>>>
> >>>> Sure no problem.
> >>>>
> >>>> (BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks
> with 1 iothread by disk)
> >>>>
> >>>>
> >>>> - Mail original -
> >>>> De: "Somnath Roy" 
> >>>> À: "aderumier" , "Irek Fasikhov" <
> malm...@gmail.com>
> >>>> Cc: "ceph-devel" , "pushpesh sharma" <
> pushpesh@gmail.com>, "ceph-users" 
> >>>> Envoyé: Mercredi 10 Juin 2015 09:06:32
> >>>> Objet: RE: rbd_cache, limiting read on high iops around 40k
> >>>>
> >>>> Hi Alexandre,
> >>>> Thanks for sharing the data.
> >>>> I need to try out the performance on qemu soon and may come back to
> you if I need some qemu setting trick :-)
> >>>>
> >>>> Regards
> >>>> Somnath
> >>>>
> >>>> -Original Message-
> >>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
> Behalf Of Alexandre DERUMIER
> >>>> Sent: Tuesday, June 09, 2015 10:42 PM
> >>>> To: Irek Fasikhov
> >>>> Cc: ceph-devel; pushpesh sharma; ceph-users
> >>>> Subject: Re: [ceph-users] rbd_cache, limiting read on high iops
> around 40k
> >>>>
> >>>>>> Very good work!
> >>>>>> Do you have a rpm-file?
> >>>>>> Thanks.
> >>>> no sorry, I'm have compiled it manually (and I'm using debian jessie
> as client)
> >>>>
> >>>>
> >>>>
> >>>> - Mail original -
> >>>> De: "Irek Fasikhov" 
> >>>> À: "aderumier" 
> >>>> Cc: "Robert LeBlanc" , "ceph-devel" <
> ceph-de...@vger.kernel.org>, "pushpesh sharma" ,
> "ceph-users" 
> >>>> Envoyé: Mercredi 10 Juin 2015 07:21:42
> >>>> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around
> 40k
> >>>>
> >>>> Hi, Alexandre.
> >>>>
> >>>> Very good work!
> >>>> Do you have a rpm-file?
> >>>> Thanks.
> >>>>
> >>&

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-22 Thread Stefan Priebe - Profihost AG
ut a 
>>> hack). Changes made there vanish after saving them, due to reason 
>>> libvirt validation fails on the same. 
>>> 
>>> #virsh dumpxml instance-00c5 > vm.xml 
>>> #virt-xml-validate vm.xml 
>>> Relax-NG validity error : Extra element cpu in interleave 
>>> vm.xml:1: element domain: Relax-NG validity error : Element domain 
>>> failed to validate content 
>>> vm.xml fails to validate 
>>> 
>>> Second approach I took was to setting QoS in volumes types. But there 
>>> is no option to set iothreads per volume, there are parameter realted 
>>> to max_read/wrirte ops/bytes. 
>>> 
>>> Thirdly, editing Nova flavor and proving extra specs like 
>>> hw:cpu_socket/thread/core, can change guest CPU topology however again 
>>> no way to set iothread. It does accept hw_disk_iothreads(no type check 
>>> in place, i believe ), but can not pass the same in domain.xml. 
>>> 
>>> Could you suggest me a way to set the same. 
>>> 
>>> -Pushpesh 
>>> 
>>> On Wed, Jun 10, 2015 at 12:59 PM, Alexandre DERUMIER 
>>>  wrote: 
>>>>>> I need to try out the performance on qemu soon and may come back to you 
>>>>>> if I need some qemu setting trick :-)
>>>> 
>>>> Sure no problem. 
>>>> 
>>>> (BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks with 1 
>>>> iothread by disk) 
>>>> 
>>>> 
>>>> - Mail original - 
>>>> De: "Somnath Roy"  
>>>> À: "aderumier" , "Irek Fasikhov"  
>>>> Cc: "ceph-devel" , "pushpesh sharma" 
>>>> , "ceph-users"  
>>>> Envoyé: Mercredi 10 Juin 2015 09:06:32 
>>>> Objet: RE: rbd_cache, limiting read on high iops around 40k 
>>>> 
>>>> Hi Alexandre, 
>>>> Thanks for sharing the data. 
>>>> I need to try out the performance on qemu soon and may come back to you if 
>>>> I need some qemu setting trick :-) 
>>>> 
>>>> Regards 
>>>> Somnath 
>>>> 
>>>> -Original Message- 
>>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
>>>> Alexandre DERUMIER 
>>>> Sent: Tuesday, June 09, 2015 10:42 PM 
>>>> To: Irek Fasikhov 
>>>> Cc: ceph-devel; pushpesh sharma; ceph-users 
>>>> Subject: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 
>>>> 
>>>>>> Very good work! 
>>>>>> Do you have a rpm-file? 
>>>>>> Thanks.
>>>> no sorry, I'm have compiled it manually (and I'm using debian jessie as 
>>>> client) 
>>>> 
>>>> 
>>>> 
>>>> - Mail original - 
>>>> De: "Irek Fasikhov"  
>>>> À: "aderumier"  
>>>> Cc: "Robert LeBlanc" , "ceph-devel" 
>>>> , "pushpesh sharma" , 
>>>> "ceph-users"  
>>>> Envoyé: Mercredi 10 Juin 2015 07:21:42 
>>>> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 
>>>> 
>>>> Hi, Alexandre. 
>>>> 
>>>> Very good work! 
>>>> Do you have a rpm-file? 
>>>> Thanks. 
>>>> 
>>>> 2015-06-10 7:10 GMT+03:00 Alexandre DERUMIER < aderum...@odiso.com > : 
>>>> 
>>>> 
>>>> Hi, 
>>>> 
>>>> I have tested qemu with last tcmalloc 2.4, and the improvement is huge 
>>>> with iothread: 50k iops (+45%) ! 
>>>> 
>>>> 
>>>> 
>>>> qemu : no iothread : glibc : iops=33395 qemu : no-iothread : tcmalloc 
>>>> (2.2.1) : iops=34516 (+3%) qemu : no-iothread : jemmaloc : iops=42226 
>>>> (+26%) qemu : no-iothread : tcmalloc (2.4) : iops=35974 (+7%) 
>>>> 
>>>> 
>>>> qemu : iothread : glibc : iops=34516 
>>>> qemu : iothread : tcmalloc : iops=38676 (+12%) qemu : iothread : jemmaloc 
>>>> : iops=28023 (-19%) qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) 
>>>> -- 
>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-22 Thread Alexandre DERUMIER
 like 
>> hw:cpu_socket/thread/core, can change guest CPU topology however again 
>> no way to set iothread. It does accept hw_disk_iothreads(no type check 
>> in place, i believe ), but can not pass the same in domain.xml. 
>> 
>> Could you suggest me a way to set the same. 
>> 
>> -Pushpesh 
>> 
>> On Wed, Jun 10, 2015 at 12:59 PM, Alexandre DERUMIER 
>>  wrote: 
>>>>>I need to try out the performance on qemu soon and may come back to you if 
>>>>>I need some qemu setting trick :-) 
>>> 
>>> Sure no problem. 
>>> 
>>> (BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks with 1 
>>> iothread by disk) 
>>> 
>>> 
>>> - Mail original - 
>>> De: "Somnath Roy"  
>>> À: "aderumier" , "Irek Fasikhov"  
>>> Cc: "ceph-devel" , "pushpesh sharma" 
>>> , "ceph-users"  
>>> Envoyé: Mercredi 10 Juin 2015 09:06:32 
>>> Objet: RE: rbd_cache, limiting read on high iops around 40k 
>>> 
>>> Hi Alexandre, 
>>> Thanks for sharing the data. 
>>> I need to try out the performance on qemu soon and may come back to you if 
>>> I need some qemu setting trick :-) 
>>> 
>>> Regards 
>>> Somnath 
>>> 
>>> -Original Message- 
>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
>>> Alexandre DERUMIER 
>>> Sent: Tuesday, June 09, 2015 10:42 PM 
>>> To: Irek Fasikhov 
>>> Cc: ceph-devel; pushpesh sharma; ceph-users 
>>> Subject: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 
>>> 
>>>>>Very good work! 
>>>>>Do you have a rpm-file? 
>>>>>Thanks. 
>>> no sorry, I'm have compiled it manually (and I'm using debian jessie as 
>>> client) 
>>> 
>>> 
>>> 
>>> - Mail original - 
>>> De: "Irek Fasikhov"  
>>> À: "aderumier"  
>>> Cc: "Robert LeBlanc" , "ceph-devel" 
>>> , "pushpesh sharma" , 
>>> "ceph-users"  
>>> Envoyé: Mercredi 10 Juin 2015 07:21:42 
>>> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 
>>> 
>>> Hi, Alexandre. 
>>> 
>>> Very good work! 
>>> Do you have a rpm-file? 
>>> Thanks. 
>>> 
>>> 2015-06-10 7:10 GMT+03:00 Alexandre DERUMIER < aderum...@odiso.com > : 
>>> 
>>> 
>>> Hi, 
>>> 
>>> I have tested qemu with last tcmalloc 2.4, and the improvement is huge with 
>>> iothread: 50k iops (+45%) ! 
>>> 
>>> 
>>> 
>>> qemu : no iothread : glibc : iops=33395 qemu : no-iothread : tcmalloc 
>>> (2.2.1) : iops=34516 (+3%) qemu : no-iothread : jemmaloc : iops=42226 
>>> (+26%) qemu : no-iothread : tcmalloc (2.4) : iops=35974 (+7%) 
>>> 
>>> 
>>> qemu : iothread : glibc : iops=34516 
>>> qemu : iothread : tcmalloc : iops=38676 (+12%) qemu : iothread : jemmaloc : 
>>> iops=28023 (-19%) qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) 
>>> -- 
>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>> ioengine=libaio, iodepth=32 
>>> fio-2.1.11 
>>> Starting 1 process 
>>> Jobs: 1 (f=1): [r(1)] [100.0% done] [214.7MB/0KB/0KB /s] [54.1K/0/0 iops] 
>>> [eta 00m:00s] 
>>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=894: Wed Jun 10 
>>> 05:54:24 2015 read : io=5120.0MB, bw=201108KB/s, iops=50276, runt= 
>>> 26070msec slat (usec): min=1, max=1136, avg= 3.54, stdev= 3.58 clat (usec): 
>>> min=128, max=6262, avg=631.41, stdev=197.71 lat (usec): min=149, max=6265, 
>>> avg=635.27, stdev=197.40 clat percentiles (usec): 
>>> | 1.00th=[ 318], 5.00th=[ 378], 10.00th=[ 418], 20.00th=[ 474], 
>>> | 30.00th=[ 516], 40.00th=[ 564], 50.00th=[ 612], 60.00th=[ 652], 
>>> | 70.00th=[ 700], 80.00th=[ 756], 90.00th=[ 860], 95.00th=[ 980], 
>>> | 99.00th=[ 1272], 99.50th=[ 1384], 99.90th=[ 1688], 99.95th=[ 1896], 
>>> | 99.99th=[ 3760] 
>>> bw (KB /s): min=145608, max=249688, per=100.00%, avg=201108.00, 
>>> stdev=21718.87 lat (usec) : 250=0.04%, 500=25.84%, 750=53.00%, 1000=16.63% 
>>> la

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-16 Thread Irek Fasikhov
If necessary, there are RPM files for centos 7:
​
 gperftools.spec
<https://drive.google.com/file/d/0BxoNLVWxzOJWaVVmWTA3Z18zbUE/edit?usp=drive_web>
​​
 pprof-2.4-1.el7.centos.noarch.rpm
<https://drive.google.com/file/d/0BxoNLVWxzOJWRmQ2ZEt6a1pnSVk/edit?usp=drive_web>
​​
 gperftools-libs-2.4-1.el7.centos.x86_64.rpm
<https://drive.google.com/file/d/0BxoNLVWxzOJWcVByNUZHWWJqRXc/edit?usp=drive_web>
​​
 gperftools-devel-2.4-1.el7.centos.x86_64.rpm
<https://drive.google.com/file/d/0BxoNLVWxzOJWYTUzQTNha3J3NEU/edit?usp=drive_web>
​​
 gperftools-debuginfo-2.4-1.el7.centos.x86_64.rpm
<https://drive.google.com/file/d/0BxoNLVWxzOJWVzBic043YUk2LWM/edit?usp=drive_web>
​​
 gperftools-2.4-1.el7.centos.x86_64.rpm
<https://drive.google.com/file/d/0BxoNLVWxzOJWNm81QWdQYU9ZaG8/edit?usp=drive_web>
​

2015-06-17 8:01 GMT+03:00 Alexandre DERUMIER :

> Hi,
> I finally fix it with tcmalloc with
>
> TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=268435456 LD_PRELOAD} =
> "/usr/lib/libtcmalloc_minimal.so.4" qemu
>
> I got almost same result than jemmaloc in this case, maybe a littleb it
> faster
>
>
> Here the iops results for 1qemu vm with iothread by disk (iodepth=32,
> 4krandread, nocache)
>
>
> qemu randread 4k nocache libc6  iops
>
>
> 1 disk  29052
> 2 disks 55878
> 4 disks 127899
> 8 disks 240566
> 15 disks269976
>
> qemu randread 4k nocache jemmaloc   iops
>
> 1 disk   41278
> 2 disks  75781
> 4 disks  195351
> 8 disks  294241
> 15 disks 298199
>
>
>
> qemu randread 4k nocache tcmalloc 16M cache iops
>
>
> 1 disk   37911
> 2 disks  67698
> 4 disks  41076
> 8 disks  43312
> 15 disks 37569
>
>
> qemu randread 4k nocache tcmalloc patched 256M  iops
>
> 1 disk no-iothread
> 1 disk   42160
> 2 disks  83135
> 4 disks  194591
> 8 disks  306038
> 15 disks 302278
>
>
> - Mail original -
> De: "aderumier" 
> À: "Mark Nelson" 
> Cc: "ceph-users" 
> Envoyé: Mardi 16 Juin 2015 20:27:54
> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k
>
> >>I forgot to ask, is this with the patched version of tcmalloc that
> >>theoretically fixes the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES issue?
>
> Yes, the patched version of tcmalloc, but also the last version from
> gperftools git.
> (I'm talking about qemu here, not osds).
>
> I have tried to increased TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, but it
> doesn't help.
>
>
>
> For osd, increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is helping.
> (Benchs are still running, I try to overload them as much as possible)
>
>
>
> - Mail original -
> De: "Mark Nelson" 
> À: "ceph-users" 
> Envoyé: Mardi 16 Juin 2015 19:04:27
> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k
>
> I forgot to ask, is this with the patched version of tcmalloc that
> theoretically fixes the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES issue?
>
> Mark
>
> On 06/16/2015 11:46 AM, Mark Nelson wrote:
> > Hi Alexandre,
> >
> > Excellent find! Have you also informed the QEMU developers of your
> > discovery?
> >
> > Mark
> >
> > On 06/16/2015 11:38 AM, Alexandre DERUMIER wrote:
> >> Hi,
> >>
> >> some news about qemu with tcmalloc vs jemmaloc.
> >>
> >> I'm testing with multiple disks (with iothreads) in 1 qemu guest.
> >>
> >> And if tcmalloc is a little faster than jemmaloc,
> >>
> >> I have hit a lot of time the
> >> tcmalloc::ThreadCache::ReleaseToCentralCache bug.
> >>
> >> increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, don't help.
> >>
> >>
> >> with multiple disk, I'm around 200k iops with tcmalloc (before hitting
> >> the bug) and 350kiops with jemmaloc.
> >>
> >> The problem is that when I hit malloc bug, I'm around 4000-1 iops,
> >> and only way to fix is is to restart qemu ...
> >>
> >>
> >>
> >> - Mail original -
> >> De: "pushpesh sharma" 
> >> À: "aderumier" 
> >> Cc: "Somnath Roy" , "Irek Fasikhov"
> >> , "ceph-devel" ,
> >> "ceph-users" 
> >> Envoyé: Vendredi 12 Juin 2015 08:58:21
> >> Objet: Re: rbd_cache, limiting read on high iops around 40k
> >>
> >> Thanks, posted the question in openstack list. Hopefully will get some
> >> expert opinion.
> >>
> >> On Fri, Jun 12, 2015 at 11:33 AM, Ale

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-16 Thread Alexandre DERUMIER
Hi,
I finally fix it with tcmalloc with 

TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=268435456 LD_PRELOAD} = 
"/usr/lib/libtcmalloc_minimal.so.4" qemu

I got almost same result than jemmaloc in this case, maybe a littleb it faster


Here the iops results for 1qemu vm with iothread by disk (iodepth=32, 
4krandread, nocache)


qemu randread 4k nocache libc6  iops


1 disk  29052
2 disks 55878
4 disks 127899
8 disks 240566
15 disks269976

qemu randread 4k nocache jemmaloc   iops

1 disk   41278
2 disks  75781
4 disks  195351
8 disks  294241
15 disks 298199



qemu randread 4k nocache tcmalloc 16M cache iops


1 disk   37911
2 disks  67698
4 disks  41076
8 disks  43312
15 disks 37569


qemu randread 4k nocache tcmalloc patched 256M  iops

1 disk no-iothread  
1 disk   42160
2 disks  83135
4 disks  194591
8 disks  306038
15 disks 302278


- Mail original -
De: "aderumier" 
À: "Mark Nelson" 
Cc: "ceph-users" 
Envoyé: Mardi 16 Juin 2015 20:27:54
Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

>>I forgot to ask, is this with the patched version of tcmalloc that 
>>theoretically fixes the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES issue? 

Yes, the patched version of tcmalloc, but also the last version from gperftools 
git. 
(I'm talking about qemu here, not osds). 

I have tried to increased TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, but it doesn't 
help. 



For osd, increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is helping. 
(Benchs are still running, I try to overload them as much as possible) 



- Mail original - 
De: "Mark Nelson"  
À: "ceph-users"  
Envoyé: Mardi 16 Juin 2015 19:04:27 
Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 

I forgot to ask, is this with the patched version of tcmalloc that 
theoretically fixes the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES issue? 

Mark 

On 06/16/2015 11:46 AM, Mark Nelson wrote: 
> Hi Alexandre, 
> 
> Excellent find! Have you also informed the QEMU developers of your 
> discovery? 
> 
> Mark 
> 
> On 06/16/2015 11:38 AM, Alexandre DERUMIER wrote: 
>> Hi, 
>> 
>> some news about qemu with tcmalloc vs jemmaloc. 
>> 
>> I'm testing with multiple disks (with iothreads) in 1 qemu guest. 
>> 
>> And if tcmalloc is a little faster than jemmaloc, 
>> 
>> I have hit a lot of time the 
>> tcmalloc::ThreadCache::ReleaseToCentralCache bug. 
>> 
>> increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, don't help. 
>> 
>> 
>> with multiple disk, I'm around 200k iops with tcmalloc (before hitting 
>> the bug) and 350kiops with jemmaloc. 
>> 
>> The problem is that when I hit malloc bug, I'm around 4000-1 iops, 
>> and only way to fix is is to restart qemu ... 
>> 
>> 
>> 
>> - Mail original - 
>> De: "pushpesh sharma"  
>> À: "aderumier"  
>> Cc: "Somnath Roy" , "Irek Fasikhov" 
>> , "ceph-devel" , 
>> "ceph-users"  
>> Envoyé: Vendredi 12 Juin 2015 08:58:21 
>> Objet: Re: rbd_cache, limiting read on high iops around 40k 
>> 
>> Thanks, posted the question in openstack list. Hopefully will get some 
>> expert opinion. 
>> 
>> On Fri, Jun 12, 2015 at 11:33 AM, Alexandre DERUMIER 
>>  wrote: 
>>> Hi, 
>>> 
>>> here a libvirt xml sample from libvirt src 
>>> 
>>> (you need to define  number, then assign then in disks). 
>>> 
>>> I don't use openstack, so I really don't known how it's working with it. 
>>> 
>>> 
>>>  
>>> QEMUGuest1 
>>> c7a5fdbd-edaf-9455-926a-d65c16db1809 
>>> 219136 
>>> 219136 
>>> 2 
>>> 2 
>>>  
>>> hvm 
>>>  
>>>  
>>>  
>>> destroy 
>>> restart 
>>> destroy 
>>>  
>>> /usr/bin/qemu 
>>>  
>>>  
>>>  
>>>  
>>> >> function='0x0'/> 
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>> 
>>> 
>>> - Mail original - 
>>> De: "pushpesh sharma"  
>>> À: "aderumier"  
>>> Cc: "Somnath Roy" , "Irek Fasikhov" 
>>> , "ceph-devel" , 
>>> "ceph-users"  
>>> Envoyé: Vendredi 12 J

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-16 Thread Alexandre DERUMIER
>>I forgot to ask, is this with the patched version of tcmalloc that 
>>theoretically fixes the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES issue?

Yes, the patched version of tcmalloc, but also the last version from gperftools 
git.
(I'm talking about qemu here, not osds).

I have tried to increased TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, but it doesn't 
help.



For osd, increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is helping.
(Benchs are still running, I try to overload them as much as possible)



- Mail original -
De: "Mark Nelson" 
À: "ceph-users" 
Envoyé: Mardi 16 Juin 2015 19:04:27
Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

I forgot to ask, is this with the patched version of tcmalloc that 
theoretically fixes the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES issue? 

Mark 

On 06/16/2015 11:46 AM, Mark Nelson wrote: 
> Hi Alexandre, 
> 
> Excellent find! Have you also informed the QEMU developers of your 
> discovery? 
> 
> Mark 
> 
> On 06/16/2015 11:38 AM, Alexandre DERUMIER wrote: 
>> Hi, 
>> 
>> some news about qemu with tcmalloc vs jemmaloc. 
>> 
>> I'm testing with multiple disks (with iothreads) in 1 qemu guest. 
>> 
>> And if tcmalloc is a little faster than jemmaloc, 
>> 
>> I have hit a lot of time the 
>> tcmalloc::ThreadCache::ReleaseToCentralCache bug. 
>> 
>> increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, don't help. 
>> 
>> 
>> with multiple disk, I'm around 200k iops with tcmalloc (before hitting 
>> the bug) and 350kiops with jemmaloc. 
>> 
>> The problem is that when I hit malloc bug, I'm around 4000-1 iops, 
>> and only way to fix is is to restart qemu ... 
>> 
>> 
>> 
>> - Mail original - 
>> De: "pushpesh sharma"  
>> À: "aderumier"  
>> Cc: "Somnath Roy" , "Irek Fasikhov" 
>> , "ceph-devel" , 
>> "ceph-users"  
>> Envoyé: Vendredi 12 Juin 2015 08:58:21 
>> Objet: Re: rbd_cache, limiting read on high iops around 40k 
>> 
>> Thanks, posted the question in openstack list. Hopefully will get some 
>> expert opinion. 
>> 
>> On Fri, Jun 12, 2015 at 11:33 AM, Alexandre DERUMIER 
>>  wrote: 
>>> Hi, 
>>> 
>>> here a libvirt xml sample from libvirt src 
>>> 
>>> (you need to define  number, then assign then in disks). 
>>> 
>>> I don't use openstack, so I really don't known how it's working with it. 
>>> 
>>> 
>>>  
>>> QEMUGuest1 
>>> c7a5fdbd-edaf-9455-926a-d65c16db1809 
>>> 219136 
>>> 219136 
>>> 2 
>>> 2 
>>>  
>>> hvm 
>>>  
>>>  
>>>  
>>> destroy 
>>> restart 
>>> destroy 
>>>  
>>> /usr/bin/qemu 
>>>  
>>>  
>>>  
>>>  
>>> >> function='0x0'/> 
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>> 
>>> 
>>> - Mail original - 
>>> De: "pushpesh sharma"  
>>> À: "aderumier"  
>>> Cc: "Somnath Roy" , "Irek Fasikhov" 
>>> , "ceph-devel" , 
>>> "ceph-users"  
>>> Envoyé: Vendredi 12 Juin 2015 07:52:41 
>>> Objet: Re: rbd_cache, limiting read on high iops around 40k 
>>> 
>>> Hi Alexandre, 
>>> 
>>> I agree with your rational, of one iothread per disk. CPU consumed in 
>>> IOwait is pretty high in each VM. But I am not finding a way to set 
>>> the same on a nova instance. I am using openstack Juno with QEMU+KVM. 
>>> As per libvirt documentation for setting iothreads, I can edit 
>>> domain.xml directly and achieve the same effect. However in as in 
>>> openstack env domain xml is created by nova with some additional 
>>> metadata, so editing the domain xml using 'virsh edit' does not seems 
>>> to work(I agree, it is not a very cloud way of doing things, but a 
>>> hack). Changes made there vanish after saving them, due to reason 
>>> libvirt validation fails on the same. 
>>> 
>>> #virsh dumpxml instance-00c5 > vm.xml 
>>> #virt-xml-validate vm.xml 
>>> Relax-NG validity error : Extra element cpu in interleave 
>>> vm.xml:1: element domain: Relax-NG validity error : Element domain 
>>> failed to va

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-16 Thread Mark Nelson
I forgot to ask, is this with the patched version of tcmalloc that 
theoretically fixes the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES issue?


Mark

On 06/16/2015 11:46 AM, Mark Nelson wrote:

Hi Alexandre,

Excellent find!  Have you also informed the QEMU developers of your
discovery?

Mark

On 06/16/2015 11:38 AM, Alexandre DERUMIER wrote:

Hi,

some news about qemu with tcmalloc vs jemmaloc.

I'm testing with multiple disks (with iothreads) in 1 qemu guest.

And if tcmalloc is a little faster than jemmaloc,

I have hit a lot of time the
tcmalloc::ThreadCache::ReleaseToCentralCache bug.

increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, don't help.


with multiple disk, I'm around 200k iops with tcmalloc (before hitting
the bug) and 350kiops with jemmaloc.

The problem is that when I hit malloc bug, I'm around 4000-1 iops,
and only way to fix is is to restart qemu ...



- Mail original -
De: "pushpesh sharma" 
À: "aderumier" 
Cc: "Somnath Roy" , "Irek Fasikhov"
, "ceph-devel" ,
"ceph-users" 
Envoyé: Vendredi 12 Juin 2015 08:58:21
Objet: Re: rbd_cache, limiting read on high iops around 40k

Thanks, posted the question in openstack list. Hopefully will get some
expert opinion.

On Fri, Jun 12, 2015 at 11:33 AM, Alexandre DERUMIER
 wrote:

Hi,

here a libvirt xml sample from libvirt src

(you need to define  number, then assign then in disks).

I don't use openstack, so I really don't known how it's working with it.



QEMUGuest1
c7a5fdbd-edaf-9455-926a-d65c16db1809
219136
219136
2
2

hvm



destroy
restart
destroy

/usr/bin/qemu



















- Mail original -
De: "pushpesh sharma" 
À: "aderumier" 
Cc: "Somnath Roy" , "Irek Fasikhov"
, "ceph-devel" ,
"ceph-users" 
Envoyé: Vendredi 12 Juin 2015 07:52:41
Objet: Re: rbd_cache, limiting read on high iops around 40k

Hi Alexandre,

I agree with your rational, of one iothread per disk. CPU consumed in
IOwait is pretty high in each VM. But I am not finding a way to set
the same on a nova instance. I am using openstack Juno with QEMU+KVM.
As per libvirt documentation for setting iothreads, I can edit
domain.xml directly and achieve the same effect. However in as in
openstack env domain xml is created by nova with some additional
metadata, so editing the domain xml using 'virsh edit' does not seems
to work(I agree, it is not a very cloud way of doing things, but a
hack). Changes made there vanish after saving them, due to reason
libvirt validation fails on the same.

#virsh dumpxml instance-00c5 > vm.xml
#virt-xml-validate vm.xml
Relax-NG validity error : Extra element cpu in interleave
vm.xml:1: element domain: Relax-NG validity error : Element domain
failed to validate content
vm.xml fails to validate

Second approach I took was to setting QoS in volumes types. But there
is no option to set iothreads per volume, there are parameter realted
to max_read/wrirte ops/bytes.

Thirdly, editing Nova flavor and proving extra specs like
hw:cpu_socket/thread/core, can change guest CPU topology however again
no way to set iothread. It does accept hw_disk_iothreads(no type check
in place, i believe ), but can not pass the same in domain.xml.

Could you suggest me a way to set the same.

-Pushpesh

On Wed, Jun 10, 2015 at 12:59 PM, Alexandre DERUMIER
 wrote:

I need to try out the performance on qemu soon and may come back
to you if I need some qemu setting trick :-)


Sure no problem.

(BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks
with 1 iothread by disk)


- Mail original -
De: "Somnath Roy" 
À: "aderumier" , "Irek Fasikhov"

Cc: "ceph-devel" , "pushpesh sharma"
, "ceph-users" 
Envoyé: Mercredi 10 Juin 2015 09:06:32
Objet: RE: rbd_cache, limiting read on high iops around 40k

Hi Alexandre,
Thanks for sharing the data.
I need to try out the performance on qemu soon and may come back to
you if I need some qemu setting trick :-)

Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
Behalf Of Alexandre DERUMIER
Sent: Tuesday, June 09, 2015 10:42 PM
To: Irek Fasikhov
Cc: ceph-devel; pushpesh sharma; ceph-users
Subject: Re: [ceph-users] rbd_cache, limiting read on high iops
around 40k


Very good work!
Do you have a rpm-file?
Thanks.

no sorry, I'm have compiled it manually (and I'm using debian jessie
as client)



- Mail original -
De: "Irek Fasikhov" 
À: "aderumier" 
Cc: "Robert LeBlanc" , "ceph-devel"
, "pushpesh sharma"
, "ceph-users" 
Envoyé: Mercredi 10 Juin 2015 07:21:42
Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around
40k

Hi, Alexandre.

Very good work!
Do you have a rpm-file?
Thanks.

2015-06-10 7:10 GMT+03:00 Alexandre DERUMIER < ad

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-16 Thread Mark Nelson

Hi Alexandre,

Excellent find!  Have you also informed the QEMU developers of your 
discovery?


Mark

On 06/16/2015 11:38 AM, Alexandre DERUMIER wrote:

Hi,

some news about qemu with tcmalloc vs jemmaloc.

I'm testing with multiple disks (with iothreads) in 1 qemu guest.

And if tcmalloc is a little faster than jemmaloc,

I have hit a lot of time the tcmalloc::ThreadCache::ReleaseToCentralCache bug.

increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, don't help.


with multiple disk, I'm around 200k iops with tcmalloc (before hitting the bug) 
and 350kiops with jemmaloc.

The problem is that when I hit malloc bug, I'm around 4000-1 iops, and only 
way to fix is is to restart qemu ...



- Mail original -
De: "pushpesh sharma" 
À: "aderumier" 
Cc: "Somnath Roy" , "Irek Fasikhov" , "ceph-devel" 
, "ceph-users" 
Envoyé: Vendredi 12 Juin 2015 08:58:21
Objet: Re: rbd_cache, limiting read on high iops around 40k

Thanks, posted the question in openstack list. Hopefully will get some
expert opinion.

On Fri, Jun 12, 2015 at 11:33 AM, Alexandre DERUMIER
 wrote:

Hi,

here a libvirt xml sample from libvirt src

(you need to define  number, then assign then in disks).

I don't use openstack, so I really don't known how it's working with it.



QEMUGuest1
c7a5fdbd-edaf-9455-926a-d65c16db1809
219136
219136
2
2

hvm



destroy
restart
destroy

/usr/bin/qemu



















- Mail original -
De: "pushpesh sharma" 
À: "aderumier" 
Cc: "Somnath Roy" , "Irek Fasikhov" , "ceph-devel" 
, "ceph-users" 
Envoyé: Vendredi 12 Juin 2015 07:52:41
Objet: Re: rbd_cache, limiting read on high iops around 40k

Hi Alexandre,

I agree with your rational, of one iothread per disk. CPU consumed in
IOwait is pretty high in each VM. But I am not finding a way to set
the same on a nova instance. I am using openstack Juno with QEMU+KVM.
As per libvirt documentation for setting iothreads, I can edit
domain.xml directly and achieve the same effect. However in as in
openstack env domain xml is created by nova with some additional
metadata, so editing the domain xml using 'virsh edit' does not seems
to work(I agree, it is not a very cloud way of doing things, but a
hack). Changes made there vanish after saving them, due to reason
libvirt validation fails on the same.

#virsh dumpxml instance-00c5 > vm.xml
#virt-xml-validate vm.xml
Relax-NG validity error : Extra element cpu in interleave
vm.xml:1: element domain: Relax-NG validity error : Element domain
failed to validate content
vm.xml fails to validate

Second approach I took was to setting QoS in volumes types. But there
is no option to set iothreads per volume, there are parameter realted
to max_read/wrirte ops/bytes.

Thirdly, editing Nova flavor and proving extra specs like
hw:cpu_socket/thread/core, can change guest CPU topology however again
no way to set iothread. It does accept hw_disk_iothreads(no type check
in place, i believe ), but can not pass the same in domain.xml.

Could you suggest me a way to set the same.

-Pushpesh

On Wed, Jun 10, 2015 at 12:59 PM, Alexandre DERUMIER
 wrote:

I need to try out the performance on qemu soon and may come back to you if I 
need some qemu setting trick :-)


Sure no problem.

(BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks with 1 
iothread by disk)


- Mail original -
De: "Somnath Roy" 
À: "aderumier" , "Irek Fasikhov" 
Cc: "ceph-devel" , "pushpesh sharma" , 
"ceph-users" 
Envoyé: Mercredi 10 Juin 2015 09:06:32
Objet: RE: rbd_cache, limiting read on high iops around 40k

Hi Alexandre,
Thanks for sharing the data.
I need to try out the performance on qemu soon and may come back to you if I 
need some qemu setting trick :-)

Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Alexandre DERUMIER
Sent: Tuesday, June 09, 2015 10:42 PM
To: Irek Fasikhov
Cc: ceph-devel; pushpesh sharma; ceph-users
Subject: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k


Very good work!
Do you have a rpm-file?
Thanks.

no sorry, I'm have compiled it manually (and I'm using debian jessie as client)



- Mail original -
De: "Irek Fasikhov" 
À: "aderumier" 
Cc: "Robert LeBlanc" , "ceph-devel" , "pushpesh 
sharma" , "ceph-users" 
Envoyé: Mercredi 10 Juin 2015 07:21:42
Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

Hi, Alexandre.

Very good work!
Do you have a rpm-file?
Thanks.

2015-06-10 7:10 GMT+03:00 Alexandre DERUMIER < aderum...@odiso.com > :


Hi,

I have tested qemu with last tcmalloc 2.4, and the improvement is huge with 
iothread: 50k iops (+45%) !



qemu : no iothread : glibc : iops=3339

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-16 Thread Alexandre DERUMIER
Hi,

some news about qemu with tcmalloc vs jemmaloc.

I'm testing with multiple disks (with iothreads) in 1 qemu guest.

And if tcmalloc is a little faster than jemmaloc,

I have hit a lot of time the tcmalloc::ThreadCache::ReleaseToCentralCache bug.

increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, don't help.


with multiple disk, I'm around 200k iops with tcmalloc (before hitting the bug) 
and 350kiops with jemmaloc.

The problem is that when I hit malloc bug, I'm around 4000-1 iops, and only 
way to fix is is to restart qemu ...



- Mail original -
De: "pushpesh sharma" 
À: "aderumier" 
Cc: "Somnath Roy" , "Irek Fasikhov" 
, "ceph-devel" , "ceph-users" 

Envoyé: Vendredi 12 Juin 2015 08:58:21
Objet: Re: rbd_cache, limiting read on high iops around 40k

Thanks, posted the question in openstack list. Hopefully will get some 
expert opinion. 

On Fri, Jun 12, 2015 at 11:33 AM, Alexandre DERUMIER 
 wrote: 
> Hi, 
> 
> here a libvirt xml sample from libvirt src 
> 
> (you need to define  number, then assign then in disks). 
> 
> I don't use openstack, so I really don't known how it's working with it. 
> 
> 
>  
> QEMUGuest1 
> c7a5fdbd-edaf-9455-926a-d65c16db1809 
> 219136 
> 219136 
> 2 
> 2 
>  
> hvm 
>  
>  
>  
> destroy 
> restart 
> destroy 
>  
> /usr/bin/qemu 
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
> 
> 
> - Mail original - 
> De: "pushpesh sharma"  
> À: "aderumier"  
> Cc: "Somnath Roy" , "Irek Fasikhov" 
> , "ceph-devel" , "ceph-users" 
>  
> Envoyé: Vendredi 12 Juin 2015 07:52:41 
> Objet: Re: rbd_cache, limiting read on high iops around 40k 
> 
> Hi Alexandre, 
> 
> I agree with your rational, of one iothread per disk. CPU consumed in 
> IOwait is pretty high in each VM. But I am not finding a way to set 
> the same on a nova instance. I am using openstack Juno with QEMU+KVM. 
> As per libvirt documentation for setting iothreads, I can edit 
> domain.xml directly and achieve the same effect. However in as in 
> openstack env domain xml is created by nova with some additional 
> metadata, so editing the domain xml using 'virsh edit' does not seems 
> to work(I agree, it is not a very cloud way of doing things, but a 
> hack). Changes made there vanish after saving them, due to reason 
> libvirt validation fails on the same. 
> 
> #virsh dumpxml instance-00c5 > vm.xml 
> #virt-xml-validate vm.xml 
> Relax-NG validity error : Extra element cpu in interleave 
> vm.xml:1: element domain: Relax-NG validity error : Element domain 
> failed to validate content 
> vm.xml fails to validate 
> 
> Second approach I took was to setting QoS in volumes types. But there 
> is no option to set iothreads per volume, there are parameter realted 
> to max_read/wrirte ops/bytes. 
> 
> Thirdly, editing Nova flavor and proving extra specs like 
> hw:cpu_socket/thread/core, can change guest CPU topology however again 
> no way to set iothread. It does accept hw_disk_iothreads(no type check 
> in place, i believe ), but can not pass the same in domain.xml. 
> 
> Could you suggest me a way to set the same. 
> 
> -Pushpesh 
> 
> On Wed, Jun 10, 2015 at 12:59 PM, Alexandre DERUMIER 
>  wrote: 
>>>>I need to try out the performance on qemu soon and may come back to you if 
>>>>I need some qemu setting trick :-) 
>> 
>> Sure no problem. 
>> 
>> (BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks with 1 
>> iothread by disk) 
>> 
>> 
>> - Mail original - 
>> De: "Somnath Roy"  
>> À: "aderumier" , "Irek Fasikhov"  
>> Cc: "ceph-devel" , "pushpesh sharma" 
>> , "ceph-users"  
>> Envoyé: Mercredi 10 Juin 2015 09:06:32 
>> Objet: RE: rbd_cache, limiting read on high iops around 40k 
>> 
>> Hi Alexandre, 
>> Thanks for sharing the data. 
>> I need to try out the performance on qemu soon and may come back to you if I 
>> need some qemu setting trick :-) 
>> 
>> Regards 
>> Somnath 
>> 
>> -Original Message- 
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
>> Alexandre DERUMIER 
>> Sent: Tuesday, June 09, 2015 10:42 PM 
>> To: Irek Fasikhov 
>> Cc: ceph-devel; pushpesh sharma; ceph-users 
>> Subject: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 
>> 
>>>>Very good work! 
>>>>Do yo

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-11 Thread pushpesh sharma
Thanks, posted the question in openstack list. Hopefully will get some
expert opinion.

On Fri, Jun 12, 2015 at 11:33 AM, Alexandre DERUMIER
 wrote:
> Hi,
>
> here a libvirt xml sample from libvirt src
>
> (you need to define   number, then assign then in disks).
>
> I don't use openstack, so I really don't known how it's working with it.
>
>
> 
>   QEMUGuest1
>   c7a5fdbd-edaf-9455-926a-d65c16db1809
>   219136
>   219136
>   2
>   2
>   
> hvm
> 
>   
>   
>   destroy
>   restart
>   destroy
>   
> /usr/bin/qemu
> 
>   
>   
>   
>function='0x0'/>
> 
> 
>   
>   
>   
> 
> 
> 
> 
> 
>   
> 
>
>
> - Mail original -
> De: "pushpesh sharma" 
> À: "aderumier" 
> Cc: "Somnath Roy" , "Irek Fasikhov" 
> , "ceph-devel" , "ceph-users" 
> 
> Envoyé: Vendredi 12 Juin 2015 07:52:41
> Objet: Re: rbd_cache, limiting read on high iops around 40k
>
> Hi Alexandre,
>
> I agree with your rational, of one iothread per disk. CPU consumed in
> IOwait is pretty high in each VM. But I am not finding a way to set
> the same on a nova instance. I am using openstack Juno with QEMU+KVM.
> As per libvirt documentation for setting iothreads, I can edit
> domain.xml directly and achieve the same effect. However in as in
> openstack env domain xml is created by nova with some additional
> metadata, so editing the domain xml using 'virsh edit' does not seems
> to work(I agree, it is not a very cloud way of doing things, but a
> hack). Changes made there vanish after saving them, due to reason
> libvirt validation fails on the same.
>
> #virsh dumpxml instance-00c5 > vm.xml
> #virt-xml-validate vm.xml
> Relax-NG validity error : Extra element cpu in interleave
> vm.xml:1: element domain: Relax-NG validity error : Element domain
> failed to validate content
> vm.xml fails to validate
>
> Second approach I took was to setting QoS in volumes types. But there
> is no option to set iothreads per volume, there are parameter realted
> to max_read/wrirte ops/bytes.
>
> Thirdly, editing Nova flavor and proving extra specs like
> hw:cpu_socket/thread/core, can change guest CPU topology however again
> no way to set iothread. It does accept hw_disk_iothreads(no type check
> in place, i believe ), but can not pass the same in domain.xml.
>
> Could you suggest me a way to set the same.
>
> -Pushpesh
>
> On Wed, Jun 10, 2015 at 12:59 PM, Alexandre DERUMIER
>  wrote:
>>>>I need to try out the performance on qemu soon and may come back to you if 
>>>>I need some qemu setting trick :-)
>>
>> Sure no problem.
>>
>> (BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks with 1 
>> iothread by disk)
>>
>>
>> - Mail original -
>> De: "Somnath Roy" 
>> À: "aderumier" , "Irek Fasikhov" 
>> Cc: "ceph-devel" , "pushpesh sharma" 
>> , "ceph-users" 
>> Envoyé: Mercredi 10 Juin 2015 09:06:32
>> Objet: RE: rbd_cache, limiting read on high iops around 40k
>>
>> Hi Alexandre,
>> Thanks for sharing the data.
>> I need to try out the performance on qemu soon and may come back to you if I 
>> need some qemu setting trick :-)
>>
>> Regards
>> Somnath
>>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
>> Alexandre DERUMIER
>> Sent: Tuesday, June 09, 2015 10:42 PM
>> To: Irek Fasikhov
>> Cc: ceph-devel; pushpesh sharma; ceph-users
>> Subject: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k
>>
>>>>Very good work!
>>>>Do you have a rpm-file?
>>>>Thanks.
>> no sorry, I'm have compiled it manually (and I'm using debian jessie as 
>> client)
>>
>>
>>
>> - Mail original -
>> De: "Irek Fasikhov" 
>> À: "aderumier" 
>> Cc: "Robert LeBlanc" , "ceph-devel" 
>> , "pushpesh sharma" , 
>> "ceph-users" 
>> Envoyé: Mercredi 10 Juin 2015 07:21:42
>> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k
>>
>> Hi, Alexandre.
>>
>> Very good work!
>> Do you have a rpm-file?
>> Thanks.
>>
>> 2015-06-10 7:10 GMT+03:00 Alexandre DERUMIER < aderum...@odiso.com > :
>>
>>
>> Hi,
>>
>> I have tested 

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-11 Thread Alexandre DERUMIER
Hi,

here a libvirt xml sample from libvirt src

(you need to define   number, then assign then in disks).

I don't use openstack, so I really don't known how it's working with it.



  QEMUGuest1
  c7a5fdbd-edaf-9455-926a-d65c16db1809
  219136
  219136
  2
  2
  
hvm

  
  
  destroy
  restart
  destroy
  
/usr/bin/qemu

  
  
  
  


  
  
  





  



- Mail original -
De: "pushpesh sharma" 
À: "aderumier" 
Cc: "Somnath Roy" , "Irek Fasikhov" 
, "ceph-devel" , "ceph-users" 

Envoyé: Vendredi 12 Juin 2015 07:52:41
Objet: Re: rbd_cache, limiting read on high iops around 40k

Hi Alexandre, 

I agree with your rational, of one iothread per disk. CPU consumed in 
IOwait is pretty high in each VM. But I am not finding a way to set 
the same on a nova instance. I am using openstack Juno with QEMU+KVM. 
As per libvirt documentation for setting iothreads, I can edit 
domain.xml directly and achieve the same effect. However in as in 
openstack env domain xml is created by nova with some additional 
metadata, so editing the domain xml using 'virsh edit' does not seems 
to work(I agree, it is not a very cloud way of doing things, but a 
hack). Changes made there vanish after saving them, due to reason 
libvirt validation fails on the same. 

#virsh dumpxml instance-00c5 > vm.xml 
#virt-xml-validate vm.xml 
Relax-NG validity error : Extra element cpu in interleave 
vm.xml:1: element domain: Relax-NG validity error : Element domain 
failed to validate content 
vm.xml fails to validate 

Second approach I took was to setting QoS in volumes types. But there 
is no option to set iothreads per volume, there are parameter realted 
to max_read/wrirte ops/bytes. 

Thirdly, editing Nova flavor and proving extra specs like 
hw:cpu_socket/thread/core, can change guest CPU topology however again 
no way to set iothread. It does accept hw_disk_iothreads(no type check 
in place, i believe ), but can not pass the same in domain.xml. 

Could you suggest me a way to set the same. 

-Pushpesh 

On Wed, Jun 10, 2015 at 12:59 PM, Alexandre DERUMIER 
 wrote: 
>>>I need to try out the performance on qemu soon and may come back to you if I 
>>>need some qemu setting trick :-) 
> 
> Sure no problem. 
> 
> (BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks with 1 
> iothread by disk) 
> 
> 
> - Mail original - 
> De: "Somnath Roy"  
> À: "aderumier" , "Irek Fasikhov"  
> Cc: "ceph-devel" , "pushpesh sharma" 
> , "ceph-users"  
> Envoyé: Mercredi 10 Juin 2015 09:06:32 
> Objet: RE: rbd_cache, limiting read on high iops around 40k 
> 
> Hi Alexandre, 
> Thanks for sharing the data. 
> I need to try out the performance on qemu soon and may come back to you if I 
> need some qemu setting trick :-) 
> 
> Regards 
> Somnath 
> 
> -Original Message----- 
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Alexandre DERUMIER 
> Sent: Tuesday, June 09, 2015 10:42 PM 
> To: Irek Fasikhov 
> Cc: ceph-devel; pushpesh sharma; ceph-users 
> Subject: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 
> 
>>>Very good work! 
>>>Do you have a rpm-file? 
>>>Thanks. 
> no sorry, I'm have compiled it manually (and I'm using debian jessie as 
> client) 
> 
> 
> 
> - Mail original - 
> De: "Irek Fasikhov"  
> À: "aderumier"  
> Cc: "Robert LeBlanc" , "ceph-devel" 
> , "pushpesh sharma" , 
> "ceph-users"  
> Envoyé: Mercredi 10 Juin 2015 07:21:42 
> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 
> 
> Hi, Alexandre. 
> 
> Very good work! 
> Do you have a rpm-file? 
> Thanks. 
> 
> 2015-06-10 7:10 GMT+03:00 Alexandre DERUMIER < aderum...@odiso.com > : 
> 
> 
> Hi, 
> 
> I have tested qemu with last tcmalloc 2.4, and the improvement is huge with 
> iothread: 50k iops (+45%) ! 
> 
> 
> 
> qemu : no iothread : glibc : iops=33395 qemu : no-iothread : tcmalloc (2.2.1) 
> : iops=34516 (+3%) qemu : no-iothread : jemmaloc : iops=42226 (+26%) qemu : 
> no-iothread : tcmalloc (2.4) : iops=35974 (+7%) 
> 
> 
> qemu : iothread : glibc : iops=34516 
> qemu : iothread : tcmalloc : iops=38676 (+12%) qemu : iothread : jemmaloc : 
> iops=28023 (-19%) qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) 
> 
> 
> 
> 
> 
> qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) 
> -- 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
> ioen

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-11 Thread pushpesh sharma
Hi Alexandre,

I agree with your rational, of one iothread per disk. CPU consumed in
IOwait is pretty high in each VM. But I am not finding a way to set
the same on a nova instance. I am using openstack Juno with QEMU+KVM.
As per libvirt documentation for setting iothreads, I can edit
domain.xml directly and achieve the same effect. However in as in
openstack env domain xml is created by nova with some additional
metadata, so editing the domain xml using 'virsh edit' does not seems
to work(I agree, it is not a very cloud way of doing things, but a
hack). Changes made there vanish after saving them, due to reason
libvirt validation fails on the same.

#virsh dumpxml instance-00c5 > vm.xml
#virt-xml-validate vm.xml
Relax-NG validity error : Extra element cpu in interleave
vm.xml:1: element domain: Relax-NG validity error : Element domain
failed to validate content
vm.xml fails to validate

Second approach I took was to setting QoS in volumes types. But there
is no option to set iothreads per volume, there are parameter realted
to max_read/wrirte ops/bytes.

Thirdly, editing Nova flavor and proving extra specs like
hw:cpu_socket/thread/core, can change guest CPU topology however again
no way to set iothread. It does accept hw_disk_iothreads(no type check
in place, i believe ), but can not pass the same in domain.xml.

Could you suggest me a way to set the same.

-Pushpesh

On Wed, Jun 10, 2015 at 12:59 PM, Alexandre DERUMIER
 wrote:
>>>I need to try out the performance on qemu soon and may come back to you if I 
>>>need some qemu setting trick :-)
>
> Sure no problem.
>
> (BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks with 1 
> iothread by disk)
>
>
> - Mail original -
> De: "Somnath Roy" 
> À: "aderumier" , "Irek Fasikhov" 
> Cc: "ceph-devel" , "pushpesh sharma" 
> , "ceph-users" 
> Envoyé: Mercredi 10 Juin 2015 09:06:32
> Objet: RE: rbd_cache, limiting read on high iops around 40k
>
> Hi Alexandre,
> Thanks for sharing the data.
> I need to try out the performance on qemu soon and may come back to you if I 
> need some qemu setting trick :-)
>
> Regards
> Somnath
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Alexandre DERUMIER
> Sent: Tuesday, June 09, 2015 10:42 PM
> To: Irek Fasikhov
> Cc: ceph-devel; pushpesh sharma; ceph-users
> Subject: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k
>
>>>Very good work!
>>>Do you have a rpm-file?
>>>Thanks.
> no sorry, I'm have compiled it manually (and I'm using debian jessie as 
> client)
>
>
>
> ----- Mail original -----
> De: "Irek Fasikhov" 
> À: "aderumier" 
> Cc: "Robert LeBlanc" , "ceph-devel" 
> , "pushpesh sharma" , 
> "ceph-users" 
> Envoyé: Mercredi 10 Juin 2015 07:21:42
> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k
>
> Hi, Alexandre.
>
> Very good work!
> Do you have a rpm-file?
> Thanks.
>
> 2015-06-10 7:10 GMT+03:00 Alexandre DERUMIER < aderum...@odiso.com > :
>
>
> Hi,
>
> I have tested qemu with last tcmalloc 2.4, and the improvement is huge with 
> iothread: 50k iops (+45%) !
>
>
>
> qemu : no iothread : glibc : iops=33395 qemu : no-iothread : tcmalloc (2.2.1) 
> : iops=34516 (+3%) qemu : no-iothread : jemmaloc : iops=42226 (+26%) qemu : 
> no-iothread : tcmalloc (2.4) : iops=35974 (+7%)
>
>
> qemu : iothread : glibc : iops=34516
> qemu : iothread : tcmalloc : iops=38676 (+12%) qemu : iothread : jemmaloc : 
> iops=28023 (-19%) qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%)
>
>
>
>
>
> qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%)
> --
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
> ioengine=libaio, iodepth=32
> fio-2.1.11
> Starting 1 process
> Jobs: 1 (f=1): [r(1)] [100.0% done] [214.7MB/0KB/0KB /s] [54.1K/0/0 iops] 
> [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=894: Wed Jun 10 05:54:24 
> 2015 read : io=5120.0MB, bw=201108KB/s, iops=50276, runt= 26070msec slat 
> (usec): min=1, max=1136, avg= 3.54, stdev= 3.58 clat (usec): min=128, 
> max=6262, avg=631.41, stdev=197.71 lat (usec): min=149, max=6265, avg=635.27, 
> stdev=197.40 clat percentiles (usec):
> | 1.00th=[ 318], 5.00th=[ 378], 10.00th=[ 418], 20.00th=[ 474],
> | 30.00th=[ 516], 40.00th=[ 564], 50.00th=[ 612], 60.00th=[ 652],
> | 70.00th=[ 700], 80.00th=[ 756], 90.00th=[ 860], 95.00th=[ 980],
> | 99.00th=[ 1272], 99.50th=[ 1384], 99.90th=[ 1688], 99.95th=[ 1896],
> | 99.99th=[ 3760]

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-10 Thread Alexandre DERUMIER
>>I need to try out the performance on qemu soon and may come back to you if I 
>>need some qemu setting trick :-)

Sure no problem.

(BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks with 1 
iothread by disk)


- Mail original -
De: "Somnath Roy" 
À: "aderumier" , "Irek Fasikhov" 
Cc: "ceph-devel" , "pushpesh sharma" 
, "ceph-users" 
Envoyé: Mercredi 10 Juin 2015 09:06:32
Objet: RE: rbd_cache, limiting read on high iops around 40k

Hi Alexandre, 
Thanks for sharing the data. 
I need to try out the performance on qemu soon and may come back to you if I 
need some qemu setting trick :-) 

Regards 
Somnath 

-Original Message- 
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Alexandre DERUMIER 
Sent: Tuesday, June 09, 2015 10:42 PM 
To: Irek Fasikhov 
Cc: ceph-devel; pushpesh sharma; ceph-users 
Subject: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 

>>Very good work! 
>>Do you have a rpm-file? 
>>Thanks. 
no sorry, I'm have compiled it manually (and I'm using debian jessie as client) 



- Mail original - 
De: "Irek Fasikhov"  
À: "aderumier"  
Cc: "Robert LeBlanc" , "ceph-devel" 
, "pushpesh sharma" , 
"ceph-users"  
Envoyé: Mercredi 10 Juin 2015 07:21:42 
Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 

Hi, Alexandre. 

Very good work! 
Do you have a rpm-file? 
Thanks. 

2015-06-10 7:10 GMT+03:00 Alexandre DERUMIER < aderum...@odiso.com > : 


Hi, 

I have tested qemu with last tcmalloc 2.4, and the improvement is huge with 
iothread: 50k iops (+45%) ! 



qemu : no iothread : glibc : iops=33395 qemu : no-iothread : tcmalloc (2.2.1) : 
iops=34516 (+3%) qemu : no-iothread : jemmaloc : iops=42226 (+26%) qemu : 
no-iothread : tcmalloc (2.4) : iops=35974 (+7%) 


qemu : iothread : glibc : iops=34516 
qemu : iothread : tcmalloc : iops=38676 (+12%) qemu : iothread : jemmaloc : 
iops=28023 (-19%) qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) 





qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) 
-- 
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, 
iodepth=32 
fio-2.1.11 
Starting 1 process 
Jobs: 1 (f=1): [r(1)] [100.0% done] [214.7MB/0KB/0KB /s] [54.1K/0/0 iops] [eta 
00m:00s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=894: Wed Jun 10 05:54:24 
2015 read : io=5120.0MB, bw=201108KB/s, iops=50276, runt= 26070msec slat 
(usec): min=1, max=1136, avg= 3.54, stdev= 3.58 clat (usec): min=128, max=6262, 
avg=631.41, stdev=197.71 lat (usec): min=149, max=6265, avg=635.27, 
stdev=197.40 clat percentiles (usec): 
| 1.00th=[ 318], 5.00th=[ 378], 10.00th=[ 418], 20.00th=[ 474], 
| 30.00th=[ 516], 40.00th=[ 564], 50.00th=[ 612], 60.00th=[ 652], 
| 70.00th=[ 700], 80.00th=[ 756], 90.00th=[ 860], 95.00th=[ 980], 
| 99.00th=[ 1272], 99.50th=[ 1384], 99.90th=[ 1688], 99.95th=[ 1896], 
| 99.99th=[ 3760] 
bw (KB /s): min=145608, max=249688, per=100.00%, avg=201108.00, stdev=21718.87 
lat (usec) : 250=0.04%, 500=25.84%, 750=53.00%, 1000=16.63% lat (msec) : 
2=4.46%, 4=0.03%, 10=0.01% cpu : usr=9.73%, sys=24.93%, ctx=66417, majf=0, 
minf=38 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, 
>=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, 
>=64=0.0% issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 latency : 
target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=5120.0MB, aggrb=201107KB/s, minb=201107KB/s, maxb=201107KB/s, 
mint=26070msec, maxt=26070msec 

Disk stats (read/write): 
vdb: ios=1302555/0, merge=0/0, ticks=715176/0, in_queue=714840, util=99.73% 






rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, 
iodepth=32 
fio-2.1.11 
Starting 1 process 
Jobs: 1 (f=1): [r(1)] [100.0% done] [158.7MB/0KB/0KB /s] [40.6K/0/0 iops] [eta 
00m:00s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=889: Wed Jun 10 06:05:06 
2015 read : io=5120.0MB, bw=143897KB/s, iops=35974, runt= 36435msec slat 
(usec): min=1, max=710, avg= 3.31, stdev= 3.35 clat (usec): min=191, max=4740, 
avg=884.66, stdev=315.65 lat (usec): min=289, max=4743, avg=888.31, 
stdev=315.51 clat percentiles (usec): 
| 1.00th=[ 462], 5.00th=[ 516], 10.00th=[ 548], 20.00th=[ 596], 
| 30.00th=[ 652], 40.00th=[ 764], 50.00th=[ 868], 60.00th=[ 940], 
| 70.00th=[ 1004], 80.00th=[ 1096], 90.00th=[ 1256], 95.00th=[ 1416], 
| 99.00th=[ 2024], 99.50th=[ 2224], 99.90th=[ 2544], 99.95th=[ 2640], 
| 99.99th=[ 3632] 
bw (KB /s): min=98352, max=177328, per=99.91%, avg=143772.11, stdev=21782.39 
lat (usec) : 250=0.01%, 500=3.48%, 750=35.69%, 1000=30.01% lat (msec) : 
2=29.74%, 4=1.07%, 10=0.01% cpu : usr=7.10%, sys=16.

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-10 Thread Somnath Roy
Hi Alexandre,
Thanks for sharing the data.
I need to try out the performance on qemu soon and may come back to you if I 
need some qemu setting trick :-)

Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Alexandre DERUMIER
Sent: Tuesday, June 09, 2015 10:42 PM
To: Irek Fasikhov
Cc: ceph-devel; pushpesh sharma; ceph-users
Subject: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

>>Very good work!
>>Do you have a rpm-file?
>>Thanks.
no sorry, I'm have compiled it manually (and I'm using debian jessie as client)



- Mail original -
De: "Irek Fasikhov" 
À: "aderumier" 
Cc: "Robert LeBlanc" , "ceph-devel" 
, "pushpesh sharma" , 
"ceph-users" 
Envoyé: Mercredi 10 Juin 2015 07:21:42
Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

Hi, Alexandre.

Very good work!
Do you have a rpm-file?
Thanks.

2015-06-10 7:10 GMT+03:00 Alexandre DERUMIER < aderum...@odiso.com > :


Hi,

I have tested qemu with last tcmalloc 2.4, and the improvement is huge with 
iothread: 50k iops (+45%) !



qemu : no iothread : glibc : iops=33395 qemu : no-iothread : tcmalloc (2.2.1) : 
iops=34516 (+3%) qemu : no-iothread : jemmaloc : iops=42226 (+26%) qemu : 
no-iothread : tcmalloc (2.4) : iops=35974 (+7%)


qemu : iothread : glibc : iops=34516
qemu : iothread : tcmalloc : iops=38676 (+12%) qemu : iothread : jemmaloc : 
iops=28023 (-19%) qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%)





qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%)
--
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, 
iodepth=32
fio-2.1.11
Starting 1 process
Jobs: 1 (f=1): [r(1)] [100.0% done] [214.7MB/0KB/0KB /s] [54.1K/0/0 iops] [eta 
00m:00s]
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=894: Wed Jun 10 05:54:24 
2015 read : io=5120.0MB, bw=201108KB/s, iops=50276, runt= 26070msec slat 
(usec): min=1, max=1136, avg= 3.54, stdev= 3.58 clat (usec): min=128, max=6262, 
avg=631.41, stdev=197.71 lat (usec): min=149, max=6265, avg=635.27, 
stdev=197.40 clat percentiles (usec):
| 1.00th=[ 318], 5.00th=[ 378], 10.00th=[ 418], 20.00th=[ 474],
| 30.00th=[ 516], 40.00th=[ 564], 50.00th=[ 612], 60.00th=[ 652],
| 70.00th=[ 700], 80.00th=[ 756], 90.00th=[ 860], 95.00th=[ 980],
| 99.00th=[ 1272], 99.50th=[ 1384], 99.90th=[ 1688], 99.95th=[ 1896],
| 99.99th=[ 3760]
bw (KB /s): min=145608, max=249688, per=100.00%, avg=201108.00, stdev=21718.87 
lat (usec) : 250=0.04%, 500=25.84%, 750=53.00%, 1000=16.63% lat (msec) : 
2=4.46%, 4=0.03%, 10=0.01% cpu : usr=9.73%, sys=24.93%, ctx=66417, majf=0, 
minf=38 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, 
>=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, 
>=64=0.0% issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 latency : 
target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
READ: io=5120.0MB, aggrb=201107KB/s, minb=201107KB/s, maxb=201107KB/s, 
mint=26070msec, maxt=26070msec

Disk stats (read/write):
vdb: ios=1302555/0, merge=0/0, ticks=715176/0, in_queue=714840, util=99.73%






rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, 
iodepth=32
fio-2.1.11
Starting 1 process
Jobs: 1 (f=1): [r(1)] [100.0% done] [158.7MB/0KB/0KB /s] [40.6K/0/0 iops] [eta 
00m:00s]
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=889: Wed Jun 10 06:05:06 
2015 read : io=5120.0MB, bw=143897KB/s, iops=35974, runt= 36435msec slat 
(usec): min=1, max=710, avg= 3.31, stdev= 3.35 clat (usec): min=191, max=4740, 
avg=884.66, stdev=315.65 lat (usec): min=289, max=4743, avg=888.31, 
stdev=315.51 clat percentiles (usec):
| 1.00th=[ 462], 5.00th=[ 516], 10.00th=[ 548], 20.00th=[ 596],
| 30.00th=[ 652], 40.00th=[ 764], 50.00th=[ 868], 60.00th=[ 940],
| 70.00th=[ 1004], 80.00th=[ 1096], 90.00th=[ 1256], 95.00th=[ 1416],
| 99.00th=[ 2024], 99.50th=[ 2224], 99.90th=[ 2544], 99.95th=[ 2640],
| 99.99th=[ 3632]
bw (KB /s): min=98352, max=177328, per=99.91%, avg=143772.11, stdev=21782.39 
lat (usec) : 250=0.01%, 500=3.48%, 750=35.69%, 1000=30.01% lat (msec) : 
2=29.74%, 4=1.07%, 10=0.01% cpu : usr=7.10%, sys=16.90%, ctx=54855, majf=0, 
minf=38 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, 
>=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, 
>=64=0.0% issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 latency : 
target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
READ: io=5120.0MB, aggrb=143896KB/s, minb=143896KB/s, maxb=143896KB/s, 
mint=36435msec, maxt=36435msec

Disk stats (read/write):
vdb: ios=1301357/0, merge=0/0, ticks=1033036/0, in_q

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Alexandre DERUMIER
>>Very good work! 
>>Do you have a rpm-file? 
>>Thanks. 
no sorry, I'm have compiled it manually (and I'm using debian jessie as client)



- Mail original -
De: "Irek Fasikhov" 
À: "aderumier" 
Cc: "Robert LeBlanc" , "ceph-devel" 
, "pushpesh sharma" , 
"ceph-users" 
Envoyé: Mercredi 10 Juin 2015 07:21:42
Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

Hi, Alexandre. 

Very good work! 
Do you have a rpm-file? 
Thanks. 

2015-06-10 7:10 GMT+03:00 Alexandre DERUMIER < aderum...@odiso.com > : 


Hi, 

I have tested qemu with last tcmalloc 2.4, and the improvement is huge with 
iothread: 50k iops (+45%) ! 



qemu : no iothread : glibc : iops=33395 
qemu : no-iothread : tcmalloc (2.2.1) : iops=34516 (+3%) 
qemu : no-iothread : jemmaloc : iops=42226 (+26%) 
qemu : no-iothread : tcmalloc (2.4) : iops=35974 (+7%) 


qemu : iothread : glibc : iops=34516 
qemu : iothread : tcmalloc : iops=38676 (+12%) 
qemu : iothread : jemmaloc : iops=28023 (-19%) 
qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) 





qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) 
-- 
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, 
iodepth=32 
fio-2.1.11 
Starting 1 process 
Jobs: 1 (f=1): [r(1)] [100.0% done] [214.7MB/0KB/0KB /s] [54.1K/0/0 iops] [eta 
00m:00s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=894: Wed Jun 10 05:54:24 
2015 
read : io=5120.0MB, bw=201108KB/s, iops=50276, runt= 26070msec 
slat (usec): min=1, max=1136, avg= 3.54, stdev= 3.58 
clat (usec): min=128, max=6262, avg=631.41, stdev=197.71 
lat (usec): min=149, max=6265, avg=635.27, stdev=197.40 
clat percentiles (usec): 
| 1.00th=[ 318], 5.00th=[ 378], 10.00th=[ 418], 20.00th=[ 474], 
| 30.00th=[ 516], 40.00th=[ 564], 50.00th=[ 612], 60.00th=[ 652], 
| 70.00th=[ 700], 80.00th=[ 756], 90.00th=[ 860], 95.00th=[ 980], 
| 99.00th=[ 1272], 99.50th=[ 1384], 99.90th=[ 1688], 99.95th=[ 1896], 
| 99.99th=[ 3760] 
bw (KB /s): min=145608, max=249688, per=100.00%, avg=201108.00, stdev=21718.87 
lat (usec) : 250=0.04%, 500=25.84%, 750=53.00%, 1000=16.63% 
lat (msec) : 2=4.46%, 4=0.03%, 10=0.01% 
cpu : usr=9.73%, sys=24.93%, ctx=66417, majf=0, minf=38 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=5120.0MB, aggrb=201107KB/s, minb=201107KB/s, maxb=201107KB/s, 
mint=26070msec, maxt=26070msec 

Disk stats (read/write): 
vdb: ios=1302555/0, merge=0/0, ticks=715176/0, in_queue=714840, util=99.73% 






rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, 
iodepth=32 
fio-2.1.11 
Starting 1 process 
Jobs: 1 (f=1): [r(1)] [100.0% done] [158.7MB/0KB/0KB /s] [40.6K/0/0 iops] [eta 
00m:00s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=889: Wed Jun 10 06:05:06 
2015 
read : io=5120.0MB, bw=143897KB/s, iops=35974, runt= 36435msec 
slat (usec): min=1, max=710, avg= 3.31, stdev= 3.35 
clat (usec): min=191, max=4740, avg=884.66, stdev=315.65 
lat (usec): min=289, max=4743, avg=888.31, stdev=315.51 
clat percentiles (usec): 
| 1.00th=[ 462], 5.00th=[ 516], 10.00th=[ 548], 20.00th=[ 596], 
| 30.00th=[ 652], 40.00th=[ 764], 50.00th=[ 868], 60.00th=[ 940], 
| 70.00th=[ 1004], 80.00th=[ 1096], 90.00th=[ 1256], 95.00th=[ 1416], 
| 99.00th=[ 2024], 99.50th=[ 2224], 99.90th=[ 2544], 99.95th=[ 2640], 
| 99.99th=[ 3632] 
bw (KB /s): min=98352, max=177328, per=99.91%, avg=143772.11, stdev=21782.39 
lat (usec) : 250=0.01%, 500=3.48%, 750=35.69%, 1000=30.01% 
lat (msec) : 2=29.74%, 4=1.07%, 10=0.01% 
cpu : usr=7.10%, sys=16.90%, ctx=54855, majf=0, minf=38 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=5120.0MB, aggrb=143896KB/s, minb=143896KB/s, maxb=143896KB/s, 
mint=36435msec, maxt=36435msec 

Disk stats (read/write): 
vdb: ios=1301357/0, merge=0/0, ticks=1033036/0, in_queue=1032716, util=99.85% 


- Mail original - 
De: "aderumier" < aderum...@odiso.com > 
À: "Robert LeBlanc" < rob...@leblancnet.us > 
Cc: "Mark Nelson" < mnel...@redhat.com >, "ceph-devel" < 
ceph-de...@vger.kernel.org >, "pushpesh sharma" < pushpesh@gmail.com >, 
"ceph-users" < ceph-users@lists.ceph.com > 

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Irek Fasikhov
Hi, Alexandre.

Very good work!
Do you have a rpm-file?
Thanks.

2015-06-10 7:10 GMT+03:00 Alexandre DERUMIER :

> Hi,
>
> I have tested qemu with last tcmalloc 2.4, and the improvement is huge
> with iothread: 50k iops (+45%) !
>
>
>
> qemu : no iothread : glibc : iops=33395
> qemu : no-iothread : tcmalloc (2.2.1) : iops=34516 (+3%)
> qemu : no-iothread : jemmaloc : iops=42226 (+26%)
> qemu : no-iothread : tcmalloc (2.4) : iops=35974 (+7%)
>
>
> qemu : iothread : glibc : iops=34516
> qemu : iothread : tcmalloc : iops=38676 (+12%)
> qemu : iothread : jemmaloc : iops=28023 (-19%)
> qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%)
>
>
>
>
>
> qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%)
> --
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=libaio, iodepth=32
> fio-2.1.11
> Starting 1 process
> Jobs: 1 (f=1): [r(1)] [100.0% done] [214.7MB/0KB/0KB /s] [54.1K/0/0 iops]
> [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=894: Wed Jun 10
> 05:54:24 2015
>   read : io=5120.0MB, bw=201108KB/s, iops=50276, runt= 26070msec
> slat (usec): min=1, max=1136, avg= 3.54, stdev= 3.58
> clat (usec): min=128, max=6262, avg=631.41, stdev=197.71
>  lat (usec): min=149, max=6265, avg=635.27, stdev=197.40
> clat percentiles (usec):
>  |  1.00th=[  318],  5.00th=[  378], 10.00th=[  418], 20.00th=[  474],
>  | 30.00th=[  516], 40.00th=[  564], 50.00th=[  612], 60.00th=[  652],
>  | 70.00th=[  700], 80.00th=[  756], 90.00th=[  860], 95.00th=[  980],
>  | 99.00th=[ 1272], 99.50th=[ 1384], 99.90th=[ 1688], 99.95th=[ 1896],
>  | 99.99th=[ 3760]
> bw (KB  /s): min=145608, max=249688, per=100.00%, avg=201108.00,
> stdev=21718.87
> lat (usec) : 250=0.04%, 500=25.84%, 750=53.00%, 1000=16.63%
> lat (msec) : 2=4.46%, 4=0.03%, 10=0.01%
>   cpu  : usr=9.73%, sys=24.93%, ctx=66417, majf=0, minf=38
>   IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
> >=64=0.0%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
> >=64=0.0%
>  issued: total=r=1310720/w=0/d=0, short=r=0/w=0/d=0
>  latency   : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
>READ: io=5120.0MB, aggrb=201107KB/s, minb=201107KB/s, maxb=201107KB/s,
> mint=26070msec, maxt=26070msec
>
> Disk stats (read/write):
>   vdb: ios=1302555/0, merge=0/0, ticks=715176/0, in_queue=714840,
> util=99.73%
>
>
>
>
>
>
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=libaio, iodepth=32
> fio-2.1.11
> Starting 1 process
> Jobs: 1 (f=1): [r(1)] [100.0% done] [158.7MB/0KB/0KB /s] [40.6K/0/0 iops]
> [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=889: Wed Jun 10
> 06:05:06 2015
>   read : io=5120.0MB, bw=143897KB/s, iops=35974, runt= 36435msec
> slat (usec): min=1, max=710, avg= 3.31, stdev= 3.35
> clat (usec): min=191, max=4740, avg=884.66, stdev=315.65
>  lat (usec): min=289, max=4743, avg=888.31, stdev=315.51
> clat percentiles (usec):
>  |  1.00th=[  462],  5.00th=[  516], 10.00th=[  548], 20.00th=[  596],
>  | 30.00th=[  652], 40.00th=[  764], 50.00th=[  868], 60.00th=[  940],
>  | 70.00th=[ 1004], 80.00th=[ 1096], 90.00th=[ 1256], 95.00th=[ 1416],
>  | 99.00th=[ 2024], 99.50th=[ 2224], 99.90th=[ 2544], 99.95th=[ 2640],
>  | 99.99th=[ 3632]
> bw (KB  /s): min=98352, max=177328, per=99.91%, avg=143772.11,
> stdev=21782.39
> lat (usec) : 250=0.01%, 500=3.48%, 750=35.69%, 1000=30.01%
> lat (msec) : 2=29.74%, 4=1.07%, 10=0.01%
>   cpu  : usr=7.10%, sys=16.90%, ctx=54855, majf=0, minf=38
>   IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
> >=64=0.0%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
> >=64=0.0%
>  issued: total=r=1310720/w=0/d=0, short=r=0/w=0/d=0
>  latency   : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
>READ: io=5120.0MB, aggrb=143896KB/s, minb=143896KB/s, maxb=143896KB/s,
> mint=36435msec, maxt=36435msec
>
> Disk stats (read/write):
>   vdb: ios=1301357/0, merge=0/0, ticks=1033036/0, in_queue=1032716,
> util=99.85%
>
>
> - Mail original -
> De: "aderumier" 
> À: "Robert LeBlanc" 
> Cc: "Mark Nelson" , "ceph-devel" <
> ceph-de...@vger.kernel.org>, "pushpesh s

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Alexandre DERUMIER
Hi,

I have tested qemu with last tcmalloc 2.4, and the improvement is huge with 
iothread: 50k iops (+45%) !



qemu : no iothread : glibc : iops=33395 
qemu : no-iothread : tcmalloc (2.2.1) : iops=34516 (+3%) 
qemu : no-iothread : jemmaloc : iops=42226 (+26%) 
qemu : no-iothread : tcmalloc (2.4) : iops=35974 (+7%)


qemu : iothread : glibc : iops=34516 
qemu : iothread : tcmalloc : iops=38676 (+12%) 
qemu : iothread : jemmaloc : iops=28023 (-19%) 
qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) 





qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) 
--
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, 
iodepth=32
fio-2.1.11
Starting 1 process
Jobs: 1 (f=1): [r(1)] [100.0% done] [214.7MB/0KB/0KB /s] [54.1K/0/0 iops] [eta 
00m:00s]
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=894: Wed Jun 10 05:54:24 
2015
  read : io=5120.0MB, bw=201108KB/s, iops=50276, runt= 26070msec
slat (usec): min=1, max=1136, avg= 3.54, stdev= 3.58
clat (usec): min=128, max=6262, avg=631.41, stdev=197.71
 lat (usec): min=149, max=6265, avg=635.27, stdev=197.40
clat percentiles (usec):
 |  1.00th=[  318],  5.00th=[  378], 10.00th=[  418], 20.00th=[  474],
 | 30.00th=[  516], 40.00th=[  564], 50.00th=[  612], 60.00th=[  652],
 | 70.00th=[  700], 80.00th=[  756], 90.00th=[  860], 95.00th=[  980],
 | 99.00th=[ 1272], 99.50th=[ 1384], 99.90th=[ 1688], 99.95th=[ 1896],
 | 99.99th=[ 3760]
bw (KB  /s): min=145608, max=249688, per=100.00%, avg=201108.00, 
stdev=21718.87
lat (usec) : 250=0.04%, 500=25.84%, 750=53.00%, 1000=16.63%
lat (msec) : 2=4.46%, 4=0.03%, 10=0.01%
  cpu  : usr=9.73%, sys=24.93%, ctx=66417, majf=0, minf=38
  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
 issued: total=r=1310720/w=0/d=0, short=r=0/w=0/d=0
 latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=5120.0MB, aggrb=201107KB/s, minb=201107KB/s, maxb=201107KB/s, 
mint=26070msec, maxt=26070msec

Disk stats (read/write):
  vdb: ios=1302555/0, merge=0/0, ticks=715176/0, in_queue=714840, util=99.73%






rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, 
iodepth=32
fio-2.1.11
Starting 1 process
Jobs: 1 (f=1): [r(1)] [100.0% done] [158.7MB/0KB/0KB /s] [40.6K/0/0 iops] [eta 
00m:00s]
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=889: Wed Jun 10 06:05:06 
2015
  read : io=5120.0MB, bw=143897KB/s, iops=35974, runt= 36435msec
slat (usec): min=1, max=710, avg= 3.31, stdev= 3.35
clat (usec): min=191, max=4740, avg=884.66, stdev=315.65
 lat (usec): min=289, max=4743, avg=888.31, stdev=315.51
clat percentiles (usec):
 |  1.00th=[  462],  5.00th=[  516], 10.00th=[  548], 20.00th=[  596],
 | 30.00th=[  652], 40.00th=[  764], 50.00th=[  868], 60.00th=[  940],
 | 70.00th=[ 1004], 80.00th=[ 1096], 90.00th=[ 1256], 95.00th=[ 1416],
 | 99.00th=[ 2024], 99.50th=[ 2224], 99.90th=[ 2544], 99.95th=[ 2640],
 | 99.99th=[ 3632]
bw (KB  /s): min=98352, max=177328, per=99.91%, avg=143772.11, 
stdev=21782.39
lat (usec) : 250=0.01%, 500=3.48%, 750=35.69%, 1000=30.01%
lat (msec) : 2=29.74%, 4=1.07%, 10=0.01%
  cpu  : usr=7.10%, sys=16.90%, ctx=54855, majf=0, minf=38
  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
 issued: total=r=1310720/w=0/d=0, short=r=0/w=0/d=0
 latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=5120.0MB, aggrb=143896KB/s, minb=143896KB/s, maxb=143896KB/s, 
mint=36435msec, maxt=36435msec

Disk stats (read/write):
  vdb: ios=1301357/0, merge=0/0, ticks=1033036/0, in_queue=1032716, util=99.85%


- Mail original -
De: "aderumier" 
À: "Robert LeBlanc" 
Cc: "Mark Nelson" , "ceph-devel" 
, "pushpesh sharma" , 
"ceph-users" 
Envoyé: Mardi 9 Juin 2015 18:47:27
Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

Hi Robert, 

>>What I found was that Ceph OSDs performed well with either 
>>tcmalloc or jemalloc (except when RocksDB was built with jemalloc 
>>instead of tcmalloc, I'm still working to dig into why that might be 
>>the case). 
yes,from my test, for osd tcmalloc is a little faster (but very little) than 
jemalloc. 



>>However, I found that tcmalloc with QEMU/KVM was very detrimental to 
>>small I/O, but provided huge gains in I/O >=1MB. Jemalloc was much 
>>better for QEMU

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Alexandre DERUMIER
>>At high queue-depths and high IOPS, I would suspect that the bottleneck is 
>>the single, coarse-grained mutex protecting the cache data structures. It's 
>>been a back burner item to refactor the current cache mutex into 
>>finer->>grained locks. 
>>
>>Jason 

Thanks for the explain Jason.

Anyway, inside qemu, I'm around 35-40k with or without rbd_cache, so it's make 
not too much difference currently.
(maybe some other qemu bottleneck).
 

- Mail original -
De: "Jason Dillaman" 
À: "Mark Nelson" 
Cc: "aderumier" , "pushpesh sharma" 
, "ceph-devel" , 
"ceph-users" 
Envoyé: Mardi 9 Juin 2015 15:39:50
Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

> In the past we've hit some performance issues with RBD cache that we've 
> fixed, but we've never really tried pushing a single VM beyond 40+K read 
> IOPS in testing (or at least I never have). I suspect there's a couple 
> of possibilities as to why it might be slower, but perhaps joshd can 
> chime in as he's more familiar with what that code looks like. 
> 

At high queue-depths and high IOPS, I would suspect that the bottleneck is the 
single, coarse-grained mutex protecting the cache data structures. It's been a 
back burner item to refactor the current cache mutex into finer-grained locks. 

Jason 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Alexandre DERUMIER
mplete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
 issued: total=r=1310720/w=0/d=0, short=r=0/w=0/d=0
 latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=5120.0MB, aggrb=112094KB/s, minb=112094KB/s, maxb=112094KB/s, 
mint=46772msec, maxt=46772msec

Disk stats (read/write):
  vdb: ios=1309169/0, merge=0/0, ticks=1305796/0, in_queue=1305376, util=98.68%



qemu : non-iothread : jemmaloc : iops=42226

rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, 
iodepth=32
fio-2.1.11
Starting 1 process
Jobs: 1 (f=1): [r(1)] [100.0% done] [171.2MB/0KB/0KB /s] [43.9K/0/0 iops] [eta 
00m:00s]
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=892: Tue Jun  9 18:34:11 
2015
  read : io=5120.0MB, bw=177130KB/s, iops=44282, runt= 29599msec
slat (usec): min=1, max=527, avg= 3.80, stdev= 3.74
clat (usec): min=174, max=3841, avg=717.08, stdev=237.53
 lat (usec): min=210, max=3844, avg=721.23, stdev=237.22
clat percentiles (usec):
 |  1.00th=[  354],  5.00th=[  422], 10.00th=[  462], 20.00th=[  516],
 | 30.00th=[  572], 40.00th=[  628], 50.00th=[  684], 60.00th=[  740],
 | 70.00th=[  804], 80.00th=[  884], 90.00th=[ 1004], 95.00th=[ 1128],
 | 99.00th=[ 1544], 99.50th=[ 1672], 99.90th=[ 1928], 99.95th=[ 2064],
 | 99.99th=[ 2608]
bw (KB  /s): min=138120, max=230816, per=100.00%, avg=177192.14, 
stdev=23440.79
lat (usec) : 250=0.01%, 500=16.24%, 750=45.93%, 1000=27.46%
lat (msec) : 2=10.30%, 4=0.07%
  cpu  : usr=10.14%, sys=23.84%, ctx=60938, majf=0, minf=39
  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
 issued: total=r=1310720/w=0/d=0, short=r=0/w=0/d=0
 latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=5120.0MB, aggrb=177130KB/s, minb=177130KB/s, maxb=177130KB/s, 
mint=29599msec, maxt=29599msec

Disk stats (read/write):
  vdb: ios=1303992/0, merge=0/0, ticks=798008/0, in_queue=797636, util=99.80%



- Mail original -
De: "Robert LeBlanc" 
À: "aderumier" 
Cc: "Mark Nelson" , "ceph-devel" 
, "pushpesh sharma" , 
"ceph-users" 
Envoyé: Mardi 9 Juin 2015 18:00:29
Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

-BEGIN PGP SIGNED MESSAGE- 
Hash: SHA256 

I also saw a similar performance increase by using alternative memory 
allocators. What I found was that Ceph OSDs performed well with either 
tcmalloc or jemalloc (except when RocksDB was built with jemalloc 
instead of tcmalloc, I'm still working to dig into why that might be 
the case). 

However, I found that tcmalloc with QEMU/KVM was very detrimental to 
small I/O, but provided huge gains in I/O >=1MB. Jemalloc was much 
better for QEMU/KVM in the tests that we ran. [1] 

I'm currently looking into I/O bottlenecks around the 16KB range and 
I'm seeing a lot of time in thread creation and destruction, the 
memory allocators are quite a bit down the list (both fio with 
ioengine rbd and on the OSDs). I wonder what the difference can be. 
I've tried using the async messenger but there wasn't a huge 
difference. [2] 

Further down the rabbit hole 

[1] https://www.mail-archive.com/ceph-users@lists.ceph.com/msg20197.html 
[2] https://www.mail-archive.com/ceph-devel@vger.kernel.org/msg23982.html 
-BEGIN PGP SIGNATURE- 
Version: Mailvelope v0.13.1 
Comment: https://www.mailvelope.com 

wsFcBAEBCAAQBQJVdw2ZCRDmVDuy+mK58QAA4MwP/1vt65cvTyyVGGSGRrE8 
unuWjafMHzl486XH+EaVrDVTXFVFOoncJ6kugSpD7yavtCpZNdhsIaTRZguU 
YpfAppNAJU5biSwNv9QPI7kPP2q2+I7Z8ZkvhcVnkjIythoeNnSjV7zJrw87 
afq46GhPHqEXdjp3rOB4RRPniOMnub5oU6QRnKn3HPW8Dx9ZqTeCofRDnCY2 
S695Dt1gzt0ERUOgrUUkt0FQJdkkV6EURcUschngjtEd5727VTLp02HivVl3 
vDYWxQHPK8oS6Xe8GOW0JjulwiqlYotSlrqSU5FMU5gozbk9zMFPIUW1e+51 
9ART8Ta2ItMhPWtAhRwwvxgy51exCy9kBc+m+ptKW5XRUXOImGcOQxszPGOO 
qIIOG1vVG/GBmo/0i6tliqBFYdXmw1qFV7tFiIbisZRH7Q/1NahjYTHqHhu3 
Dv61T6WrerD+9N6S1Lrz1QYe2Fqa56BHhHSXM82NE86SVxEvUkoGegQU+c7b 
6rY1JvuJHJzva7+M2XHApYCchCs4a1Yyd1qWB7yThJD57RIyX1TOg0+siV13 
R+v6wxhQU0vBovH+5oAWmCZaPNT+F0Uvs3xWAxxaIR9r83wMj9qQeBZTKVzQ 
1aFIi15KqAwOp12yWCmrqKTeXhjwYQNd8viCQCGN7AQyPglmzfbuEHalVjz4 
oSJX 
=k281 
-END PGP SIGNATURE- 
 
Robert LeBlanc 
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 


On Tue, Jun 9, 2015 at 6:02 AM, Alexandre DERUMIER  wrote: 
>>>Frankly, I'm a little impressed that without RBD cache we can hit 80K 
>>>IOPS from 1 VM! 
> 
> Note that theses result are not in a vm (fio-rbd on host), so in a vm we'll 
> have overhead. 
> (I'm planning to 

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I also saw a similar performance increase by using alternative memory
allocators. What I found was that Ceph OSDs performed well with either
tcmalloc or jemalloc (except when RocksDB was built with jemalloc
instead of tcmalloc, I'm still working to dig into why that might be
the case).

However, I found that tcmalloc with QEMU/KVM was very detrimental to
small I/O, but provided huge gains in I/O >=1MB. Jemalloc was much
better for QEMU/KVM in the tests that we ran. [1]

I'm currently looking into I/O bottlenecks around the 16KB range and
I'm seeing a lot of time in thread creation and destruction, the
memory allocators are quite a bit down the list (both fio with
ioengine rbd and on the OSDs). I wonder what the difference can be.
I've tried using the async messenger but there wasn't a huge
difference. [2]

Further down the rabbit hole

[1] https://www.mail-archive.com/ceph-users@lists.ceph.com/msg20197.html
[2] https://www.mail-archive.com/ceph-devel@vger.kernel.org/msg23982.html
-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVdw2ZCRDmVDuy+mK58QAA4MwP/1vt65cvTyyVGGSGRrE8
unuWjafMHzl486XH+EaVrDVTXFVFOoncJ6kugSpD7yavtCpZNdhsIaTRZguU
YpfAppNAJU5biSwNv9QPI7kPP2q2+I7Z8ZkvhcVnkjIythoeNnSjV7zJrw87
afq46GhPHqEXdjp3rOB4RRPniOMnub5oU6QRnKn3HPW8Dx9ZqTeCofRDnCY2
S695Dt1gzt0ERUOgrUUkt0FQJdkkV6EURcUschngjtEd5727VTLp02HivVl3
vDYWxQHPK8oS6Xe8GOW0JjulwiqlYotSlrqSU5FMU5gozbk9zMFPIUW1e+51
9ART8Ta2ItMhPWtAhRwwvxgy51exCy9kBc+m+ptKW5XRUXOImGcOQxszPGOO
qIIOG1vVG/GBmo/0i6tliqBFYdXmw1qFV7tFiIbisZRH7Q/1NahjYTHqHhu3
Dv61T6WrerD+9N6S1Lrz1QYe2Fqa56BHhHSXM82NE86SVxEvUkoGegQU+c7b
6rY1JvuJHJzva7+M2XHApYCchCs4a1Yyd1qWB7yThJD57RIyX1TOg0+siV13
R+v6wxhQU0vBovH+5oAWmCZaPNT+F0Uvs3xWAxxaIR9r83wMj9qQeBZTKVzQ
1aFIi15KqAwOp12yWCmrqKTeXhjwYQNd8viCQCGN7AQyPglmzfbuEHalVjz4
oSJX
=k281
-END PGP SIGNATURE-

Robert LeBlanc
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, Jun 9, 2015 at 6:02 AM, Alexandre DERUMIER  wrote:
>>>Frankly, I'm a little impressed that without RBD cache we can hit 80K
>>>IOPS from 1 VM!
>
> Note that theses result are not in a vm (fio-rbd on host), so in a vm we'll 
> have overhead.
> (I'm planning to send results in qemu soon)
>
>>>How fast are the SSDs in those 3 OSDs?
>
> Theses results are with datas in buffer memory of osd nodes.
>
> When reading fulling on ssd (intel s3500),
>
> For 1 client,
>
> I'm around 33k iops without cache and 32k iops with cache, with 1 osd.
> I'm around 55k iops without cache and 38k iops with cache, with 3 osd.
>
> with multiple clients jobs, I can reach around 70kiops by osd , and 250k iops 
> by osd when datas are in buffer.
>
> (cpus servers/clients are 2x 10 cores 3,1ghz e5 xeon)
>
>
>
> small tip :
> I'm using tcmalloc for fio-rbd or rados bench to improve latencies by around 
> 20%
>
> LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4 fio ...
> LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4 rados bench ...
>
> as a lot of time is spent in malloc/free
>
>
> (qemu support also tcmalloc since some months , I'll bench it too
>   https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg05372.html)
>
>
>
> I'll try to send full bench results soon, from 1 to 18 ssd osd.
>
>
>
>
> ----- Mail original -
> De: "Mark Nelson" 
> À: "aderumier" , "pushpesh sharma" 
> 
> Cc: "ceph-devel" , "ceph-users" 
> 
> Envoyé: Mardi 9 Juin 2015 13:36:31
> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k
>
> Hi All,
>
> In the past we've hit some performance issues with RBD cache that we've
> fixed, but we've never really tried pushing a single VM beyond 40+K read
> IOPS in testing (or at least I never have). I suspect there's a couple
> of possibilities as to why it might be slower, but perhaps joshd can
> chime in as he's more familiar with what that code looks like.
>
> Frankly, I'm a little impressed that without RBD cache we can hit 80K
> IOPS from 1 VM! How fast are the SSDs in those 3 OSDs?
>
> Mark
>
> On 06/09/2015 03:36 AM, Alexandre DERUMIER wrote:
>> It's seem that the limit is mainly going in high queue depth (+- > 16)
>>
>> Here the result in iops with 1client- 4krandread- 3osd - with differents 
>> queue depth size.
>> rbd_cache is almost the same than without cache with queue depth <16
>>
>>
>> cache
>> -
>> qd1: 1651
>> qd2: 3482
>> qd4: 7958
>> qd8: 17912
>> qd16: 36020
>> qd32: 42765
>> qd64: 46169
>>
>> no cache
>> -

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Jason Dillaman
> In the past we've hit some performance issues with RBD cache that we've
> fixed, but we've never really tried pushing a single VM beyond 40+K read
> IOPS in testing (or at least I never have).  I suspect there's a couple
> of possibilities as to why it might be slower, but perhaps joshd can
> chime in as he's more familiar with what that code looks like.
> 

At high queue-depths and high IOPS, I would suspect that the bottleneck is the 
single, coarse-grained mutex protecting the cache data structures.  It's been a 
back burner item to refactor the current cache mutex into finer-grained locks.

Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Alexandre DERUMIER
>>Frankly, I'm a little impressed that without RBD cache we can hit 80K 
>>IOPS from 1 VM!

Note that theses result are not in a vm (fio-rbd on host), so in a vm we'll 
have overhead.
(I'm planning to send results in qemu soon)

>>How fast are the SSDs in those 3 OSDs? 

Theses results are with datas in buffer memory of osd nodes.

When reading fulling on ssd (intel s3500),

For 1 client, 

I'm around 33k iops without cache and 32k iops with cache, with 1 osd.
I'm around 55k iops without cache and 38k iops with cache, with 3 osd.

with multiple clients jobs, I can reach around 70kiops by osd , and 250k iops 
by osd when datas are in buffer.

(cpus servers/clients are 2x 10 cores 3,1ghz e5 xeon)



small tip : 
I'm using tcmalloc for fio-rbd or rados bench to improve latencies by around 20%

LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4 fio ...
LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4 rados bench ...

as a lot of time is spent in malloc/free 


(qemu support also tcmalloc since some months , I'll bench it too
  https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg05372.html)



I'll try to send full bench results soon, from 1 to 18 ssd osd.




- Mail original -
De: "Mark Nelson" 
À: "aderumier" , "pushpesh sharma" 
Cc: "ceph-devel" , "ceph-users" 

Envoyé: Mardi 9 Juin 2015 13:36:31
Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

Hi All, 

In the past we've hit some performance issues with RBD cache that we've 
fixed, but we've never really tried pushing a single VM beyond 40+K read 
IOPS in testing (or at least I never have). I suspect there's a couple 
of possibilities as to why it might be slower, but perhaps joshd can 
chime in as he's more familiar with what that code looks like. 

Frankly, I'm a little impressed that without RBD cache we can hit 80K 
IOPS from 1 VM! How fast are the SSDs in those 3 OSDs? 

Mark 

On 06/09/2015 03:36 AM, Alexandre DERUMIER wrote: 
> It's seem that the limit is mainly going in high queue depth (+- > 16) 
> 
> Here the result in iops with 1client- 4krandread- 3osd - with differents 
> queue depth size. 
> rbd_cache is almost the same than without cache with queue depth <16 
> 
> 
> cache 
> - 
> qd1: 1651 
> qd2: 3482 
> qd4: 7958 
> qd8: 17912 
> qd16: 36020 
> qd32: 42765 
> qd64: 46169 
> 
> no cache 
>  
> qd1: 1748 
> qd2: 3570 
> qd4: 8356 
> qd8: 17732 
> qd16: 41396 
> qd32: 78633 
> qd64: 79063 
> qd128: 79550 
> 
> 
> - Mail original - 
> De: "aderumier"  
> À: "pushpesh sharma"  
> Cc: "ceph-devel" , "ceph-users" 
>  
> Envoyé: Mardi 9 Juin 2015 09:28:21 
> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 
> 
> Hi, 
> 
>>> We tried adding more RBDs to single VM, but no luck. 
> 
> If you want to scale with more disks in a single qemu vm, you need to use 
> iothread feature from qemu and assign 1 iothread by disk (works with 
> virtio-blk). 
> It's working for me, I can scale with adding more disks. 
> 
> 
> My bench here are done with fio-rbd on host. 
> I can scale up to 400k iops with 10clients-rbd_cache=off on a single host and 
> around 250kiops 10clients-rbdcache=on. 
> 
> 
> I just wonder why I don't have performance decrease around 30k iops with 
> 1osd. 
> 
> I'm going to see if this tracker 
> http://tracker.ceph.com/issues/11056 
> 
> could be the cause. 
> 
> (My master build was done some week ago) 
> 
> 
> 
> - Mail original - 
> De: "pushpesh sharma"  
> À: "aderumier"  
> Cc: "ceph-devel" , "ceph-users" 
>  
> Envoyé: Mardi 9 Juin 2015 09:21:04 
> Objet: Re: rbd_cache, limiting read on high iops around 40k 
> 
> Hi Alexandre, 
> 
> We have also seen something very similar on Hammer(0.94-1). We were doing 
> some benchmarking for VMs hosted on hypervisor (QEMU-KVM, openstack-juno). 
> Each Ubuntu-VM has a RBD as root disk, and 1 RBD as additional storage. For 
> some strange reason it was not able to scale 4K- RR iops on each VM beyond 
> 35-40k. We tried adding more RBDs to single VM, but no luck. However 
> increasing number of VMs to 4 on a single hypervisor did scale to some 
> extent. After this there was no much benefit we got from adding more VMs. 
> 
> Here is the trend we have seen, x-axis is number of hypervisor, each 
> hypervisor has 4 VM, each VM has 1 RBD:- 
> 
> 
> 
> 
> VDbench is used as benchmarking tool. We were not saturating network and CPUs 
> at OSD nodes. We were not able to saturate CPUs at hypervisors, and that is 

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Mark Nelson

Hi All,

In the past we've hit some performance issues with RBD cache that we've 
fixed, but we've never really tried pushing a single VM beyond 40+K read 
IOPS in testing (or at least I never have).  I suspect there's a couple 
of possibilities as to why it might be slower, but perhaps joshd can 
chime in as he's more familiar with what that code looks like.


Frankly, I'm a little impressed that without RBD cache we can hit 80K 
IOPS from 1 VM!  How fast are the SSDs in those 3 OSDs?


Mark

On 06/09/2015 03:36 AM, Alexandre DERUMIER wrote:

It's seem that the limit is mainly going in high queue depth (+- > 16)

Here the result in iops with 1client- 4krandread- 3osd - with differents queue 
depth size.
rbd_cache is almost the same than without cache with queue depth <16


cache
-
qd1: 1651
qd2: 3482
qd4: 7958
qd8: 17912
qd16: 36020
qd32: 42765
qd64: 46169

no cache

qd1: 1748
qd2: 3570
qd4: 8356
qd8: 17732
qd16: 41396
qd32: 78633
qd64: 79063
qd128: 79550


- Mail original -
De: "aderumier" 
À: "pushpesh sharma" 
Cc: "ceph-devel" , "ceph-users" 

Envoyé: Mardi 9 Juin 2015 09:28:21
Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

Hi,


We tried adding more RBDs to single VM, but no luck.


If you want to scale with more disks in a single qemu vm, you need to use 
iothread feature from qemu and assign 1 iothread by disk (works with 
virtio-blk).
It's working for me, I can scale with adding more disks.


My bench here are done with fio-rbd on host.
I can scale up to 400k iops with 10clients-rbd_cache=off on a single host and 
around 250kiops 10clients-rbdcache=on.


I just wonder why I don't have performance decrease around 30k iops with 1osd.

I'm going to see if this tracker
http://tracker.ceph.com/issues/11056

could be the cause.

(My master build was done some week ago)



- Mail original -
De: "pushpesh sharma" 
À: "aderumier" 
Cc: "ceph-devel" , "ceph-users" 

Envoyé: Mardi 9 Juin 2015 09:21:04
Objet: Re: rbd_cache, limiting read on high iops around 40k

Hi Alexandre,

We have also seen something very similar on Hammer(0.94-1). We were doing some 
benchmarking for VMs hosted on hypervisor (QEMU-KVM, openstack-juno). Each 
Ubuntu-VM has a RBD as root disk, and 1 RBD as additional storage. For some 
strange reason it was not able to scale 4K- RR iops on each VM beyond 35-40k. 
We tried adding more RBDs to single VM, but no luck. However increasing number 
of VMs to 4 on a single hypervisor did scale to some extent. After this there 
was no much benefit we got from adding more VMs.

Here is the trend we have seen, x-axis is number of hypervisor, each hypervisor 
has 4 VM, each VM has 1 RBD:-




VDbench is used as benchmarking tool. We were not saturating network and CPUs 
at OSD nodes. We were not able to saturate CPUs at hypervisors, and that is 
where we were suspecting of some throttling effect. However we haven't setted 
any such limits from nova or kvm end. We tried some CPU pinning and other KVM 
related tuning as well, but no luck.

We tried the same experiment on a bare metal. It was 4K RR IOPs were scaling 
from 40K(1 RBD) to 180K(4 RBDs). But after that rather than scaling beyond that 
point the numbers were actually degrading. (Single pipe more congestion effect)

We never suspected that rbd cache enable could be detrimental to performance. 
It would nice to route cause the problem if that is the case.

On Tue, Jun 9, 2015 at 11:21 AM, Alexandre DERUMIER < aderum...@odiso.com > 
wrote:


Hi,

I'm doing benchmark (ceph master branch), with randread 4k qdepth=32,
and rbd_cache=true seem to limit the iops around 40k


no cache

1 client - rbd_cache=false - 1osd : 38300 iops
1 client - rbd_cache=false - 2osd : 69073 iops
1 client - rbd_cache=false - 3osd : 78292 iops


cache
-
1 client - rbd_cache=true - 1osd : 38100 iops
1 client - rbd_cache=true - 2osd : 42457 iops
1 client - rbd_cache=true - 3osd : 45823 iops



Is it expected ?



fio result rbd_cache=false 3 osd

rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, 
iodepth=32
fio-2.1.11
Starting 1 process
rbd engine: RBD version: 0.1.9
Jobs: 1 (f=1): [r(1)] [100.0% done] [307.5MB/0KB/0KB /s] [78.8K/0/0 iops] [eta 
00m:00s]
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113548: Tue Jun 9 07:48:42 
2015
read : io=1MB, bw=313169KB/s, iops=78292, runt= 32698msec
slat (usec): min=5, max=530, avg=11.77, stdev= 6.77
clat (usec): min=70, max=2240, avg=336.08, stdev=94.82
lat (usec): min=101, max=2247, avg=347.84, stdev=95.49
clat percentiles (usec):
| 1.00th=[ 173], 5.00th=[ 209], 10.00th=[ 231], 20.00th=[ 262],
| 30.00th=[ 282], 40.00th=[ 302], 50.00th=[ 322], 60.00th=[ 346],
| 70.00th=[ 370], 80.00th=[ 402], 90.00th=[ 454], 95.00th=[ 506],
| 99.00th=[ 628], 99.50th=[

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Alexandre DERUMIER
It's seem that the limit is mainly going in high queue depth (+- > 16)

Here the result in iops with 1client- 4krandread- 3osd - with differents queue 
depth size.
rbd_cache is almost the same than without cache with queue depth <16


cache
-
qd1: 1651
qd2: 3482
qd4: 7958
qd8: 17912
qd16: 36020
qd32: 42765
qd64: 46169

no cache

qd1: 1748
qd2: 3570
qd4: 8356
qd8: 17732
qd16: 41396
qd32: 78633
qd64: 79063
qd128: 79550


- Mail original -
De: "aderumier" 
À: "pushpesh sharma" 
Cc: "ceph-devel" , "ceph-users" 

Envoyé: Mardi 9 Juin 2015 09:28:21
Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

Hi, 

>> We tried adding more RBDs to single VM, but no luck. 

If you want to scale with more disks in a single qemu vm, you need to use 
iothread feature from qemu and assign 1 iothread by disk (works with 
virtio-blk). 
It's working for me, I can scale with adding more disks. 


My bench here are done with fio-rbd on host. 
I can scale up to 400k iops with 10clients-rbd_cache=off on a single host and 
around 250kiops 10clients-rbdcache=on. 


I just wonder why I don't have performance decrease around 30k iops with 1osd. 

I'm going to see if this tracker 
http://tracker.ceph.com/issues/11056 

could be the cause. 

(My master build was done some week ago) 



- Mail original - 
De: "pushpesh sharma"  
À: "aderumier"  
Cc: "ceph-devel" , "ceph-users" 
 
Envoyé: Mardi 9 Juin 2015 09:21:04 
Objet: Re: rbd_cache, limiting read on high iops around 40k 

Hi Alexandre, 

We have also seen something very similar on Hammer(0.94-1). We were doing some 
benchmarking for VMs hosted on hypervisor (QEMU-KVM, openstack-juno). Each 
Ubuntu-VM has a RBD as root disk, and 1 RBD as additional storage. For some 
strange reason it was not able to scale 4K- RR iops on each VM beyond 35-40k. 
We tried adding more RBDs to single VM, but no luck. However increasing number 
of VMs to 4 on a single hypervisor did scale to some extent. After this there 
was no much benefit we got from adding more VMs. 

Here is the trend we have seen, x-axis is number of hypervisor, each hypervisor 
has 4 VM, each VM has 1 RBD:- 




VDbench is used as benchmarking tool. We were not saturating network and CPUs 
at OSD nodes. We were not able to saturate CPUs at hypervisors, and that is 
where we were suspecting of some throttling effect. However we haven't setted 
any such limits from nova or kvm end. We tried some CPU pinning and other KVM 
related tuning as well, but no luck. 

We tried the same experiment on a bare metal. It was 4K RR IOPs were scaling 
from 40K(1 RBD) to 180K(4 RBDs). But after that rather than scaling beyond that 
point the numbers were actually degrading. (Single pipe more congestion effect) 

We never suspected that rbd cache enable could be detrimental to performance. 
It would nice to route cause the problem if that is the case. 

On Tue, Jun 9, 2015 at 11:21 AM, Alexandre DERUMIER < aderum...@odiso.com > 
wrote: 


Hi, 

I'm doing benchmark (ceph master branch), with randread 4k qdepth=32, 
and rbd_cache=true seem to limit the iops around 40k 


no cache 
 
1 client - rbd_cache=false - 1osd : 38300 iops 
1 client - rbd_cache=false - 2osd : 69073 iops 
1 client - rbd_cache=false - 3osd : 78292 iops 


cache 
- 
1 client - rbd_cache=true - 1osd : 38100 iops 
1 client - rbd_cache=true - 2osd : 42457 iops 
1 client - rbd_cache=true - 3osd : 45823 iops 



Is it expected ? 



fio result rbd_cache=false 3 osd 
 
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, 
iodepth=32 
fio-2.1.11 
Starting 1 process 
rbd engine: RBD version: 0.1.9 
Jobs: 1 (f=1): [r(1)] [100.0% done] [307.5MB/0KB/0KB /s] [78.8K/0/0 iops] [eta 
00m:00s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113548: Tue Jun 9 07:48:42 
2015 
read : io=1MB, bw=313169KB/s, iops=78292, runt= 32698msec 
slat (usec): min=5, max=530, avg=11.77, stdev= 6.77 
clat (usec): min=70, max=2240, avg=336.08, stdev=94.82 
lat (usec): min=101, max=2247, avg=347.84, stdev=95.49 
clat percentiles (usec): 
| 1.00th=[ 173], 5.00th=[ 209], 10.00th=[ 231], 20.00th=[ 262], 
| 30.00th=[ 282], 40.00th=[ 302], 50.00th=[ 322], 60.00th=[ 346], 
| 70.00th=[ 370], 80.00th=[ 402], 90.00th=[ 454], 95.00th=[ 506], 
| 99.00th=[ 628], 99.50th=[ 692], 99.90th=[ 860], 99.95th=[ 948], 
| 99.99th=[ 1176] 
bw (KB /s): min=238856, max=360448, per=100.00%, avg=313402.34, stdev=25196.21 
lat (usec) : 100=0.01%, 250=15.94%, 500=78.60%, 750=5.19%, 1000=0.23% 
lat (msec) : 2=0.03%, 4=0.01% 
cpu : usr=74.48%, sys=13.25%, ctx=703225, majf=0, minf=12452 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.8%, 16=87.0%, 32=12.1%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=91.6%, 8=3.4%, 16=4.5%, 32=0.4%, 64=0.0%, &

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Alexandre DERUMIER
Hi,

>> We tried adding more RBDs to single VM, but no luck.

If you want to scale with more disks in a single qemu vm, you need to use 
iothread feature from qemu and assign 1 iothread by disk (works with 
virtio-blk).
It's working for me, I can scale with adding more disks.


My bench here are done with fio-rbd on host.
I can scale up to 400k iops with 10clients-rbd_cache=off on a single host and 
around 250kiops 10clients-rbdcache=on.


I just wonder why I don't have performance decrease around 30k iops with 1osd.

I'm going to see if this tracker
http://tracker.ceph.com/issues/11056

could be the cause.

(My master build was done some week ago)



- Mail original -
De: "pushpesh sharma" 
À: "aderumier" 
Cc: "ceph-devel" , "ceph-users" 

Envoyé: Mardi 9 Juin 2015 09:21:04
Objet: Re: rbd_cache, limiting read on high iops around 40k

Hi Alexandre, 

We have also seen something very similar on Hammer(0.94-1). We were doing some 
benchmarking for VMs hosted on hypervisor (QEMU-KVM, openstack-juno). Each 
Ubuntu-VM has a RBD as root disk, and 1 RBD as additional storage. For some 
strange reason it was not able to scale 4K- RR iops on each VM beyond 35-40k. 
We tried adding more RBDs to single VM, but no luck. However increasing number 
of VMs to 4 on a single hypervisor did scale to some extent. After this there 
was no much benefit we got from adding more VMs. 

Here is the trend we have seen, x-axis is number of hypervisor, each hypervisor 
has 4 VM, each VM has 1 RBD:- 



 
VDbench is used as benchmarking tool. We were not saturating network and CPUs 
at OSD nodes. We were not able to saturate CPUs at hypervisors, and that is 
where we were suspecting of some throttling effect. However we haven't setted 
any such limits from nova or kvm end. We tried some CPU pinning and other KVM 
related tuning as well, but no luck. 

We tried the same experiment on a bare metal. It was 4K RR IOPs were scaling 
from 40K(1 RBD) to 180K(4 RBDs). But after that rather than scaling beyond that 
point the numbers were actually degrading. (Single pipe more congestion effect) 

We never suspected that rbd cache enable could be detrimental to performance. 
It would nice to route cause the problem if that is the case. 

On Tue, Jun 9, 2015 at 11:21 AM, Alexandre DERUMIER < aderum...@odiso.com > 
wrote: 


Hi, 

I'm doing benchmark (ceph master branch), with randread 4k qdepth=32, 
and rbd_cache=true seem to limit the iops around 40k 


no cache 
 
1 client - rbd_cache=false - 1osd : 38300 iops 
1 client - rbd_cache=false - 2osd : 69073 iops 
1 client - rbd_cache=false - 3osd : 78292 iops 


cache 
- 
1 client - rbd_cache=true - 1osd : 38100 iops 
1 client - rbd_cache=true - 2osd : 42457 iops 
1 client - rbd_cache=true - 3osd : 45823 iops 



Is it expected ? 



fio result rbd_cache=false 3 osd 
 
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, 
iodepth=32 
fio-2.1.11 
Starting 1 process 
rbd engine: RBD version: 0.1.9 
Jobs: 1 (f=1): [r(1)] [100.0% done] [307.5MB/0KB/0KB /s] [78.8K/0/0 iops] [eta 
00m:00s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113548: Tue Jun 9 07:48:42 
2015 
read : io=1MB, bw=313169KB/s, iops=78292, runt= 32698msec 
slat (usec): min=5, max=530, avg=11.77, stdev= 6.77 
clat (usec): min=70, max=2240, avg=336.08, stdev=94.82 
lat (usec): min=101, max=2247, avg=347.84, stdev=95.49 
clat percentiles (usec): 
| 1.00th=[ 173], 5.00th=[ 209], 10.00th=[ 231], 20.00th=[ 262], 
| 30.00th=[ 282], 40.00th=[ 302], 50.00th=[ 322], 60.00th=[ 346], 
| 70.00th=[ 370], 80.00th=[ 402], 90.00th=[ 454], 95.00th=[ 506], 
| 99.00th=[ 628], 99.50th=[ 692], 99.90th=[ 860], 99.95th=[ 948], 
| 99.99th=[ 1176] 
bw (KB /s): min=238856, max=360448, per=100.00%, avg=313402.34, stdev=25196.21 
lat (usec) : 100=0.01%, 250=15.94%, 500=78.60%, 750=5.19%, 1000=0.23% 
lat (msec) : 2=0.03%, 4=0.01% 
cpu : usr=74.48%, sys=13.25%, ctx=703225, majf=0, minf=12452 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.8%, 16=87.0%, 32=12.1%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=91.6%, 8=3.4%, 16=4.5%, 32=0.4%, 64=0.0%, >=64=0.0% 
issued : total=r=256/w=0/d=0, short=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=1MB, aggrb=313169KB/s, minb=313169KB/s, maxb=313169KB/s, 
mint=32698msec, maxt=32698msec 

Disk stats (read/write): 
dm-0: ios=0/45, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/24, 
aggrmerge=0/21, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% 
sda: ios=0/24, merge=0/21, ticks=0/0, in_queue=0, util=0.00% 




fio result rbd_cache=true 3osd 
-- 

rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, 
iodepth=32 
fio-2.1.11 
Starting 1 process 
rbd engine: RBD version: 0.1.9 
Jobs: 1 (f=1): [r(1)] [100.0% done] [171.6MB/0KB/0KB /s] [43.1K/0/0 iops] [eta 
00m:

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread pushpesh sharma
Hi Alexandre,

We have also seen something very similar on Hammer(0.94-1). We were doing
some benchmarking for VMs hosted on hypervisor (QEMU-KVM, openstack-juno).
Each Ubuntu-VM has a RBD as root disk, and 1 RBD as additional storage. For
some strange reason it was not able to scale 4K- RR iops on each VM beyond
35-40k. We tried adding more RBDs to single VM, but no luck. However
increasing number of VMs to 4 on a single hypervisor did scale to some
extent. After this there was no much benefit we got from adding more VMs.

Here is the trend we have seen, x-axis is number of hypervisor, each
hypervisor has 4 VM, each VM has 1 RBD:-



​
 VDbench is used as benchmarking tool. We were not saturating network and
CPUs at OSD nodes. We were not able to saturate CPUs at hypervisors, and
that is where we were suspecting of some throttling effect. However  we
haven't setted any such limits from nova or kvm end. We tried some CPU
pinning and other KVM related tuning as well, but no luck.

We tried the same experiment on a bare metal. It was 4K RR IOPs were
scaling from 40K(1 RBD) to 180K(4 RBDs). But after that rather than scaling
beyond that point the numbers were actually degrading. (Single pipe more
congestion effect)

We never suspected that rbd cache enable could be detrimental to
performance. It would nice to route cause the problem if that is the case.


On Tue, Jun 9, 2015 at 11:21 AM, Alexandre DERUMIER 
wrote:

> Hi,
>
> I'm doing benchmark (ceph master branch), with randread 4k qdepth=32,
> and rbd_cache=true seem to limit the iops around 40k
>
>
> no cache
> 
> 1 client - rbd_cache=false - 1osd : 38300 iops
> 1 client - rbd_cache=false - 2osd : 69073 iops
> 1 client - rbd_cache=false - 3osd : 78292 iops
>
>
> cache
> -
> 1 client - rbd_cache=true - 1osd : 38100 iops
> 1 client - rbd_cache=true - 2osd : 42457 iops
> 1 client - rbd_cache=true - 3osd : 45823 iops
>
>
>
> Is it expected ?
>
>
>
> fio result rbd_cache=false 3 osd
> 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=rbd, iodepth=32
> fio-2.1.11
> Starting 1 process
> rbd engine: RBD version: 0.1.9
> Jobs: 1 (f=1): [r(1)] [100.0% done] [307.5MB/0KB/0KB /s] [78.8K/0/0 iops]
> [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113548: Tue Jun  9
> 07:48:42 2015
>   read : io=1MB, bw=313169KB/s, iops=78292, runt= 32698msec
> slat (usec): min=5, max=530, avg=11.77, stdev= 6.77
> clat (usec): min=70, max=2240, avg=336.08, stdev=94.82
>  lat (usec): min=101, max=2247, avg=347.84, stdev=95.49
> clat percentiles (usec):
>  |  1.00th=[  173],  5.00th=[  209], 10.00th=[  231], 20.00th=[  262],
>  | 30.00th=[  282], 40.00th=[  302], 50.00th=[  322], 60.00th=[  346],
>  | 70.00th=[  370], 80.00th=[  402], 90.00th=[  454], 95.00th=[  506],
>  | 99.00th=[  628], 99.50th=[  692], 99.90th=[  860], 99.95th=[  948],
>  | 99.99th=[ 1176]
> bw (KB  /s): min=238856, max=360448, per=100.00%, avg=313402.34,
> stdev=25196.21
> lat (usec) : 100=0.01%, 250=15.94%, 500=78.60%, 750=5.19%, 1000=0.23%
> lat (msec) : 2=0.03%, 4=0.01%
>   cpu  : usr=74.48%, sys=13.25%, ctx=703225, majf=0, minf=12452
>   IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.8%, 16=87.0%, 32=12.1%,
> >=64=0.0%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>  complete  : 0=0.0%, 4=91.6%, 8=3.4%, 16=4.5%, 32=0.4%, 64=0.0%,
> >=64=0.0%
>  issued: total=r=256/w=0/d=0, short=r=0/w=0/d=0
>  latency   : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
>READ: io=1MB, aggrb=313169KB/s, minb=313169KB/s, maxb=313169KB/s,
> mint=32698msec, maxt=32698msec
>
> Disk stats (read/write):
> dm-0: ios=0/45, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
> aggrios=0/24, aggrmerge=0/21, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
>   sda: ios=0/24, merge=0/21, ticks=0/0, in_queue=0, util=0.00%
>
>
>
>
> fio result rbd_cache=true 3osd
> --
>
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=rbd, iodepth=32
> fio-2.1.11
> Starting 1 process
> rbd engine: RBD version: 0.1.9
> Jobs: 1 (f=1): [r(1)] [100.0% done] [171.6MB/0KB/0KB /s] [43.1K/0/0 iops]
> [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113389: Tue Jun  9
> 07:47:30 2015
>   read : io=1MB, bw=183296KB/s, iops=45823, runt= 55866msec
> slat (usec): min=7, max=805, avg=21.26, stdev=15.84
> clat (usec): min=101, max=4602, avg=478.55, stdev=143.73
>  lat (usec): min=123, max=4669, avg=499.80, stdev=146.03
> clat percentiles (usec):
>  |  1.00th=[  227],  5.00th=[  274], 10.00th=[  306], 20.00th=[  350],
>  | 30.00th=[  390], 40.00th=[  430], 50.00th=[  470], 60.00th=[  506],
>  | 70.00th=[  548], 80.00th=[  596], 90.00th=[  660], 95.00th=[  724],
>  | 99.00th=[  844], 99.50th=[  908], 99.90th=[ 1112], 99.95th=[ 1288

[ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-08 Thread Alexandre DERUMIER
Hi,

I'm doing benchmark (ceph master branch), with randread 4k qdepth=32,
and rbd_cache=true seem to limit the iops around 40k


no cache

1 client - rbd_cache=false - 1osd : 38300 iops
1 client - rbd_cache=false - 2osd : 69073 iops
1 client - rbd_cache=false - 3osd : 78292 iops


cache
-
1 client - rbd_cache=true - 1osd : 38100 iops
1 client - rbd_cache=true - 2osd : 42457 iops
1 client - rbd_cache=true - 3osd : 45823 iops



Is it expected ? 



fio result rbd_cache=false 3 osd

rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, 
iodepth=32
fio-2.1.11
Starting 1 process
rbd engine: RBD version: 0.1.9
Jobs: 1 (f=1): [r(1)] [100.0% done] [307.5MB/0KB/0KB /s] [78.8K/0/0 iops] [eta 
00m:00s]
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113548: Tue Jun  9 
07:48:42 2015
  read : io=1MB, bw=313169KB/s, iops=78292, runt= 32698msec
slat (usec): min=5, max=530, avg=11.77, stdev= 6.77
clat (usec): min=70, max=2240, avg=336.08, stdev=94.82
 lat (usec): min=101, max=2247, avg=347.84, stdev=95.49
clat percentiles (usec):
 |  1.00th=[  173],  5.00th=[  209], 10.00th=[  231], 20.00th=[  262],
 | 30.00th=[  282], 40.00th=[  302], 50.00th=[  322], 60.00th=[  346],
 | 70.00th=[  370], 80.00th=[  402], 90.00th=[  454], 95.00th=[  506],
 | 99.00th=[  628], 99.50th=[  692], 99.90th=[  860], 99.95th=[  948],
 | 99.99th=[ 1176]
bw (KB  /s): min=238856, max=360448, per=100.00%, avg=313402.34, 
stdev=25196.21
lat (usec) : 100=0.01%, 250=15.94%, 500=78.60%, 750=5.19%, 1000=0.23%
lat (msec) : 2=0.03%, 4=0.01%
  cpu  : usr=74.48%, sys=13.25%, ctx=703225, majf=0, minf=12452
  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.8%, 16=87.0%, 32=12.1%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=91.6%, 8=3.4%, 16=4.5%, 32=0.4%, 64=0.0%, >=64=0.0%
 issued: total=r=256/w=0/d=0, short=r=0/w=0/d=0
 latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=1MB, aggrb=313169KB/s, minb=313169KB/s, maxb=313169KB/s, 
mint=32698msec, maxt=32698msec

Disk stats (read/write):
dm-0: ios=0/45, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/24, 
aggrmerge=0/21, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
  sda: ios=0/24, merge=0/21, ticks=0/0, in_queue=0, util=0.00%




fio result rbd_cache=true 3osd
--

rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, 
iodepth=32
fio-2.1.11
Starting 1 process
rbd engine: RBD version: 0.1.9
Jobs: 1 (f=1): [r(1)] [100.0% done] [171.6MB/0KB/0KB /s] [43.1K/0/0 iops] [eta 
00m:00s]
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113389: Tue Jun  9 
07:47:30 2015
  read : io=1MB, bw=183296KB/s, iops=45823, runt= 55866msec
slat (usec): min=7, max=805, avg=21.26, stdev=15.84
clat (usec): min=101, max=4602, avg=478.55, stdev=143.73
 lat (usec): min=123, max=4669, avg=499.80, stdev=146.03
clat percentiles (usec):
 |  1.00th=[  227],  5.00th=[  274], 10.00th=[  306], 20.00th=[  350],
 | 30.00th=[  390], 40.00th=[  430], 50.00th=[  470], 60.00th=[  506],
 | 70.00th=[  548], 80.00th=[  596], 90.00th=[  660], 95.00th=[  724],
 | 99.00th=[  844], 99.50th=[  908], 99.90th=[ 1112], 99.95th=[ 1288],
 | 99.99th=[ 2192]
bw (KB  /s): min=115280, max=204416, per=100.00%, avg=183315.10, 
stdev=15079.93
lat (usec) : 250=2.42%, 500=55.61%, 750=38.48%, 1000=3.28%
lat (msec) : 2=0.19%, 4=0.01%, 10=0.01%
  cpu  : usr=60.27%, sys=12.01%, ctx=2995393, majf=0, minf=14100
  IO depths: 1=0.1%, 2=0.1%, 4=0.2%, 8=13.5%, 16=81.0%, 32=5.3%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=95.0%, 8=0.1%, 16=1.0%, 32=4.0%, 64=0.0%, >=64=0.0%
 issued: total=r=256/w=0/d=0, short=r=0/w=0/d=0
 latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=1MB, aggrb=183295KB/s, minb=183295KB/s, maxb=183295KB/s, 
mint=55866msec, maxt=55866msec

Disk stats (read/write):
dm-0: ios=0/61, merge=0/0, ticks=0/8, in_queue=8, util=0.01%, aggrios=0/29, 
aggrmerge=0/32, aggrticks=0/8, aggrin_queue=8, aggrutil=0.01%
  sda: ios=0/29, merge=0/32, ticks=0/8, in_queue=8, util=0.01%

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com