Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap
Hallo Jason, thanks again for your time and apologies for long silence but I was busy upgrading to Luminous and converting Filestore->Bluestore. In the meantime, the staging cluster where I was making tests was both upgraded to Ceph-Luminous and upgraded to OpenStack-Pike: good news is that now fstrim works as expected so I think it's not worth it (and difficult/impossible) to investigate further. I may post some more info once I have a maintenance window to upgrade the production cluster (I have to touch nova.conf, and I want to do that during a maintenance). By the way, I am unable to configure Ceph such that the admin socket is made available on the (pure) client node, am going to open a separate issue for this. Thanks! Fulvio Original Message Subject: Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap From: Jason Dillaman To: Fulvio Galeazzi CC: Ceph Users Date: 03/15/2018 01:35 PM OK, last suggestion just to narrow the issue down: ensure you have a functional admin socket and librbd log file as documented here [1]. With the VM running, before you execute "fstrim", run "ceph --admin-daemon /path/to/the/asok/file conf set debug_rbd 20" on the hypervisor host, execute "fstrim" within the VM, and then restore the log settings via "ceph --admin-daemon /path/to/the/asok/file conf set debug_rbd 0/5". Grep the log file for "aio_discard" to verify if QEMU is passing the discard down to librbd. [1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/ On Thu, Mar 15, 2018 at 6:53 AM, Fulvio Galeazzi wrote: Hallo Jason, I am really thankful for your time! Changed the volume features: rbd image 'volume-80838a69-e544-47eb-b981-a4786be89736': . features: layering, exclusive-lock, deep-flatten I had to create several dummy files before seeing and increase with "rbd du": to me, this is sort of indication that dirty blocks are, at least, reused if not properly released. Then I did "rm * ; sync ; fstrim / ; sync" but the size did not go down. Is there a way to instruct Ceph to perform what is not currently happening automatically (namely, scan the object-map of a volume and force cleanup of released blocks)? Or the problem is exactly that such blocks are not seen by Ceph as reusable? By the way, I think I forgot to mention that underlying OSD disks are taken from a FibreChannel storage (DELL MD3860, which is not capable of presenting JBOD so I present single disks as RAID0) and XFS formatted. Thanks! Fulvio Original Message Subject: Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap From: Jason Dillaman To: Fulvio Galeazzi CC: Ceph Users Date: 03/14/2018 02:10 PM Hmm -- perhaps as an experiment, can you disable the object-map and fast-diff features to see if they are incorrectly reporting the object as in-use after a discard? $ rbd --cluster cephpa1 -p cinder-ceph feature disable volume-80838a69-e544-47eb-b981-a4786be89736 object-map,fast-diff On Wed, Mar 14, 2018 at 3:29 AM, Fulvio Galeazzi wrote: Hallo Jason, sure here it is! rbd --cluster cephpa1 -p cinder-ceph info volume-80838a69-e544-47eb-b981-a4786be89736 rbd image 'volume-80838a69-e544-47eb-b981-a4786be89736': size 15360 MB in 3840 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.9e7ffe238e1f29 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: Thanks Fulvio ---- Original Message Subject: Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap From: Jason Dillaman To: Fulvio Galeazzi CC: Ceph Users Date: 03/13/2018 06:33 PM Can you provide the output from "rbd info /volume-80838a69-e544-47eb-b981-a4786be89736"? On Tue, Mar 13, 2018 at 12:30 PM, Fulvio Galeazzi wrote: Hallo! Discards appear like they are being sent to the device. How big of a temporary file did you create and then delete? Did you sync the file to disk before deleting it? What version of qemu-kvm are you running? I made several test with commands like (issuing sync after each operation): dd if=/dev/zero of=/tmp/fileTest bs=1M count=200 oflag=direct What I see is that if I repeat the command with count<=200 the size does not increase. Let's try now with count>200: NAMEPROVISIONED USED volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2284M dd if=/dev/zero of=/tmp/fileTest bs=1M count=750 oflag=direct dd if=/dev/zero of=/tmp/fileTest2 bs=1M count=750 oflag=direct sync NAMEPROVISIONED USED volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2528M rm /tmp/fileTest* sync sudo fstrim -v / /: 14.1 GiB (15145271296 by
Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap
OK, last suggestion just to narrow the issue down: ensure you have a functional admin socket and librbd log file as documented here [1]. With the VM running, before you execute "fstrim", run "ceph --admin-daemon /path/to/the/asok/file conf set debug_rbd 20" on the hypervisor host, execute "fstrim" within the VM, and then restore the log settings via "ceph --admin-daemon /path/to/the/asok/file conf set debug_rbd 0/5". Grep the log file for "aio_discard" to verify if QEMU is passing the discard down to librbd. [1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/ On Thu, Mar 15, 2018 at 6:53 AM, Fulvio Galeazzi wrote: > Hallo Jason, I am really thankful for your time! > > Changed the volume features: > > rbd image 'volume-80838a69-e544-47eb-b981-a4786be89736': > . > features: layering, exclusive-lock, deep-flatten > > I had to create several dummy files before seeing and increase with "rbd > du": to me, this is sort of indication that dirty blocks are, at least, > reused if not properly released. > > Then I did "rm * ; sync ; fstrim / ; sync" but the size did not go down. > Is there a way to instruct Ceph to perform what is not currently happening > automatically (namely, scan the object-map of a volume and force cleanup of > released blocks)? Or the problem is exactly that such blocks are not seen by > Ceph as reusable? > > By the way, I think I forgot to mention that underlying OSD disks are > taken from a FibreChannel storage (DELL MD3860, which is not capable of > presenting JBOD so I present single disks as RAID0) and XFS formatted. > > Thanks! > > Fulvio > > Original Message > Subject: Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap > From: Jason Dillaman > To: Fulvio Galeazzi > CC: Ceph Users > Date: 03/14/2018 02:10 PM > >> Hmm -- perhaps as an experiment, can you disable the object-map and >> fast-diff features to see if they are incorrectly reporting the object >> as in-use after a discard? >> >> $ rbd --cluster cephpa1 -p cinder-ceph feature disable >> volume-80838a69-e544-47eb-b981-a4786be89736 object-map,fast-diff >> >> On Wed, Mar 14, 2018 at 3:29 AM, Fulvio Galeazzi >> wrote: >>> >>> Hallo Jason, sure here it is! >>> >>> rbd --cluster cephpa1 -p cinder-ceph info >>> volume-80838a69-e544-47eb-b981-a4786be89736 >>> rbd image 'volume-80838a69-e544-47eb-b981-a4786be89736': >>> size 15360 MB in 3840 objects >>> order 22 (4096 kB objects) >>> block_name_prefix: rbd_data.9e7ffe238e1f29 >>> format: 2 >>> features: layering, exclusive-lock, object-map, fast-diff, >>> deep-flatten >>> flags: >>> >>>Thanks >>> >>> Fulvio >>> >>> >>> Original Message >>> Subject: Re: [ceph-users] Issue with fstrim and Nova >>> hw_disk_discard=unmap >>> From: Jason Dillaman >>> To: Fulvio Galeazzi >>> CC: Ceph Users >>> Date: 03/13/2018 06:33 PM >>> >>>> Can you provide the output from "rbd info >>> name>/volume-80838a69-e544-47eb-b981-a4786be89736"? >>>> >>>> On Tue, Mar 13, 2018 at 12:30 PM, Fulvio Galeazzi >>>> wrote: >>>>> >>>>> >>>>> Hallo! >>>>> >>>>>> Discards appear like they are being sent to the device. How big of a >>>>>> temporary file did you create and then delete? Did you sync the file >>>>>> to disk before deleting it? What version of qemu-kvm are you running? >>>>> >>>>> >>>>> >>>>> >>>>> I made several test with commands like (issuing sync after each >>>>> operation): >>>>> >>>>> dd if=/dev/zero of=/tmp/fileTest bs=1M count=200 oflag=direct >>>>> >>>>> What I see is that if I repeat the command with count<=200 the size >>>>> does >>>>> not >>>>> increase. >>>>> >>>>> Let's try now with count>200: >>>>> >>>>> NAMEPROVISIONED USED >>>>> volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2284M >>>>> >>>>> dd if=/dev/zero of=/tmp/fileTest bs=1M count=750 oflag=direct >>>>> dd if=/dev/zero
Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap
Hallo Jason, I am really thankful for your time! Changed the volume features: rbd image 'volume-80838a69-e544-47eb-b981-a4786be89736': . features: layering, exclusive-lock, deep-flatten I had to create several dummy files before seeing and increase with "rbd du": to me, this is sort of indication that dirty blocks are, at least, reused if not properly released. Then I did "rm * ; sync ; fstrim / ; sync" but the size did not go down. Is there a way to instruct Ceph to perform what is not currently happening automatically (namely, scan the object-map of a volume and force cleanup of released blocks)? Or the problem is exactly that such blocks are not seen by Ceph as reusable? By the way, I think I forgot to mention that underlying OSD disks are taken from a FibreChannel storage (DELL MD3860, which is not capable of presenting JBOD so I present single disks as RAID0) and XFS formatted. Thanks! Fulvio Original Message -------- Subject: Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap From: Jason Dillaman To: Fulvio Galeazzi CC: Ceph Users Date: 03/14/2018 02:10 PM Hmm -- perhaps as an experiment, can you disable the object-map and fast-diff features to see if they are incorrectly reporting the object as in-use after a discard? $ rbd --cluster cephpa1 -p cinder-ceph feature disable volume-80838a69-e544-47eb-b981-a4786be89736 object-map,fast-diff On Wed, Mar 14, 2018 at 3:29 AM, Fulvio Galeazzi wrote: Hallo Jason, sure here it is! rbd --cluster cephpa1 -p cinder-ceph info volume-80838a69-e544-47eb-b981-a4786be89736 rbd image 'volume-80838a69-e544-47eb-b981-a4786be89736': size 15360 MB in 3840 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.9e7ffe238e1f29 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: Thanks Fulvio Original Message ---- Subject: Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap From: Jason Dillaman To: Fulvio Galeazzi CC: Ceph Users Date: 03/13/2018 06:33 PM Can you provide the output from "rbd info /volume-80838a69-e544-47eb-b981-a4786be89736"? On Tue, Mar 13, 2018 at 12:30 PM, Fulvio Galeazzi wrote: Hallo! Discards appear like they are being sent to the device. How big of a temporary file did you create and then delete? Did you sync the file to disk before deleting it? What version of qemu-kvm are you running? I made several test with commands like (issuing sync after each operation): dd if=/dev/zero of=/tmp/fileTest bs=1M count=200 oflag=direct What I see is that if I repeat the command with count<=200 the size does not increase. Let's try now with count>200: NAMEPROVISIONED USED volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2284M dd if=/dev/zero of=/tmp/fileTest bs=1M count=750 oflag=direct dd if=/dev/zero of=/tmp/fileTest2 bs=1M count=750 oflag=direct sync NAMEPROVISIONED USED volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2528M rm /tmp/fileTest* sync sudo fstrim -v / /: 14.1 GiB (15145271296 bytes) trimmed NAMEPROVISIONED USED volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2528M As for qemu-kvm, the guest OS is CentOS7, with: [centos@testcentos-deco3 tmp]$ rpm -qa | grep qemu qemu-guest-agent-2.8.0-2.el7.x86_64 while the host is Ubuntu 16 with: root@pa1-r2-s10:/home/ubuntu# dpkg -l | grep qemu ii qemu-block-extra:amd64 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64extra block backend modules for qemu-system and qemu-utils ii qemu-kvm 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64QEMU Full virtualization ii qemu-system-common 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64QEMU full system emulation binaries (common files) ii qemu-system-x86 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64QEMU full system emulation binaries (x86) ii qemu-utils 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64QEMU utilities Thanks! Fulvio smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap
Hmm -- perhaps as an experiment, can you disable the object-map and fast-diff features to see if they are incorrectly reporting the object as in-use after a discard? $ rbd --cluster cephpa1 -p cinder-ceph feature disable volume-80838a69-e544-47eb-b981-a4786be89736 object-map,fast-diff On Wed, Mar 14, 2018 at 3:29 AM, Fulvio Galeazzi wrote: > Hallo Jason, sure here it is! > > rbd --cluster cephpa1 -p cinder-ceph info > volume-80838a69-e544-47eb-b981-a4786be89736 > rbd image 'volume-80838a69-e544-47eb-b981-a4786be89736': > size 15360 MB in 3840 objects > order 22 (4096 kB objects) > block_name_prefix: rbd_data.9e7ffe238e1f29 > format: 2 > features: layering, exclusive-lock, object-map, fast-diff, > deep-flatten > flags: > > Thanks > > Fulvio > > > ---- Original Message > Subject: Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap > From: Jason Dillaman > To: Fulvio Galeazzi > CC: Ceph Users > Date: 03/13/2018 06:33 PM > >> Can you provide the output from "rbd info > name>/volume-80838a69-e544-47eb-b981-a4786be89736"? >> >> On Tue, Mar 13, 2018 at 12:30 PM, Fulvio Galeazzi >> wrote: >>> >>> Hallo! >>> >>>> Discards appear like they are being sent to the device. How big of a >>>> temporary file did you create and then delete? Did you sync the file >>>> to disk before deleting it? What version of qemu-kvm are you running? >>> >>> >>> >>> I made several test with commands like (issuing sync after each >>> operation): >>> >>> dd if=/dev/zero of=/tmp/fileTest bs=1M count=200 oflag=direct >>> >>> What I see is that if I repeat the command with count<=200 the size does >>> not >>> increase. >>> >>> Let's try now with count>200: >>> >>> NAMEPROVISIONED USED >>> volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2284M >>> >>> dd if=/dev/zero of=/tmp/fileTest bs=1M count=750 oflag=direct >>> dd if=/dev/zero of=/tmp/fileTest2 bs=1M count=750 oflag=direct >>> sync >>> >>> NAMEPROVISIONED USED >>> volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2528M >>> >>> rm /tmp/fileTest* >>> sync >>> sudo fstrim -v / >>> /: 14.1 GiB (15145271296 bytes) trimmed >>> >>> NAMEPROVISIONED USED >>> volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2528M >>> >>> >>> >>> As for qemu-kvm, the guest OS is CentOS7, with: >>> >>> [centos@testcentos-deco3 tmp]$ rpm -qa | grep qemu >>> qemu-guest-agent-2.8.0-2.el7.x86_64 >>> >>> while the host is Ubuntu 16 with: >>> >>> root@pa1-r2-s10:/home/ubuntu# dpkg -l | grep qemu >>> ii qemu-block-extra:amd64 1:2.8+dfsg-3ubuntu2.9~cloud1 >>> amd64extra block backend modules for qemu-system and qemu-utils >>> ii qemu-kvm 1:2.8+dfsg-3ubuntu2.9~cloud1 >>> amd64QEMU Full virtualization >>> ii qemu-system-common 1:2.8+dfsg-3ubuntu2.9~cloud1 >>> amd64QEMU full system emulation binaries (common files) >>> ii qemu-system-x86 1:2.8+dfsg-3ubuntu2.9~cloud1 >>> amd64QEMU full system emulation binaries (x86) >>> ii qemu-utils 1:2.8+dfsg-3ubuntu2.9~cloud1 >>> amd64QEMU utilities >>> >>> >>>Thanks! >>> >>> Fulvio >>> >> >> >> > -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap
Hallo Jason, sure here it is! rbd --cluster cephpa1 -p cinder-ceph info volume-80838a69-e544-47eb-b981-a4786be89736 rbd image 'volume-80838a69-e544-47eb-b981-a4786be89736': size 15360 MB in 3840 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.9e7ffe238e1f29 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: Thanks Fulvio Original Message Subject: Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap From: Jason Dillaman To: Fulvio Galeazzi CC: Ceph Users Date: 03/13/2018 06:33 PM Can you provide the output from "rbd info /volume-80838a69-e544-47eb-b981-a4786be89736"? On Tue, Mar 13, 2018 at 12:30 PM, Fulvio Galeazzi wrote: Hallo! Discards appear like they are being sent to the device. How big of a temporary file did you create and then delete? Did you sync the file to disk before deleting it? What version of qemu-kvm are you running? I made several test with commands like (issuing sync after each operation): dd if=/dev/zero of=/tmp/fileTest bs=1M count=200 oflag=direct What I see is that if I repeat the command with count<=200 the size does not increase. Let's try now with count>200: NAMEPROVISIONED USED volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2284M dd if=/dev/zero of=/tmp/fileTest bs=1M count=750 oflag=direct dd if=/dev/zero of=/tmp/fileTest2 bs=1M count=750 oflag=direct sync NAMEPROVISIONED USED volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2528M rm /tmp/fileTest* sync sudo fstrim -v / /: 14.1 GiB (15145271296 bytes) trimmed NAMEPROVISIONED USED volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2528M As for qemu-kvm, the guest OS is CentOS7, with: [centos@testcentos-deco3 tmp]$ rpm -qa | grep qemu qemu-guest-agent-2.8.0-2.el7.x86_64 while the host is Ubuntu 16 with: root@pa1-r2-s10:/home/ubuntu# dpkg -l | grep qemu ii qemu-block-extra:amd64 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64extra block backend modules for qemu-system and qemu-utils ii qemu-kvm 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64QEMU Full virtualization ii qemu-system-common 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64QEMU full system emulation binaries (common files) ii qemu-system-x86 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64QEMU full system emulation binaries (x86) ii qemu-utils 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64QEMU utilities Thanks! Fulvio smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap
Can you provide the output from "rbd info /volume-80838a69-e544-47eb-b981-a4786be89736"? On Tue, Mar 13, 2018 at 12:30 PM, Fulvio Galeazzi wrote: > Hallo! > >> Discards appear like they are being sent to the device. How big of a >> temporary file did you create and then delete? Did you sync the file >> to disk before deleting it? What version of qemu-kvm are you running? > > > I made several test with commands like (issuing sync after each operation): > > dd if=/dev/zero of=/tmp/fileTest bs=1M count=200 oflag=direct > > What I see is that if I repeat the command with count<=200 the size does not > increase. > > Let's try now with count>200: > > NAMEPROVISIONED USED > volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2284M > > dd if=/dev/zero of=/tmp/fileTest bs=1M count=750 oflag=direct > dd if=/dev/zero of=/tmp/fileTest2 bs=1M count=750 oflag=direct > sync > > NAMEPROVISIONED USED > volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2528M > > rm /tmp/fileTest* > sync > sudo fstrim -v / > /: 14.1 GiB (15145271296 bytes) trimmed > > NAMEPROVISIONED USED > volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2528M > > > > As for qemu-kvm, the guest OS is CentOS7, with: > > [centos@testcentos-deco3 tmp]$ rpm -qa | grep qemu > qemu-guest-agent-2.8.0-2.el7.x86_64 > > while the host is Ubuntu 16 with: > > root@pa1-r2-s10:/home/ubuntu# dpkg -l | grep qemu > ii qemu-block-extra:amd64 1:2.8+dfsg-3ubuntu2.9~cloud1 > amd64extra block backend modules for qemu-system and qemu-utils > ii qemu-kvm 1:2.8+dfsg-3ubuntu2.9~cloud1 > amd64QEMU Full virtualization > ii qemu-system-common 1:2.8+dfsg-3ubuntu2.9~cloud1 > amd64QEMU full system emulation binaries (common files) > ii qemu-system-x86 1:2.8+dfsg-3ubuntu2.9~cloud1 > amd64QEMU full system emulation binaries (x86) > ii qemu-utils 1:2.8+dfsg-3ubuntu2.9~cloud1 > amd64QEMU utilities > > > Thanks! > > Fulvio > -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap
Discards appear like they are being sent to the device. How big of a temporary file did you create and then delete? Did you sync the file to disk before deleting it? What version of qemu-kvm are you running? On Tue, Mar 13, 2018 at 11:09 AM, Fulvio Galeazzi wrote: > Hallo Jason, > thanks for your feedback! > > Original Message >> * decorated a CentOS image with > hw_scsi_model=virtio--scsi,hw_disk_bus=scsi> > Is that just a typo for > "hw_scsi_model"? > Yes, it was a typo when I wrote my message. The image has virtio-scsi as it > should. > >>> I see that commands: >>> rbd --cluster cephpa1 diff cinder-ceph/${theVol} | awk '{ SUM += $2 } END >>> { >>> print SUM/1024/1024 " MB" }' ; rados --cluster cephpa1 -p cinder-ceph ls >>> | >>> grep rbd_data.{whatever} | wc -l >> >> >> That's pretty old-school -- you can just use 'rbd du" now to calculate >> the disk usage. > > > Good to know, thanks! > >>> show the size increases but does not decrease when I execute delete the >>> temporary file and execute >>> sudo fstrim -v / >> >> >> Have you verified that your VM is indeed using virtio-scsi? Does >> blktrace show SCSI UNMAP operations being issued to the block device >> when you execute "fstrim"? > > > Thanks for the tip, I think I need some more help, please. > > Disk on my VM is indeed /dev/sda rather than /dev/vda. The XML shows: > . > > > . >name='cinder-ceph/volume-80838a69-e544-47eb-b981-a4786be89736'> > . > > 80838a69-e544-47eb-b981-a4786be89736 > > > >function='0x0'/> > > > > As for blktrace, blkparse shows me tons of lines, please find below the > first ones and one of the many group of lines which I see: > > 8,00 11 4.333917112 24677 Q FWFSM 8406583 + 4 [fstrim] > 8,00 12 4.333919649 24677 G FWFSM 8406583 + 4 [fstrim] > 8,00 13 4.333920695 24677 P N [fstrim] > 8,00 14 4.333922965 24677 I FWFSM 8406583 + 4 [fstrim] > 8,00 15 4.333924575 24677 U N [fstrim] 1 > 8,00 20 4.340140041 24677 Q D 986016 + 2097152 [fstrim] > 8,00 21 4.340144908 24677 G D 986016 + 2097152 [fstrim] > 8,00 22 4.340145561 24677 P N [fstrim] > 8,00 24 4.340147495 24677 Q D 3083168 + 1112672 [fstrim] > 8,00 25 4.340149772 24677 G D 3083168 + 1112672 [fstrim] > . > 8,00 50 4.340556955 24677 Q D 665880 + 20008 [fstrim] > 8,00 51 4.340558481 24677 G D 665880 + 20008 [fstrim] > 8,00 52 4.340558728 24677 P N [fstrim] > 8,00 53 4.340559725 24677 I D 665880 + 20008 [fstrim] > 8,00 54 4.340560292 24677 U N [fstrim] 1 > 8,00 55 4.340560801 24677 D D 665880 + 20008 [fstrim] > . > > Apologies for my ignorance, is the above enough to understand whether SCSI > UNMAP operations are being issued? > > Thanks a lot! > > Fulvio > -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap
Hallo! Discards appear like they are being sent to the device. How big of a temporary file did you create and then delete? Did you sync the file to disk before deleting it? What version of qemu-kvm are you running? I made several test with commands like (issuing sync after each operation): dd if=/dev/zero of=/tmp/fileTest bs=1M count=200 oflag=direct What I see is that if I repeat the command with count<=200 the size does not increase. Let's try now with count>200: NAMEPROVISIONED USED volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2284M dd if=/dev/zero of=/tmp/fileTest bs=1M count=750 oflag=direct dd if=/dev/zero of=/tmp/fileTest2 bs=1M count=750 oflag=direct sync NAMEPROVISIONED USED volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2528M rm /tmp/fileTest* sync sudo fstrim -v / /: 14.1 GiB (15145271296 bytes) trimmed NAMEPROVISIONED USED volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2528M As for qemu-kvm, the guest OS is CentOS7, with: [centos@testcentos-deco3 tmp]$ rpm -qa | grep qemu qemu-guest-agent-2.8.0-2.el7.x86_64 while the host is Ubuntu 16 with: root@pa1-r2-s10:/home/ubuntu# dpkg -l | grep qemu ii qemu-block-extra:amd64 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64extra block backend modules for qemu-system and qemu-utils ii qemu-kvm 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64QEMU Full virtualization ii qemu-system-common 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64QEMU full system emulation binaries (common files) ii qemu-system-x86 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64QEMU full system emulation binaries (x86) ii qemu-utils 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64QEMU utilities Thanks! Fulvio smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap
Hallo Jason, thanks for your feedback! Original Message >> * decorated a CentOS image with hw_scsi_model=virtio--scsi,hw_disk_bus=scsi> > Is that just a typo for "hw_scsi_model"? Yes, it was a typo when I wrote my message. The image has virtio-scsi as it should. I see that commands: rbd --cluster cephpa1 diff cinder-ceph/${theVol} | awk '{ SUM += $2 } END { print SUM/1024/1024 " MB" }' ; rados --cluster cephpa1 -p cinder-ceph ls | grep rbd_data.{whatever} | wc -l That's pretty old-school -- you can just use 'rbd du" now to calculate the disk usage. Good to know, thanks! show the size increases but does not decrease when I execute delete the temporary file and execute sudo fstrim -v / Have you verified that your VM is indeed using virtio-scsi? Does blktrace show SCSI UNMAP operations being issued to the block device when you execute "fstrim"? Thanks for the tip, I think I need some more help, please. Disk on my VM is indeed /dev/sda rather than /dev/vda. The XML shows: . . name='cinder-ceph/volume-80838a69-e544-47eb-b981-a4786be89736'> . 80838a69-e544-47eb-b981-a4786be89736 function='0x0'/> As for blktrace, blkparse shows me tons of lines, please find below the first ones and one of the many group of lines which I see: 8,00 11 4.333917112 24677 Q FWFSM 8406583 + 4 [fstrim] 8,00 12 4.333919649 24677 G FWFSM 8406583 + 4 [fstrim] 8,00 13 4.333920695 24677 P N [fstrim] 8,00 14 4.333922965 24677 I FWFSM 8406583 + 4 [fstrim] 8,00 15 4.333924575 24677 U N [fstrim] 1 8,00 20 4.340140041 24677 Q D 986016 + 2097152 [fstrim] 8,00 21 4.340144908 24677 G D 986016 + 2097152 [fstrim] 8,00 22 4.340145561 24677 P N [fstrim] 8,00 24 4.340147495 24677 Q D 3083168 + 1112672 [fstrim] 8,00 25 4.340149772 24677 G D 3083168 + 1112672 [fstrim] . 8,00 50 4.340556955 24677 Q D 665880 + 20008 [fstrim] 8,00 51 4.340558481 24677 G D 665880 + 20008 [fstrim] 8,00 52 4.340558728 24677 P N [fstrim] 8,00 53 4.340559725 24677 I D 665880 + 20008 [fstrim] 8,00 54 4.340560292 24677 U N [fstrim] 1 8,00 55 4.340560801 24677 D D 665880 + 20008 [fstrim] . Apologies for my ignorance, is the above enough to understand whether SCSI UNMAP operations are being issued? Thanks a lot! Fulvio smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap
On Mon, Mar 12, 2018 at 9:54 AM, Fulvio Galeazzi wrote: > Hallo all, > I am not sure RBD discard is working in my setup, and I am asking for > your help. > (I searched this mailing list for related messages and found one by > Nathan Harper last 29th Jan 2018 "Debugging fstrim issues" which > however mentions trimming was masked by logging... so I am not 100% > sure of what is the expected result) > > I am on Ocata, and Ceph 10.2.10. > Followed the recipe: > https://www.sebastien-han.fr/blog/2015/02/02/openstack-and-ceph-rbd-discard/ > * setup Nova adding to /etc/nova/nova.conf > ... > [libvirt] > hw_disk_discard = unmap > ... > * decorated a CentOS image with hw_scsi_model=virtio--scsi,hw_disk_bus=scsi Is that just a typo for "hw_scsi_model"? > * created a VM with boot disk on Ceph (my default is ephemeral, >though), verified the XML shows my disk is scsi, > > > I see that commands: > rbd --cluster cephpa1 diff cinder-ceph/${theVol} | awk '{ SUM += $2 } END { > print SUM/1024/1024 " MB" }' ; rados --cluster cephpa1 -p cinder-ceph ls | > grep rbd_data.{whatever} | wc -l That's pretty old-school -- you can just use 'rbd du" now to calculate the disk usage. > show the size increases but does not decrease when I execute delete the > temporary file and execute > sudo fstrim -v / > > Am I missing something? Have you verified that your VM is indeed using virtio-scsi? Does blktrace show SCSI UNMAP operations being issued to the block device when you execute "fstrim"? > I do see that adding/removing files created with dd does not always result > in a global size increase, it is as if the dirty blocks are kept around and > reused. Is this the way discard is supposed to work? > > Thanks for your help! > > Fulvio > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap
Hallo all, I am not sure RBD discard is working in my setup, and I am asking for your help. (I searched this mailing list for related messages and found one by Nathan Harper last 29th Jan 2018 "Debugging fstrim issues" which however mentions trimming was masked by logging... so I am not 100% sure of what is the expected result) I am on Ocata, and Ceph 10.2.10. Followed the recipe: https://www.sebastien-han.fr/blog/2015/02/02/openstack-and-ceph-rbd-discard/ * setup Nova adding to /etc/nova/nova.conf ... [libvirt] hw_disk_discard = unmap ... * decorated a CentOS image with hw_scsi_model=virtio--scsi,hw_disk_bus=scsi * created a VM with boot disk on Ceph (my default is ephemeral, though), verified the XML shows my disk is scsi, I see that commands: rbd --cluster cephpa1 diff cinder-ceph/${theVol} | awk '{ SUM += $2 } END { print SUM/1024/1024 " MB" }' ; rados --cluster cephpa1 -p cinder-ceph ls | grep rbd_data.{whatever} | wc -l show the size increases but does not decrease when I execute delete the temporary file and execute sudo fstrim -v / Am I missing something? I do see that adding/removing files created with dd does not always result in a global size increase, it is as if the dirty blocks are kept around and reused. Is this the way discard is supposed to work? Thanks for your help! Fulvio smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com