Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck
On Wed, Apr 18, 2018 at 4:44 PM, Fam Zhengwrote: > > qemu-img hangs because the convert_iteration_sectors loop cannot make any > progress when it reaches the end of the base image. It is a bug (implicitly?) > fixed by Eric Blake (Cc'ed) 's BDRV_BLOCK_EOF patches on upstream, backporting > them to the above downstream version fixes the problem for me: > > commit c61e684e44272f2acb2bef34cf2aa234582a73a9 > Author: Eric Blake > > block: Exploit BDRV_BLOCK_EOF for larger zero blocks > > commit fb0d8654ffc3ea1494067327fc4c4da5d0872724 > Author: Eric Blake > > block: Add BDRV_BLOCK_EOF to bdrv_get_block_status() > > Fam Fam, Thanks for the info. -- Thanks, Li Qun
Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck
On Wed, 04/18 15:58, Fam Zheng wrote: > On Wed, 04/18 15:42, David Lee wrote: > > On Thu, Apr 12, 2018 at 11:57 PM, David Leewrote: > > >>> > > >>> We tested qemu-kvm-ev-2.9.0-16.el7_4.14.1 - where from the source RPM we > > >>> verified it does contain ef6dada8b44e1e7c4bec5c1115903af9af415b50 > > >>> > > >>> But the issue still exists. The convert got stuck if one of the old > > >>> active overlay > > >>> had been 'vol-resize'd with qemu monitor command to a larger size. > > >>> This looks > > >>> like a prerequisite but not sufficient condition to trigger this > > >>> badness. > > >> > > >> So it is a separate issue. Did you try upstream master as well? > > >> > > >> Fam > > > > > > Not yet. > > > > Stefan & FAM, > > > > Here are the steps to reproduce this issue reliably: > > > > # qemu-img create -f qcow2 test.qcow2 100m > > ... omitted > > # qemu-img create -F qcow2 -f qcow2 -b test.qcow2 overlay.qcow2 > > ... omitted > > # qemu-img resize overlay.qcow2 +20m > > Image resized. > > # qemu-img create -F qcow2 -f qcow2 -b overlay.qcow2 overlay2.qcow2 > > ... omitted > > # qemu-img convert overlay2.qcow2 -f qcow2 -O qcow2 combined.qcow2 > > [hang] > > > > > > # qemu-img --version > > qemu-img version 2.9.0(qemu-kvm-ev-2.9.0-16.el7_4.14.1) > > Thanks, I can reproduce this but not on master. I will take a look. qemu-img hangs because the convert_iteration_sectors loop cannot make any progress when it reaches the end of the base image. It is a bug (implicitly?) fixed by Eric Blake (Cc'ed) 's BDRV_BLOCK_EOF patches on upstream, backporting them to the above downstream version fixes the problem for me: commit c61e684e44272f2acb2bef34cf2aa234582a73a9 Author: Eric Blake block: Exploit BDRV_BLOCK_EOF for larger zero blocks commit fb0d8654ffc3ea1494067327fc4c4da5d0872724 Author: Eric Blake block: Add BDRV_BLOCK_EOF to bdrv_get_block_status() Fam
Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck
On Wed, 04/18 15:42, David Lee wrote: > On Thu, Apr 12, 2018 at 11:57 PM, David Leewrote: > >>> > >>> We tested qemu-kvm-ev-2.9.0-16.el7_4.14.1 - where from the source RPM we > >>> verified it does contain ef6dada8b44e1e7c4bec5c1115903af9af415b50 > >>> > >>> But the issue still exists. The convert got stuck if one of the old > >>> active overlay > >>> had been 'vol-resize'd with qemu monitor command to a larger size. This > >>> looks > >>> like a prerequisite but not sufficient condition to trigger this badness. > >> > >> So it is a separate issue. Did you try upstream master as well? > >> > >> Fam > > > > Not yet. > > Stefan & FAM, > > Here are the steps to reproduce this issue reliably: > > # qemu-img create -f qcow2 test.qcow2 100m > ... omitted > # qemu-img create -F qcow2 -f qcow2 -b test.qcow2 overlay.qcow2 > ... omitted > # qemu-img resize overlay.qcow2 +20m > Image resized. > # qemu-img create -F qcow2 -f qcow2 -b overlay.qcow2 overlay2.qcow2 > ... omitted > # qemu-img convert overlay2.qcow2 -f qcow2 -O qcow2 combined.qcow2 > [hang] > > > # qemu-img --version > qemu-img version 2.9.0(qemu-kvm-ev-2.9.0-16.el7_4.14.1) Thanks, I can reproduce this but not on master. I will take a look. Fam
Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck
On Thu, Apr 12, 2018 at 11:57 PM, David Leewrote: >>> >>> We tested qemu-kvm-ev-2.9.0-16.el7_4.14.1 - where from the source RPM we >>> verified it does contain ef6dada8b44e1e7c4bec5c1115903af9af415b50 >>> >>> But the issue still exists. The convert got stuck if one of the old >>> active overlay >>> had been 'vol-resize'd with qemu monitor command to a larger size. This >>> looks >>> like a prerequisite but not sufficient condition to trigger this badness. >> >> So it is a separate issue. Did you try upstream master as well? >> >> Fam > > Not yet. Stefan & FAM, Here are the steps to reproduce this issue reliably: # qemu-img create -f qcow2 test.qcow2 100m ... omitted # qemu-img create -F qcow2 -f qcow2 -b test.qcow2 overlay.qcow2 ... omitted # qemu-img resize overlay.qcow2 +20m Image resized. # qemu-img create -F qcow2 -f qcow2 -b overlay.qcow2 overlay2.qcow2 ... omitted # qemu-img convert overlay2.qcow2 -f qcow2 -O qcow2 combined.qcow2 [hang] # qemu-img --version qemu-img version 2.9.0(qemu-kvm-ev-2.9.0-16.el7_4.14.1) -- Thanks, Li Qun
Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck
On Mon, Apr 09, 2018 at 10:38:54AM +0300, Benny Zlotnik wrote: > source: qcow2 on NFS > target: raw on NFS Have you tried on a local file system with the same source file contents? Which NFS protocol version is being used? Stefan signature.asc Description: PGP signature
Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck
On Thu, Apr 12, 2018 at 10:23 PM, Fam Zhengwrote: > On Thu, 04/12 21:45, David Lee wrote: >> On Thu, Apr 12, 2018 at 10:16 AM, David Lee wrote: >> >> > My team caught this issue too after switching to CentOS 7.4 with >> >> > qemu-img >> >> > 2.9.0 >> >> > gdb shows exactly the same backtrace when the convert stuck, and we are >> >> > on >> >> > NFS. >> >> > >> >> > Later we found the following: >> >> > 1. The stuck can happen on local storage, too. >> >> > 2. Replace qemu-img 2.9.0 with 2.6.0 and everything works smoothly >> >> > again. >> >> > >> >> > BTW, we use "qemu-img convert" to convert qcow2 and its backing files >> >> > into >> >> > a single qcow2 image. >> >> >> >> Maybe it is RHBZ 1508886? >> >> >> >> Fam >> > >> > >> > >> > Thanks, Fam. We just tracked down to this BZ too and are about to trying >> > the commit ef6dada8b44e1e7c4bec5c1115903af9af415b50 >> >> We tested qemu-kvm-ev-2.9.0-16.el7_4.14.1 - where from the source RPM we >> verified it does contain ef6dada8b44e1e7c4bec5c1115903af9af415b50 >> >> But the issue still exists. The convert got stuck if one of the old >> active overlay >> had been 'vol-resize'd with qemu monitor command to a larger size. This >> looks >> like a prerequisite but not sufficient condition to trigger this badness. > > So it is a separate issue. Did you try upstream master as well? > > Fam Not yet. -- Thanks, Li Qun
Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck
On Thu, 04/12 21:45, David Lee wrote: > On Thu, Apr 12, 2018 at 10:16 AM, David Leewrote: > >> > My team caught this issue too after switching to CentOS 7.4 with qemu-img > >> > 2.9.0 > >> > gdb shows exactly the same backtrace when the convert stuck, and we are > >> > on > >> > NFS. > >> > > >> > Later we found the following: > >> > 1. The stuck can happen on local storage, too. > >> > 2. Replace qemu-img 2.9.0 with 2.6.0 and everything works smoothly again. > >> > > >> > BTW, we use "qemu-img convert" to convert qcow2 and its backing files > >> > into > >> > a single qcow2 image. > >> > >> Maybe it is RHBZ 1508886? > >> > >> Fam > > > > > > > > Thanks, Fam. We just tracked down to this BZ too and are about to trying > > the commit ef6dada8b44e1e7c4bec5c1115903af9af415b50 > > We tested qemu-kvm-ev-2.9.0-16.el7_4.14.1 - where from the source RPM we > verified it does contain ef6dada8b44e1e7c4bec5c1115903af9af415b50 > > But the issue still exists. The convert got stuck if one of the old > active overlay > had been 'vol-resize'd with qemu monitor command to a larger size. This > looks > like a prerequisite but not sufficient condition to trigger this badness. So it is a separate issue. Did you try upstream master as well? Fam
Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck
On Thu, Apr 12, 2018 at 10:16 AM, David Leewrote: >> > My team caught this issue too after switching to CentOS 7.4 with qemu-img >> > 2.9.0 >> > gdb shows exactly the same backtrace when the convert stuck, and we are on >> > NFS. >> > >> > Later we found the following: >> > 1. The stuck can happen on local storage, too. >> > 2. Replace qemu-img 2.9.0 with 2.6.0 and everything works smoothly again. >> > >> > BTW, we use "qemu-img convert" to convert qcow2 and its backing files into >> > a single qcow2 image. >> >> Maybe it is RHBZ 1508886? >> >> Fam > > > > Thanks, Fam. We just tracked down to this BZ too and are about to trying > the commit ef6dada8b44e1e7c4bec5c1115903af9af415b50 We tested qemu-kvm-ev-2.9.0-16.el7_4.14.1 - where from the source RPM we verified it does contain ef6dada8b44e1e7c4bec5c1115903af9af415b50 But the issue still exists. The convert got stuck if one of the old active overlay had been 'vol-resize'd with qemu monitor command to a larger size. This looks like a prerequisite but not sufficient condition to trigger this badness. -- Thanks, Li Qun
Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck
On Thu, Apr 12, 2018 at 10:03 AM, Fam Zhengwrote: > On Thu, 04/12 09:51, David Lee wrote: > > On Mon, Apr 9, 2018 at 3:35 AM, Benny Zlotnik > wrote: > > > > > $ gdb -p 13024 -batch -ex "thread apply all bt" > > > [Thread debugging using libthread_db enabled] > > > Using host libthread_db library "/lib64/libthread_db.so.1". > > > 0x7f98275cfaff in ppoll () from /lib64/libc.so.6 > > > > > > Thread 1 (Thread 0x7f983e30ab00 (LWP 13024)): > > > #0 0x7f98275cfaff in ppoll () from /lib64/libc.so.6 > > > #1 0x55b55cf59d69 in qemu_poll_ns () > > > #2 0x55b55cf5ba45 in aio_poll () > > > #3 0x55b55ceedc0f in bdrv_get_block_status_above () > > > #4 0x55b55cea3611 in convert_iteration_sectors () > > > #5 0x55b55cea4352 in img_convert () > > > #6 0x55b55ce9d819 in main () > > > > > > My team caught this issue too after switching to CentOS 7.4 with qemu-img > > 2.9.0 > > gdb shows exactly the same backtrace when the convert stuck, and we are > on > > NFS. > > > > Later we found the following: > > 1. The stuck can happen on local storage, too. > > 2. Replace qemu-img 2.9.0 with 2.6.0 and everything works smoothly again. > > > > BTW, we use "qemu-img convert" to convert qcow2 and its backing files > into > > a single qcow2 image. > > Maybe it is RHBZ 1508886? > > Fam > Thanks, Fam. We just tracked down to this BZ too and are about to trying the commit ef6dada8b44e1e7c4bec5c1115903af9af415b50 -- Thanks, Li Qun
Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck
On Thu, 04/12 09:51, David Lee wrote: > On Mon, Apr 9, 2018 at 3:35 AM, Benny Zlotnikwrote: > > > $ gdb -p 13024 -batch -ex "thread apply all bt" > > [Thread debugging using libthread_db enabled] > > Using host libthread_db library "/lib64/libthread_db.so.1". > > 0x7f98275cfaff in ppoll () from /lib64/libc.so.6 > > > > Thread 1 (Thread 0x7f983e30ab00 (LWP 13024)): > > #0 0x7f98275cfaff in ppoll () from /lib64/libc.so.6 > > #1 0x55b55cf59d69 in qemu_poll_ns () > > #2 0x55b55cf5ba45 in aio_poll () > > #3 0x55b55ceedc0f in bdrv_get_block_status_above () > > #4 0x55b55cea3611 in convert_iteration_sectors () > > #5 0x55b55cea4352 in img_convert () > > #6 0x55b55ce9d819 in main () > > > My team caught this issue too after switching to CentOS 7.4 with qemu-img > 2.9.0 > gdb shows exactly the same backtrace when the convert stuck, and we are on > NFS. > > Later we found the following: > 1. The stuck can happen on local storage, too. > 2. Replace qemu-img 2.9.0 with 2.6.0 and everything works smoothly again. > > BTW, we use "qemu-img convert" to convert qcow2 and its backing files into > a single qcow2 image. Maybe it is RHBZ 1508886? Fam
Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck
On Mon, Apr 9, 2018 at 3:35 AM, Benny Zlotnikwrote: > $ gdb -p 13024 -batch -ex "thread apply all bt" > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > 0x7f98275cfaff in ppoll () from /lib64/libc.so.6 > > Thread 1 (Thread 0x7f983e30ab00 (LWP 13024)): > #0 0x7f98275cfaff in ppoll () from /lib64/libc.so.6 > #1 0x55b55cf59d69 in qemu_poll_ns () > #2 0x55b55cf5ba45 in aio_poll () > #3 0x55b55ceedc0f in bdrv_get_block_status_above () > #4 0x55b55cea3611 in convert_iteration_sectors () > #5 0x55b55cea4352 in img_convert () > #6 0x55b55ce9d819 in main () My team caught this issue too after switching to CentOS 7.4 with qemu-img 2.9.0 gdb shows exactly the same backtrace when the convert stuck, and we are on NFS. Later we found the following: 1. The stuck can happen on local storage, too. 2. Replace qemu-img 2.9.0 with 2.6.0 and everything works smoothly again. BTW, we use "qemu-img convert" to convert qcow2 and its backing files into a single qcow2 image. > On Sun, Apr 8, 2018 at 10:28 PM, Nir Soffer wrote: > > > On Sun, Apr 8, 2018 at 9:27 PM Benny Zlotnik > wrote: > > > >> Hi, > >> > >> As part of copy operation initiated by rhev got stuck for more than a > day > >> and consumes plenty of CPU > >> vdsm 13024 3117 99 Apr07 ?1-06:58:43 /usr/bin/qemu-img > >> convert > >> -p -t none -T none -f qcow2 > >> /rhev/data-center/bb422fac-81c5-4fea-8782-3498bb5c8a59/ > >> 26989331-2c39-4b34-a7ed-d7dd7703646c/images/597e12b6- > >> 19f5-45bd-868f-767600c7115e/62a5492e-e120-4c25-898e-9f5f5629853e > >> -O raw /rhev/data-center/mnt/mantis-nfs-lif1.lab.eng.tlv2.redhat.com: > >> _vol__service/26989331-2c39-4b34-a7ed-d7dd7703646c/images/ > >> 9ece9408-9ca6-48cd-992a-6f590c710672/06d6d3c0-beb8- > 4b6b-ab00-56523df185da > >> > >> The target image appears to have no data yet: > >> qemu-img info 06d6d3c0-beb8-4b6b-ab00-56523df185da" > >> image: 06d6d3c0-beb8-4b6b-ab00-56523df185da > >> file format: raw > >> virtual size: 120G (128849018880 bytes) > >> disk size: 0 > >> > >> strace -p 13024 -tt -T -f shows only: > >> ... > >> 21:13:01.309382 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, > >> 0}, > >> NULL, 8) = 0 (Timeout) <0.10> > >> 21:13:01.309411 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, > >> 0}, > >> NULL, 8) = 0 (Timeout) <0.09> > >> 21:13:01.309440 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, > >> 0}, > >> NULL, 8) = 0 (Timeout) <0.09> > >> 21:13:01.309468 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, > >> 0}, > >> NULL, 8) = 0 (Timeout) <0.10> > >> > >> version: qemu-img-rhev-2.9.0-16.el7_4.13.x86_64 > >> > >> What could cause this? I'll provide any additional information needed > >> > > > > A backtrace may help, try: > > > > gdb -p 13024 -batch -ex "thread apply all bt" > > > > Also adding Kevin and qemu-block. > > > > Nir > > > -- Thanks, Li Qun
Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck
On 2018-04-09 08:04, Stefan Hajnoczi wrote: > On Sun, Apr 08, 2018 at 10:35:16PM +0300, Benny Zlotnik wrote: > > What type of storage are the source and destination images? (e.g. > source is a local qcow2 file on xfs, destination is a raw file on NFS) > >> $ gdb -p 13024 -batch -ex "thread apply all bt" >> [Thread debugging using libthread_db enabled] >> Using host libthread_db library "/lib64/libthread_db.so.1". >> 0x7f98275cfaff in ppoll () from /lib64/libc.so.6 >> >> Thread 1 (Thread 0x7f983e30ab00 (LWP 13024)): >> #0 0x7f98275cfaff in ppoll () from /lib64/libc.so.6 >> #1 0x55b55cf59d69 in qemu_poll_ns () >> #2 0x55b55cf5ba45 in aio_poll () >> #3 0x55b55ceedc0f in bdrv_get_block_status_above () >> #4 0x55b55cea3611 in convert_iteration_sectors () > > CCing Max Reitz in case this is familiar. Hmm, not really, no... The culprit I know of (sensing block status outside of qemu) would block in lseek64() under find_allocation(). I didn't have any luck reproducing the issue either... Whenever I had some hang in ppoll(), it was usually during a drain, but that doesn't seem to be the case here either. So I have no idea. Maybe I'll test some other configurations at another time, but so far I didn't experience any hangs and I have no idea what could be provoking them (other than some network issue outside of qemu, but well...). Max >> #5 0x55b55cea4352 in img_convert () >> #6 0x55b55ce9d819 in main () >> >> >> On Sun, Apr 8, 2018 at 10:28 PM, Nir Sofferwrote: >> >>> On Sun, Apr 8, 2018 at 9:27 PM Benny Zlotnik wrote: >>> Hi, As part of copy operation initiated by rhev got stuck for more than a day and consumes plenty of CPU vdsm 13024 3117 99 Apr07 ?1-06:58:43 /usr/bin/qemu-img convert -p -t none -T none -f qcow2 /rhev/data-center/bb422fac-81c5-4fea-8782-3498bb5c8a59/ 26989331-2c39-4b34-a7ed-d7dd7703646c/images/597e12b6- 19f5-45bd-868f-767600c7115e/62a5492e-e120-4c25-898e-9f5f5629853e -O raw /rhev/data-center/mnt/mantis-nfs-lif1.lab.eng.tlv2.redhat.com: _vol__service/26989331-2c39-4b34-a7ed-d7dd7703646c/images/ 9ece9408-9ca6-48cd-992a-6f590c710672/06d6d3c0-beb8-4b6b-ab00-56523df185da The target image appears to have no data yet: qemu-img info 06d6d3c0-beb8-4b6b-ab00-56523df185da" image: 06d6d3c0-beb8-4b6b-ab00-56523df185da file format: raw virtual size: 120G (128849018880 bytes) disk size: 0 strace -p 13024 -tt -T -f shows only: ... 21:13:01.309382 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, 0}, NULL, 8) = 0 (Timeout) <0.10> 21:13:01.309411 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, 0}, NULL, 8) = 0 (Timeout) <0.09> 21:13:01.309440 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, 0}, NULL, 8) = 0 (Timeout) <0.09> 21:13:01.309468 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, 0}, NULL, 8) = 0 (Timeout) <0.10> version: qemu-img-rhev-2.9.0-16.el7_4.13.x86_64 What could cause this? I'll provide any additional information needed >>> >>> A backtrace may help, try: >>> >>> gdb -p 13024 -batch -ex "thread apply all bt" >>> >>> Also adding Kevin and qemu-block. >>> >>> Nir >>> signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck
source: qcow2 on NFS target: raw on NFS source: $ qemu-img info /rhev/data-center/bb422fac-81c5-4fea-8782-3498bb5c8a59/26989331-2c39-4b34-a7ed-d7dd7703646c/images/597e12b6-19f5-45bd-868f-767600c7115e/62a5492e-e120-4c25-898e-9f5f5629853e image: /rhev/data-center/bb422fac-81c5-4fea-8782-3498bb5c8a59/26989331-2c39-4b34-a7ed-d7dd7703646c/images/597e12b6-19f5-45bd-868f-767600c7115e/62a5492e-e120-4c25-898e-9f5f5629853e file format: qcow2 virtual size: 120G (128849018880 bytes) disk size: 63G cluster_size: 65536 backing file: 950926cc-aac6-42fd-a719-6386d4202897 (actual path: /rhev/data-center/bb422fac-81c5-4fea-8782-3498bb5c8a59/26989331-2c39-4b34-a7ed-d7dd7703646c/images/597e12b6-19f5-45bd-868f-767600c7115e/950926cc-aac6-42fd-a719-6386d4202897) backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false target: $ qemu-img info /rhev/data-center/mnt/bb422fac-81c5-4fea-8782-3498bb5c8a59 /26989331-2c39-4b34-a7ed-d7dd7703646c/images/9ece9408-9ca6-48cd-992a-6f590c710672/06d6d3c0-beb8-4b6b-ab00-56523df185da image: bb422fac-81c5-4fea-8782-3498bb5c8a59/26989331-2c39-4b34-a7ed-d7dd7703646c/images/9ece9408-9ca6-48cd-992a-6f590c710672/06d6d3c0-beb8-4b6b-ab00-56523df185da file format: raw virtual size: 120G (128849018880 bytes) disk size: 0 On Mon, Apr 9, 2018 at 9:04 AM, Stefan Hajnocziwrote: > On Sun, Apr 08, 2018 at 10:35:16PM +0300, Benny Zlotnik wrote: > > What type of storage are the source and destination images? (e.g. > source is a local qcow2 file on xfs, destination is a raw file on NFS) > > > $ gdb -p 13024 -batch -ex "thread apply all bt" > > [Thread debugging using libthread_db enabled] > > Using host libthread_db library "/lib64/libthread_db.so.1". > > 0x7f98275cfaff in ppoll () from /lib64/libc.so.6 > > > > Thread 1 (Thread 0x7f983e30ab00 (LWP 13024)): > > #0 0x7f98275cfaff in ppoll () from /lib64/libc.so.6 > > #1 0x55b55cf59d69 in qemu_poll_ns () > > #2 0x55b55cf5ba45 in aio_poll () > > #3 0x55b55ceedc0f in bdrv_get_block_status_above () > > #4 0x55b55cea3611 in convert_iteration_sectors () > > CCing Max Reitz in case this is familiar. > > > #5 0x55b55cea4352 in img_convert () > > #6 0x55b55ce9d819 in main () > > > > > > On Sun, Apr 8, 2018 at 10:28 PM, Nir Soffer wrote: > > > > > On Sun, Apr 8, 2018 at 9:27 PM Benny Zlotnik > wrote: > > > > > >> Hi, > > >> > > >> As part of copy operation initiated by rhev got stuck for more than a > day > > >> and consumes plenty of CPU > > >> vdsm 13024 3117 99 Apr07 ?1-06:58:43 /usr/bin/qemu-img > > >> convert > > >> -p -t none -T none -f qcow2 > > >> /rhev/data-center/bb422fac-81c5-4fea-8782-3498bb5c8a59/ > > >> 26989331-2c39-4b34-a7ed-d7dd7703646c/images/597e12b6- > > >> 19f5-45bd-868f-767600c7115e/62a5492e-e120-4c25-898e-9f5f5629853e > > >> -O raw /rhev/data-center/mnt/mantis-nfs-lif1.lab.eng.tlv2.redhat.com: > > >> _vol__service/26989331-2c39-4b34-a7ed-d7dd7703646c/images/ > > >> 9ece9408-9ca6-48cd-992a-6f590c710672/06d6d3c0-beb8- > 4b6b-ab00-56523df185da > > >> > > >> The target image appears to have no data yet: > > >> qemu-img info 06d6d3c0-beb8-4b6b-ab00-56523df185da" > > >> image: 06d6d3c0-beb8-4b6b-ab00-56523df185da > > >> file format: raw > > >> virtual size: 120G (128849018880 bytes) > > >> disk size: 0 > > >> > > >> strace -p 13024 -tt -T -f shows only: > > >> ... > > >> 21:13:01.309382 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, > {0, > > >> 0}, > > >> NULL, 8) = 0 (Timeout) <0.10> > > >> 21:13:01.309411 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, > {0, > > >> 0}, > > >> NULL, 8) = 0 (Timeout) <0.09> > > >> 21:13:01.309440 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, > {0, > > >> 0}, > > >> NULL, 8) = 0 (Timeout) <0.09> > > >> 21:13:01.309468 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, > {0, > > >> 0}, > > >> NULL, 8) = 0 (Timeout) <0.10> > > >> > > >> version: qemu-img-rhev-2.9.0-16.el7_4.13.x86_64 > > >> > > >> What could cause this? I'll provide any additional information needed > > >> > > > > > > A backtrace may help, try: > > > > > > gdb -p 13024 -batch -ex "thread apply all bt" > > > > > > Also adding Kevin and qemu-block. > > > > > > Nir > > > >
Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck
On Sun, Apr 08, 2018 at 10:35:16PM +0300, Benny Zlotnik wrote: What type of storage are the source and destination images? (e.g. source is a local qcow2 file on xfs, destination is a raw file on NFS) > $ gdb -p 13024 -batch -ex "thread apply all bt" > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > 0x7f98275cfaff in ppoll () from /lib64/libc.so.6 > > Thread 1 (Thread 0x7f983e30ab00 (LWP 13024)): > #0 0x7f98275cfaff in ppoll () from /lib64/libc.so.6 > #1 0x55b55cf59d69 in qemu_poll_ns () > #2 0x55b55cf5ba45 in aio_poll () > #3 0x55b55ceedc0f in bdrv_get_block_status_above () > #4 0x55b55cea3611 in convert_iteration_sectors () CCing Max Reitz in case this is familiar. > #5 0x55b55cea4352 in img_convert () > #6 0x55b55ce9d819 in main () > > > On Sun, Apr 8, 2018 at 10:28 PM, Nir Sofferwrote: > > > On Sun, Apr 8, 2018 at 9:27 PM Benny Zlotnik wrote: > > > >> Hi, > >> > >> As part of copy operation initiated by rhev got stuck for more than a day > >> and consumes plenty of CPU > >> vdsm 13024 3117 99 Apr07 ?1-06:58:43 /usr/bin/qemu-img > >> convert > >> -p -t none -T none -f qcow2 > >> /rhev/data-center/bb422fac-81c5-4fea-8782-3498bb5c8a59/ > >> 26989331-2c39-4b34-a7ed-d7dd7703646c/images/597e12b6- > >> 19f5-45bd-868f-767600c7115e/62a5492e-e120-4c25-898e-9f5f5629853e > >> -O raw /rhev/data-center/mnt/mantis-nfs-lif1.lab.eng.tlv2.redhat.com: > >> _vol__service/26989331-2c39-4b34-a7ed-d7dd7703646c/images/ > >> 9ece9408-9ca6-48cd-992a-6f590c710672/06d6d3c0-beb8-4b6b-ab00-56523df185da > >> > >> The target image appears to have no data yet: > >> qemu-img info 06d6d3c0-beb8-4b6b-ab00-56523df185da" > >> image: 06d6d3c0-beb8-4b6b-ab00-56523df185da > >> file format: raw > >> virtual size: 120G (128849018880 bytes) > >> disk size: 0 > >> > >> strace -p 13024 -tt -T -f shows only: > >> ... > >> 21:13:01.309382 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, > >> 0}, > >> NULL, 8) = 0 (Timeout) <0.10> > >> 21:13:01.309411 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, > >> 0}, > >> NULL, 8) = 0 (Timeout) <0.09> > >> 21:13:01.309440 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, > >> 0}, > >> NULL, 8) = 0 (Timeout) <0.09> > >> 21:13:01.309468 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, > >> 0}, > >> NULL, 8) = 0 (Timeout) <0.10> > >> > >> version: qemu-img-rhev-2.9.0-16.el7_4.13.x86_64 > >> > >> What could cause this? I'll provide any additional information needed > >> > > > > A backtrace may help, try: > > > > gdb -p 13024 -batch -ex "thread apply all bt" > > > > Also adding Kevin and qemu-block. > > > > Nir > > signature.asc Description: PGP signature
Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck
$ gdb -p 13024 -batch -ex "thread apply all bt" [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". 0x7f98275cfaff in ppoll () from /lib64/libc.so.6 Thread 1 (Thread 0x7f983e30ab00 (LWP 13024)): #0 0x7f98275cfaff in ppoll () from /lib64/libc.so.6 #1 0x55b55cf59d69 in qemu_poll_ns () #2 0x55b55cf5ba45 in aio_poll () #3 0x55b55ceedc0f in bdrv_get_block_status_above () #4 0x55b55cea3611 in convert_iteration_sectors () #5 0x55b55cea4352 in img_convert () #6 0x55b55ce9d819 in main () On Sun, Apr 8, 2018 at 10:28 PM, Nir Sofferwrote: > On Sun, Apr 8, 2018 at 9:27 PM Benny Zlotnik wrote: > >> Hi, >> >> As part of copy operation initiated by rhev got stuck for more than a day >> and consumes plenty of CPU >> vdsm 13024 3117 99 Apr07 ?1-06:58:43 /usr/bin/qemu-img >> convert >> -p -t none -T none -f qcow2 >> /rhev/data-center/bb422fac-81c5-4fea-8782-3498bb5c8a59/ >> 26989331-2c39-4b34-a7ed-d7dd7703646c/images/597e12b6- >> 19f5-45bd-868f-767600c7115e/62a5492e-e120-4c25-898e-9f5f5629853e >> -O raw /rhev/data-center/mnt/mantis-nfs-lif1.lab.eng.tlv2.redhat.com: >> _vol__service/26989331-2c39-4b34-a7ed-d7dd7703646c/images/ >> 9ece9408-9ca6-48cd-992a-6f590c710672/06d6d3c0-beb8-4b6b-ab00-56523df185da >> >> The target image appears to have no data yet: >> qemu-img info 06d6d3c0-beb8-4b6b-ab00-56523df185da" >> image: 06d6d3c0-beb8-4b6b-ab00-56523df185da >> file format: raw >> virtual size: 120G (128849018880 bytes) >> disk size: 0 >> >> strace -p 13024 -tt -T -f shows only: >> ... >> 21:13:01.309382 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, >> 0}, >> NULL, 8) = 0 (Timeout) <0.10> >> 21:13:01.309411 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, >> 0}, >> NULL, 8) = 0 (Timeout) <0.09> >> 21:13:01.309440 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, >> 0}, >> NULL, 8) = 0 (Timeout) <0.09> >> 21:13:01.309468 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, >> 0}, >> NULL, 8) = 0 (Timeout) <0.10> >> >> version: qemu-img-rhev-2.9.0-16.el7_4.13.x86_64 >> >> What could cause this? I'll provide any additional information needed >> > > A backtrace may help, try: > > gdb -p 13024 -batch -ex "thread apply all bt" > > Also adding Kevin and qemu-block. > > Nir >
Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck
On Sun, Apr 8, 2018 at 9:27 PM Benny Zlotnikwrote: > Hi, > > As part of copy operation initiated by rhev got stuck for more than a day > and consumes plenty of CPU > vdsm 13024 3117 99 Apr07 ?1-06:58:43 /usr/bin/qemu-img convert > -p -t none -T none -f qcow2 > > /rhev/data-center/bb422fac-81c5-4fea-8782-3498bb5c8a59/26989331-2c39-4b34-a7ed-d7dd7703646c/images/597e12b6-19f5-45bd-868f-767600c7115e/62a5492e-e120-4c25-898e-9f5f5629853e > -O raw /rhev/data-center/mnt/mantis-nfs-lif1.lab.eng.tlv2.redhat.com: > > _vol__service/26989331-2c39-4b34-a7ed-d7dd7703646c/images/9ece9408-9ca6-48cd-992a-6f590c710672/06d6d3c0-beb8-4b6b-ab00-56523df185da > > The target image appears to have no data yet: > qemu-img info 06d6d3c0-beb8-4b6b-ab00-56523df185da" > image: 06d6d3c0-beb8-4b6b-ab00-56523df185da > file format: raw > virtual size: 120G (128849018880 bytes) > disk size: 0 > > strace -p 13024 -tt -T -f shows only: > ... > 21:13:01.309382 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, 0}, > NULL, 8) = 0 (Timeout) <0.10> > 21:13:01.309411 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, 0}, > NULL, 8) = 0 (Timeout) <0.09> > 21:13:01.309440 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, 0}, > NULL, 8) = 0 (Timeout) <0.09> > 21:13:01.309468 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, 0}, > NULL, 8) = 0 (Timeout) <0.10> > > version: qemu-img-rhev-2.9.0-16.el7_4.13.x86_64 > > What could cause this? I'll provide any additional information needed > A backtrace may help, try: gdb -p 13024 -batch -ex "thread apply all bt" Also adding Kevin and qemu-block. Nir