Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-22 Thread John Snow
On 11/22/2017 07:55 AM, Alberto Garcia wrote: > On Tue 21 Nov 2017 04:18:13 PM CET, Anton Nefedov wrote: >> >> keep BlockJob referenced while it is >> >> paused (by block_job_pause/resume_all()). That should prevent it from >> >> deleting the BB. >> >> looks kind of hacky; maybe referencing

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-22 Thread Alberto Garcia
On Tue 07 Nov 2017 05:19:41 PM CET, Anton Nefedov wrote: > The small attached tweak makes iotest 30 (-qcow2 -nocache) fail 10/10 > times on my machine. I can reproduce the crash with v2.11.0-rc2 without having to modify QEMU at all using the attached test case (it's based on one of the existing

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-22 Thread Alberto Garcia
On Tue 21 Nov 2017 04:18:13 PM CET, Anton Nefedov wrote: > >> keep BlockJob referenced while it is > >> paused (by block_job_pause/resume_all()). That should prevent it from > >> deleting the BB. > > looks kind of hacky; maybe referencing in block_job_pause() (and not > just pause_all) seems

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-21 Thread John Snow
CC Jeff Cody ... who may or may not be preoccupied with Thanksgiving travel now. Convenient URL for reading past replies: https://lists.nongnu.org/archive/html/qemu-devel/2017-11/msg03844.html On 11/21/2017 10:31 AM, Alberto Garcia wrote: > On Tue 21 Nov 2017 04:18:13 PM CET, Anton Nefedov

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-21 Thread Alberto Garcia
On Tue 21 Nov 2017 04:18:13 PM CET, Anton Nefedov wrote: >>> Or, perhaps another approach, keep BlockJob referenced while it is >>> paused (by block_job_pause/resume_all()). That should prevent it >>> from deleting the BB. >> >> Yes, I tried this and it actually solves the issue. But I still

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-21 Thread Anton Nefedov
On 21/11/2017 3:51 PM, Alberto Garcia wrote: On Thu 16 Nov 2017 05:09:59 PM CET, Anton Nefedov wrote: I have the impression that one major source of headaches is the fact that the reopen queue contains nodes that don't need to be reopened at all. Ideally this should be detected early on in

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-21 Thread Alberto Garcia
On Thu 16 Nov 2017 05:09:59 PM CET, Anton Nefedov wrote: I have the impression that one major source of headaches is the fact that the reopen queue contains nodes that don't need to be reopened at all. Ideally this should be detected early on in bdrv_reopen_queue(), so there's

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-17 Thread Alberto Garcia
On Thu 16 Nov 2017 10:56:58 PM CET, John Snow wrote: I have the impression that one major source of headaches is the fact that the reopen queue contains nodes that don't need to be reopened at all. Ideally this should be detected early on in bdrv_reopen_queue(), so there's no

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-16 Thread John Snow
On 11/16/2017 10:54 AM, Alberto Garcia wrote: > On Wed 15 Nov 2017 05:31:20 PM CET, Anton Nefedov wrote: >>> I have the impression that one major source of headaches is the fact >>> that the reopen queue contains nodes that don't need to be reopened at >>> all. Ideally this should be detected

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-16 Thread Anton Nefedov
On 16/11/2017 6:54 PM, Alberto Garcia wrote: On Wed 15 Nov 2017 05:31:20 PM CET, Anton Nefedov wrote: I have the impression that one major source of headaches is the fact that the reopen queue contains nodes that don't need to be reopened at all. Ideally this should be detected early on in

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-16 Thread Alberto Garcia
On Wed 15 Nov 2017 05:31:20 PM CET, Anton Nefedov wrote: >> I have the impression that one major source of headaches is the fact >> that the reopen queue contains nodes that don't need to be reopened at >> all. Ideally this should be detected early on in bdrv_reopen_queue(), so >> there's no

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-15 Thread Anton Nefedov
On 15/11/2017 6:42 PM, Alberto Garcia wrote: On Fri 10 Nov 2017 04:02:23 AM CET, Fam Zheng wrote: I'm thinking that perhaps we should add the pause point directly to block_job_defer_to_main_loop(), to prevent any block job from running the exit function when it's paused. I was trying this

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-15 Thread Alberto Garcia
On Fri 10 Nov 2017 04:02:23 AM CET, Fam Zheng wrote: >> > I'm thinking that perhaps we should add the pause point directly to >> > block_job_defer_to_main_loop(), to prevent any block job from >> > running the exit function when it's paused. >> >> I was trying this and unfortunately this breaks

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-09 Thread Fam Zheng
On Thu, 11/09 17:26, Alberto Garcia wrote: > On Wed 08 Nov 2017 03:45:38 PM CET, Alberto Garcia wrote: > > > I'm thinking that perhaps we should add the pause point directly to > > block_job_defer_to_main_loop(), to prevent any block job from running > > the exit function when it's paused. > > I

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-09 Thread Alberto Garcia
On Wed 08 Nov 2017 03:45:38 PM CET, Alberto Garcia wrote: > I'm thinking that perhaps we should add the pause point directly to > block_job_defer_to_main_loop(), to prevent any block job from running > the exit function when it's paused. I was trying this and unfortunately this breaks the mirror

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-09 Thread Alberto Garcia
On Thu 09 Nov 2017 07:05:26 AM CET, Fam Zheng wrote: >> > I can fix the crash by adding block_job_pause_point(>common) at >> > the end of stream_run() (where the 'out' label is). >> > >> > I'm thinking that perhaps we should add the pause point directly to >> > block_job_defer_to_main_loop(), to

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-08 Thread Fam Zheng
On Wed, 11/08 18:50, Anton Nefedov wrote: > > > On 8/11/2017 5:45 PM, Alberto Garcia wrote: > > On Tue 07 Nov 2017 05:19:41 PM CET, Anton Nefedov wrote: > > > BlockBackend gets deleted by another job's stream_complete(), deferred > > > to the main loop, so the fact that the job is put to sleep

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-08 Thread Anton Nefedov
On 8/11/2017 5:45 PM, Alberto Garcia wrote: On Tue 07 Nov 2017 05:19:41 PM CET, Anton Nefedov wrote: BlockBackend gets deleted by another job's stream_complete(), deferred to the main loop, so the fact that the job is put to sleep by bdrv_drain_all_begin() doesn't really stop it from

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-08 Thread Alberto Garcia
On Tue 07 Nov 2017 05:19:41 PM CET, Anton Nefedov wrote: > BlockBackend gets deleted by another job's stream_complete(), deferred > to the main loop, so the fact that the job is put to sleep by > bdrv_drain_all_begin() doesn't really stop it from execution. I was debugging this a bit, and the

Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-08 Thread Alberto Garcia
On Tue 07 Nov 2017 05:19:41 PM CET, Anton Nefedov wrote: > One more drainage-related thing. > We have recently encountered an issue with parallel block jobs and it's > not quite clear how to fix it properly, so would appreciate your ideas. > > The small attached tweak makes iotest 30 (-qcow2