Re: [Qemu-devel] [Qemu-block] [PATCH v8 00/11] Support streaming to an intermediate layer

2015-06-26 Thread Alberto Garcia
On Wed 24 Jun 2015 12:02:35 PM CEST, Alberto Garcia be...@igalia.com wrote:

 What happens is that in stream_complete() all images that have been
 removed from the chain ([A], [B] and [C]) are closed, and then [D] is
 reopened in read-only mode.

 The reopen queue contains [C], that is still listed as a child of [D]
 although it's no longer a valid pointer.

Here's a simple way to access freed memory with the current master:

$ qemu-img create -f qcow2 base.qcow2 100M
$ qemu-img create -f qcow2 -o backing_file=base.qcow2 hd.qcow2
$ valgrind qemu-system-x86_64 -enable-kvm -monitor stdio -drive 
if=virtio,file=hd.qcow2

(qemu) block_stream virtio0 
(qemu) drive_del virtio0 
==15925== Thread 1:
==15925== Invalid read of size 8
==15925==at 0x4C0C55: bdrv_close (block.c:1853)
==15925==by 0x2E12C7: hmp_drive_del (blockdev.c:2173)
==15925==by 0x1E72D5: handle_hmp_command (monitor.c:4058)
==15925==by 0x1E9A44: monitor_command_cb (monitor.c:5081)
==15925==by 0x578443: readline_handle_byte (readline.c:391)
==15925==by 0x1E999E: monitor_read (monitor.c:5064)
==15925==by 0x2E7064: qemu_chr_be_write (qemu-char.c:306)
==15925==by 0x2E88AA: fd_chr_read (qemu-char.c:1012)
==15925==by 0x9B6DB4C: g_main_context_dispatch (in 
/lib/x86_64-linux-gnu/libglib-2.0.so.0.4400.1)
==15925==by 0x4C6D7C: glib_pollfds_poll (main-loop.c:199)
==15925==by 0x4C6E59: os_host_main_loop_wait (main-loop.c:244)
==15925==by 0x4C6F18: main_loop_wait (main-loop.c:493)
==15925==  Address 0x1a86f2a0 is 16 bytes after a block of size 256 free'd
==15925==at 0x4C29E90: free (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15925==by 0x6215767: pixman_image_unref (in 
/usr/lib/x86_64-linux-gnu/libpixman-1.so.0.32.6)
==15925==by 0x490131: qemu_pixman_glyph_render (qemu-pixman.c:250)
==15925==by 0x48AC70: vga_putcharxy (console.c:435)
==15925==by 0x48B38C: console_refresh (console.c:567)
==15925==by 0x48CC57: text_console_invalidate (console.c:1155)
==15925==by 0x48A6A2: graphic_hw_invalidate (console.c:269)
==15925==by 0x48E71D: text_console_update_cursor (console.c:1831)
==15925==by 0x4C8245: timerlist_run_timers (qemu-timer.c:502)
==15925==by 0x4C828D: qemu_clock_run_timers (qemu-timer.c:513)
==15925==by 0x4C8596: qemu_clock_run_all_timers (qemu-timer.c:621)
==15925==by 0x4C6F4E: main_loop_wait (main-loop.c:499)
==15925== 

As explained in my previous e-mail, the problem is that after the
block_stream operation, the bs-children list in hd.qcow2 still keeps a
pointer to base.qcow2 even when its memory has been freed.

Berto



Re: [Qemu-devel] [Qemu-block] [PATCH v8 00/11] Support streaming to an intermediate layer

2015-06-24 Thread Alberto Garcia
On Tue 23 Jun 2015 06:48:39 PM CEST, Stefan Hajnoczi wrote:

 It seems that self.vm.qmp('block-stream', ...) is returning None in
 your case. Is that the only test that is failing?

 Yes, only this test fails.  I have pushed my tree here:
 https://github.com/stefanha/qemu/commits/berto-intermediate-streaming

 I wonder if you see this failure on your machine with my tree.

I was debugging the problem. So we have this scenario:

[A] - [B] - [C] - [D] - [E] - [F] - [G]

The test that is failing is test_stream_parallel(), that creates
several concurrent streaming operations, in this order:

 a) [B] = [C]
 b) [D] = [E]
 c) [F] = [G]

The way it works is that stream_start() reopens the destination image
in read-write mode (if needed), creates the stream_run() coroutine to
copy the data, and finally stream_complete() puts it back in read-only
mode and closes the images that are removed from the chain.

This was not causing problems earlier because each operation works on a
different set of images, but it seems that things have changed since the
reopen overhaul patches:

   https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg01587.html
   https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg01592.html

Here's what happens:

 - We launch block job a).
 - While a) is ongoing, b) enters and tries to reopen [E] in
   read-write mode.
 - With the reopen overhaul patches, the BlockReopenQueue no longer
   contains just [E] but also all its backing chain up to [A].
 - bdrv_reopen_multiple() calls bdrv_reopen_prepare() and flushes all
   images.
 - At this point job a) takes over and finishes, reopening [C] again
   in read-only mode and closing [B], removing it from memory.
 - When b) continues (still in bdrv_reopen_multiple) the queue still
   contains [B], which is no longer a valid pointer.

My first reaction was to keep a reference to each BDS in the
BlockReopenQueue during the operation, but I don't know why that queue
needs to contain all the backing chain in the first place.

Berto



Re: [Qemu-devel] [Qemu-block] [PATCH v8 00/11] Support streaming to an intermediate layer

2015-06-24 Thread Alberto Garcia
On Wed 24 Jun 2015 10:02:03 AM CEST, Alberto Garcia wrote:

 [A] - [B] - [C] - [D] - [E] - [F] - [G]

 [...]

 The way it works is that stream_start() reopens the destination image
 in read-write mode (if needed), creates the stream_run() coroutine to
 copy the data, and finally stream_complete() puts it back in read-only
 mode and closes the images that are removed from the chain.

There's more. If I perform a single streaming operation [A] = [D] this
also crashes, although in a different way.

   block.c:1764: bdrv_reopen_prepare: Assertion `reopen_state-bs-drv != 
((void *)0)' failed.

What happens is that in stream_complete() all images that have been
removed from the chain ([A], [B] and [C]) are closed, and then [D] is
reopened in read-only mode.

The reopen queue contains [C], that is still listed as a child of [D]
although it's no longer a valid pointer.

[C] is opened using bdrv_open_backing_file() and attached to [D] with
bdrv_attach_child(), but never detached afterwards.

Berto



Re: [Qemu-devel] [Qemu-block] [PATCH v8 00/11] Support streaming to an intermediate layer

2015-06-23 Thread Stefan Hajnoczi
On Tue, Jun 23, 2015 at 5:09 PM, Alberto Garcia be...@igalia.com wrote:
 On Tue 23 Jun 2015 05:36:32 PM CEST, Stefan Hajnoczi wrote:

 Did you try qemu-iotests?  I'm still getting:

 030 11s ... [failed, exit status 1] - output mismatch (see
 030.out.bad)

 Yes, it works fine here.

 +Traceback (most recent call last):
 +  File 030, line 202, in test_stream_parallel
 +self.assert_qmp(result, 'return', {})

 This creates several streaming operations in parallel. That particular
 line just checks the return value of each block-stream command.

 +  File /home/stefanha/qemu/tests/qemu-iotests/iotests.py, line 286, in 
 assert_qmp
 +result = self.dictpath(d, path)
 +  File /home/stefanha/qemu/tests/qemu-iotests/iotests.py, line 265, in 
 dictpath
 +self.fail('failed path traversal for %s in %s' % (path, str(d)))
 +AssertionError: failed path traversal for return in None
 +
  --
  Ran 17 tests

 It seems that self.vm.qmp('block-stream', ...) is returning None in your
 case. Is that the only test that is failing?

Yes, only this test fails.  I have pushed my tree here:
https://github.com/stefanha/qemu/commits/berto-intermediate-streaming

I wonder if you see this failure on your machine with my tree.

Stefan