Re: [Qemu-block] [PATCH for-2.11 v2] file-posix: Clear out first sector in hdev_create

2017-08-11 Thread Fam Zheng
On Fri, 08/11 09:42, Eric Blake wrote:
> On 08/11/2017 03:09 AM, Fam Zheng wrote:
> > People get surprised when, after "qemu-img create -f raw /dev/sdX", they
> > still see qcow2 with "qemu-img info", if previously the bdev had a qcow2
> > header. While this is natural because raw doesn't need to write any
> > magic bytes during creation, hdev_create is free to clear out the first
> > sector to make sure the stale qcow2 header doesn't cause such confusion.
> > 
> > Signed-off-by: Fam Zheng 
> > 
> > ---
> > 
> > v2: Use stack allocated buffer. [Eric]
> > Fix return value.
> > (Keep qemu_write_full instead of switching to qemu_pwritev because
> > the former handles short writes.)
> > Fix typo "qemu-img". [Changlong]
> > ---
> >  block/file-posix.c | 10 ++
> >  1 file changed, 10 insertions(+)
> 
> > 
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index f4de022ae0..a63bbf2b90 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -2703,6 +2703,16 @@ static int hdev_create(const char *filename, 
> > QemuOpts *opts,
> >  ret = -ENOSPC;
> >  }
> >  
> > +if (total_size) {
> > +uint8_t buf[BDRV_SECTOR_SIZE] = { 0 };
> > +int64_t zero_size = MIN(BDRV_SECTOR_SIZE, total_size);
> > +if (lseek(fd, 0, SEEK_SET) == -1) {
> > +ret = -errno;
> > +} else {
> > +ret = qemu_write_full(fd, buf, zero_size);
> > +ret = ret == zero_size ? 0 : -errno;
> > +}
> > +}
> 
> Question: are we ever constrained by O_DIRECT where writing only 512
> bytes would be too small for a block device that mandates 4k alignment?
> If so, then we need MAX(minimum write size, MIN(BDRV_SECTOR_SIZE,
> total_size)) - it would also mean we can't stack-allocate any more, but
> that we have to do an aligned buffer allocation (where g_malloc is not
> necessarily suitably aligned).
> 
> If O_DIRECT is not a problem, then this is okay:

A few lines above:


fd = qemu_open(filename, O_WRONLY | O_BINARY);

so there is no O_DIRECT issue.

Fam



Re: [Qemu-block] [PATCH v2 for 2.10] block/nbd-client: always return EIO on and after the first io channel error

2017-08-11 Thread Eric Blake
On 08/08/2017 10:26 AM, Vladimir Sementsov-Ogievskiy wrote:

Initial review, I'm still playing with this one, and will reply again on
Monday.

> Do not communicate after the first error to avoid communicating throught

s/throught/through a/

> broken channel. The only exclusion is try to send NBD_CMD_DISC anyway on
> in nbd_client_close.

I think the exclusion is wrong - if we detected the server sending us
garbage, we're probably better off disconnecting entirely rather than
trying to assume that the server will give us a clean disconnect via
NBD_CMD_DISC.

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> Hi all. Here is a patch, fixing a problem noted in
> [PATCH 06/17] block/nbd-client: fix nbd_read_reply_entry
> and
> [PATCH 17/17] block/nbd-client: always return EIO on and after the first io 
> channel error
> and discussed on list.
> 
> If it will be applied to 2.10, then I'll rebase my 'nbd client refactoring 
> and fixing'
> on it (for 2.11). If not - I'll prefer not rebase the series, so, do not 
> apply this
> patch for 2.11.
> 
> v2: set eio_to_all in nbd_co_send_request and nbd_co_receive_reply in case of 
> error
> 
>  block/nbd-client.h |  1 +
>  block/nbd-client.c | 65 
> +++---
>  2 files changed, 48 insertions(+), 18 deletions(-)
> 
> diff --git a/block/nbd-client.h b/block/nbd-client.h
> index df80771357..28db9922c8 100644
> --- a/block/nbd-client.h
> +++ b/block/nbd-client.h
> @@ -29,6 +29,7 @@ typedef struct NBDClientSession {
>  
>  Coroutine *recv_coroutine[MAX_NBD_REQUESTS];
>  NBDReply reply;
> +bool eio_to_all;

Bikeshedding - I'm wondering if there is a better name; maybe 'error' or
'server_broken'?

>  } NBDClientSession;
>  
>  NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 25dd28406b..a72cb7690a 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -49,6 +49,8 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>  {
>  NBDClientSession *client = nbd_get_client_session(bs);
>  
> +client->eio_to_all = true;
> +

Okay, if you set it here, then it does NOT mean 'server_broken' - and if
we reached this spot normally, we still WANT the NBD_CMD_DISC.  So do we
really need to set it here?

>  if (!client->ioc) { /* Already closed */
>  return;
>  }
> @@ -74,12 +76,16 @@ static coroutine_fn void nbd_read_reply_entry(void 
> *opaque)
>  Error *local_err = NULL;
>  
>  for (;;) {
> +if (s->eio_to_all) {
> +break;
> +}
> +
>  assert(s->reply.handle == 0);
>  ret = nbd_receive_reply(s->ioc, >reply, _err);
>  if (ret < 0) {
>  error_report_err(local_err);
>  }
> -if (ret <= 0) {
> +if (ret <= 0 || s->eio_to_all) {

I'm still wondering if we need this condition at two points in the loop,
or if merely checking at the beginning of the cycle is sufficient (like
I said in my counterproposal thread, I'll be hammering away at your
patch under gdb over the weekend, to see if I can trigger all the
additions under some situation).

>  break;
>  }
>  
> @@ -107,6 +113,7 @@ static coroutine_fn void nbd_read_reply_entry(void 
> *opaque)
>  qemu_coroutine_yield();
>  }
>  
> +s->eio_to_all = true;

I think this one should be guarded by 'if (ret < 0)' - we also break out
of the for loop when ret == 0 (aka EOF because the server ended
cleanly), which is not the same as the server sending us garbage.

>  nbd_recv_coroutines_enter_all(s);
>  s->read_reply_co = NULL;
>  }
> @@ -118,6 +125,10 @@ static int nbd_co_send_request(BlockDriverState *bs,
>  NBDClientSession *s = nbd_get_client_session(bs);
>  int rc, ret, i;
>  
> +if (s->eio_to_all) {
> +return -EIO;
> +}
> +

This one is good.

>  qemu_co_mutex_lock(>send_mutex);
>  while (s->in_flight == MAX_NBD_REQUESTS) {
>  qemu_co_queue_wait(>free_sema, >send_mutex);
> @@ -135,15 +146,15 @@ static int nbd_co_send_request(BlockDriverState *bs,
>  assert(i < MAX_NBD_REQUESTS);
>  request->handle = INDEX_TO_HANDLE(s, i);
>  
> -if (!s->ioc) {
> +if (s->eio_to_all) {
>  qemu_co_mutex_unlock(>send_mutex);
> -return -EPIPE;
> +return -EIO;
>  }

I don't know if changing the errno is wise; maybe we want to keep both
conditions (the !s->ioc and the s->eio_to_all) - especially if we only
set eio_to_all on an actual detection of server garbage rather than on
all exit paths.

>  
>  if (qiov) {
>  qio_channel_set_cork(s->ioc, true);
>  rc = nbd_send_request(s->ioc, request);
> -if (rc >= 0) {
> +if (rc >= 0 && !s->eio_to_all) {
>  ret = nbd_rwv(s->ioc, qiov->iov, qiov->niov, request->len, false,
>NULL);
>  if (ret != request->len) {
> @@ -155,7 

Re: [Qemu-block] [Qemu-devel] [PATCH 1/2] nbd: Drop connection if broken server is detected

2017-08-11 Thread Eric Blake
On 08/11/2017 03:01 PM, Eric Blake wrote:
> On 08/11/2017 02:41 PM, Eric Blake wrote:
>>> Hmm, was it correct even before your patch? Is it safe to enter a coroutine
>>> (which we've scheduled by nbd_recv_coroutines_enter_all()), which is
>>> actually
>>> yielded inside nbd_rwv (not our yield in nbd_co_receive_reply)?
>>
>> I'm honestly not sure how to answer the question. In my testing, I was
>> unable to catch a coroutine yielding inside of nbd_rwv();
> 
> Single stepping through nbd_rwv(), I see that I/O is performed by
> sendmsg(), which either gets the message sent or, because of nonblocking
> mode, fails with EAGAIN, which gets turned into QIO_CHANNEL_ERR_BLOCK
> and indeed a call to qemu_channel_yield() within nbd_rwv() - but it's
> timing sensitive, so I still haven't been able to provoke this scenario
> using gdb.

With this compiled into the client:

diff --git i/nbd/common.c w/nbd/common.c
index a2f28f2eec..f10e991eed 100644
--- i/nbd/common.c
+++ w/nbd/common.c
@@ -36,6 +36,10 @@ ssize_t nbd_rwv(QIOChannel *ioc, struct iovec *iov,
size_t niov, size_t length,

 while (nlocal_iov > 0) {
 ssize_t len;
+static int hack;
+if (hack) {
+len = QIO_CHANNEL_ERR_BLOCK;
+} else
 if (do_read) {
 len = qio_channel_readv(ioc, local_iov, nlocal_iov, errp);
 } else {

I was able to set a breakpoint in gdb to temporarily manipulate 'hack'
at the right moment, in order to trigger what would happen if a
nbd_rwv() hit EAGAIN.  And sadly, I got a segfault using my patches,
because the last reference to ioc had been cleared in the meantime.


Program received signal SIGSEGV, Segmentation fault.
0x5562ee94 in object_class_dynamic_cast_assert
(class=0x55d9d1b0,
typename=0x556c856d "qio-channel", file=0x556c8560
"io/channel.c",
line=75, func=0x556c8670 <__func__.21506> "qio_channel_writev_full")
at qom/object.c:705
705 trace_object_class_dynamic_cast_assert(class ? class->type->name
: "(null)",


#0  0x5562ee94 in object_class_dynamic_cast_assert (
class=0x55d9d1b0, typename=0x556c856d "qio-channel",
file=0x556c8560 "io/channel.c", line=75,
func=0x556c8670 <__func__.21506> "qio_channel_writev_full")
at qom/object.c:705
#1  0x5562312d in qio_channel_writev_full (ioc=0x55d9bde0,
iov=0x55d9ec90, niov=1, fds=0x0, nfds=0, errp=0x0) at
io/channel.c:75
#2  0x55623233 in qio_channel_writev (ioc=0x55d9bde0,
iov=0x55d9ec90, niov=1, errp=0x0) at io/channel.c:102
#3  0x55603590 in nbd_rwv (ioc=0x55d9bde0, iov=0x55d9ecd0,
niov=1, length=1048576, do_read=false, errp=0x0) at nbd/common.c:46
#4  0x555ebcca in nbd_co_send_request (bs=0x55d94260,
request=0x7fffda2819f0, qiov=0x55d9ee88) at block/nbd-client.c:152


My next test is whether incrementing the ref-count to s->ioc for the
duration of nbd_co_send_request() is adequate to protect against this
problem:

diff --git i/block/nbd-client.c w/block/nbd-client.c
index 28b10f3fa2..cb0c4ebedf 100644
--- i/block/nbd-client.c
+++ w/block/nbd-client.c
@@ -147,6 +147,11 @@ static int nbd_co_send_request(BlockDriverState *bs,
 return -EPIPE;
 }

+/*
+ * Make sure ioc stays live, even if another coroutine decides to
+ * kill the connection because of a server error.
+ */
+object_ref(OBJECT(ioc));
 if (qiov) {
 qio_channel_set_cork(ioc, true);
 rc = nbd_send_request(ioc, request);
@@ -161,6 +166,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
 } else {
 rc = nbd_send_request(ioc, request);
 }
+object_unref(OBJECT(ioc));
 qemu_co_mutex_unlock(>send_mutex);
 return rc;
 }

and that got further, only to crash at:

(gdb) c
Continuing.
readv failed: Input/output error
aio_write failed: Input/output error
qemu-io: block/block-backend.c:1211: blk_aio_write_entry: Assertion
`!rwco->qiov || rwco->qiov->size == acb->bytes' failed.

where rwco->qiov->size was garbage.

At this point, while I still think it might be possible to come up with
a less-invasive  solution than your v2 patch, I'm also at the point
where I want the bug fixed rather than me wallowing around trying to
debug coroutine interactions; and thus I'm leaning towards your v2 patch
as being more likely to be robust in the face of concurrency (it's not
killing ioc while other coroutines still exist, so much as just making
sure that every yield point checks if the kill switch has been triggered
for a short-circuit exit).  So I will probably be taking your version
and creating a pull request for -rc3 on Monday.  (Before I fully ack
your version, though, I _will_ be hammering at it under gdb the same way
I hammered at mine)


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 1/2] nbd: Drop connection if broken server is detected

2017-08-11 Thread Eric Blake
On 08/11/2017 02:41 PM, Eric Blake wrote:
>> Hmm, was it correct even before your patch? Is it safe to enter a coroutine
>> (which we've scheduled by nbd_recv_coroutines_enter_all()), which is
>> actually
>> yielded inside nbd_rwv (not our yield in nbd_co_receive_reply)?
> 
> I'm honestly not sure how to answer the question. In my testing, I was
> unable to catch a coroutine yielding inside of nbd_rwv();

Single stepping through nbd_rwv(), I see that I/O is performed by
sendmsg(), which either gets the message sent or, because of nonblocking
mode, fails with EAGAIN, which gets turned into QIO_CHANNEL_ERR_BLOCK
and indeed a call to qemu_channel_yield() within nbd_rwv() - but it's
timing sensitive, so I still haven't been able to provoke this scenario
using gdb.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 1/2] nbd: Drop connection if broken server is detected

2017-08-11 Thread Eric Blake
On 08/11/2017 09:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> 11.08.2017 17:15, Eric Blake wrote:
>> On 08/11/2017 02:48 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> 11.08.2017 05:37, Eric Blake wrote:
 As soon as the server is sending us garbage, we should quit
 trying to send further messages to the server, and allow all
 pending coroutines for any remaining replies to error out.
 Failure to do so can let a malicious server cause the client
 to hang, for example, if the server sends an invalid magic
 number in its response.

>> The nbd_recv_coroutines_enter_all() call schedules all pending
>> nbd_co_send_request coroutines to fire as soon as the current coroutine
>> reaches a yield point. The next yield point is during the
>> BDRV_POLL_WHILE of nbd_teardown_connection - but this is AFTER we've
>> called qio_channel_shutdown() - so as long as nbd_rwv() is called with a
>> valid ioc, the qio code should recognize that we are shutting down the
>> connection and gracefully give an error on each write attempt.
> 
> 
> Hmm, was it correct even before your patch? Is it safe to enter a coroutine
> (which we've scheduled by nbd_recv_coroutines_enter_all()), which is
> actually
> yielded inside nbd_rwv (not our yield in nbd_co_receive_reply)?

I'm honestly not sure how to answer the question. In my testing, I was
unable to catch a coroutine yielding inside of nbd_rwv(); I was able to
easily provoke a situation where the client can send two or more
commands prior to the server getting a chance to reply to either:

./qemu-io -f raw nbd://localhost:10809/foo \
   -c 'aio_read 0 512' -c 'aio_write 1k 1m'

where tracing the server proves that the server received both commands
before sending a reply; when the client sends two aio_read commands, it
was even the case that I could observe the server replying to the second
read before the first.  So I'm definitely provoking parallel coroutines.
 But even without my tentative squash patch, I haven't been able to
observe s->ioc change from valid to NULL within the body of
nbd_co_send_request - either the entire request is skipped because ioc
was already cleared, or the entire request operates on a valid ioc
(although the request may still fail with EPIPE because the ioc has
started its efforts at shutdown).  I even tried varying the size of the
aio_write; with 1M, the client got the write request sent off before the
server's reply; but 2M was large enough that the server sent the read
reply before the client could send the write.  Since we are using a
mutex, we have at most one coroutine able to attempt a write at once;
but that still says nothing about how many other parallel coroutines can
wake up to do a read.  I also think the fact that we are using qio's
set_cork around the paired writes is helping: although we have two calls
to nbd_rwv(), the first one is for the NBD_CMD_WRITE header which is
small that the qio layer waits for the second nbd_rwv() call before
actually sending anything over the wire (if anything, we are more likely
to see s->ioc change before the final set_cork call, rather than between
the two writes - if that change can even happen).

>>
>> I see your point about the fact that coroutines can change hands in
>> between our two writes for an NBD_CMD_WRITE in nbd_co_send_request()
>> (the first write is nbd_send_request() for the header, the second is
>> nbd_rwv() for the data) - if between those two writes we process a
>> failing read, I see your point about us risking re-reading s->ioc as
> 
> But there are no yields between two writes, so, if previous logic is
> correct,
> if the read fails during first write it will return and error and we
> will not go
> into the second write. If it fails during the second write, it should be
> OK too.

If we can ever observe s->ioc changing to NULL, then my followup squash
patch is needed (if nothing else, calling qio_channel_set_cork(NULL,
false) will crash).  But I'm not familiar enough with coroutines to know
if it is possible, or just paranoia on my part.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PULL 0/7] Block/Multiboot patches for 2.10.0-rc3

2017-08-11 Thread Peter Maydell
On 11 August 2017 at 18:10, Peter Maydell  wrote:
> I get an intermittent failure on aarch64 test-aio-multithread:
>
> MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> gtester -k --verbose -m=quick tests/test-aio-multithread
> TEST: tests/test-aio-multithread... (pid=19863)
>   /aio/multi/lifecycle:OK
>   /aio/multi/schedule: OK
>   /aio/multi/mutex/contended:  OK
>   /aio/multi/mutex/handoff:OK
>   /aio/multi/mutex/mcs:**
> ERROR:/home/pm215/qemu/tests/test-aio-multithread.c:368:test_multi_fair_mutex:
> assertion failed (counter == atomic_counter): (343406 == 343407)
> FAIL
> GTester: last random seed: R02S227b39277b8c54976c98f0e990305851
> (pid=21145)
>   /aio/multi/mutex/pthread:OK
> FAIL: tests/test-aio-multithread
>
>
> but I've pushed this to master on the optimistic assumption that
> it's not the fault of anything in this pullreq... (will
> investigate further)

I can repro this on 95766c2cd043 so it definitely isn't this patchset's
fault.

thanks
-- PMM



Re: [Qemu-block] [PULL 0/7] Block/Multiboot patches for 2.10.0-rc3

2017-08-11 Thread Peter Maydell
On 11 August 2017 at 15:05, Kevin Wolf  wrote:
> The following changes since commit 95766c2cd04395e5712b4d5967b3251f35d537df:
>
>   Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' 
> into staging (2017-08-10 18:53:39 +0100)
>
> are available in the git repository at:
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to 8565c3ab537e78f3e69977ec2c609dc9417a806e:
>
>   qemu-iotests: fix 185 (2017-08-11 14:44:39 +0200)
>
> 
> Block layer patches for 2.10.0-rc3
>
> 

I get an intermittent failure on aarch64 test-aio-multithread:

MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
gtester -k --verbose -m=quick tests/test-aio-multithread
TEST: tests/test-aio-multithread... (pid=19863)
  /aio/multi/lifecycle:OK
  /aio/multi/schedule: OK
  /aio/multi/mutex/contended:  OK
  /aio/multi/mutex/handoff:OK
  /aio/multi/mutex/mcs:**
ERROR:/home/pm215/qemu/tests/test-aio-multithread.c:368:test_multi_fair_mutex:
assertion failed (counter == atomic_counter): (343406 == 343407)
FAIL
GTester: last random seed: R02S227b39277b8c54976c98f0e990305851
(pid=21145)
  /aio/multi/mutex/pthread:OK
FAIL: tests/test-aio-multithread


but I've pushed this to master on the optimistic assumption that
it's not the fault of anything in this pullreq... (will
investigate further)

thanks
-- PMM



Re: [Qemu-block] [Qemu-devel] migration issue with qemu 2.10-rc2: QEMU command 'nbd-server-add': Block node is read-only

2017-08-11 Thread Kevin Wolf
Am 11.08.2017 um 17:34 hat Christian Ehrhardt geschrieben:
> On Fri, Aug 11, 2017 at 2:37 PM, Kevin Wolf  wrote:
> 
> > Am 11.08.2017 um 14:04 hat Fam Zheng geschrieben:
> > > On Fri, 08/11 13:07, Christian Ehrhardt wrote:
> > > > Simplifying that to a smaller test:
> > > >
> >
> [...]
> 
> > > > Block node is read-only
> >
> [...]
> 
> > >
> > > This is actually caused by
> > >
> > > commit 9c5e6594f15b7364624a3ad40306c396c93a2145
> > > Author: Kevin Wolf 
> > > Date:   Thu May 4 18:52:40 2017 +0200
> > >
> > > block: Fix write/resize permissions for inactive images
> > >
> >
> 
> Yeah in the meantime an automated bisect run is through an agrees.
> Thanks for pointing out the right change triggering that so early!
> 
> Thanks Kevin for all the suggestions already, I quickly looked at the code
> but I'm lost there without taking much more time.
> Is anybody experienced in that area looking at fixing that?
> Because IMHO that is kind of a block bug for 2.10 by breaking formerly
> working behavior (even as Kevin identified it seems to have done it wrong
> all the time).

The patch below implements what I was thinking of, and it seems to fix
this problem. However, you'll immediately run into the next one, which
is a message like 'Conflicts with use by ide0-hd0 as 'root', which does
not allow 'write' on #block172'.

The reason for this is that since commit 4417ab7adf1,
bdrv_invalidate_cache() not only activates the image, but also is the
signal for guest device BlockBackends that migration has completed and
they should now request exclusive access to the image.

As the NBD server shows, this assumption is wrong. Images can be
activated even before migration completes. Maybe this really needs to
be done in a VM state change handler.

We can't simply revert commit 4417ab7adf1 because for image file
locking, it is important that we drop locks for inactive images, so
BdrvChildRole.activate/inactivate will probably stay. Maybe only one
bool blk->disable_perm isn't enough, we might need a different one for
devices before migration completed, or at least a counter.

I'll be on vacation starting tomorrow, so someone else needs to tackle
this. Fam, I think you are relatively familiar with the op blockers?

Kevin


diff --git a/nbd/server.c b/nbd/server.c
index 82a78bf439..b68b6fb535 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1045,11 +1045,22 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
dev_offset, off_t size,
   bool writethrough, BlockBackend *on_eject_blk,
   Error **errp)
 {
+AioContext  *ctx;
 BlockBackend *blk;
 NBDExport *exp = g_malloc0(sizeof(NBDExport));
 uint64_t perm;
 int ret;
 
+/*
+ * NBD exports are used for non-shared storage migration.  Make sure
+ * that BDRV_O_INACTIVE is cleared and the image is ready for write
+ * access since the export could be available before migration handover.
+ */
+ctx = bdrv_get_aio_context(bs);
+aio_context_acquire(ctx);
+bdrv_invalidate_cache(bs, NULL);
+aio_context_release(ctx);
+
 /* Don't allow resize while the NBD server is running, otherwise we don't
  * care what happens with the node. */
 perm = BLK_PERM_CONSISTENT_READ;
@@ -1087,15 +1098,6 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
dev_offset, off_t size,
 exp->eject_notifier.notify = nbd_eject_notifier;
 blk_add_remove_bs_notifier(on_eject_blk, >eject_notifier);
 }
-
-/*
- * NBD exports are used for non-shared storage migration.  Make sure
- * that BDRV_O_INACTIVE is cleared and the image is ready for write
- * access since the export could be available before migration handover.
- */
-aio_context_acquire(exp->ctx);
-blk_invalidate_cache(blk, NULL);
-aio_context_release(exp->ctx);
 return exp;
 
 fail:



Re: [Qemu-block] [Qemu-devel] migration issue with qemu 2.10-rc2: QEMU command 'nbd-server-add': Block node is read-only

2017-08-11 Thread Christian Ehrhardt
On Fri, Aug 11, 2017 at 2:37 PM, Kevin Wolf  wrote:

> Am 11.08.2017 um 14:04 hat Fam Zheng geschrieben:
> > On Fri, 08/11 13:07, Christian Ehrhardt wrote:
> > > Simplifying that to a smaller test:
> > >
>
[...]

> > > Block node is read-only
>
[...]

> >
> > This is actually caused by
> >
> > commit 9c5e6594f15b7364624a3ad40306c396c93a2145
> > Author: Kevin Wolf 
> > Date:   Thu May 4 18:52:40 2017 +0200
> >
> > block: Fix write/resize permissions for inactive images
> >
>

Yeah in the meantime an automated bisect run is through an agrees.
Thanks for pointing out the right change triggering that so early!

Thanks Kevin for all the suggestions already, I quickly looked at the code
but I'm lost there without taking much more time.
Is anybody experienced in that area looking at fixing that?
Because IMHO that is kind of a block bug for 2.10 by breaking formerly
working behavior (even as Kevin identified it seems to have done it wrong
all the time).


Re: [Qemu-block] [PATCH 1/2] nbd: Drop connection if broken server is detected

2017-08-11 Thread Vladimir Sementsov-Ogievskiy

11.08.2017 17:15, Eric Blake wrote:

On 08/11/2017 02:48 AM, Vladimir Sementsov-Ogievskiy wrote:

11.08.2017 05:37, Eric Blake wrote:

As soon as the server is sending us garbage, we should quit
trying to send further messages to the server, and allow all
pending coroutines for any remaining replies to error out.
Failure to do so can let a malicious server cause the client
to hang, for example, if the server sends an invalid magic
number in its response.
@@ -107,8 +108,12 @@ static coroutine_fn void
nbd_read_reply_entry(void *opaque)
   qemu_coroutine_yield();
   }

+s->reply.handle = 0;
   nbd_recv_coroutines_enter_all(s);
   s->read_reply_co = NULL;
+if (ret < 0) {
+nbd_teardown_connection(bs);
+}

what if it happens in parallel with nbd_co_send_request?
nbd_teardown_connectin destroys s->ioc, nbd_co_send_requests
checks s->ioc only once and then calls nbd_send_request (which is
finally nbd_rwv and may yield). I think nbd_rwv is not
prepared to sudden destruction of ioc..

The nbd_recv_coroutines_enter_all() call schedules all pending
nbd_co_send_request coroutines to fire as soon as the current coroutine
reaches a yield point. The next yield point is during the
BDRV_POLL_WHILE of nbd_teardown_connection - but this is AFTER we've
called qio_channel_shutdown() - so as long as nbd_rwv() is called with a
valid ioc, the qio code should recognize that we are shutting down the
connection and gracefully give an error on each write attempt.



Hmm, was it correct even before your patch? Is it safe to enter a coroutine
(which we've scheduled by nbd_recv_coroutines_enter_all()), which is 
actually

yielded inside nbd_rwv (not our yield in nbd_co_receive_reply)?



I see your point about the fact that coroutines can change hands in
between our two writes for an NBD_CMD_WRITE in nbd_co_send_request()
(the first write is nbd_send_request() for the header, the second is
nbd_rwv() for the data) - if between those two writes we process a
failing read, I see your point about us risking re-reading s->ioc as


But there are no yields between two writes, so, if previous logic is 
correct,
if the read fails during first write it will return and error and we 
will not go
into the second write. If it fails during the second write, it should be 
OK too.



NULL for the second write call.  But maybe this is an appropriate fix -
hanging on to the ioc that we learned when grabbing the send_mutex:


diff --git a/block/nbd-client.c b/block/nbd-client.c
index 802d50b636..28b10f3fa2 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -122,6 +122,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
  {
  NBDClientSession *s = nbd_get_client_session(bs);
  int rc, ret, i;
+QIOChannel *ioc;

  qemu_co_mutex_lock(>send_mutex);
  while (s->in_flight == MAX_NBD_REQUESTS) {
@@ -139,25 +140,26 @@ static int nbd_co_send_request(BlockDriverState *bs,
  g_assert(qemu_in_coroutine());
  assert(i < MAX_NBD_REQUESTS);
  request->handle = INDEX_TO_HANDLE(s, i);
+ioc = s->ioc;

-if (!s->ioc) {
+if (!ioc) {
  qemu_co_mutex_unlock(>send_mutex);
  return -EPIPE;
  }

  if (qiov) {
-qio_channel_set_cork(s->ioc, true);
-rc = nbd_send_request(s->ioc, request);
+qio_channel_set_cork(ioc, true);
+rc = nbd_send_request(ioc, request);
  if (rc >= 0) {
-ret = nbd_rwv(s->ioc, qiov->iov, qiov->niov, request->len,
false,
+ret = nbd_rwv(ioc, qiov->iov, qiov->niov, request->len, false,
NULL);
  if (ret != request->len) {
  rc = -EIO;
  }
  }
-qio_channel_set_cork(s->ioc, false);
+qio_channel_set_cork(ioc, false);
  } else {
-rc = nbd_send_request(s->ioc, request);
+rc = nbd_send_request(ioc, request);
  }
  qemu_co_mutex_unlock(>send_mutex);
  return rc;



But I'm really hoping Paolo will chime in on this thread.



--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH for-2.11 v2] file-posix: Clear out first sector in hdev_create

2017-08-11 Thread Eric Blake
On 08/11/2017 03:09 AM, Fam Zheng wrote:
> People get surprised when, after "qemu-img create -f raw /dev/sdX", they
> still see qcow2 with "qemu-img info", if previously the bdev had a qcow2
> header. While this is natural because raw doesn't need to write any
> magic bytes during creation, hdev_create is free to clear out the first
> sector to make sure the stale qcow2 header doesn't cause such confusion.
> 
> Signed-off-by: Fam Zheng 
> 
> ---
> 
> v2: Use stack allocated buffer. [Eric]
> Fix return value.
> (Keep qemu_write_full instead of switching to qemu_pwritev because
> the former handles short writes.)
> Fix typo "qemu-img". [Changlong]
> ---
>  block/file-posix.c | 10 ++
>  1 file changed, 10 insertions(+)

> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index f4de022ae0..a63bbf2b90 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2703,6 +2703,16 @@ static int hdev_create(const char *filename, QemuOpts 
> *opts,
>  ret = -ENOSPC;
>  }
>  
> +if (total_size) {
> +uint8_t buf[BDRV_SECTOR_SIZE] = { 0 };
> +int64_t zero_size = MIN(BDRV_SECTOR_SIZE, total_size);
> +if (lseek(fd, 0, SEEK_SET) == -1) {
> +ret = -errno;
> +} else {
> +ret = qemu_write_full(fd, buf, zero_size);
> +ret = ret == zero_size ? 0 : -errno;
> +}
> +}

Question: are we ever constrained by O_DIRECT where writing only 512
bytes would be too small for a block device that mandates 4k alignment?
If so, then we need MAX(minimum write size, MIN(BDRV_SECTOR_SIZE,
total_size)) - it would also mean we can't stack-allocate any more, but
that we have to do an aligned buffer allocation (where g_malloc is not
necessarily suitably aligned).

If O_DIRECT is not a problem, then this is okay:

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 1/2] nbd: Drop connection if broken server is detected

2017-08-11 Thread Eric Blake
On 08/11/2017 02:48 AM, Vladimir Sementsov-Ogievskiy wrote:
> 11.08.2017 05:37, Eric Blake wrote:
>> As soon as the server is sending us garbage, we should quit
>> trying to send further messages to the server, and allow all
>> pending coroutines for any remaining replies to error out.
>> Failure to do so can let a malicious server cause the client
>> to hang, for example, if the server sends an invalid magic
>> number in its response.

>> @@ -107,8 +108,12 @@ static coroutine_fn void
>> nbd_read_reply_entry(void *opaque)
>>   qemu_coroutine_yield();
>>   }
>>
>> +s->reply.handle = 0;
>>   nbd_recv_coroutines_enter_all(s);
>>   s->read_reply_co = NULL;
>> +if (ret < 0) {
>> +nbd_teardown_connection(bs);
>> +}
> 
> what if it happens in parallel with nbd_co_send_request?
> nbd_teardown_connectin destroys s->ioc, nbd_co_send_requests
> checks s->ioc only once and then calls nbd_send_request (which is
> finally nbd_rwv and may yield). I think nbd_rwv is not
> prepared to sudden destruction of ioc..

The nbd_recv_coroutines_enter_all() call schedules all pending
nbd_co_send_request coroutines to fire as soon as the current coroutine
reaches a yield point. The next yield point is during the
BDRV_POLL_WHILE of nbd_teardown_connection - but this is AFTER we've
called qio_channel_shutdown() - so as long as nbd_rwv() is called with a
valid ioc, the qio code should recognize that we are shutting down the
connection and gracefully give an error on each write attempt.

I see your point about the fact that coroutines can change hands in
between our two writes for an NBD_CMD_WRITE in nbd_co_send_request()
(the first write is nbd_send_request() for the header, the second is
nbd_rwv() for the data) - if between those two writes we process a
failing read, I see your point about us risking re-reading s->ioc as
NULL for the second write call.  But maybe this is an appropriate fix -
hanging on to the ioc that we learned when grabbing the send_mutex:


diff --git a/block/nbd-client.c b/block/nbd-client.c
index 802d50b636..28b10f3fa2 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -122,6 +122,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
 {
 NBDClientSession *s = nbd_get_client_session(bs);
 int rc, ret, i;
+QIOChannel *ioc;

 qemu_co_mutex_lock(>send_mutex);
 while (s->in_flight == MAX_NBD_REQUESTS) {
@@ -139,25 +140,26 @@ static int nbd_co_send_request(BlockDriverState *bs,
 g_assert(qemu_in_coroutine());
 assert(i < MAX_NBD_REQUESTS);
 request->handle = INDEX_TO_HANDLE(s, i);
+ioc = s->ioc;

-if (!s->ioc) {
+if (!ioc) {
 qemu_co_mutex_unlock(>send_mutex);
 return -EPIPE;
 }

 if (qiov) {
-qio_channel_set_cork(s->ioc, true);
-rc = nbd_send_request(s->ioc, request);
+qio_channel_set_cork(ioc, true);
+rc = nbd_send_request(ioc, request);
 if (rc >= 0) {
-ret = nbd_rwv(s->ioc, qiov->iov, qiov->niov, request->len,
false,
+ret = nbd_rwv(ioc, qiov->iov, qiov->niov, request->len, false,
   NULL);
 if (ret != request->len) {
 rc = -EIO;
 }
 }
-qio_channel_set_cork(s->ioc, false);
+qio_channel_set_cork(ioc, false);
 } else {
-rc = nbd_send_request(s->ioc, request);
+rc = nbd_send_request(ioc, request);
 }
 qemu_co_mutex_unlock(>send_mutex);
 return rc;



But I'm really hoping Paolo will chime in on this thread.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PULL 7/7] qemu-iotests: fix 185

2017-08-11 Thread Kevin Wolf
From: Vladimir Sementsov-Ogievskiy 

185 can sometimes produce wrong output like this:

185 2s ... - output mismatch (see 185.out.bad)
--- /work/src/qemu/master/tests/qemu-iotests/185.out2017-07-14 \
15:14:29.520343805 +0300
+++ 185.out.bad 2017-08-07 16:51:02.231922900 +0300
@@ -37,7 +37,7 @@
 {"return": {}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, \
 "event": "SHUTDOWN", "data": {"guest": false}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, \
"event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", \
"len": 4194304, "offset": 4194304, "speed": 65536, "type": \
"mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, \
"event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", \
"len": 0, "offset": 0, "speed": 65536, "type": "mirror"}}

 === Start backup job and exit qemu ===

Failures: 185
Failed 1 of 1 tests

This is because, under heavy load, the quit can happen before the first
iteration of the mirror request has occurred.  To make sure we've had
time to iterate, let's just add a sleep for 0.5 seconds before quitting.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/185 | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/qemu-iotests/185 b/tests/qemu-iotests/185
index 0eda371f27..f5b47e4c1a 100755
--- a/tests/qemu-iotests/185
+++ b/tests/qemu-iotests/185
@@ -156,6 +156,10 @@ _send_qemu_cmd $h \
   'speed': 65536 } }" \
 "return"
 
+# If we don't sleep here 'quit' command may be handled before
+# the first mirror iteration is done
+sleep 0.5
+
 _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
 wait=1 _cleanup_qemu
 
-- 
2.13.4




[Qemu-block] [PULL 6/7] file-posix: Do runtime check for ofd lock API

2017-08-11 Thread Kevin Wolf
From: Fam Zheng 

It is reported that on Windows Subsystem for Linux, ofd operations fail
with -EINVAL. In other words, QEMU binary built with system headers that
exports F_OFD_SETLK doesn't necessarily run in an environment that
actually supports it:

$ qemu-system-aarch64 ... -drive file=test.vhdx,if=none,id=hd0 \
-device virtio-blk-pci,drive=hd0
qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to unlock 
byte 100
qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to unlock 
byte 100
qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to lock byte 
100

As a matter of fact this is not WSL specific. It can happen when running
a QEMU compiled against a newer glibc on an older kernel, such as in
a containerized environment.

Let's do a runtime check to cope with that.

Reported-by: Andrew Baumann 
Reviewed-by: Eric Blake 
Signed-off-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/file-posix.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index f4de022ae0..cb3bfce147 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -457,22 +457,19 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 switch (locking) {
 case ON_OFF_AUTO_ON:
 s->use_lock = true;
-#ifndef F_OFD_SETLK
-fprintf(stderr,
-"File lock requested but OFD locking syscall is unavailable, "
-"falling back to POSIX file locks.\n"
-"Due to the implementation, locks can be lost 
unexpectedly.\n");
-#endif
+if (!qemu_has_ofd_lock()) {
+fprintf(stderr,
+"File lock requested but OFD locking syscall is "
+"unavailable, falling back to POSIX file locks.\n"
+"Due to the implementation, locks can be lost "
+"unexpectedly.\n");
+}
 break;
 case ON_OFF_AUTO_OFF:
 s->use_lock = false;
 break;
 case ON_OFF_AUTO_AUTO:
-#ifdef F_OFD_SETLK
-s->use_lock = true;
-#else
-s->use_lock = false;
-#endif
+s->use_lock = qemu_has_ofd_lock();
 break;
 default:
 abort();
-- 
2.13.4




[Qemu-block] [PULL 3/7] qcow2: Drop debugging dump_refcounts()

2017-08-11 Thread Kevin Wolf
From: Eric Blake 

It's been #if 0'd since its introduction in 2006, commit 585f8587.
We can revive dead code if we need it, but in the meantime, it has
bit-rotted (for example, not checking for failure in bdrv_getlength()).

Signed-off-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 
Reviewed-by: John Snow 
Signed-off-by: Kevin Wolf 
---
 block/qcow2.c | 21 -
 1 file changed, 21 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d7c600b5a2..99407403ea 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3798,27 +3798,6 @@ static ImageInfoSpecific 
*qcow2_get_specific_info(BlockDriverState *bs)
 return spec_info;
 }
 
-#if 0
-static void dump_refcounts(BlockDriverState *bs)
-{
-BDRVQcow2State *s = bs->opaque;
-int64_t nb_clusters, k, k1, size;
-int refcount;
-
-size = bdrv_getlength(bs->file->bs);
-nb_clusters = size_to_clusters(s, size);
-for(k = 0; k < nb_clusters;) {
-k1 = k;
-refcount = get_refcount(bs, k);
-k++;
-while (k < nb_clusters && get_refcount(bs, k) == refcount)
-k++;
-printf("%" PRId64 ": refcount=%d nb=%" PRId64 "\n", k, refcount,
-   k - k1);
-}
-}
-#endif
-
 static int qcow2_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
   int64_t pos)
 {
-- 
2.13.4




[Qemu-block] [PULL 5/7] osdep: Add runtime OFD lock detection

2017-08-11 Thread Kevin Wolf
From: Fam Zheng 

Build time check of OFD lock is not sufficient and can cause image open
errors when the runtime environment doesn't support it.

Add a helper function to probe it at runtime, additionally. Also provide
a qemu_has_ofd_lock() for callers to check the status.

Signed-off-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 include/qemu/osdep.h |  1 +
 util/osdep.c | 66 
 2 files changed, 57 insertions(+), 10 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 3b74f6fcb2..6855b94bbf 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -357,6 +357,7 @@ int qemu_dup(int fd);
 int qemu_lock_fd(int fd, int64_t start, int64_t len, bool exclusive);
 int qemu_unlock_fd(int fd, int64_t start, int64_t len);
 int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive);
+bool qemu_has_ofd_lock(void);
 
 #if defined(__HAIKU__) && defined(__i386__)
 #define FMT_pid "%ld"
diff --git a/util/osdep.c b/util/osdep.c
index a2863c8e53..a479fedc4a 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -38,14 +38,6 @@ extern int madvise(caddr_t, size_t, int);
 #include "qemu/error-report.h"
 #include "monitor/monitor.h"
 
-#ifdef F_OFD_SETLK
-#define QEMU_SETLK F_OFD_SETLK
-#define QEMU_GETLK F_OFD_GETLK
-#else
-#define QEMU_SETLK F_SETLK
-#define QEMU_GETLK F_GETLK
-#endif
-
 static bool fips_enabled = false;
 
 static const char *hw_version = QEMU_HW_VERSION;
@@ -82,6 +74,10 @@ int qemu_madvise(void *addr, size_t len, int advice)
 }
 
 #ifndef _WIN32
+
+static int fcntl_op_setlk = -1;
+static int fcntl_op_getlk = -1;
+
 /*
  * Dups an fd and sets the flags
  */
@@ -149,6 +145,54 @@ static int qemu_parse_fdset(const char *param)
 return qemu_parse_fd(param);
 }
 
+static void qemu_probe_lock_ops(void)
+{
+if (fcntl_op_setlk == -1) {
+#ifdef F_OFD_SETLK
+int fd;
+int ret;
+struct flock fl = {
+.l_whence = SEEK_SET,
+.l_start  = 0,
+.l_len= 0,
+.l_type   = F_WRLCK,
+};
+
+fd = open("/dev/null", O_RDWR);
+if (fd < 0) {
+fprintf(stderr,
+"Failed to open /dev/null for OFD lock probing: %s\n",
+strerror(errno));
+fcntl_op_setlk = F_SETLK;
+fcntl_op_getlk = F_GETLK;
+return;
+}
+ret = fcntl(fd, F_OFD_GETLK, );
+close(fd);
+if (!ret) {
+fcntl_op_setlk = F_OFD_SETLK;
+fcntl_op_getlk = F_OFD_GETLK;
+} else {
+fcntl_op_setlk = F_SETLK;
+fcntl_op_getlk = F_GETLK;
+}
+#else
+fcntl_op_setlk = F_SETLK;
+fcntl_op_getlk = F_GETLK;
+#endif
+}
+}
+
+bool qemu_has_ofd_lock(void)
+{
+qemu_probe_lock_ops();
+#ifdef F_OFD_SETLK
+return fcntl_op_setlk == F_OFD_SETLK;
+#else
+return false;
+#endif
+}
+
 static int qemu_lock_fcntl(int fd, int64_t start, int64_t len, int fl_type)
 {
 int ret;
@@ -158,7 +202,8 @@ static int qemu_lock_fcntl(int fd, int64_t start, int64_t 
len, int fl_type)
 .l_len= len,
 .l_type   = fl_type,
 };
-ret = fcntl(fd, QEMU_SETLK, );
+qemu_probe_lock_ops();
+ret = fcntl(fd, fcntl_op_setlk, );
 return ret == -1 ? -errno : 0;
 }
 
@@ -181,7 +226,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, 
bool exclusive)
 .l_len= len,
 .l_type   = exclusive ? F_WRLCK : F_RDLCK,
 };
-ret = fcntl(fd, QEMU_GETLK, );
+qemu_probe_lock_ops();
+ret = fcntl(fd, fcntl_op_getlk, );
 if (ret == -1) {
 return -errno;
 } else {
-- 
2.13.4




[Qemu-block] [PULL 4/7] qcow2: Check failure of bdrv_getlength()

2017-08-11 Thread Kevin Wolf
From: Eric Blake 

qcow2_co_pwritev_compressed() should not call bdrv_truncate()
if determining the size failed.

Reported-by: Markus Armbruster 
Signed-off-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 
Reviewed-by: John Snow 
Signed-off-by: Kevin Wolf 
---
 block/qcow2.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 99407403ea..40ba26c111 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3282,12 +3282,15 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 z_stream strm;
 int ret, out_len;
 uint8_t *buf, *out_buf;
-uint64_t cluster_offset;
+int64_t cluster_offset;
 
 if (bytes == 0) {
 /* align end of file to a sector boundary to ease reading with
sector based I/Os */
 cluster_offset = bdrv_getlength(bs->file->bs);
+if (cluster_offset < 0) {
+return cluster_offset;
+}
 return bdrv_truncate(bs->file, cluster_offset, PREALLOC_MODE_OFF, 
NULL);
 }
 
-- 
2.13.4




[Qemu-block] [PULL 2/7] vpc: Check failure of bdrv_getlength()

2017-08-11 Thread Kevin Wolf
From: Eric Blake 

vpc_open() was checking for bdrv_getlength() failure in one, but
not the other, location.

Reported-by: Markus Armbruster 
Signed-off-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Jeff Cody 
Reviewed-by: John Snow 
Signed-off-by: Kevin Wolf 
---
 block/vpc.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/vpc.c b/block/vpc.c
index 574879ba7c..82911ebead 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -219,6 +219,7 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
 uint64_t pagetable_size;
 int disk_type = VHD_DYNAMIC;
 int ret;
+int64_t bs_size;
 
 bs->file = bdrv_open_child(NULL, options, "file", bs, _file,
false, errp);
@@ -411,7 +412,13 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 }
 
-if (s->free_data_block_offset > bdrv_getlength(bs->file->bs)) {
+bs_size = bdrv_getlength(bs->file->bs);
+if (bs_size < 0) {
+error_setg_errno(errp, -bs_size, "Unable to learn image size");
+ret = bs_size;
+goto fail;
+}
+if (s->free_data_block_offset > bs_size) {
 error_setg(errp, "block-vpc: free_data_block_offset points after "
  "the end of file. The image has been truncated.");
 ret = -EINVAL;
-- 
2.13.4




[Qemu-block] [PULL 0/7] Block/Multiboot patches for 2.10.0-rc3

2017-08-11 Thread Kevin Wolf
The following changes since commit 95766c2cd04395e5712b4d5967b3251f35d537df:

  Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into 
staging (2017-08-10 18:53:39 +0100)

are available in the git repository at:

  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 8565c3ab537e78f3e69977ec2c609dc9417a806e:

  qemu-iotests: fix 185 (2017-08-11 14:44:39 +0200)


Block layer patches for 2.10.0-rc3


Eric Blake (4):
  tests/multiboot: Fix whitespace failure
  vpc: Check failure of bdrv_getlength()
  qcow2: Drop debugging dump_refcounts()
  qcow2: Check failure of bdrv_getlength()

Fam Zheng (2):
  osdep: Add runtime OFD lock detection
  file-posix: Do runtime check for ofd lock API

Vladimir Sementsov-Ogievskiy (1):
  qemu-iotests: fix 185

 include/qemu/osdep.h|  1 +
 block/file-posix.c  | 19 ++---
 block/qcow2.c   | 26 +++---
 block/vpc.c |  9 ++-
 util/osdep.c| 66 ++---
 tests/multiboot/run_test.sh |  2 +-
 tests/qemu-iotests/185  |  4 +++
 7 files changed, 82 insertions(+), 45 deletions(-)



[Qemu-block] [PULL 1/7] tests/multiboot: Fix whitespace failure

2017-08-11 Thread Kevin Wolf
From: Eric Blake 

Commit b43671f8 accidentally broke run_test.sh within tests/multiboot;
due to a subtle change in whitespace.

These two commands produce theh same output (at least, for sane $IFS
of space-tab-newline):

echo -e "...$@..."
echo -e "...$*..."

But that's only because echo inserts spaces between multiple arguments
(the $@ case), while the $* form gives a single argument to echo with
the spaces already present.

But when converting to printf %b, there are no automatic spaces between
multiple arguments, so we HAVE to use $*.

It doesn't help that run_test.sh isn't part of 'make check'.

Signed-off-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 tests/multiboot/run_test.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/multiboot/run_test.sh b/tests/multiboot/run_test.sh
index c8f3da8f37..0278148b43 100755
--- a/tests/multiboot/run_test.sh
+++ b/tests/multiboot/run_test.sh
@@ -26,7 +26,7 @@ run_qemu() {
 local kernel=$1
 shift
 
-printf %b "\n\n=== Running test case: $kernel $@ ===\n\n" >> test.log
+printf %b "\n\n=== Running test case: $kernel $* ===\n\n" >> test.log
 
 $QEMU \
 -kernel $kernel \
-- 
2.13.4




Re: [Qemu-block] [PATCH 2/8] io: introduce a network socket listener API

2017-08-11 Thread Daniel P. Berrange
On Fri, Aug 11, 2017 at 01:39:43PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berra...@redhat.com) wrote:
> > On Fri, Aug 11, 2017 at 01:26:00PM +0100, Dr. David Alan Gilbert wrote:
> > > * Daniel P. Berrange (berra...@redhat.com) wrote:
> > > > The existing QIOChannelSocket class provides the ability to
> > > > listen on a single socket at a time. This patch introduces
> > > > a QIONetListener class that provides a higher level API
> > > > concept around listening for network services, allowing
> > > > for listening on multiple sockets.
> > > 
> > > What protects against a connection on more than one of the sockets?
> > 
> > That's not the responsibility of this module. If a backend only
> > wants to allow a single client at a time, it has to unregister
> > the new client callback and re-register when it is ready to
> > accept a new client. This aspect is no different to the existing
> > case of multiple clients connecting to a single listener socket.
> 
> OK, and we guarantee that we never call accept() twice because we
> make sure we do that unregister before we get back to the main loop?

Yes, and even if 2 clients arrive at exactly the same time, and thus
both sockets show G_IO_IN on the same iteration of the main loop,
we check whether the new client callback is NULL, and so will just
drop the 2nd client.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [PATCH v2 for 2.10] iotests: fix 185

2017-08-11 Thread Kevin Wolf
Am 09.08.2017 um 17:17 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 185 can sometimes produce wrong output like this:
> 
> =
> 185 2s ... - output mismatch (see 185.out.bad)
> --- /work/src/qemu/master/tests/qemu-iotests/185.out2017-07-14 \
> 15:14:29.520343805 +0300
> +++ 185.out.bad 2017-08-07 16:51:02.231922900 +0300
> @@ -37,7 +37,7 @@

For next time: This commit message really confuses 'git am' because it
thinks that this is already the real patch. Indenting a quoted diff
makes it much happier, so that's what I did instead of the '===' lines.

>  {"return": {}}
>  {"return": {}}
>  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, \
>  "event": "SHUTDOWN", "data": {"guest": false}}
> -{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, \
> "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", \
> "len": 4194304, "offset": 4194304, "speed": 65536, "type": \
> "mirror"}}
> +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, \
> "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", \
> "len": 0, "offset": 0, "speed": 65536, "type": "mirror"}}
> 
>  === Start backup job and exit qemu ===
> 
> Failures: 185
> Failed 1 of 1 tests
> =
> 
> This is because quite happens before first mirror request is done
> (and, in specified case, even before block-job len field is set).
> To prevent it let's just add a sleep for 0.3 seconds before quite.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Thanks, touched up the commit message according to your discussion with
Eric and applied to the block branch.

Kevin



Re: [Qemu-block] [PATCH 2/8] io: introduce a network socket listener API

2017-08-11 Thread Dr. David Alan Gilbert
* Daniel P. Berrange (berra...@redhat.com) wrote:
> On Fri, Aug 11, 2017 at 01:26:00PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrange (berra...@redhat.com) wrote:
> > > The existing QIOChannelSocket class provides the ability to
> > > listen on a single socket at a time. This patch introduces
> > > a QIONetListener class that provides a higher level API
> > > concept around listening for network services, allowing
> > > for listening on multiple sockets.
> > 
> > What protects against a connection on more than one of the sockets?
> 
> That's not the responsibility of this module. If a backend only
> wants to allow a single client at a time, it has to unregister
> the new client callback and re-register when it is ready to
> accept a new client. This aspect is no different to the existing
> case of multiple clients connecting to a single listener socket.

OK, and we guarantee that we never call accept() twice because we
make sure we do that unregister before we get back to the main loop?

Dave

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-block] [Qemu-devel] migration issue with qemu 2.10-rc2: QEMU command 'nbd-server-add': Block node is read-only

2017-08-11 Thread Kevin Wolf
Am 11.08.2017 um 14:04 hat Fam Zheng geschrieben:
> On Fri, 08/11 13:07, Christian Ehrhardt wrote:
> > Simplifying that to a smaller test:
> > 
> > $ qemu-img create -f qcow2 /tmp/test.qcow2 100M
> > $ qemu-system-x86_64 -S -m 512 -smp 1 -nodefaults --nographic -monitor
> > stdio -drive
> > file=/tmp/test.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -incoming
> > defer
> > QEMU 2.9.92 monitor - type 'help' for more information
> > (qemu) warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx
> > [bit 5]
> > nbd_server_start 0.0.0.0:49153
> > (qemu) nbd_server_add -w drive-virtio-disk0
> > Block node is read-only
> > (qemu) quit
> > 
> > It might be reasonable to keep the -device section to specify how it is
> > included.
> 
> This is actually caused by
> 
> commit 9c5e6594f15b7364624a3ad40306c396c93a2145
> Author: Kevin Wolf 
> Date:   Thu May 4 18:52:40 2017 +0200
> 
> block: Fix write/resize permissions for inactive images
> 
> which forbids "nbd_server_add -w" in the "inactive" state set by -incoming.
> 
> But I'm not sure what is a proper fix. Maybe revert the bdrv_is_writable() 
> part
> of the commit? Kevin?

No, the reason why this fail is simply that nbd_export_new() does things
in the wrong order. It always had to activate the image, otherwise the
first write would run into an assertion failure, but it only does so
after requesting the permissions.

blk_insert_bs() in line 1061 requests the permissions,
blk_invalidate_cache() in line 1097 activates the image.

We could either use bdrv_invalidate_cache() instead and call that before
blk_insert_bs(), or start with perm=0, shared=BLK_PERM_ALL and then use
blk_set_perm() only after activating the image. The third option would
be to set blk->disable_perm = true, which would ensure that the
permissions are only effective after the image gets activated; but this
field isn't accessible outside block-backend.c at the moment.

Kevin



Re: [Qemu-block] [PATCH 2/8] io: introduce a network socket listener API

2017-08-11 Thread Daniel P. Berrange
On Fri, Aug 11, 2017 at 01:26:00PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berra...@redhat.com) wrote:
> > The existing QIOChannelSocket class provides the ability to
> > listen on a single socket at a time. This patch introduces
> > a QIONetListener class that provides a higher level API
> > concept around listening for network services, allowing
> > for listening on multiple sockets.
> 
> What protects against a connection on more than one of the sockets?

That's not the responsibility of this module. If a backend only
wants to allow a single client at a time, it has to unregister
the new client callback and re-register when it is ready to
accept a new client. This aspect is no different to the existing
case of multiple clients connecting to a single listener socket.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [PATCH 2/8] io: introduce a network socket listener API

2017-08-11 Thread Dr. David Alan Gilbert
* Daniel P. Berrange (berra...@redhat.com) wrote:
> The existing QIOChannelSocket class provides the ability to
> listen on a single socket at a time. This patch introduces
> a QIONetListener class that provides a higher level API
> concept around listening for network services, allowing
> for listening on multiple sockets.

What protects against a connection on more than one of the sockets?

Dave

> Signed-off-by: Daniel P. Berrange 
> ---
>  include/io/net-listener.h | 174 +
>  io/Makefile.objs  |   1 +
>  io/net-listener.c | 315 
> ++
>  3 files changed, 490 insertions(+)
>  create mode 100644 include/io/net-listener.h
>  create mode 100644 io/net-listener.c
> 
> diff --git a/include/io/net-listener.h b/include/io/net-listener.h
> new file mode 100644
> index 00..0ac5c9cc72
> --- /dev/null
> +++ b/include/io/net-listener.h
> @@ -0,0 +1,174 @@
> +/*
> + * QEMU I/O network listener
> + *
> + * Copyright (c) 2016 Red Hat, Inc.
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see 
> .
> + *
> + */
> +
> +#ifndef QIO_NET_LISTENER_H
> +#define QIO_NET_LISTENER_H
> +
> +#include "io/channel-socket.h"
> +
> +#define TYPE_QIO_NET_LISTENER "qio-net-listener"
> +#define QIO_NET_LISTENER(obj)\
> +OBJECT_CHECK(QIONetListener, (obj), TYPE_QIO_NET_LISTENER)
> +#define QIO_NET_LISTENER_CLASS(klass)\
> +OBJECT_CLASS_CHECK(QIONetListenerClass, klass, TYPE_QIO_NET_LISTENER)
> +#define QIO_NET_LISTENER_GET_CLASS(obj)  \
> +OBJECT_GET_CLASS(QIONetListenerClass, obj, TYPE_QIO_NET_LISTENER)
> +
> +typedef struct QIONetListener QIONetListener;
> +typedef struct QIONetListenerClass QIONetListenerClass;
> +
> +typedef void (*QIONetListenerClientFunc)(QIONetListener *listener,
> + QIOChannelSocket *sioc,
> + gpointer data);
> +
> +/**
> + * QIONetListener:
> + *
> + * The QIONetListener object encapsulates the management of a
> + * listening socket. It is able to listen on multiple sockets
> + * concurrently, to deal with the scenario where IPv4 / IPv6
> + * needs separate sockets, or there is a need to listen on a
> + * subset of interface IP addresses, instead of the wildcard
> + * address.
> + */
> +struct QIONetListener {
> +Object parent;
> +
> +char *name;
> +QIOChannelSocket **sioc;
> +gulong *io_tag;
> +size_t nsioc;
> +
> +gboolean disconnected;
> +
> +QIONetListenerClientFunc io_func;
> +gpointer io_data;
> +GDestroyNotify io_notify;
> +};
> +
> +struct QIONetListenerClass {
> +ObjectClass parent;
> +};
> +
> +
> +/**
> + * qio_net_listener_new:
> + *
> + * Create a new network listener service, which is not
> + * listening on any sockets initially.
> + *
> + * Returns: the new listener
> + */
> +QIONetListener *qio_net_listener_new(void);
> +
> +
> +/**
> + * qio_net_listener_set_name:
> + * @listener: the network listener object
> + * @name: the listener name
> + *
> + * Set the name of the listener. This is used as a debugging
> + * aid, to set names on any GSource instances associated
> + * with the listener
> + */
> +void qio_net_listener_set_name(QIONetListener *listener,
> +   const char *name);
> +
> +/**
> + * qio_net_listener_open_sync:
> + * @listener: the network listener object
> + * @addr: the address to listen on
> + * @errp: pointer to a NULL initialized error object
> + *
> + * Synchronously open a listening connection on all
> + * addresses associated with @addr. This method may
> + * also be invoked multiple times, in order to have a
> + * single listener on multiple distinct addresses.
> + */
> +int qio_net_listener_open_sync(QIONetListener *listener,
> +   SocketAddress *addr,
> +   Error **errp);
> +
> +/**
> + * qio_net_listener_add:
> + * @listener: the network listener object
> + * @sioc: the socket I/O channel
> + *
> + * Associate a listening socket I/O channel with the
> + * listener. The listener will acquire an new reference
> + * on @sioc, so the caller should release its own reference
> + * if it no longer requires the 

Re: [Qemu-block] [PATCH for-2.10? v3 0/2] block: Do OFD lock check at runtime

2017-08-11 Thread Kevin Wolf
Am 11.08.2017 um 13:44 hat Fam Zheng geschrieben:
> v3: Fix mingw build. [patchew]
> 
> v2: Probe /dev/null to save LOC. [Eric]
> Mention "new glibc + old kernel" in commit message. [Kevin, Daniel, Eric,
> Christian]
> 
> This fixes the image opening failure reported by Andrew Baumann:
> 
> > I'm running a recent Linux build of qemu on Windows Subsystem for Linux 
> > (WSL)
> > which doesn't appear to implement file locking:
> >
> > $ qemu-system-aarch64 ... -drive file=test.vhdx,if=none,id=hd0 -device 
> > virtio-blk-pci,drive=hd0
> > qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to unlock 
> > byte 100
> > qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to unlock 
> > byte 100
> > qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to lock 
> > byte 100
> 
> It appears to be that the binary is built for Linux targets, but the WSL
> runtime doesn't recognize the ops (-EINVAL).
> 
> This is also a practical problem for Linux considering it's possible to run
> QEMU which is built against a new glibc on an old kernel that doesn't have OFD
> API.
> 
> Convert to runtime check to cope with that.

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [Qemu-devel] migration issue with qemu 2.10-rc2: QEMU command 'nbd-server-add': Block node is read-only

2017-08-11 Thread Fam Zheng
On Fri, 08/11 13:07, Christian Ehrhardt wrote:
> Hi,
> testing on 2.10-rc2 I ran into an issue around:
>   unable to execute QEMU command 'nbd-server-add': Block node is read-only
> 
> ### TL;DR ###
> - triggered by livbirt driven live migration with --copy-storage-all
> - buils down to nbd_server_add failing
> - can be reproduced on a single system without libvirt
> - last known working (so far) was qemu 2.8
> 
> Simple repro:
> $ qemu-img create -f qcow2 /tmp/test.qcow2 100M
> $ qemu-system-x86_64 -S -m 512 -smp 1 -nodefaults --nographic -monitor
> stdio -drive
> file=/tmp/test.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -incoming
> defer
> QEMU 2.9.92 monitor - type 'help' for more information
> (qemu) warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx
> [bit 5]
> nbd_server_start 0.0.0.0:49153
> (qemu) nbd_server_add -w drive-virtio-disk0
> Block node is read-only
> 
> ### Details ###
> 
> Trigger:
> virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://
> 10.22.69.61/system
> 
> Setup:
> - Two systems without shared storage
> - An equal image is synced before the test, so it would only migrate
> minimal remaining changes
> => Only the --copy-storage-all case triggers this, other migrations work as
> far as I tested.
> => No related apparmor denials
> 
> The last combination I knew to be successful was libvirt 3.5 and qemu 2.8.
> So I Downgraded libvirt (but kept qemu) and retest.
> => Still an issue
> => So it is a qemu 2.10 issue and not libvirt 3.6
> Continuing with libvirt 3.6 to have latest updates.
> 
> On the migration target I see the following in the log:
> libvirtd[11829]: 2017-08-11 08:51:49.283+: 11842: warning :
> qemuDomainObjTaint:4545 : Domain id=2 name='kvmguest-artful-normal'
> uuid=b6f4cdab-77b0-43b1-933d-9683567f3a89 is tainted: high-privileges
> libvirtd[11829]: 2017-08-11 08:51:49.386+: 11842: error :
> qemuMonitorJSONCheckError:389 : internal error: unable to execute QEMU
> command 'nbd-server-add': Block node is read-only
> 
> I checked the images on source (active since the guest is runngin) and
> target (inactive and out of sync copies)
> source $ virsh domblklist kvmguest-artful-normal
> Target Source
> 
> vda/var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow
> vdb/var/lib/uvtool/libvirt/images/kvmguest-artful-normal-ds.qcow
> 
> But when checking details on the source I didn't get any being blocked:
> qemu-img info --backing-chain
> /var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow
> qemu-img: Could not open
> '/var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow': Failed to get
> shared "write" lock
> Is another process using the image?
> 
> On the target these are inactive, so the inquiry works (content is the same
> anyway).
> All files are there and the backing chain looks correct.
> 
> I'm not sure if being unable to qemu-img while running is considered an
> issue of its own, but this could be related to:
> (qemu) commit 244a5668106297378391b768e7288eb157616f64
> Author: Fam Zheng 
> file-posix: Add image locking to perm operations
> 
> 
> The add should be from libvirt
>  - the chain here should be
>qemuMigrationPrepareAny -> qemuMigrationStartNBDServer ->
> qemuMonitorNBDServerAdd -> qemuMonitorJSONNBDServerAdd
>  - there is a debug statement in qemuMonitorNBDServerAdd
> VIR_DEBUG("deviceID=%s", deviceID);
>This shows it is the nbd sevrer for the first disk
>  - seems libvirt adding a nbd server for the copy-storage-all
>  - that qmp fails in qemu
>  - afterwards all else in the logs is libvirt cleaning up and killing the
> qemu that was prepped (with -S) on
>the target
> 
> With virt debug and gdb I catched the device for the qmp command and stop
> it while doing so.
> debug : qemuMonitorNBDServerStart:3993 : host=:: port=49153
> debug : qemuMonitorNBDServerStart:3995 : mon:0x7f6d4c016500
> vm:0x7f6d4c011580 json:1 fd:27
> debug : qemuMonitorNBDServerAdd:4006 : deviceID=drive-virtio-disk0
> debug : qemuMonitorNBDServerAdd:4008 : mon:0x7f6d4c016500 vm:0x7f6d4c011580
> json:1 fd:27
> [...]
> debug : qemuMonitorJSONCheckError:378 : unable to execute QEMU command
> {"execute":"nbd-server-add","arguments":{"device":"drive-virtio-disk0","writable":true},"id":"libvirt-24"}:
> {"id":"libvirt-24","error":{"class":"GenericError","desc":"Block node is
> read-only"}}
> 
> 
> Reducing the qemu call libvirt does to some basiscs that don't need libvirt
> I tried to reproduce.
> At the -S starting point that the migration uses just as well the devices
> are already there, going on in monitor from there:
> (qemu) info block
> drive-virtio-disk0 (#block164):
> /var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow (qcow2)
> Attached to:  /machine/peripheral/virtio-disk0/virtio-backend
> Cache mode:   writeback
> Backing file:
> 

[Qemu-block] [PATCH for-2.10? v3 2/2] file-posix: Do runtime check for ofd lock API

2017-08-11 Thread Fam Zheng
It is reported that on Windows Subsystem for Linux, ofd operations fail
with -EINVAL. In other words, QEMU binary built with system headers that
exports F_OFD_SETLK doesn't necessarily run in an environment that
actually supports it:

$ qemu-system-aarch64 ... -drive file=test.vhdx,if=none,id=hd0 \
-device virtio-blk-pci,drive=hd0
qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to unlock 
byte 100
qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to unlock 
byte 100
qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to lock byte 
100

As a matter of fact this is not WSL specific. It can happen when running
a QEMU compiled against a newer glibc on an older kernel, such as in
a containerized environment.

Let's do a runtime check to cope with that.

Reported-by: Andrew Baumann 
Reviewed-by: Eric Blake 
Signed-off-by: Fam Zheng 
---
 block/file-posix.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index f4de022ae0..cb3bfce147 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -457,22 +457,19 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 switch (locking) {
 case ON_OFF_AUTO_ON:
 s->use_lock = true;
-#ifndef F_OFD_SETLK
-fprintf(stderr,
-"File lock requested but OFD locking syscall is unavailable, "
-"falling back to POSIX file locks.\n"
-"Due to the implementation, locks can be lost 
unexpectedly.\n");
-#endif
+if (!qemu_has_ofd_lock()) {
+fprintf(stderr,
+"File lock requested but OFD locking syscall is "
+"unavailable, falling back to POSIX file locks.\n"
+"Due to the implementation, locks can be lost "
+"unexpectedly.\n");
+}
 break;
 case ON_OFF_AUTO_OFF:
 s->use_lock = false;
 break;
 case ON_OFF_AUTO_AUTO:
-#ifdef F_OFD_SETLK
-s->use_lock = true;
-#else
-s->use_lock = false;
-#endif
+s->use_lock = qemu_has_ofd_lock();
 break;
 default:
 abort();
-- 
2.13.4




[Qemu-block] [PATCH for-2.10? v3 0/2] block: Do OFD lock check at runtime

2017-08-11 Thread Fam Zheng
v3: Fix mingw build. [patchew]

v2: Probe /dev/null to save LOC. [Eric]
Mention "new glibc + old kernel" in commit message. [Kevin, Daniel, Eric,
Christian]

This fixes the image opening failure reported by Andrew Baumann:

> I'm running a recent Linux build of qemu on Windows Subsystem for Linux (WSL)
> which doesn't appear to implement file locking:
>
> $ qemu-system-aarch64 ... -drive file=test.vhdx,if=none,id=hd0 -device 
> virtio-blk-pci,drive=hd0
> qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to unlock 
> byte 100
> qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to unlock 
> byte 100
> qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to lock 
> byte 100

It appears to be that the binary is built for Linux targets, but the WSL
runtime doesn't recognize the ops (-EINVAL).

This is also a practical problem for Linux considering it's possible to run
QEMU which is built against a new glibc on an old kernel that doesn't have OFD
API.

Convert to runtime check to cope with that.

Fam Zheng (2):
  osdep: Add runtime OFD lock detection
  file-posix: Do runtime check for ofd lock API

 block/file-posix.c   | 19 +++
 include/qemu/osdep.h |  1 +
 util/osdep.c | 66 
 3 files changed, 65 insertions(+), 21 deletions(-)

-- 
2.13.4




[Qemu-block] [PATCH for-2.10? v3 1/2] osdep: Add runtime OFD lock detection

2017-08-11 Thread Fam Zheng
Build time check of OFD lock is not sufficient and can cause image open
errors when the runtime environment doesn't support it.

Add a helper function to probe it at runtime, additionally. Also provide
a qemu_has_ofd_lock() for callers to check the status.

Signed-off-by: Fam Zheng 
---
 include/qemu/osdep.h |  1 +
 util/osdep.c | 66 
 2 files changed, 57 insertions(+), 10 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 3b74f6fcb2..6855b94bbf 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -357,6 +357,7 @@ int qemu_dup(int fd);
 int qemu_lock_fd(int fd, int64_t start, int64_t len, bool exclusive);
 int qemu_unlock_fd(int fd, int64_t start, int64_t len);
 int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive);
+bool qemu_has_ofd_lock(void);
 
 #if defined(__HAIKU__) && defined(__i386__)
 #define FMT_pid "%ld"
diff --git a/util/osdep.c b/util/osdep.c
index a2863c8e53..a479fedc4a 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -38,14 +38,6 @@ extern int madvise(caddr_t, size_t, int);
 #include "qemu/error-report.h"
 #include "monitor/monitor.h"
 
-#ifdef F_OFD_SETLK
-#define QEMU_SETLK F_OFD_SETLK
-#define QEMU_GETLK F_OFD_GETLK
-#else
-#define QEMU_SETLK F_SETLK
-#define QEMU_GETLK F_GETLK
-#endif
-
 static bool fips_enabled = false;
 
 static const char *hw_version = QEMU_HW_VERSION;
@@ -82,6 +74,10 @@ int qemu_madvise(void *addr, size_t len, int advice)
 }
 
 #ifndef _WIN32
+
+static int fcntl_op_setlk = -1;
+static int fcntl_op_getlk = -1;
+
 /*
  * Dups an fd and sets the flags
  */
@@ -149,6 +145,54 @@ static int qemu_parse_fdset(const char *param)
 return qemu_parse_fd(param);
 }
 
+static void qemu_probe_lock_ops(void)
+{
+if (fcntl_op_setlk == -1) {
+#ifdef F_OFD_SETLK
+int fd;
+int ret;
+struct flock fl = {
+.l_whence = SEEK_SET,
+.l_start  = 0,
+.l_len= 0,
+.l_type   = F_WRLCK,
+};
+
+fd = open("/dev/null", O_RDWR);
+if (fd < 0) {
+fprintf(stderr,
+"Failed to open /dev/null for OFD lock probing: %s\n",
+strerror(errno));
+fcntl_op_setlk = F_SETLK;
+fcntl_op_getlk = F_GETLK;
+return;
+}
+ret = fcntl(fd, F_OFD_GETLK, );
+close(fd);
+if (!ret) {
+fcntl_op_setlk = F_OFD_SETLK;
+fcntl_op_getlk = F_OFD_GETLK;
+} else {
+fcntl_op_setlk = F_SETLK;
+fcntl_op_getlk = F_GETLK;
+}
+#else
+fcntl_op_setlk = F_SETLK;
+fcntl_op_getlk = F_GETLK;
+#endif
+}
+}
+
+bool qemu_has_ofd_lock(void)
+{
+qemu_probe_lock_ops();
+#ifdef F_OFD_SETLK
+return fcntl_op_setlk == F_OFD_SETLK;
+#else
+return false;
+#endif
+}
+
 static int qemu_lock_fcntl(int fd, int64_t start, int64_t len, int fl_type)
 {
 int ret;
@@ -158,7 +202,8 @@ static int qemu_lock_fcntl(int fd, int64_t start, int64_t 
len, int fl_type)
 .l_len= len,
 .l_type   = fl_type,
 };
-ret = fcntl(fd, QEMU_SETLK, );
+qemu_probe_lock_ops();
+ret = fcntl(fd, fcntl_op_setlk, );
 return ret == -1 ? -errno : 0;
 }
 
@@ -181,7 +226,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, 
bool exclusive)
 .l_len= len,
 .l_type   = exclusive ? F_WRLCK : F_RDLCK,
 };
-ret = fcntl(fd, QEMU_GETLK, );
+qemu_probe_lock_ops();
+ret = fcntl(fd, fcntl_op_getlk, );
 if (ret == -1) {
 return -errno;
 } else {
-- 
2.13.4




Re: [Qemu-block] [PATCH for-2.10 v2 0/5] More bdrv_getlength() fixes

2017-08-11 Thread Kevin Wolf
Am 10.08.2017 um 17:08 hat Eric Blake geschrieben:
> On 08/10/2017 08:02 AM, Kevin Wolf wrote:
> > Am 09.08.2017 um 22:38 hat Eric Blake geschrieben:
> >> We already have a lot of bdrv_getlength() fixes in -rc2; so I think
> >> this is still okay for -rc3.
> >>
> >> v1 was here (with a typo'd subject line):
> >> https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg01226.html
> >>
> >> Since v1:
> >> - patch 1: fix error message capitalization (Kevin, R-b kept)
> >> - fix locking bug in original patch 2 (Kevin)
> >> - split original patch 2 into two parts: signature update, and
> >> added error checking (Kevin)
> >> - check for unlikely integer overflow before bdrv_truncate (Jeff)
> >>
> >> 001/5:[0002] [FC] 'vpc: Check failure of bdrv_getlength()'
> >> 002/5:[down] 'qcow: Change signature of get_cluster_offset()'
> >> 003/5:[0048] [FC] 'qcow: Check failure of bdrv_getlength() and 
> >> bdrv_truncate()'
> >> 004/5:[] [--] 'qcow2: Drop debugging dump_refcounts()'
> >> 005/5:[] [--] 'qcow2: Check failure of bdrv_getlength()'
> > 
> > Looks good to me, but as the bug is far from being critical, I'd rather
> > apply the more complex qcow1 patches only to block-next. The vpc and
> > qcow2 parts seems a lot less risky, so 2.10 should be okay for them.
> > 
> > What do you think?
> 
> The argument for NOT doing the qcow changes (patches 2 and 3): the only
> place where we are not checking for failures is part of
> get_cluster_offset() - but in all likelihood, if we were unable to
> determine or change the length of the backing file, we will have nearby
> problems that will ultimately cause failure soon enough.  Furthermore,
> it's not a regression (we've had several releases with the problem), and
> qcow is not a good format (it's painfully slow, and we strongly
> recommend qcow2 instead) - so no one will be hitting any actual bugs in
> practice.
> 
> I'll trust your judgment as maintainer, so taking just 1, 4, and 5 in
> 2.10 is fine.

Thanks, applied the patches to block and block-next, respectively.

Kevin


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v4 5/7] block: add throttle block filter driver

2017-08-11 Thread Manos Pitsidianakis

On Wed, Aug 09, 2017 at 01:07:32PM +0300, Manos Pitsidianakis wrote:

+static int coroutine_fn throttle_co_preadv(BlockDriverState *bs,
+   uint64_t offset, uint64_t bytes,
+   QEMUIOVector *qiov, int flags)
+{
+
+ThrottleGroupMember *tgm = bs->opaque;
+throttle_group_co_io_limits_intercept(tgm, bytes, false);
+
+return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
+}
+
+static int coroutine_fn throttle_co_pwritev(BlockDriverState *bs,
+uint64_t offset, uint64_t bytes,
+QEMUIOVector *qiov, int flags)
+{
+ThrottleGroupMember *tgm = bs->opaque;
+throttle_group_co_io_limits_intercept(tgm, bytes, true);
+
+return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);

^
Tried some write throttling testing, noticed this. If anyone wants to
test this iteration, change this to bdrv_co_pwritev(), I will correct
this in the next version. (let's pretend this never happened!)


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH] nbd: Fix trace message for disconnect

2017-08-11 Thread Stefan Hajnoczi
On Thu, Aug 10, 2017 at 08:57:48PM -0500, Eric Blake wrote:
> NBD_CMD_DISC is a disconnect request, not a data discard request.
> 
> Signed-off-by: Eric Blake 
> ---
> 
> Although this is not 2.10 material in isolation (it is only a
> bad trace message), I don't mind including it in a larger pull
> request; I'm still planning to fix the issue of a client hanging
> on a malicious server in time for -rc3 (whether via Vladimir's
> patch or a smaller one that I'm testing locally).
> 
>  nbd/common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH 2/8] io: introduce a network socket listener API

2017-08-11 Thread Daniel P. Berrange
On Fri, Aug 11, 2017 at 10:15:04AM +0100, Daniel P. Berrange wrote:
> On Thu, Aug 10, 2017 at 01:12:25PM -0500, Eric Blake wrote:
> > On 08/10/2017 11:04 AM, Daniel P. Berrange wrote:
> > > The existing QIOChannelSocket class provides the ability to
> > > listen on a single socket at a time. This patch introduces
> > > a QIONetListener class that provides a higher level API
> > > concept around listening for network services, allowing
> > > for listening on multiple sockets.
> > > 
> > > Signed-off-by: Daniel P. Berrange 
> > > ---
> > 
> > > +++ b/include/io/net-listener.h
> > > @@ -0,0 +1,174 @@
> > > +/*
> > > + * QEMU I/O network listener
> > > + *
> > > + * Copyright (c) 2016 Red Hat, Inc.
> > 
> > Want to add 2017?
> > 
> > At least it's covered by MAINTAINERS :)
> > 
> > 
> > > +/**
> > > + * qio_net_listener_is_disconnected:
> > > + * @listener: the network listener object
> > > + *
> > > + * Determine if the listener is connected to any socket
> > > + * channels
> > > + *
> > > + * Returns: TRUE if connected, FALSE otherwise
> > > + */
> > > +gboolean qio_net_listener_is_disconnected(QIONetListener *listener);
> > > +
> > 
> > Must it return gboolean, or is bool sufficient?
> 
> bool is fine.
> 
> > 
> > TRUE if connected for a function named 'is_disconnected' sounds
> > backwards.  Avoid the double negative, name it:
> > 
> > qio_net_listener_is_connected(), returning true if connected
> 
> The docs are wrong, as you noticed below
> 
> > 
> > > +++ b/io/net-listener.c
> > > @@ -0,0 +1,315 @@
> > > +/*
> > > + * QEMU network listener
> > > + *
> > > + * Copyright (c) 2016 Red Hat, Inc.
> > 
> > More 2017.  Probably for the whole series :)
> > 
> > 
> > > +static gboolean qio_net_listener_channel_func(QIOChannel *ioc,
> > > +  GIOCondition condition,
> > > +  gpointer opaque)
> > > +{
> > 
> > Again, can we use bool instead of gboolean?
> 
> Yes

Opps, no, this one must be gboolean. The others can be bool though.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



[Qemu-block] [PATCH for-2.10? v2 1/2] osdep: Add runtime OFD lock detection

2017-08-11 Thread Fam Zheng
Build time check of OFD lock is not sufficient and can cause image open
errors when the runtime environment doesn't support it.

Add a helper function to probe it at runtime, additionally. Also provide
a qemu_has_ofd_lock() for callers to check the status.

Signed-off-by: Fam Zheng 
---
 include/qemu/osdep.h |  1 +
 util/osdep.c | 63 
 2 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 3b74f6fcb2..6855b94bbf 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -357,6 +357,7 @@ int qemu_dup(int fd);
 int qemu_lock_fd(int fd, int64_t start, int64_t len, bool exclusive);
 int qemu_unlock_fd(int fd, int64_t start, int64_t len);
 int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive);
+bool qemu_has_ofd_lock(void);
 
 #if defined(__HAIKU__) && defined(__i386__)
 #define FMT_pid "%ld"
diff --git a/util/osdep.c b/util/osdep.c
index a2863c8e53..081260b1a9 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -38,13 +38,8 @@ extern int madvise(caddr_t, size_t, int);
 #include "qemu/error-report.h"
 #include "monitor/monitor.h"
 
-#ifdef F_OFD_SETLK
-#define QEMU_SETLK F_OFD_SETLK
-#define QEMU_GETLK F_OFD_GETLK
-#else
-#define QEMU_SETLK F_SETLK
-#define QEMU_GETLK F_GETLK
-#endif
+static int fcntl_op_setlk = -1;
+static int fcntl_op_getlk = -1;
 
 static bool fips_enabled = false;
 
@@ -149,6 +144,54 @@ static int qemu_parse_fdset(const char *param)
 return qemu_parse_fd(param);
 }
 
+static void qemu_probe_lock_ops(void)
+{
+if (fcntl_op_setlk == -1) {
+#ifdef F_OFD_SETLK
+int fd;
+int ret;
+struct flock fl = {
+.l_whence = SEEK_SET,
+.l_start  = 0,
+.l_len= 0,
+.l_type   = F_WRLCK,
+};
+
+fd = open("/dev/null", O_RDWR);
+if (fd < 0) {
+fprintf(stderr,
+"Failed to open /dev/null for OFD lock probing: %s\n",
+strerror(errno));
+fcntl_op_setlk = F_SETLK;
+fcntl_op_getlk = F_GETLK;
+return;
+}
+ret = fcntl(fd, F_OFD_GETLK, );
+close(fd);
+if (!ret) {
+fcntl_op_setlk = F_OFD_SETLK;
+fcntl_op_getlk = F_OFD_GETLK;
+} else {
+fcntl_op_setlk = F_SETLK;
+fcntl_op_getlk = F_GETLK;
+}
+#else
+fcntl_op_setlk = F_SETLK;
+fcntl_op_getlk = F_GETLK;
+#endif
+}
+}
+
+bool qemu_has_ofd_lock(void)
+{
+qemu_probe_lock_ops();
+#ifdef F_OFD_SETLK
+return fcntl_op_setlk == F_OFD_SETLK;
+#else
+return false;
+#endif
+}
+
 static int qemu_lock_fcntl(int fd, int64_t start, int64_t len, int fl_type)
 {
 int ret;
@@ -158,7 +201,8 @@ static int qemu_lock_fcntl(int fd, int64_t start, int64_t 
len, int fl_type)
 .l_len= len,
 .l_type   = fl_type,
 };
-ret = fcntl(fd, QEMU_SETLK, );
+qemu_probe_lock_ops();
+ret = fcntl(fd, fcntl_op_setlk, );
 return ret == -1 ? -errno : 0;
 }
 
@@ -181,7 +225,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, 
bool exclusive)
 .l_len= len,
 .l_type   = exclusive ? F_WRLCK : F_RDLCK,
 };
-ret = fcntl(fd, QEMU_GETLK, );
+qemu_probe_lock_ops();
+ret = fcntl(fd, fcntl_op_getlk, );
 if (ret == -1) {
 return -errno;
 } else {
-- 
2.13.4




[Qemu-block] [PATCH for-2.10? v2 2/2] file-posix: Do runtime check for ofd lock API

2017-08-11 Thread Fam Zheng
It is reported that on Windows Subsystem for Linux, ofd operations fail
with -EINVAL. In other words, QEMU binary built with system headers that
exports F_OFD_SETLK doesn't necessarily run in an environment that
actually supports it:

$ qemu-system-aarch64 ... -drive file=test.vhdx,if=none,id=hd0 \
-device virtio-blk-pci,drive=hd0
qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to unlock 
byte 100
qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to unlock 
byte 100
qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to lock byte 
100

As a matter of fact this is not WSL specific. It can happen when running
a QEMU compiled against a newer glibc on an older kernel, such as in
a containerized environment.

Let's do a runtime check to cope with that.

Reported-by: Andrew Baumann 
Reviewed-by: Eric Blake 
Signed-off-by: Fam Zheng 
---
 block/file-posix.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index f4de022ae0..cb3bfce147 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -457,22 +457,19 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 switch (locking) {
 case ON_OFF_AUTO_ON:
 s->use_lock = true;
-#ifndef F_OFD_SETLK
-fprintf(stderr,
-"File lock requested but OFD locking syscall is unavailable, "
-"falling back to POSIX file locks.\n"
-"Due to the implementation, locks can be lost 
unexpectedly.\n");
-#endif
+if (!qemu_has_ofd_lock()) {
+fprintf(stderr,
+"File lock requested but OFD locking syscall is "
+"unavailable, falling back to POSIX file locks.\n"
+"Due to the implementation, locks can be lost "
+"unexpectedly.\n");
+}
 break;
 case ON_OFF_AUTO_OFF:
 s->use_lock = false;
 break;
 case ON_OFF_AUTO_AUTO:
-#ifdef F_OFD_SETLK
-s->use_lock = true;
-#else
-s->use_lock = false;
-#endif
+s->use_lock = qemu_has_ofd_lock();
 break;
 default:
 abort();
-- 
2.13.4




[Qemu-block] [PATCH for-2.10? v2 0/2] block: Do OFD lock check at runtime

2017-08-11 Thread Fam Zheng
v2: Probe /dev/null to save LOC. [Eric]
Mention "new glibc + old kernel" in commit message. [Kevin, Daniel, Eric,
Christian]

This fixes the image opening failure reported by Andrew Baumann:

> I'm running a recent Linux build of qemu on Windows Subsystem for Linux (WSL)
> which doesn't appear to implement file locking:
>
> $ qemu-system-aarch64 ... -drive file=test.vhdx,if=none,id=hd0 -device 
> virtio-blk-pci,drive=hd0
> qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to unlock 
> byte 100
> qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to unlock 
> byte 100
> qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to lock 
> byte 100

It appears to be that the binary is built for Linux targets, but the WSL
runtime doesn't recognize the ops (-EINVAL).

This is also a practical problem for Linux considering it's possible to run
QEMU which is built against a new glibc on an old kernel that doesn't have OFD
API.

Convert to runtime check to cope with that.

Fam Zheng (2):
  osdep: Add runtime OFD lock detection
  file-posix: Do runtime check for ofd lock API

 block/file-posix.c   | 19 +++-
 include/qemu/osdep.h |  1 +
 util/osdep.c | 63 
 3 files changed, 63 insertions(+), 20 deletions(-)

-- 
2.13.4




Re: [Qemu-block] [PATCH 8/8] sockets: fix parsing of ipv4/ipv6 opts in parse_socket_addr

2017-08-11 Thread Daniel P. Berrange
On Thu, Aug 10, 2017 at 01:35:15PM -0500, Eric Blake wrote:
> On 08/10/2017 11:04 AM, Daniel P. Berrange wrote:
> > The inet_parse() function looks for 'ipv4' and 'ipv6'
> > flags, but only treats them as bare bool flags. The normal
> > QemuOpts parsing would allow on/off values to be set too.
> > 
> > This updated inet_parse() so that its handling of the
> 
> s/updated/updates/ ?
> 
> > 'ipv4' and 'ipv6' flags matches that done by QemuOpts.
> 
> Do we have a regression compared to any previous version, such that this
> patch might be considered 2.10 material?  Offhand, though, I think it's
> fine as the end of your series, waiting for 2.11.

Well this code has been like this for years, so waiting to fix
it in 2.11 isn't making our life any worse than it already
us.

The original code merely looks for a prefix so treats

   ,ipv6
   ,ipv6dumpsterfire
   ,ipv6=off
   ,ipv6=fishfood
   ,ipv6

as all meaning 'true'. The new code only treats ,ipv6 and ,ipv6=on
as meaning true, or ipv6=off as false, rejecting all others.

So yes, that is technically a regression, but IMHO it is a desirable
regression :-)

> 
> > 
> > Signed-off-by: Daniel P. Berrange 
> > ---
> >  tests/test-sockets-proto.c | 13 -
> >  util/qemu-sockets.c| 36 
> >  2 files changed, 32 insertions(+), 17 deletions(-)
> > 
> 
> > +++ b/util/qemu-sockets.c
> > @@ -616,6 +616,25 @@ err:
> >  }
> >  
> >  /* compatibility wrapper */
> > +static int inet_parse_flag(const char *flagname, const char *optstr, bool 
> > *val,
> > +   Error **errp)
> > +{
> > +char *end;
> > +size_t len;
> > +
> > +end = strstr(optstr, ",");
> 
> Do we need to check that we are not landing on a ',,' escape that would
> make QemuOpts behave differently?  [That is, ipv4=on,,garbage should be
> parsed as setting ipv4 to 'on,garbage' (which is not valid), and NOT
> setting 'ipv4=on' followed by the 'garbage' or ',garbage' key - while
> the key named 'garbage' would fail, there might be other key names where
> the distinction matters for catching command line typos]
> 
> But if this is unrelated to QemuOpts escape parsing, it seems okay.

Ultimately this code should probably be converted to actually use
QemuOpts. The existing code already allows ipv4=on,,garbage but as
you point out I've not detected that case here. Should be easye
enough to fix though.

> 
> Reviewed-by: Eric Blake 



Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [PATCH 2/8] io: introduce a network socket listener API

2017-08-11 Thread Daniel P. Berrange
On Thu, Aug 10, 2017 at 01:12:25PM -0500, Eric Blake wrote:
> On 08/10/2017 11:04 AM, Daniel P. Berrange wrote:
> > The existing QIOChannelSocket class provides the ability to
> > listen on a single socket at a time. This patch introduces
> > a QIONetListener class that provides a higher level API
> > concept around listening for network services, allowing
> > for listening on multiple sockets.
> > 
> > Signed-off-by: Daniel P. Berrange 
> > ---
> 
> > +++ b/include/io/net-listener.h
> > @@ -0,0 +1,174 @@
> > +/*
> > + * QEMU I/O network listener
> > + *
> > + * Copyright (c) 2016 Red Hat, Inc.
> 
> Want to add 2017?
> 
> At least it's covered by MAINTAINERS :)
> 
> 
> > +/**
> > + * qio_net_listener_is_disconnected:
> > + * @listener: the network listener object
> > + *
> > + * Determine if the listener is connected to any socket
> > + * channels
> > + *
> > + * Returns: TRUE if connected, FALSE otherwise
> > + */
> > +gboolean qio_net_listener_is_disconnected(QIONetListener *listener);
> > +
> 
> Must it return gboolean, or is bool sufficient?

bool is fine.

> 
> TRUE if connected for a function named 'is_disconnected' sounds
> backwards.  Avoid the double negative, name it:
> 
> qio_net_listener_is_connected(), returning true if connected

The docs are wrong, as you noticed below

> 
> > +++ b/io/net-listener.c
> > @@ -0,0 +1,315 @@
> > +/*
> > + * QEMU network listener
> > + *
> > + * Copyright (c) 2016 Red Hat, Inc.
> 
> More 2017.  Probably for the whole series :)
> 
> 
> > +static gboolean qio_net_listener_channel_func(QIOChannel *ioc,
> > +  GIOCondition condition,
> > +  gpointer opaque)
> > +{
> 
> Again, can we use bool instead of gboolean?

Yes

> > +for (i = 0; i < nresaddrs; i++) {
> > +QIOChannelSocket *sioc = qio_channel_socket_new();
> > +
> > +if (qio_channel_socket_listen_sync(sioc, resaddrs[i],
> > +   err ? NULL : ) == 0) {
> > +success = true;
> > +}
> 
> This says that as long as at least one address connected, we are
> successful...
> 
> > +
> > +qio_net_listener_add(listener, sioc);
> 
> ...but this adds sioc as a listener regardless of whether listen_sync()
> succeeded.  Is that right?

No, it should skip the add

> 
> 
> > +gboolean qio_net_listener_is_disconnected(QIONetListener *listener)
> > +{
> > +return listener->disconnected;
> 
> Documentation says it returns true on connected, but here you are
> returning true on disconnected?

Bad docs.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



[Qemu-block] [PATCH for-2.11 v2] file-posix: Clear out first sector in hdev_create

2017-08-11 Thread Fam Zheng
People get surprised when, after "qemu-img create -f raw /dev/sdX", they
still see qcow2 with "qemu-img info", if previously the bdev had a qcow2
header. While this is natural because raw doesn't need to write any
magic bytes during creation, hdev_create is free to clear out the first
sector to make sure the stale qcow2 header doesn't cause such confusion.

Signed-off-by: Fam Zheng 

---

v2: Use stack allocated buffer. [Eric]
Fix return value.
(Keep qemu_write_full instead of switching to qemu_pwritev because
the former handles short writes.)
Fix typo "qemu-img". [Changlong]
---
 block/file-posix.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index f4de022ae0..a63bbf2b90 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2703,6 +2703,16 @@ static int hdev_create(const char *filename, QemuOpts 
*opts,
 ret = -ENOSPC;
 }
 
+if (total_size) {
+uint8_t buf[BDRV_SECTOR_SIZE] = { 0 };
+int64_t zero_size = MIN(BDRV_SECTOR_SIZE, total_size);
+if (lseek(fd, 0, SEEK_SET) == -1) {
+ret = -errno;
+} else {
+ret = qemu_write_full(fd, buf, zero_size);
+ret = ret == zero_size ? 0 : -errno;
+}
+}
 qemu_close(fd);
 return ret;
 }
-- 
2.13.4




Re: [Qemu-block] [Qemu-devel] [PATCH] file-posix: Clear out first sector in hdev_create

2017-08-11 Thread Fam Zheng
On Fri, 08/11 15:44, Xie Changlong wrote:
> 在 8/10/2017 4:01 PM, Fam Zheng 写道:
> > People get surprised when, after "qemu-imc create -f raw /dev/sdX", they
> 
> s/qemu-imc/qemu-img/

Thanks, will fix in v2.

Fam



Re: [Qemu-block] [PATCH 1/2] nbd: Drop connection if broken server is detected

2017-08-11 Thread Vladimir Sementsov-Ogievskiy

11.08.2017 05:37, Eric Blake wrote:

As soon as the server is sending us garbage, we should quit
trying to send further messages to the server, and allow all
pending coroutines for any remaining replies to error out.
Failure to do so can let a malicious server cause the client
to hang, for example, if the server sends an invalid magic
number in its response.

Reported by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Eric Blake 
---
  block/nbd-client.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 25dd28406b..802d50b636 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -68,7 +68,8 @@ static void nbd_teardown_connection(BlockDriverState *bs)

  static coroutine_fn void nbd_read_reply_entry(void *opaque)
  {
-NBDClientSession *s = opaque;
+BlockDriverState *bs = opaque;
+NBDClientSession *s = nbd_get_client_session(bs);
  uint64_t i;
  int ret;
  Error *local_err = NULL;
@@ -107,8 +108,12 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
  qemu_coroutine_yield();
  }

+s->reply.handle = 0;
  nbd_recv_coroutines_enter_all(s);
  s->read_reply_co = NULL;
+if (ret < 0) {
+nbd_teardown_connection(bs);
+}


what if it happens in parallel with nbd_co_send_request? 
nbd_teardown_connectin destroys s->ioc, nbd_co_send_requests
checks s->ioc only once and then calls nbd_send_request (which is 
finally nbd_rwv and may yield). I think nbd_rwv is not

prepared to sudden destruction of ioc..


  }

  static int nbd_co_send_request(BlockDriverState *bs,
@@ -416,7 +421,7 @@ int nbd_client_init(BlockDriverState *bs,
  /* Now that we're connected, set the socket to be non-blocking and
   * kick the reply mechanism.  */
  qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
-client->read_reply_co = qemu_coroutine_create(nbd_read_reply_entry, 
client);
+client->read_reply_co = qemu_coroutine_create(nbd_read_reply_entry, bs);
  nbd_client_attach_aio_context(bs, bdrv_get_aio_context(bs));

  logout("Established connection with NBD server\n");



--
Best regards,
Vladimir




Re: [Qemu-block] [Qemu-devel] [PATCH] file-posix: Clear out first sector in hdev_create

2017-08-11 Thread Xie Changlong

在 8/10/2017 4:01 PM, Fam Zheng 写道:

People get surprised when, after "qemu-imc create -f raw /dev/sdX", they


s/qemu-imc/qemu-img/


still see qcow2 with "qemu-img info", if previously the bdev had a qcow2


--
Thanks
-Xie



Re: [Qemu-block] [Qemu-devel] [PATCH for-2.10?] file-posix: Clear out first sector in hdev_create

2017-08-11 Thread Fam Zheng
On Fri, 08/11 15:28, Fam Zheng wrote:
> On Thu, 08/10 08:58, Eric Blake wrote:
> > On 08/10/2017 03:01 AM, Fam Zheng wrote:
> > > +if (total_size) {
> > > +int64_t zero_size = MIN(BDRV_SECTOR_SIZE, total_size);
> > > +uint8_t *buf;
> > 
> > Since BDRV_SECTOR_SIZE is small enough to stack-allocate, you could skip
> > the malloc by doing:
> > 
> > uint8_t buf[BDRV_SECTOR_SIZE] = "";
> > 
> > > +if (lseek(fd, 0, SEEK_SET) == -1) {
> > > +ret = -errno;
> > > +} else {
> > > +buf = g_malloc0(zero_size);
> > > +ret = qemu_write_full(fd, buf, zero_size);
> > 
> > Instead of doing lseek + qemu_write_full, can we just use
> > qemu_pwritev(fd, , 1, 0) with an iov set up to point to the
> > appropriate amount of buf?
> 
> Neat, will update.

For stack-allocation yes, but qemu_write_full turns out to be better because of
short write handling.  So I'll keep it in v2.

Fam



Re: [Qemu-block] [Qemu-devel] [PATCH for-2.10?] file-posix: Clear out first sector in hdev_create

2017-08-11 Thread Fam Zheng
On Thu, 08/10 08:58, Eric Blake wrote:
> On 08/10/2017 03:01 AM, Fam Zheng wrote:
> > People get surprised when, after "qemu-imc create -f raw /dev/sdX", they
> > still see qcow2 with "qemu-img info", if previously the bdev had a qcow2
> > header. While this is natural because raw doesn't need to write any
> > magic bytes during creation, hdev_create is free to clear out the first
> > sector to make sure the stale qcow2 header doesn't cause such a
> > confusion.
> > 
> > Signed-off-by: Fam Zheng 
> > ---
> >  block/file-posix.c | 11 +++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index f4de022ae0..1d8ef6f873 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -2703,6 +2703,17 @@ static int hdev_create(const char *filename, 
> > QemuOpts *opts,
> >  ret = -ENOSPC;
> >  }
> >  
> > +if (total_size) {
> > +int64_t zero_size = MIN(BDRV_SECTOR_SIZE, total_size);
> > +uint8_t *buf;
> 
> Since BDRV_SECTOR_SIZE is small enough to stack-allocate, you could skip
> the malloc by doing:
> 
> uint8_t buf[BDRV_SECTOR_SIZE] = "";
> 
> > +if (lseek(fd, 0, SEEK_SET) == -1) {
> > +ret = -errno;
> > +} else {
> > +buf = g_malloc0(zero_size);
> > +ret = qemu_write_full(fd, buf, zero_size);
> 
> Instead of doing lseek + qemu_write_full, can we just use
> qemu_pwritev(fd, , 1, 0) with an iov set up to point to the
> appropriate amount of buf?

Neat, will update.

> 
> At any rate, my ideas are micro-optimizations, so I can also live with
> how you wrote it.
> 
> Reviewed-by: Eric Blake 
> 
> Are you arguing that this is a bug-fix worthy of inclusion in 2.10,
> because it helps avoid user confusion? Or are you delaying it to 2.11,
> because we've had the existing behavior for longer than one release, so
> one release more won't hurt?

IMO no need to rush for 2.10, it can wait for 2.11.

Thanks!

Fam