Re: [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume

2016-05-10 Thread Zhou Jie

Hi, Alex
What do you think about the following solution?
1. Detect support for resume notification.
   If host vfio driver does not have resume notifier flags,
   Directly fail to boot up VM as with aer enabled.
2. Immediately notify the VM on error detected.
3. Stall any access to the device until resume is signaled.
   Disable mmaps, drop writes, return -1 for reads.
4. Delay the guest directed bus reset.
   Don't reset bus in vfio_pci_reset function.
5. Wait for resume notification.
   If we don't get the resume notification from the host after
   some timeout, we would abort the guest directed bus reset
   altogether and make the device disappear,
   Initiating an unplug of the device to prevent it from further
   interacting with the VM.
6. After get the resume notification.
   Reset bus.
   It the second bus reset. Because the host did bus reset already.
   But as you said we shouldn't necessarily design the API that
   strictly around the current behavior of the Linux AER handler.

Sincerely,
Zhou Jie

On 2016/5/7 0:39, Alex Williamson wrote:

On Fri, 6 May 2016 09:38:41 +0800
Chen Fan  wrote:


On 04/26/2016 10:48 PM, Alex Williamson wrote:

On Tue, 26 Apr 2016 11:39:02 +0800
Chen Fan  wrote:


On 04/14/2016 09:02 AM, Chen Fan wrote:

On 04/12/2016 05:38 AM, Alex Williamson wrote:

On Tue, 5 Apr 2016 19:42:02 +0800
Cao jin  wrote:


From: Chen Fan

for supporting aer recovery, host and guest would run the same aer
recovery code, that would do the secondary bus reset if the error
is fatal, the aer recovery process:
1. error_detected
2. reset_link (if fatal)
3. slot_reset/mmio_enabled
4. resume

it indicates that host will do secondary bus reset to reset
the physical devices under bus in step 2, that would cause
devices in D3 status in a short time. but in qemu, we register
an error detected handler, that would be invoked as host broadcasts
the error-detected event in step 1, in order to avoid guest do
reset_link when host do reset_link simultaneously. it may cause
fatal error. we introduce a resmue notifier to assure host reset
completely. then do guest aer injection.

Why is it safe to continue running the VM between the error detected
notification and the resume notification?  We're just pushing back the
point at which we inject the AER into the guest, potentially negating
any benefit by allowing the VM to consume bad data.  Shouldn't we
instead be immediately notifying the VM on error detected, but stalling
any access to the device until resume is signaled?  How do we know that
resume will ever be signaled?  We have both the problem that we may be
running on an older kernel that won't support a resume notification and
the problem that seeing a resume notification depends on the host being
able to successfully complete a link reset after fatal error. We can
detect support for resume notification, but we still need a strategy
for never receiving it.  Thanks,

That's make sense, but I haven't came up with a good idea. do you have
any idea, Alex?

I don't know that there are any good solutions here.  We need to
respond to the current error notifier interrupt and not regress from
our support there.  I think that means that if we want to switch from a
simple halt-on-error to a mechanism for the guest to handle recovery,
we need to disable access to the device between being notified that the
error occurred and being notified to resume.  We can do that by
disabling mmaps to the device and preventing access via the slow path
handlers.  I don't know what the best solution is for preventing access,
do we block and pause the VM or do we drop writes and return -1 for
reads, that's something that needs to be determined.  We also need to
inject the AER into the VM at the point we're notified of an error
because the VM needs to know as soon as possible to stop using the
device or trusting any data from it.  The next coordination point would
be something like the resume notifier that you've added and there are
numerous questions around the interaction of that with the guest
handling.  Clearly we can't do a guest directed bus reset until we get
the resume notifier, so do we block that execution path in QEMU until
the resume notification is received?  What happens if we don't get that
notification?  Is there any way that we can rely on the host having
done a bus reset to the point where we don't need to act on the guest
directed reset?  These are all things that need to be figured out.
Thanks,

Maybe we can simply pause the vcpu running and avoid the VM to
access the device. and add two flags in VFIO_DEVICE_GET_INFO to query
whether the vfio pci driver has a resume notifier,
if it does not have resume notifier flags, we can directly fail to boot
up VM
as with aer enabled.


We can already tell if a resume 

[Qemu-devel] [PATCH v2 2/3] block: Drop superfluous invalidating bs->file from drivers

2016-05-10 Thread Fam Zheng
Now they are invalidated by the block layer, so it's not necessary to
do this in block drivers' implementations of .bdrv_invalidate_cache.

Signed-off-by: Fam Zheng 
---
 block/qcow2.c  |  7 ---
 block/qed.c|  6 --
 block/quorum.c | 16 
 3 files changed, 29 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 470734b..d4431e0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1757,13 +1757,6 @@ static void qcow2_invalidate_cache(BlockDriverState *bs, 
Error **errp)
 
 qcow2_close(bs);
 
-bdrv_invalidate_cache(bs->file->bs, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-bs->drv = NULL;
-return;
-}
-
 memset(s, 0, sizeof(BDRVQcow2State));
 options = qdict_clone_shallow(bs->options);
 
diff --git a/block/qed.c b/block/qed.c
index 0af5274..9081941 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1594,12 +1594,6 @@ static void bdrv_qed_invalidate_cache(BlockDriverState 
*bs, Error **errp)
 
 bdrv_qed_close(bs);
 
-bdrv_invalidate_cache(bs->file->bs, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-
 memset(s, 0, sizeof(BDRVQEDState));
 ret = bdrv_qed_open(bs, NULL, bs->open_flags, _err);
 if (local_err) {
diff --git a/block/quorum.c b/block/quorum.c
index da15465..8f7c099 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -747,21 +747,6 @@ static int64_t quorum_getlength(BlockDriverState *bs)
 return result;
 }
 
-static void quorum_invalidate_cache(BlockDriverState *bs, Error **errp)
-{
-BDRVQuorumState *s = bs->opaque;
-Error *local_err = NULL;
-int i;
-
-for (i = 0; i < s->num_children; i++) {
-bdrv_invalidate_cache(s->children[i]->bs, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-}
-}
-
 static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
 {
 BDRVQuorumState *s = bs->opaque;
@@ -1070,7 +1055,6 @@ static BlockDriver bdrv_quorum = {
 
 .bdrv_aio_readv = quorum_aio_readv,
 .bdrv_aio_writev= quorum_aio_writev,
-.bdrv_invalidate_cache  = quorum_invalidate_cache,
 
 .bdrv_detach_aio_context= quorum_detach_aio_context,
 .bdrv_attach_aio_context= quorum_attach_aio_context,
-- 
2.8.2




[Qemu-devel] [PATCH v2 0/3] block: More complete inactivate/invalidate of graph nodes

2016-05-10 Thread Fam Zheng
v2: Two passes inactivation. [Kevin]

For now we only consider bs->file chain. In fact this is incomplete. For
example, if qcow2 is a quorum child, we don't properly invalidate or inactivate
it.  This series recurses into the subtrees in both bdrv_invalidate_cache_all
and bdrv_inactivate_all. This is also necessary to make the image locking
cooperate with migration.


Fam Zheng (3):
  block: Invalidate all children
  block: Drop superfluous invalidating bs->file from drivers
  block: Inactivate all children

 block.c| 67 +++---
 block/qcow2.c  |  7 --
 block/qed.c|  6 --
 block/quorum.c | 16 --
 4 files changed, 50 insertions(+), 46 deletions(-)

-- 
2.8.2




[Qemu-devel] [PATCH v2 3/3] block: Inactivate all children

2016-05-10 Thread Fam Zheng
Currently we only inactivate the top BDS. Actually bdrv_inactivate
should be the opposite of bdrv_invalidate_cache.

Recurse into the whole subtree instead.

Because a node may have multiple parents, and because once
BDRV_O_INACTIVE is set for a node, further writes are not allowed, we
cannot interleave flag settings and .bdrv_inactivate calls (that may
submit write to other nodes in a graph) within a single pass. Therefore
two passes are used here.

Signed-off-by: Fam Zheng 
---
 block.c | 47 ---
 1 file changed, 36 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index fa8b38f..93481c4 100644
--- a/block.c
+++ b/block.c
@@ -3258,38 +3258,63 @@ void bdrv_invalidate_cache_all(Error **errp)
 }
 }
 
-static int bdrv_inactivate(BlockDriverState *bs)
+static int bdrv_inactivate_recurse(BlockDriverState *bs,
+   bool setting_flag)
 {
+BdrvChild *child;
 int ret;
 
-if (bs->drv->bdrv_inactivate) {
+if (!setting_flag && bs->drv->bdrv_inactivate) {
 ret = bs->drv->bdrv_inactivate(bs);
 if (ret < 0) {
 return ret;
 }
 }
 
-bs->open_flags |= BDRV_O_INACTIVE;
+QLIST_FOREACH(child, >children, next) {
+ret = bdrv_inactivate_recurse(child->bs, setting_flag);
+if (ret < 0) {
+return ret;
+}
+}
+
+if (setting_flag) {
+bs->open_flags |= BDRV_O_INACTIVE;
+}
 return 0;
 }
 
 int bdrv_inactivate_all(void)
 {
 BlockDriverState *bs = NULL;
-int ret;
+int ret = 0;
+int pass;
 
 while ((bs = bdrv_next(bs)) != NULL) {
-AioContext *aio_context = bdrv_get_aio_context(bs);
+aio_context_acquire(bdrv_get_aio_context(bs));
+}
 
-aio_context_acquire(aio_context);
-ret = bdrv_inactivate(bs);
-aio_context_release(aio_context);
-if (ret < 0) {
-return ret;
+/* We do two passes of inactivation. The first pass calls to drivers'
+ * .bdrv_inactivate callbacks recursively so all cache is flushed to disk;
+ * the second pass sets the BDRV_O_INACTIVE flag so that no further write
+ * is allowed. */
+for (pass = 0; pass < 2; pass++) {
+bs = NULL;
+while ((bs = bdrv_next(bs)) != NULL) {
+ret = bdrv_inactivate_recurse(bs, pass);
+if (ret < 0) {
+goto out;
+}
 }
 }
 
-return 0;
+out:
+bs = NULL;
+while ((bs = bdrv_next(bs)) != NULL) {
+aio_context_release(bdrv_get_aio_context(bs));
+}
+
+return ret;
 }
 
 /**/
-- 
2.8.2




[Qemu-devel] [PATCH v2 1/3] block: Invalidate all children

2016-05-10 Thread Fam Zheng
Currently we only recurse to bs->file, which will miss the children in quorum
and VMDK.

Recurse into the whole subtree to avoid that.

Signed-off-by: Fam Zheng 
---
 block.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index d4939b4..fa8b38f 100644
--- a/block.c
+++ b/block.c
@@ -3201,6 +3201,7 @@ void bdrv_init_with_whitelist(void)
 
 void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
 {
+BdrvChild *child;
 Error *local_err = NULL;
 int ret;
 
@@ -3215,13 +3216,20 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error 
**errp)
 
 if (bs->drv->bdrv_invalidate_cache) {
 bs->drv->bdrv_invalidate_cache(bs, _err);
-} else if (bs->file) {
-bdrv_invalidate_cache(bs->file->bs, _err);
+if (local_err) {
+bs->open_flags |= BDRV_O_INACTIVE;
+error_propagate(errp, local_err);
+return;
+}
 }
-if (local_err) {
-bs->open_flags |= BDRV_O_INACTIVE;
-error_propagate(errp, local_err);
-return;
+
+QLIST_FOREACH(child, >children, next) {
+bdrv_invalidate_cache(child->bs, _err);
+if (local_err) {
+bs->open_flags |= BDRV_O_INACTIVE;
+error_propagate(errp, local_err);
+return;
+}
 }
 
 ret = refresh_total_sectors(bs, bs->total_sectors);
-- 
2.8.2




Re: [Qemu-devel] [PATCH 2/2] block: Inactivate all children

2016-05-10 Thread Fam Zheng
On Tue, 05/10 10:33, Kevin Wolf wrote:
> 
> Fair enough. My series didn't have a separate callback, but with yours
> that should be working.
> 
> So is the semantics of .bdrv_inactivate() basically "bdrv_flush, and I
> really mean it"?

Yes.

> 
> > > Maybe we need something like an "active reference counter", and we
> > > decrement that for all children and only call their .bdrv_inactivate()
> > > when it arrives at 0.
> > 
> > That should work, but the effect of the counters are local to one 
> > invocation of
> > bdrv_inactivate_all(), and is not really necessary if we do as above.
> 
> Agreed.

Working on another version now.

Fam



Re: [Qemu-devel] [PATCHv2 0/3] Better 'Force Unit Access' handling

2016-05-10 Thread Fam Zheng
On Tue, 05/03 16:39, Eric Blake wrote:
> I noticed some inconsistencies in FUA handling while working
> with NBD, then Kevin pointed out that my initial attempt wasn't
> quite right for iscsi which also had problems, so this has
> expanded into a series rather than a single patch.
> 
> I'm not sure if this is qemu-stable material at this point.
> 
> Depends on Kevin's block-next branch.
> 
> Also available as a tag at this location:
> git fetch git://repo.or.cz/qemu/ericb.git nbd-fua-v2
> 
> v1 was here:
> https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg04674.html
> 
> diffstat not worth posting, as the series is basically rewritten
> 
> Eric Blake (3):
>   block: Make supported_write_flags a per-bds property
>   block: Honor BDRV_REQ_FUA during write_zeroes
>   nbd: Simplify client FUA handling

Reviewed-by: Fam Zheng 

> 
>  block/nbd-client.h|  2 +-
>  include/block/block_int.h | 11 +++
>  block/io.c| 39 ++-
>  block/iscsi.c | 11 ---
>  block/nbd-client.c| 11 +++
>  block/nbd.c   | 28 +++-
>  block/raw-posix.c |  1 +
>  block/raw_bsd.c   |  3 ++-
>  8 files changed, 59 insertions(+), 47 deletions(-)
> 
> -- 
> 2.5.5
> 
> 



Re: [Qemu-devel] [PATCH v4 08/27] osdep: Add qemu_lock_fd and qemu_unlock_fd

2016-05-10 Thread Fam Zheng
On Wed, 05/11 08:48, Fam Zheng wrote:
> racy problem. Any suggestion how this could be fixed?

Reading into the subthread I see the answer: the file-private locks look
promising. Will take a look at that! Thanks.

Fam




Re: [Qemu-devel] [PATCH v4 08/27] osdep: Add qemu_lock_fd and qemu_unlock_fd

2016-05-10 Thread Fam Zheng
On Tue, 05/10 09:57, Daniel P. Berrange wrote:
> On Tue, May 10, 2016 at 10:50:40AM +0800, Fam Zheng wrote:
> > They are wrappers of POSIX fcntl file locking, with the additional
> > interception of open/close (through qemu_open and qemu_close) to offer a
> > better semantics that preserves the locks across multiple life cycles of
> > different fds on the same file.  The reason to make this semantics
> > change over the fcntl locks is to make the API cleaner for QEMU
> > subsystems.
> > 
> > More precisely, we remove this "feature" of fcntl lock:
> > 
> > If a process closes any file descriptor referring to a file, then
> > all of the process's locks on that file are released, regardless of
> > the file descriptor(s) on which the locks were obtained.
> > 
> > as long as the fd is always open/closed through qemu_open and
> > qemu_close.
> 
> You're not actually really removing that problem - this is just hacking
> around it in a manner which is racy & susceptible to silent failure.
> 
> 
> > +static int qemu_fd_close(int fd)
> > +{
> > +LockEntry *ent, *tmp;
> > +LockRecord *rec;
> > +char *path;
> > +int ret;
> > +
> > +assert(fd_to_path);
> > +path = g_hash_table_lookup(fd_to_path, GINT_TO_POINTER(fd));
> > +assert(path);
> > +g_hash_table_remove(fd_to_path, GINT_TO_POINTER(fd));
> > +rec = g_hash_table_lookup(lock_map, path);
> > +ret = close(fd);
> > +
> > +if (rec) {
> > +QLIST_FOREACH_SAFE(ent, >records, next, tmp) {
> > +int r;
> > +if (ent->fd == fd) {
> > +QLIST_REMOVE(ent, next);
> > +g_free(ent);
> > +continue;
> > +}
> > +r = qemu_lock_do(ent->fd, ent->start, ent->len,
> > + ent->readonly ? F_RDLCK : F_WRLCK);
> > +if (r == -1) {
> > +fprintf(stderr, "Failed to acquire lock on fd %d: %s\n",
> > +ent->fd, strerror(errno));
> > +}
> 
> If another app has acquired the lock between the close and this attempt
> to re-acquire the lock, then QEMU is silently carrying on without any
> lock being held. For something that's intending to provide protection
> against concurrent use I think this is not an acceptable failure
> scenario.
> 
> 
> > +int qemu_open(const char *name, int flags, ...)
> > +{
> > +int mode = 0;
> > +int ret;
> > +
> > +if (flags & O_CREAT) {
> > +va_list ap;
> > +
> > +va_start(ap, flags);
> > +mode = va_arg(ap, int);
> > +va_end(ap);
> > +}
> > +ret = qemu_fd_open(name, flags, mode);
> > +if (ret >= 0) {
> > +qemu_fd_add_record(ret, name);
> > +}
> > +return ret;
> > +}
> 
> I think the approach you have here is fundamentally not usable with
> fcntl locks, because it is still using the pattern
> 
>fd = open(path)
>lock(fd)
>if failed lock
>   close(fd)
>...do stuff.
> 
> 
> As mentioned in previous posting I believe the block layer must be
> checking whether it already has a lock held against "path" *before*
> even attempting to open the file. Once QEMU has the file descriptor
> open it is too later, because closing the FD will release pre-existing
> locks QEMU has.

There will still be valid close() calls, that will release necessary locks
temporarily.  You are right the "close + re-acquire" in qemu_fd_close() is a
racy problem. Any suggestion how this could be fixed?  Something like
introducing a "close2" syscall that doesn't drop locks on other fds?

Fam



[Qemu-devel] [PATCH v2 3/3] memory: drop some wrappers that waste cpu cycle

2016-05-10 Thread Gonglei
For better performance, we can use RAMblock
directly stored in memory_region at present.

Signed-off-by: Gonglei 
---
 exec.c  | 33 ++---
 hw/misc/ivshmem.c   |  8 +---
 hw/virtio/vhost-user.c  | 13 -
 include/exec/ram_addr.h |  4 +---
 memory.c|  2 +-
 5 files changed, 17 insertions(+), 43 deletions(-)

diff --git a/exec.c b/exec.c
index 117c9a8..f8de928 100644
--- a/exec.c
+++ b/exec.c
@@ -1812,38 +1812,9 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
 }
 #endif /* !_WIN32 */
 
-int qemu_get_ram_fd(ram_addr_t addr)
+void *qemu_get_ram_block_host_ptr(RAMBlock *ram_block)
 {
-RAMBlock *block;
-int fd;
-
-rcu_read_lock();
-block = qemu_get_ram_block(addr);
-fd = block->fd;
-rcu_read_unlock();
-return fd;
-}
-
-void qemu_set_ram_fd(ram_addr_t addr, int fd)
-{
-RAMBlock *block;
-
-rcu_read_lock();
-block = qemu_get_ram_block(addr);
-block->fd = fd;
-rcu_read_unlock();
-}
-
-void *qemu_get_ram_block_host_ptr(ram_addr_t addr)
-{
-RAMBlock *block;
-void *ptr;
-
-rcu_read_lock();
-block = qemu_get_ram_block(addr);
-ptr = ramblock_ptr(block, 0);
-rcu_read_unlock();
-return ptr;
+return ramblock_ptr(ram_block, 0);
 }
 
 /* Return a host pointer to ram allocated with qemu_ram_alloc.
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index e40f23b..1e930fa 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -533,7 +533,9 @@ static void process_msg_shmem(IVShmemState *s, int fd, 
Error **errp)
 }
 memory_region_init_ram_ptr(>server_bar2, OBJECT(s),
"ivshmem.bar2", size, ptr);
-qemu_set_ram_fd(memory_region_get_ram_addr(>server_bar2), fd);
+assert(s->server_bar2.ram_block);
+s->server_bar2.ram_block->fd = fd;
+
 s->ivshmem_bar2 = >server_bar2;
 }
 
@@ -939,8 +941,8 @@ static void ivshmem_exit(PCIDevice *dev)
 error_report("Failed to munmap shared memory %s",
  strerror(errno));
 }
-
-fd = qemu_get_ram_fd(memory_region_get_ram_addr(s->ivshmem_bar2));
+assert(s->ivshmem_bar2->ram_block);
+fd = s->ivshmem_bar2->ram_block->fd;
 close(fd);
 }
 
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 5914e85..5082e04 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -248,17 +248,18 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
 for (i = 0; i < dev->mem->nregions; ++i) {
 struct vhost_memory_region *reg = dev->mem->regions + i;
 ram_addr_t ram_addr;
+MemoryRegion *mr;
 
 assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
-qemu_ram_addr_from_host((void *)(uintptr_t)reg->userspace_addr,
+mr = qemu_ram_addr_from_host((void *)(uintptr_t)reg->userspace_addr,
 _addr);
-fd = qemu_get_ram_fd(ram_addr);
+fd = mr->ram_block->fd;
 if (fd > 0) {
 msg.payload.memory.regions[fd_num].userspace_addr = 
reg->userspace_addr;
 msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
 msg.payload.memory.regions[fd_num].guest_phys_addr = 
reg->guest_phys_addr;
 msg.payload.memory.regions[fd_num].mmap_offset = 
reg->userspace_addr -
-(uintptr_t) qemu_get_ram_block_host_ptr(ram_addr);
+  (uintptr_t) qemu_get_ram_block_host_ptr(mr->ram_block);
 assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
 fds[fd_num++] = fd;
 }
@@ -622,11 +623,13 @@ static bool vhost_user_can_merge(struct vhost_dev *dev,
 
 mr = qemu_ram_addr_from_host((void *)(uintptr_t)start1, _addr);
 assert(mr);
-mfd = qemu_get_ram_fd(ram_addr);
+assert(mr->ram_block);
+mfd = mr->ram_block->fd;
 
 mr = qemu_ram_addr_from_host((void *)(uintptr_t)start2, _addr);
 assert(mr);
-rfd = qemu_get_ram_fd(ram_addr);
+assert(mr->ram_block);
+rfd = mr->ram_block->fd;
 
 return mfd == rfd;
 }
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 5b6e1b8..9a01d4a 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -105,9 +105,7 @@ RAMBlock *qemu_ram_alloc_resizeable(ram_addr_t size, 
ram_addr_t max_size,
 uint64_t length,
 void *host),
 MemoryRegion *mr, Error **errp);
-int qemu_get_ram_fd(ram_addr_t addr);
-void qemu_set_ram_fd(ram_addr_t addr, int fd);
-void *qemu_get_ram_block_host_ptr(ram_addr_t addr);
+void *qemu_get_ram_block_host_ptr(RAMBlock *ram_block);
 void qemu_ram_free(RAMBlock *block);
 
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp);
diff --git a/memory.c b/memory.c
index 239e6da..9cff7e2 100644
--- a/memory.c
+++ 

Re: [Qemu-devel] [RFC 36/42] acpi: cpuhp: add cpu._OST handling

2016-05-10 Thread Eric Blake
On 05/02/2016 06:33 AM, Igor Mammedov wrote:
> Signed-off-by: Igor Mammedov 
> ---
>  hw/acpi/cpu.c | 83 
> +++
>  hw/acpi/ich9.c|  3 ++
>  hw/acpi/piix4.c   |  3 ++
>  include/hw/acpi/cpu.h |  4 +++
>  qapi-schema.json  |  2 +-
>  trace-events  |  2 ++
>  6 files changed, 96 insertions(+), 1 deletion(-)

> +++ b/qapi-schema.json
> @@ -4019,7 +4019,7 @@
>  #
>  # @DIMM: memory slot
>  #
> -{ 'enum': 'ACPISlotType', 'data': [ 'DIMM' ] }
> +{ 'enum': 'ACPISlotType', 'data': [ 'DIMM', 'CPU' ] }

Missing documentation of the new value.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH] Fix linking relocatable objects on Sparc

2016-05-10 Thread James Clarke
On Sparc, gcc implicitly passes --relax to the linker, but -r is
incompatible with this. Therefore, if --no-relax is supported, it should
be passed to the linker.

Signed-off-by: James Clarke 
---
 configure | 13 +
 rules.mak |  2 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index c37fc5f..c3583b2 100755
--- a/configure
+++ b/configure
@@ -4508,6 +4508,18 @@ if compile_prog "" "" ; then
 have_fsxattr=yes
 fi
 
+#
+# Sparc implicitly links with --relax, which is
+# incompatible with -r. It does no harm to give
+# it on other platforms too.
+
+cat > $TMPC << EOF
+int foo(void) { return 0; }
+EOF
+if compile_prog "" "-nostdlib -Wl,-r -Wl,--no-relax"; then
+  LD_REL_FLAGS="-Wl,--no-relax"
+fi
+
 ##
 # End of CC checks
 # After here, no more $cc or $ld runs
@@ -5513,6 +5525,7 @@ else
 fi
 echo "LDFLAGS=$LDFLAGS" >> $config_host_mak
 echo "LDFLAGS_NOPIE=$LDFLAGS_NOPIE" >> $config_host_mak
+echo "LD_REL_FLAGS=$LD_REL_FLAGS" >> $config_host_mak
 echo "LIBS+=$LIBS" >> $config_host_mak
 echo "LIBS_TOOLS+=$libs_tools" >> $config_host_mak
 echo "EXESUF=$EXESUF" >> $config_host_mak
diff --git a/rules.mak b/rules.mak
index d1ff311..fc2c961 100644
--- a/rules.mak
+++ b/rules.mak
@@ -93,7 +93,7 @@ module-common.o: CFLAGS += $(DSO_OBJ_CFLAGS)
$(if $(findstring /,$@),$(call quiet-command,cp $@ $(subst /,-,$@), "  
CP$(subst /,-,$@)"))
 
 
-LD_REL := $(CC) -nostdlib -Wl,-r
+LD_REL := $(CC) -nostdlib -Wl,-r $(LD_REL_FLAGS)
 
 %.mo:
$(call quiet-command,$(LD_REL) -o $@ $^,"  LD -r $(TARGET_DIR)$@")
-- 
2.8.2




Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Michał Belczyk

On Tue, May 10, 2016 at 04:41:27PM +0100, Alex Bligh wrote:


On 10 May 2016, at 16:29, Eric Blake  wrote:


So the kernel is currently one of the clients that does NOT honor block
sizes, and as such, servers should be prepared for ANY size up to
UINT_MAX (other than DoS handling).


Interesting followup question:

If the kernel does not fragment TRIM requests at all (in the
same way it fragments read and write requests), I suspect
something bad may happen with TRIM requests over 2^31
in size (particularly over 2^32 in size), as the length
field in nbd only has 32 bits.

Whether it supports block size constraints or not, it is
going to need to do *some* breaking up of requests.


Doesn't the kernel break TRIM requests at max_discard_sectors?

I remember testing the following change in the nbd kmod long time ago:

#if 0
disk->queue->limits.max_discard_sectors = UINT_MAX;
#else
disk->queue->limits.max_discard_sectors = 65536;
#endif

The problem with large TRIM requests is exactly the same as with other 
(READ/WRITE) large requests -- they _may_ take loads of time and if the client 
wants to support a fast switch over to another replica it must put some 
constraints on the timeout value... which may be very easily broken if you 
allow things like a 1GB trim request on the server using DIO on the other side 
(and say heavily fragmented sparse file over XFS, never mind).


I don't think it's the problem of QEMU NBD server to fix that, the constraint 
on the server side is perfectly fine, the problem is to note that a change on 
the client side is required to limit the maximum size of the TRIM requests.
The reason for the patch I pasted above was that at the time I looked into it 
there was no other way to change the TRIM request size via 
/sys/block/$dev/queue/max_discard_sectors, but maybe the more recent kernels 
allow that?
Not to mention that the NBD client option to set that at NBD queue setup time 
would be nice...


Just my 2p.

--
Michał Belczyk Snr



[Qemu-devel] [Bug 1528718] Re: Initial monitor does not output anything on Windows (MSYS2 binary)

2016-05-10 Thread Dravion Smith
This affects both binary editions (for instance "qemu-system-i386.exe"
AND "qemu-system-i386w.exe")

dumpbin /all "C:\Program Files\qemu\qemu-system-i386.exe"|more
Dump of file C:\Program Files\qemu\qemu-system-i386.exe
PE signature found
File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
8664 machine (x64)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1528718

Title:
  Initial monitor does not output anything on Windows (MSYS2 binary)

Status in QEMU:
  New

Bug description:
  When running on Windows error messages before the UI is started are
  not showing up.

  For example when I run:

  qemu-system-i386.exe -L /mingw32/etc/qemu/ -m 20G

  It should display "ram size too large", according to gdb:

  Breakpoint 1, error_report (fmt=fmt@entry=0x71bdf6
   "ram size too large") at
  C:/build/mingw/mingw-w64-qemu/src/qemu-2.4.0/util/qemu-error.c:233

  However the console does never receive that.

  As far as I could find out vfprintf is called, but it doesn't output
  anything.

  Maybe the binary is build for the "win32" subsystem which does not
  have a console window attached by default?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1528718/+subscriptions



[Qemu-devel] [Bug 1528718] Re: Initial monitor does not output anything on Windows (MSYS2 binary)

2016-05-10 Thread Dravion Smith
This affects both binary editions (for instance "qemu-system-i386.exe"
AND "qemu-system-i386w.exe")

dumpbin /all "C:\Program Files\qemu\qemu-system-i386.exe"|more
Dump of file C:\Program Files\qemu\qemu-system-i386.exe
PE signature found
File Type: EXECUTABLE IMAGE
4.00 operating system version
5.02 subsystem version: Win32

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1528718

Title:
  Initial monitor does not output anything on Windows (MSYS2 binary)

Status in QEMU:
  New

Bug description:
  When running on Windows error messages before the UI is started are
  not showing up.

  For example when I run:

  qemu-system-i386.exe -L /mingw32/etc/qemu/ -m 20G

  It should display "ram size too large", according to gdb:

  Breakpoint 1, error_report (fmt=fmt@entry=0x71bdf6
   "ram size too large") at
  C:/build/mingw/mingw-w64-qemu/src/qemu-2.4.0/util/qemu-error.c:233

  However the console does never receive that.

  As far as I could find out vfprintf is called, but it doesn't output
  anything.

  Maybe the binary is build for the "win32" subsystem which does not
  have a console window attached by default?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1528718/+subscriptions



Re: [Qemu-devel] [PATCH v3 2/3] xen: write information about supported backends

2016-05-10 Thread Anthony PERARD
On Fri, May 06, 2016 at 11:42:45AM +0200, Juergen Gross wrote:
> Add a Xenstore directory for each supported pv backend. This will allow
> Xen tools to decide which backend type to use in case there are
> multiple possibilities.
> 
> The information is added under
> /local/domain//device-model//backends
> before the "running" state is written to Xenstore. Using a directory
> for each backend enables us to add parameters for specific backends
> in the future.
> 
> This interface is documented in the Xen source repository in the file
> docs/misc/qemu-backends.txt
> 
> In order to reuse the Xenstore directory creation already present in
> hw/xen/xen_devconfig.c move the related functions to
> hw/xen/xen_backend.c where they fit better.
> 
> Signed-off-by: Juergen Gross 
> ---
> V3: Added .backend_register function to XenDevOps in order to have a
> way to let the backend decide whether all prerequisites are met
> for support.
> 
> V2: update commit message as requested by Stefano
> ---
>  hw/xen/xen_backend.c | 60 
> 
>  hw/xen/xen_devconfig.c   | 52 ++
>  include/hw/xen/xen_backend.h |  2 ++
>  3 files changed, 64 insertions(+), 50 deletions(-)
> 
> diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c
> index 60575ad..6d8b3a5 100644
> --- a/hw/xen/xen_backend.c
> +++ b/hw/xen/xen_backend.c
> @@ -42,11 +42,35 @@ struct xs_handle *xenstore = NULL;
>  const char *xen_protocol;
>  
>  /* private */
> +struct xs_dirs {
> +char *xs_dir;
> +QTAILQ_ENTRY(xs_dirs) list;
> +};
> +static QTAILQ_HEAD(xs_dirs_head, xs_dirs) xs_cleanup = 
> QTAILQ_HEAD_INITIALIZER(xs_cleanup);
> +
>  static QTAILQ_HEAD(XenDeviceHead, XenDevice) xendevs = 
> QTAILQ_HEAD_INITIALIZER(xendevs);
>  static int debug = 0;
>  
>  /* - */
>  
> +static void xenstore_cleanup_dir(char *dir)
> +{
> +struct xs_dirs *d;
> +
> +d = g_malloc(sizeof(*d));
> +d->xs_dir = dir;
> +QTAILQ_INSERT_TAIL(_cleanup, d, list);
> +}
> +
> +void xen_config_cleanup(void)
> +{
> +struct xs_dirs *d;
> +
> +QTAILQ_FOREACH(d, _cleanup, list) {
> +xs_rm(xenstore, 0, d->xs_dir);
> +}
> +}
> +
>  int xenstore_write_str(const char *base, const char *node, const char *val)
>  {
>  char abspath[XEN_BUFSIZE];
> @@ -75,6 +99,28 @@ char *xenstore_read_str(const char *base, const char *node)
>  return ret;
>  }
>  
> +int xenstore_mkdir(char *path, int p)
> +{
> +struct xs_permissions perms[2] = {{
> +.id= 0, /* set owner: dom0 */
> +},{
> +.id= xen_domid,
> +.perms = p,
> +}};
> +
> +if (!xs_mkdir(xenstore, 0, path)) {
> +xen_be_printf(NULL, 0, "xs_mkdir %s: failed\n", path);
> +return -1;
> +}
> +xenstore_cleanup_dir(g_strdup(path));
> +
> +if (!xs_set_permissions(xenstore, 0, path, perms, 2)) {
> +xen_be_printf(NULL, 0, "xs_set_permissions %s: failed\n", path);
> +return -1;
> +}
> +return 0;
> +}
> +
>  int xenstore_write_int(const char *base, const char *node, int ival)
>  {
>  char val[12];
> @@ -726,6 +772,20 @@ err:
>  
>  int xen_be_register(const char *type, struct XenDevOps *ops)
>  {
> +char path[50];
> +int rc;
> +
> +if (ops->backend_register) {
> +rc = ops->backend_register();
> +if (rc) {
> +return rc;
> +}
> +}
> +
> +snprintf(path, sizeof (path), "device-model/%u/backends/%s", xen_domid,
> + type);
> +xenstore_mkdir(path, XS_PERM_READ);

Do you actually need the guest to be able to read those paths?

> +
>  return xenstore_scan(type, xen_domid, ops);
>  }
>  
> diff --git a/hw/xen/xen_devconfig.c b/hw/xen/xen_devconfig.c
> index 1f30fe4..b7d290d 100644
> --- a/hw/xen/xen_devconfig.c
> +++ b/hw/xen/xen_devconfig.c
> @@ -5,54 +5,6 @@
>  
>  /* - */
>  
> -struct xs_dirs {
> -char *xs_dir;
> -QTAILQ_ENTRY(xs_dirs) list;
> -};
> -static QTAILQ_HEAD(xs_dirs_head, xs_dirs) xs_cleanup = 
> QTAILQ_HEAD_INITIALIZER(xs_cleanup);
> -
> -static void xen_config_cleanup_dir(char *dir)
> -{
> -struct xs_dirs *d;
> -
> -d = g_malloc(sizeof(*d));
> -d->xs_dir = dir;
> -QTAILQ_INSERT_TAIL(_cleanup, d, list);
> -}
> -
> -void xen_config_cleanup(void)
> -{
> -struct xs_dirs *d;
> -
> -QTAILQ_FOREACH(d, _cleanup, list) {
> - xs_rm(xenstore, 0, d->xs_dir);
> -}
> -}
> -
> -/* - */
> -
> -static int xen_config_dev_mkdir(char *dev, int p)
> -{
> -struct xs_permissions perms[2] = {{
> -.id= 0, /* set owner: dom0 */
> -},{
> -.id= xen_domid,
> -.perms = p,
> -}};
> -

The function looks like it as been tailored to mkdir for config of
backends. So 

Re: [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Eric Blake
On 05/06/2016 02:45 AM, Quentin Casasnovas wrote:
> When running fstrim on a filesystem mounted through qemu-nbd with
> --discard=on, fstrim would fail with I/O errors:
> 
>   $ fstrim /k/spl/ice/
>   fstrim: /k/spl/ice/: FITRIM ioctl failed: Input/output error
> 
> and qemu-nbd was spitting these:
> 
>   nbd.c:nbd_co_receive_request():L1232: len (94621696) is larger than max len 
> (33554432)

Your patch duplicates what is already present in qemu:

commit eb38c3b67018ff8069e4f674a28661931a8a3e4f
Author: Paolo Bonzini 
Date:   Thu Jan 7 14:32:42 2016 +0100

nbd-server: do not check request length except for reads and writes

Only reads and writes need to allocate memory correspondent to the
request length.  Other requests can be sent to the storage without
allocating any memory, and thus any request length is acceptable.

Reported-by: Sitsofe Wheeler 
Cc: qemu-bl...@nongnu.org
Reviewed-by: Max Reitz 
Signed-off-by: Paolo Bonzini 

For the purposes of qemu-stable, it's better to backport the existing
patch than to write a new version of it.

It also helps to state what version of qemu you were testing, as it is
obviously not the (soon-to-be-released) version 2.6 which already has
the fix.

>  nbd.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/nbd.c b/nbd.c
> index b3d9654..e733669 100644
> --- a/nbd.c
> +++ b/nbd.c
> @@ -1209,6 +1209,11 @@ static ssize_t nbd_co_send_reply(NBDRequest *req, 
> struct nbd_reply *reply,
>  return rc;
>  }
>  
> +static bool nbd_should_check_request_size(const struct nbd_request *request)
> +{
> +return (request->type & NBD_CMD_MASK_COMMAND) != NBD_CMD_TRIM;
> +}
> +
>  static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request 
> *request)
>  {
>  NBDClient *client = req->client;
> @@ -1227,7 +1232,8 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, 
> struct nbd_request *reque
>  goto out;
>  }
>  
> -if (request->len > NBD_MAX_BUFFER_SIZE) {
> +if (nbd_should_check_request_size(request) &&
> +request->len > NBD_MAX_BUFFER_SIZE) {
>  LOG("len (%u) is larger than max len (%u)",
>  request->len, NBD_MAX_BUFFER_SIZE);
>  rc = -EINVAL;
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC 23/42] tests: pc-cpu-test: turn on cpu-hotplug explicily

2016-05-10 Thread Eduardo Habkost
On Mon, May 02, 2016 at 02:33:32PM +0200, Igor Mammedov wrote:
> Machine types before 2.7 have legacy CPU hotplug
> enabled by defaut to not regress existing setups
> where it's always enabled.
> 
> But since 2.7 CPU hotplug is disabled y default
> and requires explicit enabling using 'cpu-hotplug'
> parameter in '-machine' option.
> So turn it on for cpu-hotplug testcase to allow
> it test 2.7 and later machine types.
> 
> Fixes following failure:
> 
> /x86_64/cpu/pc-q35-2.7/add/1x3x2=12: **
> ERROR:tests/pc-cpu-test.c:44:test_pc_with_cpu_add:
> assertion failed: (!qdict_haskey(response, "error"))

Is this test error introduced by a patch in this series? If so,
shouldn't this fix be squashed with the patch that disables CPU
hotplug by default, so we get a bisectable tree?

-- 
Eduardo



Re: [Qemu-devel] [RFC 20/42] machine: add cpu-hotplug machine option

2016-05-10 Thread Eduardo Habkost
On Mon, May 02, 2016 at 02:33:29PM +0200, Igor Mammedov wrote:
> Signed-off-by: Igor Mammedov 

If a machine class simply doesn't support CPU hotplug at all, is
silently ignoring "cpu-hotplug=on" the right thing to do?

Shouldn't we exit with an error if the machine class doesn't
support CPU hotplug and the user tries to enable it?

-- 
Eduardo



[Qemu-devel] [Bug 1579565] Re: ERROR: sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T.

2016-05-10 Thread luigiburdo
Hi  Perter, 
probalby ... i found something wrongs inside my ppc64 library , im clearing and 
reinstall the glib . report it soon

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1579565

Title:
  ERROR: sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T.

Status in QEMU:
  Invalid

Bug description:
  cont build from last 2.6 rc4

  ~/Downloads/qemu-2.6.0-rc4$ ./configure

  ERROR: sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T.
 You probably need to set PKG_CONFIG_LIBDIR
 to point to the right pkg-config files for your
 build target

  Os Ubuntu Mate 16.04 PPC64

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1579565/+subscriptions



[Qemu-devel] [Bug 1579565] Re: ERROR: sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T.

2016-05-10 Thread luigiburdo
Yes Fixed guys.
configure reply right . can close this bug , that was not a qemu bug.

** Changed in: qemu
   Status: Fix Committed => Invalid

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1579565

Title:
  ERROR: sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T.

Status in QEMU:
  Invalid

Bug description:
  cont build from last 2.6 rc4

  ~/Downloads/qemu-2.6.0-rc4$ ./configure

  ERROR: sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T.
 You probably need to set PKG_CONFIG_LIBDIR
 to point to the right pkg-config files for your
 build target

  Os Ubuntu Mate 16.04 PPC64

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1579565/+subscriptions



Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Eric Blake
On 05/10/2016 10:33 AM, Quentin Casasnovas wrote:

> Looks like there's an easier way:
> 
>  $ qemu-img create -f qcow2 foo.qcow2 10G
>  $ qemu-nbd --discard=on -c /dev/nbd0 foo.qcow2
>  $ mkfs.ext4 /dev/nbd0
>  mke2fs 1.42.13 (17-May-2015)
>  Discarding device blocks: failed - Input/output error

strace on mkfs.ext4 shows:
...
open("/dev/nbd1", O_RDWR|O_EXCL)= 3
stat("/dev/nbd1", {st_mode=S_IFBLK|0660, st_rdev=makedev(43, 1), ...}) = 0
ioctl(3, BLKDISCARDZEROES, [0]) = 0
ioctl(3, BLKROGET, [0]) = 0
uname({sysname="Linux", nodename="red", ...}) = 0
ioctl(3, BLKDISCARD, [0, 100])  = 0
write(1, "Discarding device blocks: ", 26) = 26
write(1, "   4096/2621440", 15   4096/2621440) = 15
write(1, "\10\10\10\10\10\10\10\10\10\10\10\1) = 15
ioctl(3, BLKDISCARD, [100, 8000]) = -1 EIO (Input/output error)
write(1, "   ", 15   ) = 15
write(1, "\10\10\10\10\10\10\10\10\10\10\10\1) = 15
write(1, "failed - ", 9failed - )= 9

so it does indeed look like the smaller request [0, 0x100] succeeded
while the larger [0x100, 0x8000] failed with EIO.  But I
instrumented my qemu-nbd server to output all of the incoming requests
it was receiving from the kernel, and I never saw a mention of
NBD_CMD_TRIM being received.  Maybe it's dependent on what version of
the kernel and/or NBD kernel module you are running?

I also tried fallocate(1) on /dev/nbd1, as in:
 # strace fallocate -o 0 -l 64k -p /dev/nbd1
but it looks like the kernel doesn't yet wire up fallocate(2) to forward
to the appropriate device ioctls and/or NBD_CMD_TRIM, where strace says:

open("/dev/nbd1", O_RDWR)   = 3
fallocate(3, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 0, 65536) = -1
ENODEV (No such device)

The ENODEV is a rather unusual choice of error.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC 12/42] pc: initialize legacy hotplug only for 2.6 and older machine types

2016-05-10 Thread Eduardo Habkost
On Mon, May 02, 2016 at 02:33:21PM +0200, Igor Mammedov wrote:
> on old machine types CPU hotplug was uncondtionally
> enabled since it was introduced, consuming IO ports
> and providing AML regardles of whether it was actually
> in use or not. Keep it so for 2.6 and older machines.
> 
> New machine types will have an option to turn CPU
> hotplug on if it's needed while by default it stays
> disabled not consuming extra RAM/IO resources.
> 
> Signed-off-by: Igor Mammedov 

What if people are using "-machine pc -smp N,max_cpus=M"?
Shouldn't we at least warning about missing CPU hotplug support
when using just "max_cpus" with no "cpu-hotplug=on" with pc-2.7
and newer? Should max_cpus > smp_cpus automatically set
cpu-hotplug=on?

-- 
Eduardo



Re: [Qemu-devel] [RFC 11/42] pc: add 2.7 machine

2016-05-10 Thread Eduardo Habkost
On Mon, May 02, 2016 at 02:33:20PM +0200, Igor Mammedov wrote:
> Signed-off-by: Igor Mammedov 

Reviewed-by: Eduardo Habkost 

-- 
Eduardo



Re: [Qemu-devel] [RFC 10/42] pc: piix4/ich9: add 'cpu-hotplug-legacy' property

2016-05-10 Thread Eduardo Habkost
On Mon, May 02, 2016 at 02:33:19PM +0200, Igor Mammedov wrote:
> Signed-off-by: Igor Mammedov 

As the property is useless if no code is reading
s->pm.cpu_hotplug_legacy. What about squashing patches 10/42 and
12/42 together?

-- 
Eduardo



Re: [Qemu-devel] [PATCH] target-i386: implement CPUID[0xB] (Extended Topology Enumeration)

2016-05-10 Thread Eduardo Habkost
On Mon, May 09, 2016 at 10:49:00PM +0200, Radim Krčmář wrote:
> I looked at a dozen Intel CPU that have this CPUID and all of them
> always had Core offset as 1 (a wasted bit when hyperthreading is
> disabled) and Package offset at least 4 (wasted bits at <= 4 cores).
> 
> QEMU uses more compact IDs and it doesn't make much sense to change it
> now.  I keep the SMT and Core sub-leaves even if there is just one
> thread/core;  it makes the code simpler and there should be no harm.

If in the future we really want to make the APIC ID offsets match
the CPUs you looked at, we can add extra X86CPU properties to
allow configuration of APIC ID bit offsets larger than the ones
calculated from smp_cores and smp_threads.

What about compatiblity on migration? With this patch, guests
will see CPUID data change when upgrading QEMU.

> 
> Signed-off-by: Radim Krčmář 
> ---
>  target-i386/cpu.c | 27 +++
>  target-i386/cpu.h |  5 +
>  2 files changed, 32 insertions(+)
> 
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index d0b5b691563c..4f8c3288 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -35,6 +35,7 @@
>  #include "sysemu/arch_init.h"
>  
>  #include "hw/hw.h"
> +#include "hw/i386/topology.h"
>  #if defined(CONFIG_KVM)
>  #include 
>  #endif
> @@ -2460,6 +2461,32 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> uint32_t count,
>  *edx = 0;
>  }
>  break;
> +case 0xB:
> +/* Extended Topology Enumeration Leaf */
> +*ecx = count & 0xff;
> +*edx = cpu->apic_id;
> +
> +switch (*ecx) {
> +case 0:
> +*eax = apicid_core_offset(smp_cores, smp_threads);
> +*ebx = smp_threads;
> +*ecx |= CPUID_TOPOLOGY_LEVEL_SMT;
> +break;
> +case 1:
> +*eax = apicid_pkg_offset(smp_cores, smp_threads);
> +*ebx = smp_cores * smp_threads;
> +*ecx |= CPUID_TOPOLOGY_LEVEL_CORE;
> +break;
> +default:
> +*eax = 0;
> +*ebx = 0;
> +*ecx |= CPUID_TOPOLOGY_LEVEL_INVALID;
> +}
> +
> +/* Preserve reserved bits.  Extremely unlikely to make a difference. 
> */
> +*eax &= 0x1f;
> +*ebx &= 0x;

That means we don't know how to handle apicid_*_offset() >= 32,
smp_threads > 2^16, or smp_cores * smp_threads > 2^16. If that
happen one day, I would prefer to see QEMU abort than silently
truncating data in CPUID[0xB]. What about an assert()?


> +break;
>  case 0xD: {
>  KVMState *s = cs->kvm_state;
>  uint64_t ena_mask;
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index 732eb6d7ec79..b460c2debc1c 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -635,6 +635,11 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
>  #define CPUID_MWAIT_IBE (1U << 1) /* Interrupts can exit capability */
>  #define CPUID_MWAIT_EMX (1U << 0) /* enumeration supported */
>  
> +/* CPUID[0xB].ECX level types */
> +#define CPUID_TOPOLOGY_LEVEL_INVALID  (0U << 8)
> +#define CPUID_TOPOLOGY_LEVEL_SMT  (1U << 8)
> +#define CPUID_TOPOLOGY_LEVEL_CORE (2U << 8)
> +
>  #ifndef HYPERV_SPINLOCK_NEVER_RETRY
>  #define HYPERV_SPINLOCK_NEVER_RETRY 0x
>  #endif
> -- 
> 2.8.2
> 

-- 
Eduardo



[Qemu-devel] [Bug 1579565] Re: ERROR: sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T.

2016-05-10 Thread Peter Maydell
I think Daniel is right -- your glib headers are broken, and we just
weren't diagnosing it before. You need to fix your glib install.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1579565

Title:
  ERROR: sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T.

Status in QEMU:
  Fix Committed

Bug description:
  cont build from last 2.6 rc4

  ~/Downloads/qemu-2.6.0-rc4$ ./configure

  ERROR: sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T.
 You probably need to set PKG_CONFIG_LIBDIR
 to point to the right pkg-config files for your
 build target

  Os Ubuntu Mate 16.04 PPC64

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1579565/+subscriptions



[Qemu-devel] [Bug 1579565] Re: ERROR: sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T.

2016-05-10 Thread luigiburdo
rc5 same issue

./configure

ERROR: sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T.
   You probably need to set PKG_CONFIG_LIBDIR
   to point to the right pkg-config files for your
   build target

thanks

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1579565

Title:
  ERROR: sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T.

Status in QEMU:
  Fix Committed

Bug description:
  cont build from last 2.6 rc4

  ~/Downloads/qemu-2.6.0-rc4$ ./configure

  ERROR: sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T.
 You probably need to set PKG_CONFIG_LIBDIR
 to point to the right pkg-config files for your
 build target

  Os Ubuntu Mate 16.04 PPC64

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1579565/+subscriptions



Re: [Qemu-devel] [PATCH 3/5] cpu-exec: Move interrupt handling out of cpu_exec()

2016-05-10 Thread Sergey Fedorov
On 10/05/16 19:34, Richard Henderson wrote:
> On 05/10/2016 05:46 AM, Sergey Fedorov wrote:
>> From: Sergey Fedorov 
>>
>> Simplify cpu_exec() by extracting interrupt handling code outside of
>> cpu_exec() into a new static inline function cpu_handle_interrupt().
>>
>> Signed-off-by: Sergey Fedorov 
>> Signed-off-by: Sergey Fedorov 
>> ---
>>  cpu-exec.c | 132
>> -
>>  1 file changed, 70 insertions(+), 62 deletions(-)
>
> Reviewed-by: Richard Henderson  
>
>
>> +if (replay_mode == REPLAY_MODE_PLAY &&
>> !replay_has_interrupt()) {
>> +/* Do nothing */
>> +} else if (interrupt_request & CPU_INTERRUPT_HALT) {
>> +}
>> +else if (interrupt_request & CPU_INTERRUPT_RESET) {
>> +}
>> +else {
>> +replay_interrupt();
>> +if (cc->cpu_exec_interrupt(cpu, interrupt_request)) {
>> +*last_tb = NULL;
>> +}
>> +}
>> +/* Don't use the cached interrupt_request value,
>> +   do_interrupt may have updated the EXITTB flag. */
>> +if (cpu->interrupt_request & CPU_INTERRUPT_EXITTB) {
>
> Note for future cleanup: IMO this comment is cleaner if it's actually
> put where it's meaningful (and updated to reflect that do_interrupt no
> longer exists).  E.g.
>
> else {
> if (cc->cpu_exec_interrupt(cpu, interrupt_request)) {
> *last_tb = NULL;
> }
> /* Reload the interrupt_request value as it may have
>been updated by the target hook.  */
> interrupt_request = cpu->interrupt_request;
> }
> if (interupt_request & CPU_INTERRUPT_EXITTB) {
> ...
>
> But such a change of course belongs in a separate patch.

Cool, thanks for the suggestion. I've had feeling this could be
expressed in a better way, like you suggest :)

Kind regards,
Sergey



Re: [Qemu-devel] [PATCH 2/5] cpu-exec: Move exception handling out of cpu_exec()

2016-05-10 Thread Sergey Fedorov
On 10/05/16 19:21, Richard Henderson wrote:
> On 05/10/2016 05:46 AM, Sergey Fedorov wrote:
>> From: Sergey Fedorov 
>>
>> Simplify cpu_exec() by extracting exception handling code out of
>> cpu_exec() into a new static inline function cpu_handle_exception().
>> Also make cpu_handle_debug_exception() inline as it is used only once.
>
> If it's used only once, the compiler is going to do this anyway, and
> therefore there's no point in making the change.  Let's just leave off
> all the inline markers and trust the compiler, eh?

I agree the compiler is smart enough to decide and inline such functions
by itself. But actually, I hope such "static inline" in .c file could
indicate for a reader of the code that this function is going to be used
this way.

>
> Otherwise,
>
> Reviewed-by: Richard Henderson 

Kind regards,
Sergey



Re: [Qemu-devel] [PATCH 1/5] cpu-exec: Move halt handling out of cpu_exec()

2016-05-10 Thread Sergey Fedorov
On 10/05/16 19:13, Richard Henderson wrote:
> On 05/10/2016 05:46 AM, Sergey Fedorov wrote:
>> +#if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
>> +X86CPU *x86_cpu = X86_CPU(cpu);
>> +
>> +if ((cpu->interrupt_request & CPU_INTERRUPT_POLL)
>> +&& replay_interrupt()) {
>> +apic_poll_irq(x86_cpu->apic_state);
>
> Since you're moving this around, you might as well place the x86_cpu
> variable in the inner-most if, next to its only use.

Agree, will fix in v2.

>
> Otherwise,
>
> Reviewed-by: Richard Henderson 

Kind regards,
Sergey



Re: [Qemu-devel] [PATCH v6] Add optionrom compatible with fw_cfg DMA version

2016-05-10 Thread Laszlo Ersek
On 05/10/16 19:36, Richard W.M. Jones wrote:
> On Tue, May 10, 2016 at 07:24:28PM +0200, Paolo Bonzini wrote:
>>
>>
>> On 09/05/2016 18:48, Richard W.M. Jones wrote:
>>>
>>> Of course we're well outside any standards here.  Can we tell clang
>>> users to use the GCC/pre-compiled option ROMs :-?  Any other ideas?  I
>>> don't think I've missed a flag (GCC has -fno-toplevel-reorder, but
>>> clang 3.8 doesn't ...)
>>
>> I guess the checksumming script (scripts/signrom.py) could take care of
>> padding the file to a multiple of 512 bytes, and fill in the size in the
>> third byte.  Then "_end" would not be necessary anymore and -m16 could
>> replace the .code16 directive.
> 
> In my rather limited testing on gcc, gcc -m16 broke booting.  However
> I've not investigated this further.  I'll do so shortly.
> 
> However I have a question: is there a formal standard or documentation
> for the option ROM format?  Are we sticking to the (ancient) "BIOS Boot
> Specification" or is there something newer?  (My copy is from 1996).

To my knowledge, the most recent specification that describes PCI
expansion ROMs is:

  PCI Firmware Specification
  Revision 3.2
  January 26, 2015

See Chapter 5, "PCI Expansion ROMs".

(Maybe a newer release has been made, not sure.)

Of course, this spec is not public (you have to be a PCI SIG member to
get it, or some such, which costs $$$), but since Red Hat is a member,
I'll send you a link off-list (and you can't share the PDF outside of
RH, of course).

Regarding general (device independent) oproms: I'm not sure.

Thanks
Laszlo



Re: [Qemu-devel] [PATCH v1 2/2] target-arm: Enable EL2 for the A53s and A57s

2016-05-10 Thread Alistair Francis
On Tue, May 10, 2016 at 3:36 AM, Edgar E. Iglesias
 wrote:
> On Tue, May 10, 2016 at 09:09:55AM +0100, Peter Maydell wrote:
>> On 10 May 2016 at 01:16, Alistair Francis  
>> wrote:
>> > It is actually a u-boot problem. I originally just assumed it was a
>> > Linux problem, but it happens before u-boot hands off to Linux.
>>
>> OK, that makes sense. u-boot tends to be a bit lower level and less
>> hardware-agnostic. I just wanted to check it wasn't caused by some
>> problem in QEMU's EL3 support we could easily fix.
>>
>> > It appears that u-boot won't work at all with EL3 enabled but EL2
>> > disabled. It always moves to EL2 before moving to EL1 and there is no
>> > code prepared to handle going from EL3 directly to EL1.
>> >
>> > Just for the record, I'm specifically talking about what happens in
>> > the do_nonsec_virt_switch() function.
>> >
>> > It looks like there are three options:
>> >  1. Add support to u-boot to drop from EL3 to EL1 (I'm assuming this
>> > is possible, as not all implementations have EL2)
>> >  2. Just wait until QEMU adds EL2 support
>> >  3. Allow a QEMU command line option to start in EL1 instead of EL3
>
> Alistair,
>
> Our usual ZynqMP boot flow is
>
> ATF at EL3
> ATF hands over to u-boot at EL2
> u-boot hands over to the Linux kernel at EL2
>
> Our ATF has a mode where it can be instructed to start u-boot in
> EL1, bypassing EL2. That would avoid this issue. You don't need to
> recompile ATF. We can discuss the details off-line.

Hey Edgar,

Unfortunately that doesn't work with JTAG boot mode, which is what we are using.

Just to confirm it though I edited the ATF source to handoff to u-boot
in EL1. Then using my device loader patches I managed to boot ATF to
u-boot to Linux. So besides the missing EL2 support the general flow
seems to work. Linux tried to execute code outside RAM or ROM, so I
never made it to a login prompt though.

>
>
>> I would be OK with any of these three from a QEMU perspective.
>> Fixing u-boot is probably conceptually the nicest but I've never
>> looked at u-boot internals so it could be simple or painful...
>>
>> Edgar, do you have the list of what we're still missing before we
>> can turn on EL2?
>
> We're missing the Data Abort ISS:s, we're missing modelling of a few registers
> (like HSTR_EL2). Some regs are quite incomplete like HCR but I think we can
> live with that. GICv2 virtualization is missing. We are also missing a few
> AT_ modes. AArch32 support is probably lacking alot of more stuff.
>
> IIRC, last time this came up we talked about enabling EL2 after ISS and
> GICv2 virtualization support gets merged + a few odd regs. Enough to allow
> Xen and KVM to work under emulation.

Do you have a rough time frame on this? Right now I can't see any way
to boot Linux on the up-streamed ZynqMP machine. If EL2 support will
take a long time we might need to look into one of the alternative
work arounds for the time being.

>
> Cheers,
> Edgar
>



Re: [Qemu-devel] [PATCH v1 2/2] target-arm: Enable EL2 for the A53s and A57s

2016-05-10 Thread Alistair Francis
On Tue, May 10, 2016 at 1:09 AM, Peter Maydell  wrote:
> On 10 May 2016 at 01:16, Alistair Francis  wrote:
>> It is actually a u-boot problem. I originally just assumed it was a
>> Linux problem, but it happens before u-boot hands off to Linux.
>
> OK, that makes sense. u-boot tends to be a bit lower level and less
> hardware-agnostic. I just wanted to check it wasn't caused by some
> problem in QEMU's EL3 support we could easily fix.
>
>> It appears that u-boot won't work at all with EL3 enabled but EL2
>> disabled. It always moves to EL2 before moving to EL1 and there is no
>> code prepared to handle going from EL3 directly to EL1.
>>
>> Just for the record, I'm specifically talking about what happens in
>> the do_nonsec_virt_switch() function.
>>
>> It looks like there are three options:
>>  1. Add support to u-boot to drop from EL3 to EL1 (I'm assuming this
>> is possible, as not all implementations have EL2)
>>  2. Just wait until QEMU adds EL2 support
>>  3. Allow a QEMU command line option to start in EL1 instead of EL3
>
> I would be OK with any of these three from a QEMU perspective.
> Fixing u-boot is probably conceptually the nicest but I've never
> looked at u-boot internals so it could be simple or painful...

I had a quick look and it didn't look easy to do unfortunately.

Thanks,

Alistair

>
> Edgar, do you have the list of what we're still missing before we
> can turn on EL2?
>
> thanks
> -- PMM
>



Re: [Qemu-devel] [PATCH 02/10] vubr: add client mode

2016-05-10 Thread Marc-André Lureau
Hi

- Original Message -
> Hi,
> 
> Just to let you know that as a client, it need to add the reconnect
> ability, as the QEMU (as the server) may restart as well. Well, I'm
> thinking that you may think it's an example bridge only, so, let it
> be simple. Then I'm sorry for the noise.

That's a different use case to me, and a separate possible addition to vubr.

thanks



[Qemu-devel] [Bug 1528718] Re: Initial monitor does not output anything on Windows (MSYS2 binary)

2016-05-10 Thread ECC_OK
** Description changed:

  When running on Windows error messages before the UI is started are not
  showing up.
  
  For example when I run:
  
  qemu-system-i386.exe -L /mingw32/etc/qemu/ -m 20G
  
  It should display "ram size too large", according to gdb:
  
  Breakpoint 1, error_report (fmt=fmt@entry=0x71bdf6 
  "ram size too large") at
  C:/build/mingw/mingw-w64-qemu/src/qemu-2.4.0/util/qemu-error.c:233
  
  However the console does never receive that.
  
  As far as I could find out vfprintf is called, but it doesn't output
  anything.
+ 
+ Maybe the binary is build for the "win32" subsystem which does not have
+ a console window attached by default?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1528718

Title:
  Initial monitor does not output anything on Windows (MSYS2 binary)

Status in QEMU:
  New

Bug description:
  When running on Windows error messages before the UI is started are
  not showing up.

  For example when I run:

  qemu-system-i386.exe -L /mingw32/etc/qemu/ -m 20G

  It should display "ram size too large", according to gdb:

  Breakpoint 1, error_report (fmt=fmt@entry=0x71bdf6
   "ram size too large") at
  C:/build/mingw/mingw-w64-qemu/src/qemu-2.4.0/util/qemu-error.c:233

  However the console does never receive that.

  As far as I could find out vfprintf is called, but it doesn't output
  anything.

  Maybe the binary is build for the "win32" subsystem which does not
  have a console window attached by default?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1528718/+subscriptions



Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Paolo Bonzini


On 10/05/2016 17:38, Alex Bligh wrote:
> > and are at the
> > mercy of however the kernel currently decides to split large requests).
> 
> I am surprised TRIM doesn't get broken up the same way READ and WRITE
> do.

The payload of TRIM has constant size, so it makes sense to do that.
The kernel then self-imposes a 2GB limit in blkdev_issue_discard.

On the other hand, READ and WRITE of size n can possibly require O(n)
memory.

Paolo



Re: [Qemu-devel] [PATCH v6] Add optionrom compatible with fw_cfg DMA version

2016-05-10 Thread Richard W.M. Jones
On Tue, May 10, 2016 at 07:24:28PM +0200, Paolo Bonzini wrote:
> 
> 
> On 09/05/2016 18:48, Richard W.M. Jones wrote:
> > 
> > Of course we're well outside any standards here.  Can we tell clang
> > users to use the GCC/pre-compiled option ROMs :-?  Any other ideas?  I
> > don't think I've missed a flag (GCC has -fno-toplevel-reorder, but
> > clang 3.8 doesn't ...)
> 
> I guess the checksumming script (scripts/signrom.py) could take care of
> padding the file to a multiple of 512 bytes, and fill in the size in the
> third byte.  Then "_end" would not be necessary anymore and -m16 could
> replace the .code16 directive.

In my rather limited testing on gcc, gcc -m16 broke booting.  However
I've not investigated this further.  I'll do so shortly.

However I have a question: is there a formal standard or documentation
for the option ROM format?  Are we sticking to the (ancient) "BIOS Boot
Specification" or is there something newer?  (My copy is from 1996).

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top



Re: [Qemu-devel] [PATCH 8/9] target-i386: Use "-" instead of "_" on all feature names

2016-05-10 Thread Jiri Denemark
On Tue, May 10, 2016 at 17:19:51 +0200, Igor Mammedov wrote:
> On Fri,  6 May 2016 15:11:31 -0300
> Eduardo Habkost  wrote:
> 
> > This makes the feature name tables in feature_word_info all match
> > the actual QOM property names we use.
> > 
> > This will make the command-line interface more consistent,
> > allowing the QOM property names to be used as "-cpu" arguments
> > directly.
> 
> wouldn't that change output of '-cpu help',
> can it break libvirt?

Don't worry, for any QEMU >= 1.2.0 libvirt uses only QMP commands when
probing capabilities. And currently we don't even parse feature names
anywhere, we only use feature names when constructing -cpu command line
and that parts seems to be covered by this patch.

Jirka



Re: [Qemu-devel] [PATCH v6] Add optionrom compatible with fw_cfg DMA version

2016-05-10 Thread Paolo Bonzini


On 09/05/2016 18:48, Richard W.M. Jones wrote:
> 
> Of course we're well outside any standards here.  Can we tell clang
> users to use the GCC/pre-compiled option ROMs :-?  Any other ideas?  I
> don't think I've missed a flag (GCC has -fno-toplevel-reorder, but
> clang 3.8 doesn't ...)

I guess the checksumming script (scripts/signrom.py) could take care of
padding the file to a multiple of 512 bytes, and fill in the size in the
third byte.  Then "_end" would not be necessary anymore and -m16 could
replace the .code16 directive.

Paolo



Re: [Qemu-devel] [PATCH v6] Add optionrom compatible with fw_cfg DMA version

2016-05-10 Thread Paolo Bonzini


On 09/05/2016 23:37, Richard W.M. Jones wrote:
> FWIW the response from the LLVM developers:
> 
>   https://llvm.org/bugs/show_bug.cgi?id=27688

Why the heck are they saying that -fno-toplevel-reorder is "gone" (or
deprecated in the duplicate PR)?

Paolo



Re: [Qemu-devel] [PATCH 02/10] vubr: add client mode

2016-05-10 Thread Yuanhan Liu
Hi,

Just to let you know that as a client, it need to add the reconnect
ability, as the QEMU (as the server) may restart as well. Well, I'm
thinking that you may think it's an example bridge only, so, let it
be simple. Then I'm sorry for the noise.

--yliu

On Tue, May 10, 2016 at 06:03:52PM +0200, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> If -c is specified, vubr will try to connect to the socket instead of
> listening for connections.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  tests/vhost-user-bridge.c | 38 ++
>  1 file changed, 26 insertions(+), 12 deletions(-)
> 
> diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
> index 0779ba2..f907ce7 100644
> --- a/tests/vhost-user-bridge.c
> +++ b/tests/vhost-user-bridge.c
> @@ -1204,12 +1204,13 @@ vubr_accept_cb(int sock, void *ctx)
>  }
>  
>  static VubrDev *
> -vubr_new(const char *path)
> +vubr_new(const char *path, bool client)
>  {
>  VubrDev *dev = (VubrDev *) calloc(1, sizeof(VubrDev));
>  dev->nregions = 0;
>  int i;
>  struct sockaddr_un un;
> +CallbackFunc cb;
>  size_t len;
>  
>  for (i = 0; i < MAX_NR_VIRTQUEUE; i++) {
> @@ -1238,19 +1239,27 @@ vubr_new(const char *path)
>  un.sun_family = AF_UNIX;
>  strcpy(un.sun_path, path);
>  len = sizeof(un.sun_family) + strlen(path);
> -unlink(path);
>  
> -if (bind(dev->sock, (struct sockaddr *) , len) == -1) {
> -vubr_die("bind");
> -}
> +if (!client) {
> +unlink(path);
> +
> +if (bind(dev->sock, (struct sockaddr *) , len) == -1) {
> +vubr_die("bind");
> +}
>  
> -if (listen(dev->sock, 1) == -1) {
> -vubr_die("listen");
> +if (listen(dev->sock, 1) == -1) {
> +vubr_die("listen");
> +}
> +cb = vubr_accept_cb;
> +} else {
> +if (connect(dev->sock, (struct sockaddr *), len) == -1) {
> +vubr_die("connect");
> +}
> +cb = vubr_receive_cb;
>  }
>  
>  dispatcher_init(>dispatcher);
> -dispatcher_add(>dispatcher, dev->sock, (void *)dev,
> -   vubr_accept_cb);
> +dispatcher_add(>dispatcher, dev->sock, (void *)dev, cb);
>  
>  DPRINT("Waiting for connections on UNIX socket %s ...\n", path);
>  return dev;
> @@ -1369,8 +1378,9 @@ main(int argc, char *argv[])
>  {
>  VubrDev *dev;
>  int opt;
> +bool client = false;
>  
> -while ((opt = getopt(argc, argv, "l:r:u:")) != -1) {
> +while ((opt = getopt(argc, argv, "l:r:u:c")) != -1) {
>  
>  switch (opt) {
>  case 'l':
> @@ -1386,16 +1396,20 @@ main(int argc, char *argv[])
>  case 'u':
>  ud_socket_path = strdup(optarg);
>  break;
> +case 'c':
> +client = true;
> +break;
>  default:
>  goto out;
>  }
>  }
>  
> -DPRINT("ud socket: %s\n", ud_socket_path);
> +DPRINT("ud socket: %s (%s)\n", ud_socket_path,
> +   client ? "client" : "server");
>  DPRINT("local: %s:%s\n", lhost, lport);
>  DPRINT("remote:%s:%s\n", rhost, rport);
>  
> -dev = vubr_new(ud_socket_path);
> +dev = vubr_new(ud_socket_path, client);
>  if (!dev) {
>  return 1;
>  }
> -- 
> 2.7.4



Re: [Qemu-devel] [PATCH RFC 0/3] qdev: order devices by priority before creating them

2016-05-10 Thread Paolo Bonzini


On 10/05/2016 17:36, Michael S. Tsirkin wrote:
>> > Can you explain why requiring the user to specify -device in a sane
>> > order isn't good enough?
> Because it requires knowledge about the hardware.
> For example, how do users know that iommu must come before the devices?
> They don't.

Could they "know" it by adding an iommu=foo argument to e.g. the PCI
bridge, pointing to the IOMMU device?

Paolo



Re: [Qemu-devel] [PATCH 00/10] RFCv3: vhost-user: simple reconnection support

2016-05-10 Thread Yuanhan Liu
Hi Marc,

I was also thinking we should do it in this way first. That simplfies
things. So, feel free to add:

Tested-by: Yuanhan Liu 
Reviewed-by: Yuanhan Liu 

(well, I didn't review the test case carefully).

Thanks.

--yliu

On Tue, May 10, 2016 at 06:03:50PM +0200, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Hi,
> 
> In a previous series "RFCv2: vhost-user: shutdown and reconnection", I
> proposed to add a new slave request to handle graceful shutdown, for
> both qemu configuration, server or client, while keeping the guest
> running with link down status.
> 
> However, for the simple case where qemu is configured as server, and
> the backend processes packets in order and disconnects, it is possible
> for the backend to recover after reconnection by discarding the qemu
> SET_VRING_BASE value and resuming from the used->index instead. This
> simplifies the reconnection support in this particular situation (it
> would make sense to have the backend declare this behaviour with an
> extra flag)
> 
> The guest won't be notified of link status change and queues may not
> be processed in a timely manner, also qemu may assert if some
> vhost-user commands are happening while the backend is disconnected.
> So the reconnection must happen "quickly enough" here. In order to
> keep the series relatively small, these further problems will be
> addressed later.
> 
> These series demonstrates a simple reconnect support for vubr
> (configured as client and qemu as server), includes some nice to
> have fixes and a simple test.
> 
> Marc-André Lureau (8):
>   vubr: add client mode
>   vubr: workaround stale vring base
>   vhost-user: disconnect on start failure
>   vhost-net: do not crash if backend is not present
>   vhost-net: save & restore vhost-user acked features
>   vhost-net: save & restore vring enable state
>   tests: append i386 tests
>   test: start vhost-user reconnect test
> 
> Tetsuya Mukawa (2):
>   vhost-user: add ability to know vhost-user backend disconnection
>   qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd
> 
>  hw/net/vhost_net.c|  39 ++-
>  include/net/net.h |   1 +
>  include/net/vhost-user.h  |   1 +
>  include/net/vhost_net.h   |   3 ++
>  include/sysemu/char.h |   7 +++
>  net/vhost-user.c  |  32 +++-
>  qemu-char.c   |   8 +++
>  tests/Makefile|   2 +-
>  tests/vhost-user-bridge.c |  45 -
>  tests/vhost-user-test.c   | 123 
> +++---
>  10 files changed, 228 insertions(+), 33 deletions(-)
> 
> -- 
> 2.7.4



Re: [Qemu-devel] [PATCH 5/5] cpu-exec: Remove unused 'x86_cpu' and 'env' from cpu_exec()

2016-05-10 Thread Richard Henderson

On 05/10/2016 05:46 AM, Sergey Fedorov wrote:

From: Sergey Fedorov 

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
---
 cpu-exec.c | 12 
 1 file changed, 12 deletions(-)


Reviewed-by: Richard Henderson 


r~



Re: [Qemu-devel] [PATCH 4/5] cpu-exec: Move TB execution stuff out of cpu_exec()

2016-05-10 Thread Richard Henderson

On 05/10/2016 05:46 AM, Sergey Fedorov wrote:

From: Sergey Fedorov 

Simplify cpu_exec() by extracting TB execution code outside of
cpu_exec() into a new static inline function cpu_loop_exec_tb().

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
---
 cpu-exec.c | 119 +
 1 file changed, 64 insertions(+), 55 deletions(-)


Reviewed-by: Richard Henderson 


r~



Re: [Qemu-devel] [PATCH 3/5] cpu-exec: Move interrupt handling out of cpu_exec()

2016-05-10 Thread Richard Henderson

On 05/10/2016 05:46 AM, Sergey Fedorov wrote:

From: Sergey Fedorov 

Simplify cpu_exec() by extracting interrupt handling code outside of
cpu_exec() into a new static inline function cpu_handle_interrupt().

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
---
 cpu-exec.c | 132 -
 1 file changed, 70 insertions(+), 62 deletions(-)


Reviewed-by: Richard Henderson  



+if (replay_mode == REPLAY_MODE_PLAY && !replay_has_interrupt()) {
+/* Do nothing */
+} else if (interrupt_request & CPU_INTERRUPT_HALT) {
+}
+else if (interrupt_request & CPU_INTERRUPT_RESET) {
+}
+else {
+replay_interrupt();
+if (cc->cpu_exec_interrupt(cpu, interrupt_request)) {
+*last_tb = NULL;
+}
+}
+/* Don't use the cached interrupt_request value,
+   do_interrupt may have updated the EXITTB flag. */
+if (cpu->interrupt_request & CPU_INTERRUPT_EXITTB) {


Note for future cleanup: IMO this comment is cleaner if it's actually put where 
it's meaningful (and updated to reflect that do_interrupt no longer exists).  E.g.


else {
if (cc->cpu_exec_interrupt(cpu, interrupt_request)) {
*last_tb = NULL;
}
/* Reload the interrupt_request value as it may have
   been updated by the target hook.  */
interrupt_request = cpu->interrupt_request;
}
if (interupt_request & CPU_INTERRUPT_EXITTB) {
...

But such a change of course belongs in a separate patch.


r~



Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Quentin Casasnovas
On Tue, May 10, 2016 at 05:23:21PM +0100, Alex Bligh wrote:
> 
> On 10 May 2016, at 17:04, Quentin Casasnovas  
> wrote:
> 
> > Alright, I assumed by 'length of an NBD request', the specification was
> > talking about the length of.. well, the request as opposed to whatever is
> > in the length field of the request header.
> 
> With NBD_CMD_TRIM the length in the header field is 32 bit and specifies
> the length of data to trim, not the length of the data transferred (which
> is none).
> 
> > Is there a use case where you'd want to split up a single big TRIM request
> > in smaller ones (as in some hardware would not support it or something)?
> > Even then, it looks like this splitting up would be hardware dependant and
> > better implemented in block device drivers.
> 
> Part of the point of the block size extension is to push such limits to the
> client.
> 
> I could make up use cases (e.g. that performing a multi-gigabyte trim in
> a single threaded server will effectively block all other I/O), but the
> main reason in my book is orthogonality, and the fact the client needs
> to do some breaking up anyway.
> 
> > I'm just finding odd that something that fits inside the length field can't
> > be used.
> 
> That's a different point. That's Qemu's 'Denial of Service Attack'
> prevention, *not* maximum block sizes. It isn't dropping it because
> of a maximum block size parameter. If it doesn't support the block size
> extension which the version you're looking at does not, it's meant
> to handle requests up to 2^32-1 long EXCEPT that it MAY error requests
> so long as to cause a denial of service attack. As this doesn't fit
> into that case (it's a TRIM), it shouldn't be erroring it on that grounds.
> 
> I agree Qemu should fix that.
> 
> (So in a sense Eric and I are arguing about something irrelevant to
> your current problem, which is how this would work /with/ the block
> size extensions, as Eric is in the process of implementing them).
> 

Riight!  OK understood, thanks for the explanation.

Quentin



Re: [Qemu-devel] [RESEND PATCH v3 0/8] libqos: use standard virtio headers

2016-05-10 Thread Paolo Bonzini


On 10/05/2016 11:59, Stefan Hajnoczi wrote:
> > I still dream of having qtests written in Python; I agree that for now
> > this makes sense.
> 
> Is there any particular Python feature that you miss?

I would like to use QMP more and to autodiscover buses (so that every
SCSI test can be run on both virtio-scsi and LSI SAS, just by writing a
"driver" for LSI SAS), and I would like to write that code in anything
but C. :)

> Writing low-level device tests in C seems okay to me.

Yes, it's fine---but writing tests that also interact with the monitor
is a bit worse, similar to how qemu-iotests is part shell and part Python.

> The problem is that the longer we wait, the harder it becomes to move to
> Python.  We'd need all the PCI, virtio, etc frameworks ported to Python.

True...  but I can still dream. ;)

Paolo



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Quentin Casasnovas
On Tue, May 10, 2016 at 05:54:44PM +0200, Quentin Casasnovas wrote:
> On Tue, May 10, 2016 at 09:46:36AM -0600, Eric Blake wrote:
> > On 05/10/2016 09:41 AM, Alex Bligh wrote:
> > > 
> > > On 10 May 2016, at 16:29, Eric Blake  wrote:
> > > 
> > >> So the kernel is currently one of the clients that does NOT honor block
> > >> sizes, and as such, servers should be prepared for ANY size up to
> > >> UINT_MAX (other than DoS handling).
> > > 
> > > Interesting followup question:
> > > 
> > > If the kernel does not fragment TRIM requests at all (in the
> > > same way it fragments read and write requests), I suspect
> > > something bad may happen with TRIM requests over 2^31
> > > in size (particularly over 2^32 in size), as the length
> > > field in nbd only has 32 bits.
> > > 
> > > Whether it supports block size constraints or not, it is
> > > going to need to do *some* breaking up of requests.
> > 
> > Does anyone have an easy way to cause the kernel to request a trim
> > operation that large on a > 4G export?  I'm not familiar enough with
> > EXT4 operation to know what file system operations you can run to
> > ultimately indirectly create a file system trim operation that large.
> > But maybe there is something simpler - does the kernel let you use the
> > fallocate(2) syscall operation with FALLOC_FL_PUNCH_HOLE or
> > FALLOC_FL_ZERO_RANGE on an fd backed by an NBD device?
> > 
> 
> It was fairly reproducible here, we just used a random qcow2 image with
> some Debian minimal system pre-installed, mounted that qcow2 image through
> qemu-nbd then compiled a whole kernel inside it.  Then you can make clean
> and run fstrim on the mount point.  I'm assuming you can go faster than
> that by just writing a big file to the qcow2 image mounted without -o
> discard, delete the big file, then remount with -o discard + run fstrim.
> 

Looks like there's an easier way:

 $ qemu-img create -f qcow2 foo.qcow2 10G
 $ qemu-nbd --discard=on -c /dev/nbd0 foo.qcow2
 $ mkfs.ext4 /dev/nbd0
 mke2fs 1.42.13 (17-May-2015)
 Discarding device blocks: failed - Input/output error
 Creating filesystem with 2621440 4k blocks and 655360 inodes
 Filesystem UUID: 25aeb51f-0dea-4c1d-8b65-61f6bcdf97e9
 Superblock backups stored on blocks:
   32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

 Allocating group tables: done
 Writing inode tables: done
 Creating journal (32768 blocks): done
 Writing superblocks and filesystem accounting information: done

Notice the "Discarding device blocks: failed - Input/output error" line, I
bet that it is mkfs.ext4 trying to trim all blocks prior to writing the
filesystem, but it gets an I/O error while doing so.  I haven't verified it
is the same problem, but it it isn't, simply mount the resulting filesystem
and run fstrim on it:

 $ mount -o discard /dev/nbd0 /tmp/foo
 $ fstrim /tmp/foo
 fstrim: /tmp/foo: FITRIM ioctl failed: Input/output error

Quentin



Re: [Qemu-devel] [PATCH 00/10] RFCv3: vhost-user: simple reconnection support

2016-05-10 Thread Marc-André Lureau
Hi

- Original Message -
> On Tue, May 10, 2016 at 06:03:50PM +0200, marcandre.lur...@redhat.com wrote:
> > From: Marc-André Lureau 
> > 
> > Hi,
> > 
> > In a previous series "RFCv2: vhost-user: shutdown and reconnection", I
> > proposed to add a new slave request to handle graceful shutdown, for
> > both qemu configuration, server or client, while keeping the guest
> > running with link down status.
> 
> OK so I would say patches 1-4 are bugfixes, looks like they
> can be Cc stable even?

4 is being used by 5 and 10.
2-3 are only for testing.

4-8 are nice to have as they avoid obvious problems/crashes when handling 
disconnected state and add basic reconnection checks.

9 was already considered for stable by Eric in a previous series

10 would be good to have if 1 is accepted, to check the minimum works as 
expected



Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Alex Bligh

On 10 May 2016, at 17:04, Quentin Casasnovas  
wrote:

> Alright, I assumed by 'length of an NBD request', the specification was
> talking about the length of.. well, the request as opposed to whatever is
> in the length field of the request header.

With NBD_CMD_TRIM the length in the header field is 32 bit and specifies
the length of data to trim, not the length of the data transferred (which
is none).

> Is there a use case where you'd want to split up a single big TRIM request
> in smaller ones (as in some hardware would not support it or something)?
> Even then, it looks like this splitting up would be hardware dependant and
> better implemented in block device drivers.

Part of the point of the block size extension is to push such limits to the
client.

I could make up use cases (e.g. that performing a multi-gigabyte trim in
a single threaded server will effectively block all other I/O), but the
main reason in my book is orthogonality, and the fact the client needs
to do some breaking up anyway.

> I'm just finding odd that something that fits inside the length field can't
> be used.

That's a different point. That's Qemu's 'Denial of Service Attack'
prevention, *not* maximum block sizes. It isn't dropping it because
of a maximum block size parameter. If it doesn't support the block size
extension which the version you're looking at does not, it's meant
to handle requests up to 2^32-1 long EXCEPT that it MAY error requests
so long as to cause a denial of service attack. As this doesn't fit
into that case (it's a TRIM), it shouldn't be erroring it on that grounds.

I agree Qemu should fix that.

(So in a sense Eric and I are arguing about something irrelevant to
your current problem, which is how this would work /with/ the block
size extensions, as Eric is in the process of implementing them).

>  I do agree with your point number 3, obviously if the lenght
> field type doesn't allow something bigger than a u32, then the kernel has
> to do some breaking up in that case.


-- 
Alex Bligh







Re: [Qemu-devel] [PATCH 2/5] cpu-exec: Move exception handling out of cpu_exec()

2016-05-10 Thread Richard Henderson

On 05/10/2016 05:46 AM, Sergey Fedorov wrote:

From: Sergey Fedorov 

Simplify cpu_exec() by extracting exception handling code out of
cpu_exec() into a new static inline function cpu_handle_exception().
Also make cpu_handle_debug_exception() inline as it is used only once.


If it's used only once, the compiler is going to do this anyway, and therefore 
there's no point in making the change.  Let's just leave off all the inline 
markers and trust the compiler, eh?


Otherwise,

Reviewed-by: Richard Henderson 


r~



Re: [Qemu-devel] [PATCH 00/10] RFCv3: vhost-user: simple reconnection support

2016-05-10 Thread Michael S. Tsirkin
On Tue, May 10, 2016 at 06:03:50PM +0200, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Hi,
> 
> In a previous series "RFCv2: vhost-user: shutdown and reconnection", I
> proposed to add a new slave request to handle graceful shutdown, for
> both qemu configuration, server or client, while keeping the guest
> running with link down status.

OK so I would say patches 1-4 are bugfixes, looks like they
can be Cc stable even?


> However, for the simple case where qemu is configured as server, and
> the backend processes packets in order and disconnects, it is possible
> for the backend to recover after reconnection by discarding the qemu
> SET_VRING_BASE value and resuming from the used->index instead. This
> simplifies the reconnection support in this particular situation (it
> would make sense to have the backend declare this behaviour with an
> extra flag)
> 
> The guest won't be notified of link status change and queues may not
> be processed in a timely manner, also qemu may assert if some
> vhost-user commands are happening while the backend is disconnected.
> So the reconnection must happen "quickly enough" here. In order to
> keep the series relatively small, these further problems will be
> addressed later.
> 
> These series demonstrates a simple reconnect support for vubr
> (configured as client and qemu as server), includes some nice to
> have fixes and a simple test.
> 
> Marc-André Lureau (8):
>   vubr: add client mode
>   vubr: workaround stale vring base
>   vhost-user: disconnect on start failure
>   vhost-net: do not crash if backend is not present
>   vhost-net: save & restore vhost-user acked features
>   vhost-net: save & restore vring enable state
>   tests: append i386 tests
>   test: start vhost-user reconnect test
> 
> Tetsuya Mukawa (2):
>   vhost-user: add ability to know vhost-user backend disconnection
>   qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd
> 
>  hw/net/vhost_net.c|  39 ++-
>  include/net/net.h |   1 +
>  include/net/vhost-user.h  |   1 +
>  include/net/vhost_net.h   |   3 ++
>  include/sysemu/char.h |   7 +++
>  net/vhost-user.c  |  32 +++-
>  qemu-char.c   |   8 +++
>  tests/Makefile|   2 +-
>  tests/vhost-user-bridge.c |  45 -
>  tests/vhost-user-test.c   | 123 
> +++---
>  10 files changed, 228 insertions(+), 33 deletions(-)
> 
> -- 
> 2.7.4



[Qemu-devel] [PATCH 01/10] vhost-user: add ability to know vhost-user backend disconnection

2016-05-10 Thread marcandre . lureau
From: Tetsuya Mukawa 

Current QEMU cannot detect vhost-user backend disconnection. The
patch adds ability to know it.
To know disconnection, add watcher to detect G_IO_HUP event. When
G_IO_HUP event is detected, the disconnected socket will be read
to cause a CHR_EVENT_CLOSED.

Signed-off-by: Tetsuya Mukawa 
Reviewed-by: Marc-André Lureau 
---
 net/vhost-user.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 1b9e73a..4a7fd5f 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -22,6 +22,7 @@ typedef struct VhostUserState {
 NetClientState nc;
 CharDriverState *chr;
 VHostNetState *vhost_net;
+int watch;
 } VhostUserState;
 
 typedef struct VhostUserChardevProps {
@@ -167,6 +168,20 @@ static NetClientInfo net_vhost_user_info = {
 .has_ufo = vhost_user_has_ufo,
 };
 
+static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond,
+   void *opaque)
+{
+VhostUserState *s = opaque;
+uint8_t buf[1];
+
+/* We don't actually want to read anything, but CHR_EVENT_CLOSED will be
+ * raised as a side-effect of the read.
+ */
+qemu_chr_fe_read_all(s->chr, buf, sizeof(buf));
+
+return FALSE;
+}
+
 static void net_vhost_user_event(void *opaque, int event)
 {
 const char *name = opaque;
@@ -184,6 +199,8 @@ static void net_vhost_user_event(void *opaque, int event)
 trace_vhost_user_event(s->chr->label, event);
 switch (event) {
 case CHR_EVENT_OPENED:
+s->watch = qemu_chr_fe_add_watch(s->chr, G_IO_HUP,
+ net_vhost_user_watch, s);
 if (vhost_user_start(queues, ncs) < 0) {
 exit(1);
 }
@@ -192,6 +209,8 @@ static void net_vhost_user_event(void *opaque, int event)
 case CHR_EVENT_CLOSED:
 qmp_set_link(name, false, );
 vhost_user_stop(queues, ncs);
+g_source_remove(s->watch);
+s->watch = 0;
 break;
 }
 
-- 
2.7.4




Re: [Qemu-devel] [PATCH 1/5] cpu-exec: Move halt handling out of cpu_exec()

2016-05-10 Thread Richard Henderson

On 05/10/2016 05:46 AM, Sergey Fedorov wrote:

+#if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
+X86CPU *x86_cpu = X86_CPU(cpu);
+
+if ((cpu->interrupt_request & CPU_INTERRUPT_POLL)
+&& replay_interrupt()) {
+apic_poll_irq(x86_cpu->apic_state);


Since you're moving this around, you might as well place the x86_cpu variable 
in the inner-most if, next to its only use.


Otherwise,

Reviewed-by: Richard Henderson 


r~



[Qemu-devel] [PATCH v9 1/3] block: add support for --image-opts in block I/O tests

2016-05-10 Thread Daniel P. Berrange
Currently all block tests use the traditional syntax for images
just specifying a filename. To support the LUKS driver without
resorting to JSON, the tests need to be able to use the new
--image-opts argument to qemu-img and qemu-io.

This introduces a new env variable IMGOPTSSYNTAX. If this is
set to 'true', then qemu-img/qemu-io should use --image-opts.

Signed-off-by: Daniel P. Berrange 
---
 tests/qemu-iotests/039.out   | 20 +++---
 tests/qemu-iotests/061.out   |  8 +++---
 tests/qemu-iotests/137.out   |  4 +--
 tests/qemu-iotests/common|  7 -
 tests/qemu-iotests/common.config | 15 +--
 tests/qemu-iotests/common.rc | 57 +---
 6 files changed, 77 insertions(+), 34 deletions(-)

diff --git a/tests/qemu-iotests/039.out b/tests/qemu-iotests/039.out
index 32c8846..c6e0ac2 100644
--- a/tests/qemu-iotests/039.out
+++ b/tests/qemu-iotests/039.out
@@ -12,9 +12,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 wrote 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 ./common.config: Killed  ( if [ "${VALGRIND_QEMU}" == "y" ]; 
then
-exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
"$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
+exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
"$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
 else
-exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
+exec "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
 fi )
 incompatible_features 0x1
 ERROR cluster 5 refcount=0 reference=1
@@ -51,9 +51,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 wrote 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 ./common.config: Killed  ( if [ "${VALGRIND_QEMU}" == "y" ]; 
then
-exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
"$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
+exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
"$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
 else
-exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
+exec "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
 fi )
 incompatible_features 0x1
 ERROR cluster 5 refcount=0 reference=1
@@ -69,9 +69,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 wrote 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 ./common.config: Killed  ( if [ "${VALGRIND_QEMU}" == "y" ]; 
then
-exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
"$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
+exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
"$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
 else
-exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
+exec "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
 fi )
 incompatible_features 0x0
 No errors were found on the image.
@@ -92,9 +92,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 wrote 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 ./common.config: Killed  ( if [ "${VALGRIND_QEMU}" == "y" ]; 
then
-exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
"$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
+exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
"$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
 else
-exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
+exec "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
 fi )
 incompatible_features 0x1
 ERROR cluster 5 refcount=0 reference=1
@@ -106,9 +106,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 wrote 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 ./common.config: Killed  ( if [ "${VALGRIND_QEMU}" == "y" ]; 
then
-exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
"$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
+exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
"$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
 else
-exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
+exec "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
 fi )
 incompatible_features 0x0
 No errors were found on the image.
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index a03732e..a431b7f 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -58,9 +58,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 wrote 131072/131072 bytes at offset 0
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 ./common.config: Killed  ( if [ "${VALGRIND_QEMU}" == "y" ]; 
then
-exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
"$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
+exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
"$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
 else
-exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
+exec "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
 fi )
 magic 0x514649fb
 

[Qemu-devel] [PATCH v9 3/3] block: enable testing of LUKS driver with block I/O tests

2016-05-10 Thread Daniel P. Berrange
This adds support for testing the LUKS driver with the block
I/O test framework.

   cd tests/qemu-io-tests
   ./check -luks

A handful of test cases are modified to work with luks

 - 004 - whitelist luks format
 - 012 - use TEST_IMG_FILE instead of TEST_IMG for file ops
 - 048 - use TEST_IMG_FILE instead of TEST_IMG for file ops.
 don't assume extended image contents is all zeros,
 explicitly initialize with zeros
 Make file size smaller to avoid having to decrypt
 1 GB of data.
 - 052 - don't assume initial image contents is all zeros,
 explicitly initialize with zeros
 - 100 - don't assume initial image contents is all zeros,
 explicitly initialize with zeros

With this patch applied, the results are as follows:

  Passed: 001 002 003 004 005 008 009 010 011 012 021 032 043
  047 048 049 052 087 100 134 143
  Failed: 033 120 140 145
 Skipped: 007 013 014 015 017 018 019 020 022 023 024 025 026
  027 028 029 030 031 034 035 036 037 038 039 040 041
  042 043 044 045 046 047 049 050 051 053 054 055 056
  057 058 059 060 061 062 063 064 065 066 067 068 069
  070 071 072 073 074 075 076 077 078 079 080 081 082
  083 084 085 086 087 088 089 090 091 092 093 094 095
  096 097 098 099 101 102 103 104 105 107 108 109 110
  111 112 113 114 115 116 117 118 119 121 122 123 124
  128 129 130 131 132 133 134 135 136 137 138 139 141
  142 144 146 148 150 152

The reasons for the failed tests are:

 - 033 - needs adapting to use image opts syntax with blkdebug
 and test image in order to correctly set align property
 - 120 - needs adapting to use correct -drive syntax for luks
 - 140 - needs adapting to use correct -drive syntax for luks
 - 145 - needs adapting to use correct -drive syntax for luks

The vast majority of skipped tests are exercising code that is
qcow2 specific, though a couple could probably be usefully
enabled for luks too.

Signed-off-by: Daniel P. Berrange 
---
 tests/qemu-iotests/004   |  2 +-
 tests/qemu-iotests/012   |  5 -
 tests/qemu-iotests/048   | 26 +++---
 tests/qemu-iotests/048.out   |  6 --
 tests/qemu-iotests/052   |  4 
 tests/qemu-iotests/052.out   |  4 
 tests/qemu-iotests/100   |  7 +++
 tests/qemu-iotests/100.out   | 14 ++
 tests/qemu-iotests/common|  7 +++
 tests/qemu-iotests/common.rc |  3 +++
 10 files changed, 67 insertions(+), 11 deletions(-)

diff --git a/tests/qemu-iotests/004 b/tests/qemu-iotests/004
index 67e1beb..6f2aa3d 100755
--- a/tests/qemu-iotests/004
+++ b/tests/qemu-iotests/004
@@ -37,7 +37,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.rc
 . ./common.filter
 
-_supported_fmt raw qcow qcow2 qed vdi vmdk vhdx
+_supported_fmt raw qcow qcow2 qed vdi vmdk vhdx luks
 _supported_proto generic
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/012 b/tests/qemu-iotests/012
index d1d3f22..01a770d 100755
--- a/tests/qemu-iotests/012
+++ b/tests/qemu-iotests/012
@@ -43,13 +43,16 @@ _supported_fmt generic
 _supported_proto file
 _supported_os Linux
 
+# Remove once all tests are fixed to use TEST_IMG_FILE
+# correctly and common.rc sets it unconditionally
+test -z "$TEST_IMG_FILE" && TEST_IMG_FILE=$TEST_IMG
 
 size=128M
 _make_test_img $size
 
 echo
 echo "== mark image read-only"
-chmod a-w "$TEST_IMG"
+chmod a-w "$TEST_IMG_FILE"
 
 echo
 echo "== read from read-only image"
diff --git a/tests/qemu-iotests/048 b/tests/qemu-iotests/048
index e1eeac2..203c04f 100755
--- a/tests/qemu-iotests/048
+++ b/tests/qemu-iotests/048
@@ -31,13 +31,13 @@ _cleanup()
 {
 echo "Cleanup"
 _cleanup_test_img
-rm "${TEST_IMG2}"
+rm "${TEST_IMG_FILE2}"
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
 
 _compare()
 {
-$QEMU_IMG compare "$@" "$TEST_IMG" "${TEST_IMG2}"
+$QEMU_IMG compare $QEMU_IMG_EXTRA_ARGS "$@" "$TEST_IMG" "${TEST_IMG2}"
 echo $?
 }
 
@@ -46,25 +46,37 @@ _compare()
 . ./common.filter
 . ./common.pattern
 
-_supported_fmt raw qcow qcow2 qed
+_supported_fmt raw qcow qcow2 qed luks
 _supported_proto file
 _supported_os Linux
 
+# Remove once all tests are fixed to use TEST_IMG_FILE
+# correctly and common.rc sets it unconditionally
+test -z "$TEST_IMG_FILE" && TEST_IMG_FILE=$TEST_IMG
+
 # Setup test basic parameters
 TEST_IMG2=$TEST_IMG.2
+TEST_IMG_FILE2=$TEST_IMG_FILE.2
 CLUSTER_SIZE=4096
-size=1024M
+size=128M
 
 _make_test_img $size
 io_pattern write 524288 $CLUSTER_SIZE $CLUSTER_SIZE 4 45
 
 # Compare identical images
-cp "$TEST_IMG" "${TEST_IMG2}"
+cp "$TEST_IMG_FILE" "${TEST_IMG_FILE2}"
 _compare
 _compare -q
 
 # Compare images with different size
-$QEMU_IMG resize -f $IMGFMT "$TEST_IMG" +512M
+if [ "$IMGOPTSSYNTAX" = "true" ]; then
+$QEMU_IMG resize $QEMU_IMG_EXTRA_ARGS "$TEST_IMG" +32M
+else
+$QEMU_IMG resize -f $IMGFMT "$TEST_IMG" +32M
+fi
+# Ensure extended space is zero-initialized
+$QEMU_IO 

[Qemu-devel] [PATCH v9 0/3] Tests for LUKS driver

2016-05-10 Thread Daniel P. Berrange
This series contains the 3 test suite patches that had to be dropped
from the v6 series during merge with the block tree:

 v6: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg04935.html
 v7: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06687.html
 v8: https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg02736.html

Changed in v9:

 - Use QEMU_IMG_EXTRA_ARGS in common.rc
 - Use 'write -z' instead of 'write -P 0'

Changed in v8:

 - Add conversion of 039.out & 137.out to new opts variable name
 - Fix 012 and 048 to not assume TEST_IMG_FILE is always set
 - Clarify why we don't change _qemu_img_wrapper in commit msg

Changed in v7:

 - Avoid setting TEST_IMG_FILE when IMGPROTO=file in common.rc
   for traditional (not --image-opts) variable setup

Daniel P. Berrange (3):
  block: add support for --image-opts in block I/O tests
  block: add support for encryption secrets in block I/O tests
  block: enable testing of LUKS driver with block I/O tests

 tests/qemu-iotests/004   |  2 +-
 tests/qemu-iotests/012   |  5 ++-
 tests/qemu-iotests/039.out   | 20 ++--
 tests/qemu-iotests/048   | 26 +++
 tests/qemu-iotests/048.out   |  6 ++--
 tests/qemu-iotests/052   |  4 +++
 tests/qemu-iotests/052.out   |  4 +++
 tests/qemu-iotests/061.out   |  8 ++---
 tests/qemu-iotests/100   |  7 
 tests/qemu-iotests/100.out   | 14 
 tests/qemu-iotests/137.out   |  4 +--
 tests/qemu-iotests/common| 15 -
 tests/qemu-iotests/common.config | 21 ++--
 tests/qemu-iotests/common.filter |  3 +-
 tests/qemu-iotests/common.rc | 69 ++--
 15 files changed, 160 insertions(+), 48 deletions(-)

-- 
2.5.5




[Qemu-devel] [PATCH v9 2/3] block: add support for encryption secrets in block I/O tests

2016-05-10 Thread Daniel P. Berrange
The LUKS block driver tests will require the ability to specify
encryption secrets with block devices. This requires using the
--object argument to qemu-img/qemu-io to create a 'secret'
object.

When the IMGKEYSECRET env variable is set, it provides the
password to be associated with a secret called 'keysec0'

The _qemu_img_wrapper function isn't modified as that needs
to cope with differing syntax for subcommands, so can't be
made to use the image opts syntax unconditionally.

Signed-off-by: Daniel P. Berrange 
---
 tests/qemu-iotests/common| 1 +
 tests/qemu-iotests/common.config | 6 ++
 tests/qemu-iotests/common.filter | 3 ++-
 tests/qemu-iotests/common.rc | 9 +++--
 4 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index fe3b1a0..e87287c 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -53,6 +53,7 @@ export QEMU_IO_OPTIONS=""
 export CACHEMODE_IS_DEFAULT=true
 export QEMU_OPTIONS="-nodefaults"
 export VALGRIND_QEMU=
+export IMGKEYSECRET=
 export IMGOPTSSYNTAX=false
 
 for r
diff --git a/tests/qemu-iotests/common.config b/tests/qemu-iotests/common.config
index ea698da..f6384fb 100644
--- a/tests/qemu-iotests/common.config
+++ b/tests/qemu-iotests/common.config
@@ -126,6 +126,9 @@ _qemu_io_wrapper()
 local QEMU_IO_ARGS="$QEMU_IO_OPTIONS"
 if [ "$IMGOPTSSYNTAX" = "true" ]; then
 QEMU_IO_ARGS="--image-opts $QEMU_IO_ARGS"
+if [ -n "$IMGKEYSECRET" ]; then
+QEMU_IO_ARGS="--object secret,id=keysec0,data=$IMGKEYSECRET 
$QEMU_IO_ARGS"
+fi
 fi
 local RETVAL
 (
@@ -161,6 +164,9 @@ export QEMU_NBD=_qemu_nbd_wrapper
 QEMU_IMG_EXTRA_ARGS=
 if [ "$IMGOPTSSYNTAX" = "true" ]; then
 QEMU_IMG_EXTRA_ARGS="--image-opts $QEMU_IMG_EXTRA_ARGS"
+if [ -n "$IMGKEYSECRET" ]; then
+QEMU_IMG_EXTRA_ARGS="--object secret,id=keysec0,data=$IMGKEYSECRET 
$QEMU_IMG_EXTRA_ARGS"
+fi
 fi
 export QEMU_IMG_EXTRA_ARGS
 
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 8a6e1b5..d8188a5 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -97,7 +97,8 @@ _filter_img_create()
 -e "s# block_state_zero=\\(on\\|off\\)##g" \
 -e "s# log_size=[0-9]\\+##g" \
 -e "s/archipelago:a/TEST_DIR\//g" \
--e "s# refcount_bits=[0-9]\\+##g"
+-e "s# refcount_bits=[0-9]\\+##g" \
+-e "s# key-secret=[a-zA-Z0-9]\\+##g"
 }
 
 _filter_img_info()
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 080f1bc..164792d 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -129,6 +129,7 @@ _make_test_img()
 local img_name=""
 local use_backing=0
 local backing_file=""
+local object_options=""
 
 if [ -n "$TEST_IMG_FILE" ]; then
 img_name=$TEST_IMG_FILE
@@ -139,6 +140,10 @@ _make_test_img()
 if [ -n "$IMGOPTS" ]; then
 optstr=$(_optstr_add "$optstr" "$IMGOPTS")
 fi
+if [ -n "$IMGKEYSECRET" ]; then
+object_options="--object secret,id=keysec0,data=$IMGKEYSECRET"
+optstr=$(_optstr_add "$optstr" "key-secret=keysec0")
+fi
 
 if [ "$1" = "-b" ]; then
 use_backing=1
@@ -156,9 +161,9 @@ _make_test_img()
 # XXX(hch): have global image options?
 (
  if [ $use_backing = 1 ]; then
-$QEMU_IMG create -f $IMGFMT $extra_img_options -b "$backing_file" 
"$img_name" $image_size 2>&1
+$QEMU_IMG create $object_options -f $IMGFMT $extra_img_options -b 
"$backing_file" "$img_name" $image_size 2>&1
  else
-$QEMU_IMG create -f $IMGFMT $extra_img_options "$img_name" $image_size 
2>&1
+$QEMU_IMG create $object_options -f $IMGFMT $extra_img_options 
"$img_name" $image_size 2>&1
  fi
 ) | _filter_img_create
 
-- 
2.5.5




[Qemu-devel] [PATCH 07/10] vhost-net: save & restore vhost-user acked features

2016-05-10 Thread marcandre . lureau
From: Marc-André Lureau 

The initial vhost-user connection sets the features to be negotiated
with the driver. Renegotiation isn't possible without device reset.

To handle reconnection of vhost-user backend, ensure the same set of
features are provided, and reuse already acked features.

Signed-off-by: Marc-André Lureau 
---
 hw/net/vhost_net.c   | 21 -
 include/net/vhost-user.h |  1 +
 include/net/vhost_net.h  |  3 +++
 net/vhost-user.c | 10 ++
 4 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 805a0df..f3df18c 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -120,6 +120,11 @@ uint64_t vhost_net_get_max_queues(VHostNetState *net)
 return net->dev.max_queues;
 }
 
+uint64_t vhost_net_get_acked_features(VHostNetState *net)
+{
+return net->dev.acked_features;
+}
+
 static int vhost_net_get_fd(NetClientState *backend)
 {
 switch (backend->info->type) {
@@ -136,6 +141,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
 int r;
 bool backend_kernel = options->backend_type == VHOST_BACKEND_TYPE_KERNEL;
 struct vhost_net *net = g_malloc(sizeof *net);
+uint64_t features = 0;
 
 if (!options->net_backend) {
 fprintf(stderr, "vhost-net requires net backend to be setup\n");
@@ -183,8 +189,21 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
 goto fail;
 }
 }
+
 /* Set sane init value. Override when guest acks. */
-vhost_net_ack_features(net, 0);
+if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
+features = vhost_user_get_acked_features(net->nc);
+if (~net->dev.features & features) {
+fprintf(stderr, "vhost lacks feature mask %" PRIu64
+" for backend\n",
+(uint64_t)(~net->dev.features & features));
+vhost_dev_cleanup(>dev);
+goto fail;
+}
+}
+
+vhost_net_ack_features(net, features);
+
 return net;
 fail:
 g_free(net);
diff --git a/include/net/vhost-user.h b/include/net/vhost-user.h
index 85109f6..efae35d 100644
--- a/include/net/vhost-user.h
+++ b/include/net/vhost-user.h
@@ -13,5 +13,6 @@
 
 struct vhost_net;
 struct vhost_net *vhost_user_get_vhost_net(NetClientState *nc);
+uint64_t vhost_user_get_acked_features(NetClientState *nc);
 
 #endif /* VHOST_USER_H_ */
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index 3389b41..0bd4877 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -31,4 +31,7 @@ int vhost_net_notify_migration_done(VHostNetState *net, char* 
mac_addr);
 VHostNetState *get_vhost_net(NetClientState *nc);
 
 int vhost_set_vring_enable(NetClientState * nc, int enable);
+
+uint64_t vhost_net_get_acked_features(VHostNetState *net);
+
 #endif
diff --git a/net/vhost-user.c b/net/vhost-user.c
index 41ddb4b..d72ce9b 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -23,6 +23,7 @@ typedef struct VhostUserState {
 CharDriverState *chr;
 VHostNetState *vhost_net;
 int watch;
+uint64_t acked_features;
 } VhostUserState;
 
 typedef struct VhostUserChardevProps {
@@ -37,6 +38,13 @@ VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
 return s->vhost_net;
 }
 
+uint64_t vhost_user_get_acked_features(NetClientState *nc)
+{
+VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
+return s->acked_features;
+}
+
 static int vhost_user_running(VhostUserState *s)
 {
 return (s->vhost_net) ? 1 : 0;
@@ -56,6 +64,8 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
 }
 
 if (s->vhost_net) {
+/* save acked features */
+s->acked_features = vhost_net_get_acked_features(s->vhost_net);
 vhost_net_cleanup(s->vhost_net);
 s->vhost_net = NULL;
 }
-- 
2.7.4




[Qemu-devel] [PATCH 06/10] vhost-net: do not crash if backend is not present

2016-05-10 Thread marcandre . lureau
From: Marc-André Lureau 

Do not crash when backend is not present while enabling the ring. A
following patch will save the enabled state so it can be restored once
the backend is started.

Signed-off-by: Marc-André Lureau 
---
 hw/net/vhost_net.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 6e1032f..805a0df 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -401,8 +401,13 @@ VHostNetState *get_vhost_net(NetClientState *nc)
 int vhost_set_vring_enable(NetClientState *nc, int enable)
 {
 VHostNetState *net = get_vhost_net(nc);
-const VhostOps *vhost_ops = net->dev.vhost_ops;
+const VhostOps *vhost_ops;
+
+if (!net) {
+return 0;
+}
 
+vhost_ops = net->dev.vhost_ops;
 if (vhost_ops->vhost_set_vring_enable) {
 return vhost_ops->vhost_set_vring_enable(>dev, enable);
 }
-- 
2.7.4




[Qemu-devel] [PATCH 10/10] test: start vhost-user reconnect test

2016-05-10 Thread marcandre . lureau
From: Marc-André Lureau 

This is a simple reconnect test, that simply checks if vhost-user
reconnection is possible and restore the state. A more complete test
would actually manipulate and check the ring contents (such extended
testing would benefit from the libvhost-user proposed in QEMU list to
avoid duplication of ring manipulations)

Signed-off-by: Marc-André Lureau 
---
 tests/vhost-user-test.c | 123 +---
 1 file changed, 106 insertions(+), 17 deletions(-)

diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c
index 6961596..1734a2b 100644
--- a/tests/vhost-user-test.c
+++ b/tests/vhost-user-test.c
@@ -34,7 +34,7 @@
 #define QEMU_CMD_ACCEL  " -machine accel=tcg"
 #define QEMU_CMD_MEM" -m %d -object memory-backend-file,id=mem,size=%dM,"\
 "mem-path=%s,share=on -numa node,memdev=mem"
-#define QEMU_CMD_CHR" -chardev socket,id=%s,path=%s"
+#define QEMU_CMD_CHR" -chardev socket,id=%s,path=%s%s"
 #define QEMU_CMD_NETDEV " -netdev vhost-user,id=net0,chardev=%s,vhostforce"
 #define QEMU_CMD_NET" -device 
virtio-net-pci,netdev=net0,romfile=./pc-bios/pxe-virtio.rom"
 
@@ -246,7 +246,12 @@ static void chr_read(void *opaque, const uint8_t *buf, int 
size)
 
 if (msg.size) {
 p += VHOST_USER_HDR_SIZE;
-g_assert_cmpint(qemu_chr_fe_read_all(chr, p, msg.size), ==, msg.size);
+size = qemu_chr_fe_read_all(chr, p, msg.size);
+if (size != msg.size) {
+g_test_message("Wrong message size received %d != %d\n",
+   size, msg.size);
+return;
+}
 }
 
 switch (msg.request) {
@@ -363,17 +368,10 @@ static const char *init_hugepagefs(const char *path)
 static TestServer *test_server_new(const gchar *name)
 {
 TestServer *server = g_new0(TestServer, 1);
-gchar *chr_path;
 
 server->socket_path = g_strdup_printf("%s/%s.sock", tmpfs, name);
 server->mig_path = g_strdup_printf("%s/%s.mig", tmpfs, name);
-
-chr_path = g_strdup_printf("unix:%s,server,nowait", server->socket_path);
 server->chr_name = g_strdup_printf("chr-%s", name);
-server->chr = qemu_chr_new(server->chr_name, chr_path, NULL);
-g_free(chr_path);
-
-qemu_chr_add_handlers(server->chr, chr_can_read, chr_read, NULL, server);
 
 g_mutex_init(>data_mutex);
 g_cond_init(>data_cond);
@@ -383,13 +381,34 @@ static TestServer *test_server_new(const gchar *name)
 return server;
 }
 
-#define GET_QEMU_CMD(s)
\
-g_strdup_printf(QEMU_CMD, 512, 512, (root), (s)->chr_name, 
\
-(s)->socket_path, (s)->chr_name)
+static void test_server_create_chr(TestServer *server, const gchar *opt)
+{
+gchar *chr_path;
+
+chr_path = g_strdup_printf("unix:%s%s", server->socket_path, opt);
+server->chr = qemu_chr_new(server->chr_name, chr_path, NULL);
+g_free(chr_path);
 
-#define GET_QEMU_CMDE(s, mem, extra, ...)  
\
-g_strdup_printf(QEMU_CMD extra, (mem), (mem), (root), (s)->chr_name,   
\
-(s)->socket_path, (s)->chr_name, ##__VA_ARGS__)
+qemu_chr_add_handlers(server->chr, chr_can_read, chr_read, NULL, server);
+}
+
+static void test_server_listen(TestServer *server)
+{
+test_server_create_chr(server, ",server,nowait");
+}
+
+static void test_server_connect(TestServer *server)
+{
+test_server_create_chr(server, ",reconnect=1");
+}
+
+#define GET_QEMU_CMD(s) \
+g_strdup_printf(QEMU_CMD, 512, 512, (root), (s)->chr_name,  \
+(s)->socket_path, "", (s)->chr_name)
+
+#define GET_QEMU_CMDE(s, mem, chr_opts, extra, ...) \
+g_strdup_printf(QEMU_CMD extra, (mem), (mem), (root), (s)->chr_name, \
+(s)->socket_path, (chr_opts), (s)->chr_name, ##__VA_ARGS__)
 
 static gboolean _test_server_free(TestServer *server)
 {
@@ -537,7 +556,10 @@ static void test_migrate(void)
 guint8 *log;
 guint64 size;
 
-cmd = GET_QEMU_CMDE(s, 2, "");
+test_server_listen(s);
+test_server_listen(dest);
+
+cmd = GET_QEMU_CMDE(s, 2, "", "");
 from = qtest_start(cmd);
 g_free(cmd);
 
@@ -545,7 +567,7 @@ static void test_migrate(void)
 size = get_log_size(s);
 g_assert_cmpint(size, ==, (2 * 1024 * 1024) / (VHOST_LOG_PAGE * 8));
 
-cmd = GET_QEMU_CMDE(dest, 2, " -incoming %s", uri);
+cmd = GET_QEMU_CMDE(dest, 2, "", " -incoming %s", uri);
 to = qtest_init(cmd);
 g_free(cmd);
 
@@ -605,6 +627,71 @@ static void test_migrate(void)
 global_qtest = global;
 }
 
+static void wait_for_rings_started(TestServer *s, size_t count)
+{
+gint64 end_time;
+
+g_mutex_lock(>data_mutex);
+end_time = g_get_monotonic_time() + 5 * G_TIME_SPAN_SECOND;
+while (ctpop64(s->rings) != count) {
+if 

[Qemu-devel] [PATCH 05/10] vhost-user: disconnect on start failure

2016-05-10 Thread marcandre . lureau
From: Marc-André Lureau 

If the backend failed to start (for example feature negociation failed),
do not exit, but disconnect the char device instead. Slightly more
robust for reconnect case.

Signed-off-by: Marc-André Lureau 
---
 net/vhost-user.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 4a7fd5f..41ddb4b 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -202,7 +202,8 @@ static void net_vhost_user_event(void *opaque, int event)
 s->watch = qemu_chr_fe_add_watch(s->chr, G_IO_HUP,
  net_vhost_user_watch, s);
 if (vhost_user_start(queues, ncs) < 0) {
-exit(1);
+qemu_chr_disconnect(s->chr);
+return;
 }
 qmp_set_link(name, true, );
 break;
-- 
2.7.4




[Qemu-devel] [PATCH 08/10] vhost-net: save & restore vring enable state

2016-05-10 Thread marcandre . lureau
From: Marc-André Lureau 

A driver may change the vring enable state at run time but vhost-user
backend may not be present (a contrived example is when the backend is
disconnected and the device is reconfigured after driver rebinding)

Restore the vring state when the vhost-user backend is started, so it
can process the ring.

Signed-off-by: Marc-André Lureau 
---
 hw/net/vhost_net.c | 11 +++
 include/net/net.h  |  1 +
 2 files changed, 12 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index f3df18c..722e3f3 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -329,6 +329,15 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 if (r < 0) {
 goto err_start;
 }
+
+if (ncs[i].peer->vring_enable) {
+/* restore vring enable state */
+r = vhost_set_vring_enable(ncs[i].peer, ncs[i].peer->vring_enable);
+
+if (r < 0) {
+goto err_start;
+}
+}
 }
 
 return 0;
@@ -422,6 +431,8 @@ int vhost_set_vring_enable(NetClientState *nc, int enable)
 VHostNetState *net = get_vhost_net(nc);
 const VhostOps *vhost_ops;
 
+nc->vring_enable = enable;
+
 if (!net) {
 return 0;
 }
diff --git a/include/net/net.h b/include/net/net.h
index 73e4c46..2c4b23a 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -92,6 +92,7 @@ struct NetClientState {
 NetClientDestructor *destructor;
 unsigned int queue_index;
 unsigned rxfilter_notify_enabled:1;
+int vring_enable;
 QTAILQ_HEAD(NetFilterHead, NetFilterState) filters;
 };
 
-- 
2.7.4




[Qemu-devel] [PATCH 04/10] qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd

2016-05-10 Thread marcandre . lureau
From: Tetsuya Mukawa 

The patch introduces qemu_chr_disconnect(). The function is used for
closing a fd accepted by listen fd. Though we already have qemu_chr_delete(),
but it closes not only accepted fd but also listen fd. This new function
is used when we still want to keep listen fd.

Signed-off-by: Tetsuya Mukawa 
Reviewed-by: Marc-André Lureau 
---
 include/sysemu/char.h | 7 +++
 qemu-char.c   | 8 
 2 files changed, 15 insertions(+)

diff --git a/include/sysemu/char.h b/include/sysemu/char.h
index 307fd8f..2c39987 100644
--- a/include/sysemu/char.h
+++ b/include/sysemu/char.h
@@ -75,6 +75,7 @@ struct CharDriverState {
 IOReadHandler *chr_read;
 void *handler_opaque;
 void (*chr_close)(struct CharDriverState *chr);
+void (*chr_disconnect)(struct CharDriverState *chr);
 void (*chr_accept_input)(struct CharDriverState *chr);
 void (*chr_set_echo)(struct CharDriverState *chr, bool echo);
 void (*chr_set_fe_open)(struct CharDriverState *chr, int fe_open);
@@ -143,6 +144,12 @@ void qemu_chr_parse_common(QemuOpts *opts, ChardevCommon 
*backend);
  */
 CharDriverState *qemu_chr_new(const char *label, const char *filename,
   void (*init)(struct CharDriverState *s));
+/**
+ * @qemu_chr_disconnect:
+ *
+ * Close a fd accpeted by character backend.
+ */
+void qemu_chr_disconnect(CharDriverState *chr);
 
 /**
  * @qemu_chr_new_noreplay:
diff --git a/qemu-char.c b/qemu-char.c
index b597ee1..6efbc0d 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -4012,6 +4012,13 @@ void qemu_chr_fe_release(CharDriverState *s)
 s->avail_connections++;
 }
 
+void qemu_chr_disconnect(CharDriverState *chr)
+{
+if (chr->chr_disconnect) {
+chr->chr_disconnect(chr);
+}
+}
+
 static void qemu_chr_free_common(CharDriverState *chr)
 {
 g_free(chr->filename);
@@ -4405,6 +4412,7 @@ static CharDriverState *qmp_chardev_open_socket(const 
char *id,
 chr->chr_write = tcp_chr_write;
 chr->chr_sync_read = tcp_chr_sync_read;
 chr->chr_close = tcp_chr_close;
+chr->chr_disconnect = tcp_chr_disconnect;
 chr->get_msgfds = tcp_get_msgfds;
 chr->set_msgfds = tcp_set_msgfds;
 chr->chr_add_client = tcp_chr_add_client;
-- 
2.7.4




[Qemu-devel] [PATCH 02/10] vubr: add client mode

2016-05-10 Thread marcandre . lureau
From: Marc-André Lureau 

If -c is specified, vubr will try to connect to the socket instead of
listening for connections.

Signed-off-by: Marc-André Lureau 
---
 tests/vhost-user-bridge.c | 38 ++
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
index 0779ba2..f907ce7 100644
--- a/tests/vhost-user-bridge.c
+++ b/tests/vhost-user-bridge.c
@@ -1204,12 +1204,13 @@ vubr_accept_cb(int sock, void *ctx)
 }
 
 static VubrDev *
-vubr_new(const char *path)
+vubr_new(const char *path, bool client)
 {
 VubrDev *dev = (VubrDev *) calloc(1, sizeof(VubrDev));
 dev->nregions = 0;
 int i;
 struct sockaddr_un un;
+CallbackFunc cb;
 size_t len;
 
 for (i = 0; i < MAX_NR_VIRTQUEUE; i++) {
@@ -1238,19 +1239,27 @@ vubr_new(const char *path)
 un.sun_family = AF_UNIX;
 strcpy(un.sun_path, path);
 len = sizeof(un.sun_family) + strlen(path);
-unlink(path);
 
-if (bind(dev->sock, (struct sockaddr *) , len) == -1) {
-vubr_die("bind");
-}
+if (!client) {
+unlink(path);
+
+if (bind(dev->sock, (struct sockaddr *) , len) == -1) {
+vubr_die("bind");
+}
 
-if (listen(dev->sock, 1) == -1) {
-vubr_die("listen");
+if (listen(dev->sock, 1) == -1) {
+vubr_die("listen");
+}
+cb = vubr_accept_cb;
+} else {
+if (connect(dev->sock, (struct sockaddr *), len) == -1) {
+vubr_die("connect");
+}
+cb = vubr_receive_cb;
 }
 
 dispatcher_init(>dispatcher);
-dispatcher_add(>dispatcher, dev->sock, (void *)dev,
-   vubr_accept_cb);
+dispatcher_add(>dispatcher, dev->sock, (void *)dev, cb);
 
 DPRINT("Waiting for connections on UNIX socket %s ...\n", path);
 return dev;
@@ -1369,8 +1378,9 @@ main(int argc, char *argv[])
 {
 VubrDev *dev;
 int opt;
+bool client = false;
 
-while ((opt = getopt(argc, argv, "l:r:u:")) != -1) {
+while ((opt = getopt(argc, argv, "l:r:u:c")) != -1) {
 
 switch (opt) {
 case 'l':
@@ -1386,16 +1396,20 @@ main(int argc, char *argv[])
 case 'u':
 ud_socket_path = strdup(optarg);
 break;
+case 'c':
+client = true;
+break;
 default:
 goto out;
 }
 }
 
-DPRINT("ud socket: %s\n", ud_socket_path);
+DPRINT("ud socket: %s (%s)\n", ud_socket_path,
+   client ? "client" : "server");
 DPRINT("local: %s:%s\n", lhost, lport);
 DPRINT("remote:%s:%s\n", rhost, rport);
 
-dev = vubr_new(ud_socket_path);
+dev = vubr_new(ud_socket_path, client);
 if (!dev) {
 return 1;
 }
-- 
2.7.4




[Qemu-devel] [PATCH 09/10] tests: append i386 tests

2016-05-10 Thread marcandre . lureau
From: Marc-André Lureau 

Do not overwrite x86-64 tests, re-enable vhost-user-test.

Signed-off-by: Marc-André Lureau 
---
 tests/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/Makefile b/tests/Makefile
index 9194f18..50a99f8 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -224,7 +224,7 @@ endif
 check-qtest-i386-y += tests/test-netfilter$(EXESUF)
 check-qtest-i386-y += tests/test-filter-mirror$(EXESUF)
 check-qtest-i386-y += tests/test-filter-redirector$(EXESUF)
-check-qtest-x86_64-y = $(check-qtest-i386-y)
+check-qtest-x86_64-y += $(check-qtest-i386-y)
 gcov-files-i386-y += i386-softmmu/hw/timer/mc146818rtc.c
 gcov-files-x86_64-y = $(subst 
i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y))
 check-qtest-mips-y = tests/endianness-test$(EXESUF)
-- 
2.7.4




[Qemu-devel] [PATCH 03/10] vubr: workaround stale vring base

2016-05-10 Thread marcandre . lureau
From: Marc-André Lureau 

This patch is a similar solution to what Yuanhan Liu/Huawei Xie have
suggested for DPDK. When vubr quits (killed or crashed), a restart of
vubr would get stale vring base from QEMU. That would break the kernel
virtio net completely, making it non-work any more, unless a driver
reset is done.

So, instead of getting the stale vring base from QEMU, Huawei suggested
we could get a proper one from used->idx. This works because the queues
packets are processed in order.

Signed-off-by: Marc-André Lureau 
---
 tests/vhost-user-bridge.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
index f907ce7..38963e4 100644
--- a/tests/vhost-user-bridge.c
+++ b/tests/vhost-user-bridge.c
@@ -946,6 +946,13 @@ vubr_set_vring_addr_exec(VubrDev *dev, VhostUserMsg *vmsg)
 DPRINT("vring_avail at %p\n", vq->avail);
 
 vq->last_used_index = vq->used->idx;
+
+if (vq->last_avail_index != vq->used->idx) {
+DPRINT("Last avail index != used index: %d != %d, resuming",
+   vq->last_avail_index, vq->used->idx);
+vq->last_avail_index = vq->used->idx;
+}
+
 return 0;
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH 00/10] RFCv3: vhost-user: simple reconnection support

2016-05-10 Thread marcandre . lureau
From: Marc-André Lureau 

Hi,

In a previous series "RFCv2: vhost-user: shutdown and reconnection", I
proposed to add a new slave request to handle graceful shutdown, for
both qemu configuration, server or client, while keeping the guest
running with link down status.

However, for the simple case where qemu is configured as server, and
the backend processes packets in order and disconnects, it is possible
for the backend to recover after reconnection by discarding the qemu
SET_VRING_BASE value and resuming from the used->index instead. This
simplifies the reconnection support in this particular situation (it
would make sense to have the backend declare this behaviour with an
extra flag)

The guest won't be notified of link status change and queues may not
be processed in a timely manner, also qemu may assert if some
vhost-user commands are happening while the backend is disconnected.
So the reconnection must happen "quickly enough" here. In order to
keep the series relatively small, these further problems will be
addressed later.

These series demonstrates a simple reconnect support for vubr
(configured as client and qemu as server), includes some nice to
have fixes and a simple test.

Marc-André Lureau (8):
  vubr: add client mode
  vubr: workaround stale vring base
  vhost-user: disconnect on start failure
  vhost-net: do not crash if backend is not present
  vhost-net: save & restore vhost-user acked features
  vhost-net: save & restore vring enable state
  tests: append i386 tests
  test: start vhost-user reconnect test

Tetsuya Mukawa (2):
  vhost-user: add ability to know vhost-user backend disconnection
  qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd

 hw/net/vhost_net.c|  39 ++-
 include/net/net.h |   1 +
 include/net/vhost-user.h  |   1 +
 include/net/vhost_net.h   |   3 ++
 include/sysemu/char.h |   7 +++
 net/vhost-user.c  |  32 +++-
 qemu-char.c   |   8 +++
 tests/Makefile|   2 +-
 tests/vhost-user-bridge.c |  45 -
 tests/vhost-user-test.c   | 123 +++---
 10 files changed, 228 insertions(+), 33 deletions(-)

-- 
2.7.4




Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu

2016-05-10 Thread Neo Jia
On Tue, May 10, 2016 at 03:52:27PM +0800, Jike Song wrote:
> On 05/05/2016 05:27 PM, Tian, Kevin wrote:
> >> From: Song, Jike
> >>
> >> IIUC, an api-only domain is a VFIO domain *without* underlying IOMMU
> >> hardware. It just, as you said in another mail, "rather than
> >> programming them into an IOMMU for a device, it simply stores the
> >> translations for use by later requests".
> >>
> >> That imposes a constraint on gfx driver: hardware IOMMU must be disabled.
> >> Otherwise, if IOMMU is present, the gfx driver eventually programs
> >> the hardware IOMMU with IOVA returned by pci_map_page or dma_map_page;
> >> Meanwhile, the IOMMU backend for vgpu only maintains GPA <-> HPA
> >> translations without any knowledge about hardware IOMMU, how is the
> >> device model supposed to do to get an IOVA for a given GPA (thereby HPA
> >> by the IOMMU backend here)?
> >>
> >> If things go as guessed above, as vfio_pin_pages() indicates, it
> >> pin & translate vaddr to PFN, then it will be very difficult for the
> >> device model to figure out:
> >>
> >>1, for a given GPA, how to avoid calling dma_map_page multiple times?
> >>2, for which page to call dma_unmap_page?
> >>
> >> --
> > 
> > We have to support both w/ iommu and w/o iommu case, since
> > that fact is out of GPU driver control. A simple way is to use
> > dma_map_page which internally will cope with w/ and w/o iommu
> > case gracefully, i.e. return HPA w/o iommu and IOVA w/ iommu.
> > Then in this file we only need to cache GPA to whatever dmadr_t
> > returned by dma_map_page.
> > 
> 
> Hi Alex, Kirti and Neo, any thought on the IOMMU compatibility here?

Hi Jike,

With mediated passthru, you still can use hardware iommu, but more important
that part is actually orthogonal to what we are discussing here as we will only
cache the mapping between , once we 
have pinned pages later with the help of above info, you can map it into the
proper iommu domain if the system has configured so.

Thanks,
Neo

> 
> --
> Thanks,
> Jike
> 



Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Quentin Casasnovas
On Tue, May 10, 2016 at 04:49:57PM +0100, Alex Bligh wrote:
> 
> On 10 May 2016, at 16:45, Quentin Casasnovas  
> wrote:
> 
> > I'm by no mean an expert in this, but why would the kernel break up those
> > TRIM commands?  After all, breaking things up makes sense when the length
> > of the request is big, not that much when it only contains the request
> > header, which is the case for TRIM commands.
> 
> 1. You are assuming that the only reason for limiting the size of operations 
> is
>limiting the data transferred within one request. That is not necessarily
>the case. There are good reasons (if only orthogonality) to have limits
>in place even where no data is transferred.
> 
> 2. As and when the blocksize extension is implemented in the kernel (it isn't
>now), the protocol requires it.
> 
> 3. The maximum length of an NBD trim operation is 2^32. The maximum length
>of a trim operation is larger. Therefore the kernel needs to do at least
>some breaking up.

Alright, I assumed by 'length of an NBD request', the specification was
talking about the length of.. well, the request as opposed to whatever is
in the length field of the request header.

Is there a use case where you'd want to split up a single big TRIM request
in smaller ones (as in some hardware would not support it or something)?
Even then, it looks like this splitting up would be hardware dependant and
better implemented in block device drivers.

I'm just finding odd that something that fits inside the length field can't
be used.  I do agree with your point number 3, obviously if the lenght
field type doesn't allow something bigger than a u32, then the kernel has
to do some breaking up in that case.

Quentin



Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Alex Bligh

On 10 May 2016, at 16:46, Eric Blake  wrote:

> Does anyone have an easy way to cause the kernel to request a trim
> operation that large on a > 4G export?  I'm not familiar enough with
> EXT4 operation to know what file system operations you can run to
> ultimately indirectly create a file system trim operation that large.
> But maybe there is something simpler - does the kernel let you use the
> fallocate(2) syscall operation with FALLOC_FL_PUNCH_HOLE or
> FALLOC_FL_ZERO_RANGE on an fd backed by an NBD device?

Not tried it, but fallocate(1) with -p ?

http://man7.org/linux/man-pages/man1/fallocate.1.html

As it takes length and offset in TB, PB, EB, ZB and YB, it
seems to be 64 bit aware :-)

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Quentin Casasnovas
On Tue, May 10, 2016 at 09:46:36AM -0600, Eric Blake wrote:
> On 05/10/2016 09:41 AM, Alex Bligh wrote:
> > 
> > On 10 May 2016, at 16:29, Eric Blake  wrote:
> > 
> >> So the kernel is currently one of the clients that does NOT honor block
> >> sizes, and as such, servers should be prepared for ANY size up to
> >> UINT_MAX (other than DoS handling).
> > 
> > Interesting followup question:
> > 
> > If the kernel does not fragment TRIM requests at all (in the
> > same way it fragments read and write requests), I suspect
> > something bad may happen with TRIM requests over 2^31
> > in size (particularly over 2^32 in size), as the length
> > field in nbd only has 32 bits.
> > 
> > Whether it supports block size constraints or not, it is
> > going to need to do *some* breaking up of requests.
> 
> Does anyone have an easy way to cause the kernel to request a trim
> operation that large on a > 4G export?  I'm not familiar enough with
> EXT4 operation to know what file system operations you can run to
> ultimately indirectly create a file system trim operation that large.
> But maybe there is something simpler - does the kernel let you use the
> fallocate(2) syscall operation with FALLOC_FL_PUNCH_HOLE or
> FALLOC_FL_ZERO_RANGE on an fd backed by an NBD device?
> 

It was fairly reproducible here, we just used a random qcow2 image with
some Debian minimal system pre-installed, mounted that qcow2 image through
qemu-nbd then compiled a whole kernel inside it.  Then you can make clean
and run fstrim on the mount point.  I'm assuming you can go faster than
that by just writing a big file to the qcow2 image mounted without -o
discard, delete the big file, then remount with -o discard + run fstrim.

Quentin



Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Alex Bligh

On 10 May 2016, at 16:45, Quentin Casasnovas  
wrote:

> I'm by no mean an expert in this, but why would the kernel break up those
> TRIM commands?  After all, breaking things up makes sense when the length
> of the request is big, not that much when it only contains the request
> header, which is the case for TRIM commands.

1. You are assuming that the only reason for limiting the size of operations is
   limiting the data transferred within one request. That is not necessarily
   the case. There are good reasons (if only orthogonality) to have limits
   in place even where no data is transferred.

2. As and when the blocksize extension is implemented in the kernel (it isn't
   now), the protocol requires it.

3. The maximum length of an NBD trim operation is 2^32. The maximum length
   of a trim operation is larger. Therefore the kernel needs to do at least
   some breaking up.

-- 
Alex Bligh







[Qemu-devel] [PATCH 5/5] cpu-exec: Remove unused 'x86_cpu' and 'env' from cpu_exec()

2016-05-10 Thread Sergey Fedorov
From: Sergey Fedorov 

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
---
 cpu-exec.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 60f565074618..2526df0e3e40 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -571,10 +571,6 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, 
TranslationBlock *tb,
 int cpu_exec(CPUState *cpu)
 {
 CPUClass *cc = CPU_GET_CLASS(cpu);
-#ifdef TARGET_I386
-X86CPU *x86_cpu = X86_CPU(cpu);
-CPUArchState *env = _cpu->env;
-#endif
 int ret;
 SyncClocks sc;
 
@@ -630,18 +626,10 @@ int cpu_exec(CPUState *cpu)
  * Newer versions of gcc would complain about this code 
(-Wclobbered). */
 cpu = current_cpu;
 cc = CPU_GET_CLASS(cpu);
-#ifdef TARGET_I386
-x86_cpu = X86_CPU(cpu);
-env = _cpu->env;
-#endif
 #else /* buggy compiler */
 /* Assert that the compiler does not smash local variables. */
 g_assert(cpu == current_cpu);
 g_assert(cc == CPU_GET_CLASS(cpu));
-#ifdef TARGET_I386
-g_assert(x86_cpu == X86_CPU(cpu));
-g_assert(env == _cpu->env);
-#endif
 #endif /* buggy compiler */
 cpu->can_do_io = 1;
 tb_lock_reset();
-- 
1.9.1




Re: [Qemu-devel] [PATCH 0/5] cpu-exec: Restructure cpu_exec()

2016-05-10 Thread Sergey Fedorov
On 10/05/16 18:46, Sergey Fedorov wrote:
> From: Sergey Fedorov 
>
> cpu_exec() was a huge function also sprinkled with some preprocessor
> directives. It was hard to read and see the main loop crowded by all
> this code. Restructure cpu_exec() by moving its conceptual parts into
> separate static inline functions. That makes it possible to see the
> whole main loop at once, especially its sigsetjmp() handling part.
>
> This series is based on commit
> 40f646483a11 (cpu-exec: Remove relic orphaned comment)
> from
> git://github.com/rth7680/qemu.git tcg-next
> and is available at
> git://github.com/sergefdrv/qemu.git cpu_exec-restructure
>
> Sergey Fedorov (5):
>   cpu-exec: Move halt handling out of cpu_exec()
>   cpu-exec: Move exception handling out of cpu_exec()
>   cpu-exec: Move interrupt handling out of cpu_exec()
>   cpu-exec: Move TB execution stuff out of cpu_exec()
>   cpu-exec: Remove unused 'x86_cpu' and 'env' from cpu_exec()
>
>  cpu-exec.c | 395 
> +
>  1 file changed, 211 insertions(+), 184 deletions(-)
>

Sorry, +CC.

Regards,
Sergey



[Qemu-devel] [PATCH] qemu-options.hx: Specify the units for -machine kvm_shadow_mem

2016-05-10 Thread Peter Maydell
The -machine kvm_shadow_mem option takes a size in bytes; say
so explicitly in its documentation.

Signed-off-by: Peter Maydell 
Reported-by: Tobi (github.com/tobimensch)
---
 qemu-options.hx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 6106520..0b86d9f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -35,7 +35,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
 "kernel_irqchip=on|off controls accelerated irqchip 
support\n"
 "kernel_irqchip=on|off|split controls accelerated irqchip 
support (default=off)\n"
 "vmport=on|off|auto controls emulation of vmport (default: 
auto)\n"
-"kvm_shadow_mem=size of KVM shadow MMU\n"
+"kvm_shadow_mem=size of KVM shadow MMU in bytes\n"
 "dump-guest-core=on|off include guest memory in a core 
dump (default=on)\n"
 "mem-merge=on|off controls memory merge support (default: 
on)\n"
 "iommu=on|off controls emulated Intel IOMMU (VT-d) support 
(default=off)\n"
-- 
1.9.1




[Qemu-devel] [PATCH 1/5] cpu-exec: Move halt handling out of cpu_exec()

2016-05-10 Thread Sergey Fedorov
From: Sergey Fedorov 

Simplify cpu_exec() by extracting CPU halt state handling code out of
cpu_exec() into a new static inline function cpu_handle_halt().

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
---
 cpu-exec.c | 39 +--
 1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index d55faa5114c7..fb72f102742c 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -352,6 +352,29 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
 return tb;
 }
 
+static inline bool cpu_handle_halt(CPUState *cpu)
+{
+if (cpu->halted) {
+#if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
+X86CPU *x86_cpu = X86_CPU(cpu);
+
+if ((cpu->interrupt_request & CPU_INTERRUPT_POLL)
+&& replay_interrupt()) {
+apic_poll_irq(x86_cpu->apic_state);
+cpu_reset_interrupt(cpu, CPU_INTERRUPT_POLL);
+}
+#endif
+if (!cpu_has_work(cpu)) {
+current_cpu = NULL;
+return true;
+}
+
+cpu->halted = 0;
+}
+
+return false;
+}
+
 static void cpu_handle_debug_exception(CPUState *cpu)
 {
 CPUClass *cc = CPU_GET_CLASS(cpu);
@@ -383,20 +406,8 @@ int cpu_exec(CPUState *cpu)
 /* replay_interrupt may need current_cpu */
 current_cpu = cpu;
 
-if (cpu->halted) {
-#if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
-if ((cpu->interrupt_request & CPU_INTERRUPT_POLL)
-&& replay_interrupt()) {
-apic_poll_irq(x86_cpu->apic_state);
-cpu_reset_interrupt(cpu, CPU_INTERRUPT_POLL);
-}
-#endif
-if (!cpu_has_work(cpu)) {
-current_cpu = NULL;
-return EXCP_HALTED;
-}
-
-cpu->halted = 0;
+if (cpu_handle_halt(cpu)) {
+return EXCP_HALTED;
 }
 
 atomic_mb_set(_current_cpu, cpu);
-- 
1.9.1




[Qemu-devel] [PATCH 4/5] cpu-exec: Move TB execution stuff out of cpu_exec()

2016-05-10 Thread Sergey Fedorov
From: Sergey Fedorov 

Simplify cpu_exec() by extracting TB execution code outside of
cpu_exec() into a new static inline function cpu_loop_exec_tb().

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
---
 cpu-exec.c | 119 +
 1 file changed, 64 insertions(+), 55 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 0760d5dc274d..60f565074618 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -506,6 +506,66 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
 }
 }
 
+static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
+TranslationBlock **last_tb, int *tb_exit,
+SyncClocks *sc)
+{
+uintptr_t ret;
+
+if (unlikely(cpu->exit_request)) {
+return;
+}
+
+trace_exec_tb(tb, tb->pc);
+ret = cpu_tb_exec(cpu, tb);
+*last_tb = (TranslationBlock *)(ret & ~TB_EXIT_MASK);
+*tb_exit = ret & TB_EXIT_MASK;
+switch (*tb_exit) {
+case TB_EXIT_REQUESTED:
+/* Something asked us to stop executing
+ * chained TBs; just continue round the main
+ * loop. Whatever requested the exit will also
+ * have set something else (eg exit_request or
+ * interrupt_request) which we will handle
+ * next time around the loop.  But we need to
+ * ensure the tcg_exit_req read in generated code
+ * comes before the next read of cpu->exit_request
+ * or cpu->interrupt_request.
+ */
+smp_rmb();
+*last_tb = NULL;
+break;
+case TB_EXIT_ICOUNT_EXPIRED:
+{
+/* Instruction counter expired.  */
+#ifdef CONFIG_USER_ONLY
+abort();
+#else
+int insns_left = cpu->icount_decr.u32;
+if (cpu->icount_extra && insns_left >= 0) {
+/* Refill decrementer and continue execution.  */
+cpu->icount_extra += insns_left;
+insns_left = MIN(0x, cpu->icount_extra);
+cpu->icount_extra -= insns_left;
+cpu->icount_decr.u16.low = insns_left;
+} else {
+if (insns_left > 0) {
+/* Execute remaining instructions.  */
+cpu_exec_nocache(cpu, insns_left, *last_tb, false);
+align_clocks(sc, cpu);
+}
+cpu->exception_index = EXCP_INTERRUPT;
+*last_tb = NULL;
+cpu_loop_exit(cpu);
+}
+break;
+#endif
+}
+default:
+break;
+}
+}
+
 /* main execution loop */
 
 int cpu_exec(CPUState *cpu)
@@ -516,8 +576,6 @@ int cpu_exec(CPUState *cpu)
 CPUArchState *env = _cpu->env;
 #endif
 int ret;
-TranslationBlock *tb, *last_tb;
-int tb_exit = 0;
 SyncClocks sc;
 
 /* replay_interrupt may need current_cpu */
@@ -544,6 +602,9 @@ int cpu_exec(CPUState *cpu)
 init_delay_params(, cpu);
 
 for(;;) {
+TranslationBlock *tb, *last_tb;
+int tb_exit = 0;
+
 /* prepare setjmp context for exception handling */
 if (sigsetjmp(cpu->jmp_env, 0) == 0) {
 /* if an exception is pending, we execute it here */
@@ -556,59 +617,7 @@ int cpu_exec(CPUState *cpu)
 for(;;) {
 cpu_handle_interrupt(cpu, _tb);
 tb = tb_find_fast(cpu, _tb, tb_exit);
-if (likely(!cpu->exit_request)) {
-uintptr_t ret;
-trace_exec_tb(tb, tb->pc);
-/* execute the generated code */
-ret = cpu_tb_exec(cpu, tb);
-last_tb = (TranslationBlock *)(ret & ~TB_EXIT_MASK);
-tb_exit = ret & TB_EXIT_MASK;
-switch (tb_exit) {
-case TB_EXIT_REQUESTED:
-/* Something asked us to stop executing
- * chained TBs; just continue round the main
- * loop. Whatever requested the exit will also
- * have set something else (eg exit_request or
- * interrupt_request) which we will handle
- * next time around the loop.  But we need to
- * ensure the tcg_exit_req read in generated code
- * comes before the next read of cpu->exit_request
- * or cpu->interrupt_request.
- */
-smp_rmb();
-last_tb = NULL;
-break;
-case TB_EXIT_ICOUNT_EXPIRED:
-{
-/* Instruction counter expired.  */
-#ifdef CONFIG_USER_ONLY
-abort();
-#else
-int insns_left = cpu->icount_decr.u32;
-if (cpu->icount_extra && insns_left >= 0) {
- 

[Qemu-devel] [PATCH 3/5] cpu-exec: Move interrupt handling out of cpu_exec()

2016-05-10 Thread Sergey Fedorov
From: Sergey Fedorov 

Simplify cpu_exec() by extracting interrupt handling code outside of
cpu_exec() into a new static inline function cpu_handle_interrupt().

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
---
 cpu-exec.c | 132 -
 1 file changed, 70 insertions(+), 62 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index ad95ace38dd9..0760d5dc274d 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -438,6 +438,74 @@ static inline bool cpu_handle_exception(CPUState *cpu, int 
*ret)
 return false;
 }
 
+static inline void cpu_handle_interrupt(CPUState *cpu,
+TranslationBlock **last_tb)
+{
+CPUClass *cc = CPU_GET_CLASS(cpu);
+int interrupt_request = cpu->interrupt_request;
+
+if (unlikely(interrupt_request)) {
+if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
+/* Mask out external interrupts for this step. */
+interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
+}
+if (interrupt_request & CPU_INTERRUPT_DEBUG) {
+cpu->interrupt_request &= ~CPU_INTERRUPT_DEBUG;
+cpu->exception_index = EXCP_DEBUG;
+cpu_loop_exit(cpu);
+}
+if (replay_mode == REPLAY_MODE_PLAY && !replay_has_interrupt()) {
+/* Do nothing */
+} else if (interrupt_request & CPU_INTERRUPT_HALT) {
+replay_interrupt();
+cpu->interrupt_request &= ~CPU_INTERRUPT_HALT;
+cpu->halted = 1;
+cpu->exception_index = EXCP_HLT;
+cpu_loop_exit(cpu);
+}
+#if defined(TARGET_I386)
+else if (interrupt_request & CPU_INTERRUPT_INIT) {
+X86CPU *x86_cpu = X86_CPU(cpu);
+CPUArchState *env = _cpu->env;
+replay_interrupt();
+cpu_svm_check_intercept_param(env, SVM_EXIT_INIT, 0);
+do_cpu_init(x86_cpu);
+cpu->exception_index = EXCP_HALTED;
+cpu_loop_exit(cpu);
+}
+#else
+else if (interrupt_request & CPU_INTERRUPT_RESET) {
+replay_interrupt();
+cpu_reset(cpu);
+cpu_loop_exit(cpu);
+}
+#endif
+/* The target hook has 3 exit conditions:
+   False when the interrupt isn't processed,
+   True when it is, and we should restart on a new TB,
+   and via longjmp via cpu_loop_exit.  */
+else {
+replay_interrupt();
+if (cc->cpu_exec_interrupt(cpu, interrupt_request)) {
+*last_tb = NULL;
+}
+}
+/* Don't use the cached interrupt_request value,
+   do_interrupt may have updated the EXITTB flag. */
+if (cpu->interrupt_request & CPU_INTERRUPT_EXITTB) {
+cpu->interrupt_request &= ~CPU_INTERRUPT_EXITTB;
+/* ensure that no TB jump will be modified as
+   the program flow was changed */
+*last_tb = NULL;
+}
+}
+if (unlikely(cpu->exit_request || replay_has_interrupt())) {
+cpu->exit_request = 0;
+cpu->exception_index = EXCP_INTERRUPT;
+cpu_loop_exit(cpu);
+}
+}
+
 /* main execution loop */
 
 int cpu_exec(CPUState *cpu)
@@ -447,7 +515,7 @@ int cpu_exec(CPUState *cpu)
 X86CPU *x86_cpu = X86_CPU(cpu);
 CPUArchState *env = _cpu->env;
 #endif
-int ret, interrupt_request;
+int ret;
 TranslationBlock *tb, *last_tb;
 int tb_exit = 0;
 SyncClocks sc;
@@ -486,67 +554,7 @@ int cpu_exec(CPUState *cpu)
 last_tb = NULL; /* forget the last executed TB after exception */
 cpu->tb_flushed = false; /* reset before first TB lookup */
 for(;;) {
-interrupt_request = cpu->interrupt_request;
-if (unlikely(interrupt_request)) {
-if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
-/* Mask out external interrupts for this step. */
-interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
-}
-if (interrupt_request & CPU_INTERRUPT_DEBUG) {
-cpu->interrupt_request &= ~CPU_INTERRUPT_DEBUG;
-cpu->exception_index = EXCP_DEBUG;
-cpu_loop_exit(cpu);
-}
-if (replay_mode == REPLAY_MODE_PLAY
-&& !replay_has_interrupt()) {
-/* Do nothing */
-} else if (interrupt_request & CPU_INTERRUPT_HALT) {
-replay_interrupt();
-cpu->interrupt_request &= ~CPU_INTERRUPT_HALT;
-cpu->halted = 1;
-cpu->exception_index = EXCP_HLT;
-cpu_loop_exit(cpu);
-}
-#if defined(TARGET_I386)
-

[Qemu-devel] [PATCH 2/5] cpu-exec: Move exception handling out of cpu_exec()

2016-05-10 Thread Sergey Fedorov
From: Sergey Fedorov 

Simplify cpu_exec() by extracting exception handling code out of
cpu_exec() into a new static inline function cpu_handle_exception().
Also make cpu_handle_debug_exception() inline as it is used only once.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
---
 cpu-exec.c | 93 +++---
 1 file changed, 52 insertions(+), 41 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index fb72f102742c..ad95ace38dd9 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -375,7 +375,7 @@ static inline bool cpu_handle_halt(CPUState *cpu)
 return false;
 }
 
-static void cpu_handle_debug_exception(CPUState *cpu)
+static inline void cpu_handle_debug_exception(CPUState *cpu)
 {
 CPUClass *cc = CPU_GET_CLASS(cpu);
 CPUWatchpoint *wp;
@@ -389,6 +389,55 @@ static void cpu_handle_debug_exception(CPUState *cpu)
 cc->debug_excp_handler(cpu);
 }
 
+static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
+{
+if (cpu->exception_index >= 0) {
+if (cpu->exception_index >= EXCP_INTERRUPT) {
+/* exit request from the cpu execution loop */
+*ret = cpu->exception_index;
+if (*ret == EXCP_DEBUG) {
+cpu_handle_debug_exception(cpu);
+}
+cpu->exception_index = -1;
+return true;
+} else {
+#if defined(CONFIG_USER_ONLY)
+/* if user mode only, we simulate a fake exception
+   which will be handled outside the cpu execution
+   loop */
+#if defined(TARGET_I386)
+CPUClass *cc = CPU_GET_CLASS(cpu);
+cc->do_interrupt(cpu);
+#endif
+*ret = cpu->exception_index;
+cpu->exception_index = -1;
+return true;
+#else
+if (replay_exception()) {
+CPUClass *cc = CPU_GET_CLASS(cpu);
+cc->do_interrupt(cpu);
+cpu->exception_index = -1;
+} else if (!replay_has_interrupt()) {
+/* give a chance to iothread in replay mode */
+*ret = EXCP_INTERRUPT;
+return true;
+}
+#endif
+}
+#ifndef CONFIG_USER_ONLY
+} else if (replay_has_exception()
+   && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
+/* try to cause an exception pending in the log */
+TranslationBlock *last_tb = NULL; /* Avoid chaining TBs */
+cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, _tb, 0), true);
+*ret = -1;
+return true;
+#endif
+}
+
+return false;
+}
+
 /* main execution loop */
 
 int cpu_exec(CPUState *cpu)
@@ -426,50 +475,12 @@ int cpu_exec(CPUState *cpu)
  */
 init_delay_params(, cpu);
 
-/* prepare setjmp context for exception handling */
 for(;;) {
+/* prepare setjmp context for exception handling */
 if (sigsetjmp(cpu->jmp_env, 0) == 0) {
 /* if an exception is pending, we execute it here */
-if (cpu->exception_index >= 0) {
-if (cpu->exception_index >= EXCP_INTERRUPT) {
-/* exit request from the cpu execution loop */
-ret = cpu->exception_index;
-if (ret == EXCP_DEBUG) {
-cpu_handle_debug_exception(cpu);
-}
-cpu->exception_index = -1;
-break;
-} else {
-#if defined(CONFIG_USER_ONLY)
-/* if user mode only, we simulate a fake exception
-   which will be handled outside the cpu execution
-   loop */
-#if defined(TARGET_I386)
-cc->do_interrupt(cpu);
-#endif
-ret = cpu->exception_index;
-cpu->exception_index = -1;
-break;
-#else
-if (replay_exception()) {
-cc->do_interrupt(cpu);
-cpu->exception_index = -1;
-} else if (!replay_has_interrupt()) {
-/* give a chance to iothread in replay mode */
-ret = EXCP_INTERRUPT;
-break;
-}
-#endif
-}
-#ifndef CONFIG_USER_ONLY
-} else if (replay_has_exception()
-   && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
-/* try to cause an exception pending in the log */
-last_tb = NULL; /* Avoid chaining TBs */
-cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, _tb, 0), true);
-ret = -1;
+if (cpu_handle_exception(cpu, )) {
 break;
-#endif
 }
 
 last_tb = NULL; /* forget the last executed TB after exception */
-- 
1.9.1




[Qemu-devel] [PATCH 0/5] cpu-exec: Restructure cpu_exec()

2016-05-10 Thread Sergey Fedorov
From: Sergey Fedorov 

cpu_exec() was a huge function also sprinkled with some preprocessor
directives. It was hard to read and see the main loop crowded by all
this code. Restructure cpu_exec() by moving its conceptual parts into
separate static inline functions. That makes it possible to see the
whole main loop at once, especially its sigsetjmp() handling part.

This series is based on commit
40f646483a11 (cpu-exec: Remove relic orphaned comment)
from
git://github.com/rth7680/qemu.git tcg-next
and is available at
git://github.com/sergefdrv/qemu.git cpu_exec-restructure

Sergey Fedorov (5):
  cpu-exec: Move halt handling out of cpu_exec()
  cpu-exec: Move exception handling out of cpu_exec()
  cpu-exec: Move interrupt handling out of cpu_exec()
  cpu-exec: Move TB execution stuff out of cpu_exec()
  cpu-exec: Remove unused 'x86_cpu' and 'env' from cpu_exec()

 cpu-exec.c | 395 +
 1 file changed, 211 insertions(+), 184 deletions(-)

-- 
1.9.1




Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Eric Blake
On 05/10/2016 09:41 AM, Alex Bligh wrote:
> 
> On 10 May 2016, at 16:29, Eric Blake  wrote:
> 
>> So the kernel is currently one of the clients that does NOT honor block
>> sizes, and as such, servers should be prepared for ANY size up to
>> UINT_MAX (other than DoS handling).
> 
> Interesting followup question:
> 
> If the kernel does not fragment TRIM requests at all (in the
> same way it fragments read and write requests), I suspect
> something bad may happen with TRIM requests over 2^31
> in size (particularly over 2^32 in size), as the length
> field in nbd only has 32 bits.
> 
> Whether it supports block size constraints or not, it is
> going to need to do *some* breaking up of requests.

Does anyone have an easy way to cause the kernel to request a trim
operation that large on a > 4G export?  I'm not familiar enough with
EXT4 operation to know what file system operations you can run to
ultimately indirectly create a file system trim operation that large.
But maybe there is something simpler - does the kernel let you use the
fallocate(2) syscall operation with FALLOC_FL_PUNCH_HOLE or
FALLOC_FL_ZERO_RANGE on an fd backed by an NBD device?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Quentin Casasnovas
On Tue, May 10, 2016 at 04:38:29PM +0100, Alex Bligh wrote:
> Eric,
> 
> On 10 May 2016, at 16:29, Eric Blake  wrote:
> >>> Maybe we should revisit that in the spec, and/or advertise yet another
> >>> block size (since the maximum size for a trim and/or write_zeroes
> >>> request may indeed be different than the maximum size for a read/write).
> >> 
> >> I think it's up to the server to either handle large requests, or
> >> for the client to break these up.
> > 
> > But the question at hand here is whether we should permit servers to
> > advertise multiple maximum block sizes (one for read/write, another one
> > for trim/write_zero, or even two [at least qemu tracks a separate
> > maximum trim vs. write_zero sizing in its generic block layer]), or
> > merely stick with the current wording that requires clients that honor
> > maximum block size to obey the same maximum for ALL commands, regardless
> > of amount of data sent over the wire.
> 
> In my view, we should not change this. Block sizes maxima are not there
> to support DoS prevention (that's a separate phrase). They are there
> to impose maximum block sizes. Adding a different maximum block size
> for different commands is way too overengineered. There are after
> all reasons (especially without structured replies) why you'd want
> different maximum block sizes for writes and reads. If clients support
> block sizes, they will necessarily have to have the infrastructure
> to break requests up.
> 
> IE maximum block size should continue to mean maximum block size.
> 
> >> 
> >> The core problem here is that the kernel (and, ahem, most servers) are
> >> ignorant of the block size extension, and need to guess how to break
> >> things up. In my view the client (kernel in this case) should
> >> be breaking the trim requests up into whatever size it uses as the
> >> maximum size write requests. But then it would have to know about block
> >> sizes which are in (another) experimental extension.
> > 
> > Correct - no one has yet patched the kernel to honor block sizes
> > advertised through what is currently an experimental extension.
> 
> Unsurprising at it's still experimental, and only settled down a couple
> of weeks ago :-)
> 
> >  (We
> > have ioctl(NBD_SET_BLKSIZE) which can be argued to set the kernel's
> > minimum block size,
> 
> Technically that is 'out of band transmission of block size
> constraints' :-)
> 
> > but I haven't audited whether the kernel actually
> > guarantees that all client requests are sent aligned to the value passed
> > that way - but we have nothing to set the maximum size,
> 
> indeed
> 
> > and are at the
> > mercy of however the kernel currently decides to split large requests).
> 
> I am surprised TRIM doesn't get broken up the same way READ and WRITE
> do.
>

I'm by no mean an expert in this, but why would the kernel break up those
TRIM commands?  After all, breaking things up makes sense when the length
of the request is big, not that much when it only contains the request
header, which is the case for TRIM commands.

What am I missing?

Quentin



Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Alex Bligh

On 10 May 2016, at 16:29, Eric Blake  wrote:

> So the kernel is currently one of the clients that does NOT honor block
> sizes, and as such, servers should be prepared for ANY size up to
> UINT_MAX (other than DoS handling).

Interesting followup question:

If the kernel does not fragment TRIM requests at all (in the
same way it fragments read and write requests), I suspect
something bad may happen with TRIM requests over 2^31
in size (particularly over 2^32 in size), as the length
field in nbd only has 32 bits.

Whether it supports block size constraints or not, it is
going to need to do *some* breaking up of requests.

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-devel] [PATCH for-2.7 2/4] pc: Set CPU model-id on compat_props for pc <= 2.4

2016-05-10 Thread Michael S. Tsirkin
On Sat, Apr 09, 2016 at 05:37:45PM -0300, Eduardo Habkost wrote:
> Instead of relying on x86_cpudef_setup() calling
> qemu_hw_version(), just make old machines set model-id explicitly
> on compat_props for qemu64, qemu32, and athlon. This will allow
> us to eliminate x86_cpudef_setup() later.
> 
> Signed-off-by: Eduardo Habkost 

Reviewed-by: Michael S. Tsirkin 

Feel free to merge directly.

> ---
>  hw/i386/pc_piix.c| 12 +++-
>  include/hw/i386/pc.h | 29 +
>  2 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index 7f50116..25e04e4 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -583,6 +583,7 @@ DEFINE_I440FX_MACHINE(v1_4, "pc-i440fx-1.4", 
> pc_compat_1_4,
>  
>  #define PC_COMPAT_1_3 \
>  PC_COMPAT_1_4 \
> +PC_CPU_MODEL_IDS("1.3.0") \
>  {\
>  .driver   = "usb-tablet",\
>  .property = "usb_version",\
> @@ -615,6 +616,7 @@ DEFINE_I440FX_MACHINE(v1_3, "pc-1.3", pc_compat_1_3,
>  
>  #define PC_COMPAT_1_2 \
>  PC_COMPAT_1_3 \
> +PC_CPU_MODEL_IDS("1.2.0") \
>  {\
>  .driver   = "nec-usb-xhci",\
>  .property = "msi",\
> @@ -654,6 +656,7 @@ DEFINE_I440FX_MACHINE(v1_2, "pc-1.2", pc_compat_1_2,
>  
>  #define PC_COMPAT_1_1 \
>  PC_COMPAT_1_2 \
> +PC_CPU_MODEL_IDS("1.1.0") \
>  {\
>  .driver   = "virtio-scsi-pci",\
>  .property = "hotplug",\
> @@ -697,6 +700,7 @@ DEFINE_I440FX_MACHINE(v1_1, "pc-1.1", pc_compat_1_2,
>  
>  #define PC_COMPAT_1_0 \
>  PC_COMPAT_1_1 \
> +PC_CPU_MODEL_IDS("1.0") \
>  {\
>  .driver   = TYPE_ISA_FDC,\
>  .property = "check_media_rate",\
> @@ -727,7 +731,8 @@ DEFINE_I440FX_MACHINE(v1_0, "pc-1.0", pc_compat_1_2,
>  
>  
>  #define PC_COMPAT_0_15 \
> -PC_COMPAT_1_0
> +PC_COMPAT_1_0 \
> +PC_CPU_MODEL_IDS("0.15")
>  
>  static void pc_i440fx_0_15_machine_options(MachineClass *m)
>  {
> @@ -742,6 +747,7 @@ DEFINE_I440FX_MACHINE(v0_15, "pc-0.15", pc_compat_1_2,
>  
>  #define PC_COMPAT_0_14 \
>  PC_COMPAT_0_15 \
> +PC_CPU_MODEL_IDS("0.14") \
>  {\
>  .driver   = "virtio-blk-pci",\
>  .property = "event_idx",\
> @@ -781,6 +787,7 @@ DEFINE_I440FX_MACHINE(v0_14, "pc-0.14", pc_compat_1_2,
>  
>  #define PC_COMPAT_0_13 \
>  PC_COMPAT_0_14 \
> +PC_CPU_MODEL_IDS("0.13") \
>  {\
>  .driver   = TYPE_PCI_DEVICE,\
>  .property = "command_serr_enable",\
> @@ -818,6 +825,7 @@ DEFINE_I440FX_MACHINE(v0_13, "pc-0.13", pc_compat_0_13,
>  
>  #define PC_COMPAT_0_12 \
>  PC_COMPAT_0_13 \
> +PC_CPU_MODEL_IDS("0.12") \
>  {\
>  .driver   = "virtio-serial-pci",\
>  .property = "max_ports",\
> @@ -853,6 +861,7 @@ DEFINE_I440FX_MACHINE(v0_12, "pc-0.12", pc_compat_0_13,
>  
>  #define PC_COMPAT_0_11 \
>  PC_COMPAT_0_12 \
> +PC_CPU_MODEL_IDS("0.11") \
>  {\
>  .driver   = "virtio-blk-pci",\
>  .property = "vectors",\
> @@ -884,6 +893,7 @@ DEFINE_I440FX_MACHINE(v0_11, "pc-0.11", pc_compat_0_13,
>  
>  #define PC_COMPAT_0_10 \
>  PC_COMPAT_0_11 \
> +PC_CPU_MODEL_IDS("0.10") \
>  {\
>  .driver   = "virtio-blk-pci",\
>  .property = "class",\
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 96f0b66..5639979 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -359,9 +359,30 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t 
> *);
>  #define PC_COMPAT_2_5 \
>  HW_COMPAT_2_5
>  
> +/* Helper for setting model-id for CPU models that changed model-id
> + * depending on QEMU versions up to QEMU 2.4.
> + */
> +#define PC_CPU_MODEL_IDS(v) \
> +{\
> +.driver   = "qemu32-" TYPE_X86_CPU,\
> +.property = "model-id",\
> +.value= "QEMU Virtual CPU version " v,\
> +},\
> +{\
> +.driver   = "qemu64-" TYPE_X86_CPU,\
> +.property = "model-id",\
> +.value= "QEMU Virtual CPU version " v,\
> +},\
> +{\
> +.driver   = "athlon-" TYPE_X86_CPU,\
> +.property = "model-id",\
> +.value= "QEMU Virtual CPU version " v,\
> +},
> +
>  #define PC_COMPAT_2_4 \
>  PC_COMPAT_2_5 \
>  HW_COMPAT_2_4 \
> +PC_CPU_MODEL_IDS("2.4.0") \
>  {\
>  .driver   = "Haswell-" TYPE_X86_CPU,\
>  .property = "abm",\
> @@ -433,6 +454,7 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t 
> *);
>  #define PC_COMPAT_2_3 \
>  PC_COMPAT_2_4 \
>  HW_COMPAT_2_3 \
> +PC_CPU_MODEL_IDS("2.3.0") \
>  {\
>  .driver   = TYPE_X86_CPU,\
>  .property = "arat",\
> @@ -514,6 +536,7 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t 
> *);
>  #define PC_COMPAT_2_2 \
>  PC_COMPAT_2_3 

Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Alex Bligh
Eric,

On 10 May 2016, at 16:29, Eric Blake  wrote:
>>> Maybe we should revisit that in the spec, and/or advertise yet another
>>> block size (since the maximum size for a trim and/or write_zeroes
>>> request may indeed be different than the maximum size for a read/write).
>> 
>> I think it's up to the server to either handle large requests, or
>> for the client to break these up.
> 
> But the question at hand here is whether we should permit servers to
> advertise multiple maximum block sizes (one for read/write, another one
> for trim/write_zero, or even two [at least qemu tracks a separate
> maximum trim vs. write_zero sizing in its generic block layer]), or
> merely stick with the current wording that requires clients that honor
> maximum block size to obey the same maximum for ALL commands, regardless
> of amount of data sent over the wire.

In my view, we should not change this. Block sizes maxima are not there
to support DoS prevention (that's a separate phrase). They are there
to impose maximum block sizes. Adding a different maximum block size
for different commands is way too overengineered. There are after
all reasons (especially without structured replies) why you'd want
different maximum block sizes for writes and reads. If clients support
block sizes, they will necessarily have to have the infrastructure
to break requests up.

IE maximum block size should continue to mean maximum block size.

>> 
>> The core problem here is that the kernel (and, ahem, most servers) are
>> ignorant of the block size extension, and need to guess how to break
>> things up. In my view the client (kernel in this case) should
>> be breaking the trim requests up into whatever size it uses as the
>> maximum size write requests. But then it would have to know about block
>> sizes which are in (another) experimental extension.
> 
> Correct - no one has yet patched the kernel to honor block sizes
> advertised through what is currently an experimental extension.

Unsurprising at it's still experimental, and only settled down a couple
of weeks ago :-)

>  (We
> have ioctl(NBD_SET_BLKSIZE) which can be argued to set the kernel's
> minimum block size,

Technically that is 'out of band transmission of block size
constraints' :-)

> but I haven't audited whether the kernel actually
> guarantees that all client requests are sent aligned to the value passed
> that way - but we have nothing to set the maximum size,

indeed

> and are at the
> mercy of however the kernel currently decides to split large requests).

I am surprised TRIM doesn't get broken up the same way READ and WRITE
do.

> So the kernel is currently one of the clients that does NOT honor block
> sizes, and as such, servers should be prepared for ANY size up to
> UINT_MAX (other than DoS handling).

Or not to permit a connection.

>  My question above only applies to
> clients that use the experimental block size extensions.

Indeed.

>> What surprises me is that a release kernel is using experimental
>> NBD extensions; there's no guarantee these won't change. Or does
>> fstrim work some other way?
> 
> No extension in play.  The kernel is obeying NBD_FLAG_SEND_TRIM, which
> is in the normative standard, and unrelated to the INFO/BLOCK_SIZE
> extensions.

My mistake. I was confusing 'WRITE_ZEROES' with 'TRIM'.

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-devel] [PATCH RFC 0/3] qdev: order devices by priority before creating them

2016-05-10 Thread Michael S. Tsirkin
On Tue, May 10, 2016 at 10:28:43AM +0200, Markus Armbruster wrote:
> Marcel Apfelbaum  writes:
> 
> > This series aims to allow more devices to be used with '-device'
> > by sorting the devices based on a predefined creation order flag
> > before creating them.
> >
> > Devices like IOMMU need to be created before others, so they can leverage
> > the DeviceCreationPriority flag introduced by the first patch to 
> > DeviceClass.
> >
> > The second patch sorts the devices by their DeviceCreationPriority
> > before creating them.
> >
> > Finally, the last patch demonstrates how it can be used to ensure
> > the creation of host-bridges before the pci-bridges and pci-bridges before
> > the others.
> >
> > I preferred to combine all the priorities into a single enum
> > to better manage the creation order.
> >
> > This is an RFC because I only wanted to know if it seems like the right way 
> > to go.
> > Comments are appreciated,
> 
> Can you explain why requiring the user to specify -device in a sane
> order isn't good enough?

Because it requires knowledge about the hardware.
For example, how do users know that iommu must come before the devices?
They don't. They often don't know their devices are pci devices even.

-- 
MST



Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Eric Blake
On 05/10/2016 09:08 AM, Alex Bligh wrote:
> Eric,
> 
>> Hmm. The current wording of the experimental block size additions does
>> NOT allow the client to send a NBD_CMD_TRIM with a size larger than the
>> maximum NBD_CMD_WRITE:
>> https://github.com/yoe/nbd/blob/extension-info/doc/proto.md#block-size-constraints
> 
> Correct
> 
>> Maybe we should revisit that in the spec, and/or advertise yet another
>> block size (since the maximum size for a trim and/or write_zeroes
>> request may indeed be different than the maximum size for a read/write).
> 
> I think it's up to the server to either handle large requests, or
> for the client to break these up.

But the question at hand here is whether we should permit servers to
advertise multiple maximum block sizes (one for read/write, another one
for trim/write_zero, or even two [at least qemu tracks a separate
maximum trim vs. write_zero sizing in its generic block layer]), or
merely stick with the current wording that requires clients that honor
maximum block size to obey the same maximum for ALL commands, regardless
of amount of data sent over the wire.

> 
> The core problem here is that the kernel (and, ahem, most servers) are
> ignorant of the block size extension, and need to guess how to break
> things up. In my view the client (kernel in this case) should
> be breaking the trim requests up into whatever size it uses as the
> maximum size write requests. But then it would have to know about block
> sizes which are in (another) experimental extension.

Correct - no one has yet patched the kernel to honor block sizes
advertised through what is currently an experimental extension.  (We
have ioctl(NBD_SET_BLKSIZE) which can be argued to set the kernel's
minimum block size, but I haven't audited whether the kernel actually
guarantees that all client requests are sent aligned to the value passed
that way - but we have nothing to set the maximum size, and are at the
mercy of however the kernel currently decides to split large requests).
 So the kernel is currently one of the clients that does NOT honor block
sizes, and as such, servers should be prepared for ANY size up to
UINT_MAX (other than DoS handling).  My question above only applies to
clients that use the experimental block size extensions.

> 
> I've nothing against you fixing it qemu's server (after all I don't
> think there is anything to /prohibit/ a server working on something
> larger than the maximum block size), and ...
> 
>> But since the kernel is the one sending the large length request, and
>> since you are right that this is not a denial-of-service in the amount
>> of data being sent in a single NBD message,
> 
> ... currently there isn't actually a maximum block size advertised (that
> being in another experimental addition), so this is just DoS prevention,
> which qemu is free to do how it wishes.

Okay, sounds like that part is uncontroversial, and it is indeed worth
improving qemu's behavior.

> 
> What surprises me is that a release kernel is using experimental
> NBD extensions; there's no guarantee these won't change. Or does
> fstrim work some other way?

No extension in play.  The kernel is obeying NBD_FLAG_SEND_TRIM, which
is in the normative standard, and unrelated to the INFO/BLOCK_SIZE
extensions.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH for-2.7 v3 00/12] vl: graphics stubs + #ifdef cleanup + DT_NOGRAPHIC cleanup

2016-05-10 Thread Eduardo Habkost
On Tue, May 10, 2016 at 03:55:11PM +0200, Paolo Bonzini wrote:
> On 19/04/2016 21:55, Eduardo Habkost wrote:
> > * Clean up the graphics initialization code to reduce the
> >   number of #ifdefs;
> > * Remove the display_type == DT_NOGRAPHIC checks from hardware
> >   emulation code;
> > * Make the display_type global variable a local variable on
> >   main();
> > * Make the display_remote static variable a local variable on
> >   main().
> > 
[...]
> 
> Very nice, I suppose you'll be sending the pull request?

Now that I got some feedback, I plan to apply it, yes. :)

-- 
Eduardo



Re: [Qemu-devel] [RFC PATCH v4] fw/pci: Add support for mapping Intel IGD via QEMU

2016-05-10 Thread Alex Williamson
On Tue, 23 Feb 2016 10:22:33 -0500
"Kevin O'Connor"  wrote:

> On Tue, Feb 16, 2016 at 02:39:27PM -0700, Alex Williamson wrote:
> > QEMU provides two fw_cfg files to support IGD.  The first holds the
> > OpRegion data which holds the Video BIOS Table (VBT).  This needs to
> > be copied into reserved memory and the address stored in the ASL
> > Storage register of the device at 0xFC offset in PCI config space.
> > The OpRegion is generally 8KB.  This file is named "etc/igd-opregion".
> > 
> > The second file tells us the required size of the stolen memory space
> > for the device.  This is a dummy file, it has no backing so we only
> > allocate the space without copying anything into it.  This space
> > requires 1MB alignment and is generally either 1MB or 2MB, depending
> > on the hardware config.  If the user has opted in QEMU to expose
> > additional stolen memory beyond the GTT (GGMS), the GMS may add an
> > additional 32MB to 512MB.  The base address of the reserved memory
> > allocated for this is written back to the Base Data of Stolen Memory
> > register (BDSM) at PCI config offset 0x5C on the device.  This file is
> > named "etc/igd-bdsm".  
> 
> Thanks.  I'd be interested in seeing how the discussion with Michael
> and Igor works out (wrt using a standardized interface for
> communicating addresses between qemu and firmware) before adopting
> this interface in SeaBIOS.

Apologies if I missed it, but I haven't seen any work in this
direction, please point me to it if I'm wrong.  Now that QEMU 2.6 is
wrapping up, I'd like to get IGD assignment support in for 2.7, but we
can't do that without BIOS support.  How shall we proceed?  Thanks,

Alex



Re: [Qemu-devel] [PATCH 8/9] target-i386: Use "-" instead of "_" on all feature names

2016-05-10 Thread Igor Mammedov
On Fri,  6 May 2016 15:11:31 -0300
Eduardo Habkost  wrote:

> This makes the feature name tables in feature_word_info all match
> the actual QOM property names we use.
> 
> This will make the command-line interface more consistent,
> allowing the QOM property names to be used as "-cpu" arguments
> directly.

wouldn't that change output of '-cpu help',
can it break libvirt?


> Add extra feat2prop() calls to x86_cpu_parse_featurestr() to keep
> compatibility with the old that had underscores.
> 
> Cc: Jiri Denemark 
> Cc: libvir-l...@redhat.com
> Signed-off-by: Eduardo Habkost 
> ---
>  target-i386/cpu.c | 32 
>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index 309ef55..e7365d1 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -187,7 +187,7 @@ static const char *feature_name[] = {
>  };
>  static const char *ext_feature_name[] = {
>  "pni|sse3" /* Intel,AMD sse3 */, "pclmulqdq|pclmuldq", "dtes64", 
> "monitor",
> -"ds_cpl", "vmx", "smx", "est",
> +"ds-cpl", "vmx", "smx", "est",
>  "tm2", "ssse3", "cid", NULL,
>  "fma", "cx16", "xtpr", "pdcm",
>  NULL, "pcid", "dca", "sse4.1|sse4_1",
> @@ -207,17 +207,17 @@ static const char *ext2_feature_name[] = {
>  NULL /* mtrr */, NULL /* pge */, NULL /* mca */, NULL /* cmov */,
>  NULL /* pat */, NULL /* pse36 */, NULL, NULL /* Linux mp */,
>  "nx|xd", NULL, "mmxext", NULL /* mmx */,
> -NULL /* fxsr */, "fxsr_opt|ffxsr", "pdpe1gb" /* AMD Page1GB */, "rdtscp",
> +NULL /* fxsr */, "fxsr-opt|ffxsr", "pdpe1gb" /* AMD Page1GB */, "rdtscp",
>  NULL, "lm|i64", "3dnowext", "3dnow",
>  };
>  static const char *ext3_feature_name[] = {
> -"lahf_lm" /* AMD LahfSahf */, "cmp_legacy", "svm", "extapic" /* AMD 
> ExtApicSpace */,
> +"lahf-lm" /* AMD LahfSahf */, "cmp-legacy", "svm", "extapic" /* AMD 
> ExtApicSpace */,
>  "cr8legacy" /* AMD AltMovCr8 */, "abm", "sse4a", "misalignsse",
>  "3dnowprefetch", "osvw", "ibs", "xop",
>  "skinit", "wdt", NULL, "lwp",
> -"fma4", "tce", NULL, "nodeid_msr",
> -NULL, "tbm", "topoext", "perfctr_core",
> -"perfctr_nb", NULL, NULL, NULL,
> +"fma4", "tce", NULL, "nodeid-msr",
> +NULL, "tbm", "topoext", "perfctr-core",
> +"perfctr-nb", NULL, NULL, NULL,
>  NULL, NULL, NULL, NULL,
>  };
>  
> @@ -233,8 +233,8 @@ static const char *ext4_feature_name[] = {
>  };
>  
>  static const char *kvm_feature_name[] = {
> -"kvmclock", "kvm_nopiodelay", "kvm_mmu", "kvmclock",
> -"kvm_asyncpf", "kvm_steal_time", "kvm_pv_eoi", "kvm_pv_unhalt",
> +"kvmclock", "kvm-nopiodelay", "kvm-mmu", "kvmclock",
> +"kvm-asyncpf", "kvm-steal-time", "kvm-pv-eoi", "kvm-pv-unhalt",
>  NULL, NULL, NULL, NULL,
>  NULL, NULL, NULL, NULL,
>  NULL, NULL, NULL, NULL,
> @@ -244,9 +244,9 @@ static const char *kvm_feature_name[] = {
>  };
>  
>  static const char *svm_feature_name[] = {
> -"npt", "lbrv", "svm_lock", "nrip_save",
> -"tsc_scale", "vmcb_clean",  "flushbyasid", "decodeassists",
> -NULL, NULL, "pause_filter", NULL,
> +"npt", "lbrv", "svm-lock", "nrip-save",
> +"tsc-scale", "vmcb-clean",  "flushbyasid", "decodeassists",
> +NULL, NULL, "pause-filter", NULL,
>  "pfthreshold", NULL, NULL, NULL,
>  NULL, NULL, NULL, NULL,
>  NULL, NULL, NULL, NULL,
> @@ -255,7 +255,7 @@ static const char *svm_feature_name[] = {
>  };
>  
>  static const char *cpuid_7_0_ebx_feature_name[] = {
> -"fsgsbase", "tsc_adjust", NULL, "bmi1", "hle", "avx2", NULL, "smep",
> +"fsgsbase", "tsc-adjust", NULL, "bmi1", "hle", "avx2", NULL, "smep",
>  "bmi2", "erms", "invpcid", "rtm", NULL, NULL, "mpx", NULL,
>  "avx512f", NULL, "rdseed", "adx", "smap", NULL, "pcommit", "clflushopt",
>  "clwb", NULL, "avx512pf", "avx512er", "avx512cd", NULL, NULL, NULL,
> @@ -1894,8 +1894,8 @@ static PropertyInfo qdev_prop_spinlocks = {
>  .set   = x86_set_hv_spinlocks,
>  };
>  
> -/* Convert all '_' in a feature string option name to '-', to make feature
> - * name conform to QOM property naming rule, which uses '-' instead of '_'.
> +/* Convert all '_' in a feature string option name to '-', to keep 
> compatibility
> + * with old feature names that used "_" instead of "-".
>   */
>  static inline void feat2prop(char *s)
>  {
> @@ -1925,8 +1925,10 @@ static void x86_cpu_parse_featurestr(CPUState *cs, 
> char *features,
>  while (featurestr) {
>  char *val;
>  if (featurestr[0] == '+') {
> +feat2prop(featurestr);
>  add_flagname_to_bitmaps(featurestr + 1, plus_features, 
> _err);
>  } else if (featurestr[0] == '-') {
> +feat2prop(featurestr);
>  add_flagname_to_bitmaps(featurestr + 1, minus_features, 
> _err);
>  } else if ((val = strchr(featurestr, '='))) {
>  *val = 0; val++;
> @@ 

Re: [Qemu-devel] [PATCH 3/9] target-i386: Call cpu_exec_init() on realize

2016-05-10 Thread Igor Mammedov
On Fri,  6 May 2016 15:11:26 -0300
Eduardo Habkost  wrote:

> QOM instance_init functions are not supposed to have any side-effects,
> as new objects may be created at any moment for querying property
> information (see qmp_device_list_properties()).
> 
> Calling cpu_exec_init() also affects QEMU's ability to handle errors
> during CPU creation, as some actions done by cpu_exec_init() can't be
> reverted.
> 
> Move cpu_exec_init() call to realize so a simple object_new() won't
> trigger it, and so that it is called after some basic validation of CPU
> parameters.
> 
> Signed-off-by: Eduardo Habkost 
Reviewed-by: Igor Mammedov 

> ---
> Changes v1 -> v2:
>  * Call cpu_exec_init() after basic parameter validation
> ---
>  target-i386/cpu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index bde649a..26a5a6b 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -2901,6 +2901,8 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
> **errp)
>  }
>  
>  
> +cpu_exec_init(cs, _abort);
> +
>  if (tcg_enabled()) {
>  tcg_x86_init();
>  }
> @@ -3093,7 +3095,6 @@ static void x86_cpu_initfn(Object *obj)
>  FeatureWord w;
>  
>  cs->env_ptr = env;
> -cpu_exec_init(cs, _abort);
>  
>  object_property_add(obj, "family", "int",
>  x86_cpuid_version_get_family,




Re: [Qemu-devel] [PATCH v7 00/19] block: kill sector-based blk_write/read

2016-05-10 Thread Kevin Wolf
Am 06.05.2016 um 18:26 hat Eric Blake geschrieben:
> 2.7 material, depends on Kevin's block-next:
> git://repo.or.cz/qemu/kevin.git block-next
> 
> Previously posted as part of a larger NBD series [1] and as v6 [3].
> Mostly orthogonal to Kevin's recent work to also kill sector
> interfaces from the driver.
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg03526.html
> [2] https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg00557.html
> 
> Also available as a tag at this location:
> git fetch git://repo.or.cz/qemu/ericb.git nbd-block-v7

Thanks, fixed up scsi-disk and applied to block-next.

Kevin



Re: [Qemu-devel] [PATCH 2/9] target-i386: Move TCG initialization to realize time

2016-05-10 Thread Igor Mammedov
On Fri,  6 May 2016 15:11:25 -0300
Eduardo Habkost  wrote:

> QOM instance_init functions are not supposed to have any side-effects,
> as new objects may be created at any moment for querying property
> information (see qmp_device_list_properties()).
> 
> Move TCG initialization to realize time so it won't be called when just
> doing object_new() on a X86CPU subclass.
> 
> Signed-off-by: Eduardo Habkost 
Reviewed-by: Igor Mammedov 

> ---
> Changes v1 -> v2:
>  * Now the inited/tcg_initialized variable doesn't exist anymore
>  * Move tcg_x86_init() call after basic parameter validation inside
>realizefn
> ---
>  target-i386/cpu.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index a689fec..bde649a 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -2901,6 +2901,10 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
> **errp)
>  }
>  
>  
> +if (tcg_enabled()) {
> +tcg_x86_init();
> +}
> +
>  #ifndef CONFIG_USER_ONLY
>  qemu_register_reset(x86_cpu_machine_reset_cb, cpu);
>  
> @@ -3135,11 +3139,6 @@ static void x86_cpu_initfn(Object *obj)
>  }
>  
>  x86_cpu_load_def(cpu, xcc->cpu_def, _abort);
> -
> -/* init various static tables used in TCG mode */
> -if (tcg_enabled()) {
> -tcg_x86_init();
> -}
>  }
>  
>  static int64_t x86_cpu_get_arch_id(CPUState *cs)




Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Alex Bligh
Eric,

> Hmm. The current wording of the experimental block size additions does
> NOT allow the client to send a NBD_CMD_TRIM with a size larger than the
> maximum NBD_CMD_WRITE:
> https://github.com/yoe/nbd/blob/extension-info/doc/proto.md#block-size-constraints

Correct

> Maybe we should revisit that in the spec, and/or advertise yet another
> block size (since the maximum size for a trim and/or write_zeroes
> request may indeed be different than the maximum size for a read/write).

I think it's up to the server to either handle large requests, or
for the client to break these up.

The core problem here is that the kernel (and, ahem, most servers) are
ignorant of the block size extension, and need to guess how to break
things up. In my view the client (kernel in this case) should
be breaking the trim requests up into whatever size it uses as the
maximum size write requests. But then it would have to know about block
sizes which are in (another) experimental extension.

I've nothing against you fixing it qemu's server (after all I don't
think there is anything to /prohibit/ a server working on something
larger than the maximum block size), and ...

> But since the kernel is the one sending the large length request, and
> since you are right that this is not a denial-of-service in the amount
> of data being sent in a single NBD message,

... currently there isn't actually a maximum block size advertised (that
being in another experimental addition), so this is just DoS prevention,
which qemu is free to do how it wishes.

> I definitely agree that qemu
> would be wise as a quality-of-implementation to allow the larger size,
> for maximum interoperability, even if it exceeds advertised limits (that
> is, when no limits are advertised, we should handle everything possible
> if it is not so large as to be construed a denial-of-service, and
> NBD_CMD_TRIM is not large; and when limits ARE advertised, a client that
> violates limits is out of spec but we can still be liberal and respond
> successfully to such a client rather than having to outright reject it).
> So I think this patch is headed in the right direction.

I'd agree with that.

What surprises me is that a release kernel is using experimental
NBD extensions; there's no guarantee these won't change. Or does
fstrim work some other way?

Alex

> 
>> 
>> and looking at the kernel side implementation of the nbd device
>> (drivers/block/nbd.c) where it only sends the request header with no data
>> for a NBD_CMD_TRIM.
>> 
>> With this fix in, I am now able to run fstrim on my qcow2 images and keep
>> them small (or at least keep their size proportional to the amount of data
>> present on them).
>> 
>> Signed-off-by: Quentin Casasnovas 
>> CC: Paolo Bonzini 
>> CC: 
>> CC: 
>> CC: 
> 
> This is NOT trivial material and should not go in through that tree.
> However, I concur that it qualifies for a backport on a stable branch.
> 
>> ---
>> nbd.c | 8 +++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>> 
>> diff --git a/nbd.c b/nbd.c
>> index b3d9654..e733669 100644
>> --- a/nbd.c
>> +++ b/nbd.c
>> @@ -1209,6 +1209,11 @@ static ssize_t nbd_co_send_reply(NBDRequest *req, 
>> struct nbd_reply *reply,
>> return rc;
>> }
>> 
>> +static bool nbd_should_check_request_size(const struct nbd_request *request)
>> +{
>> +return (request->type & NBD_CMD_MASK_COMMAND) != NBD_CMD_TRIM;
>> +}
>> +
>> static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request 
>> *request)
>> {
>> NBDClient *client = req->client;
>> @@ -1227,7 +1232,8 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, 
>> struct nbd_request *reque
>> goto out;
>> }
>> 
>> -if (request->len > NBD_MAX_BUFFER_SIZE) {
>> +if (nbd_should_check_request_size(request) &&
>> +request->len > NBD_MAX_BUFFER_SIZE) {
> 
> I'd rather sort out the implications of this on the NBD protocol before
> taking anything into qemu.  We've got time on our hand, so let's use it
> to get this right.  (That, and I have several pending patches that
> conflict with this as part of adding WRITE_ZEROES and INFO_BLOCK_SIZE
> support, where it may be easier to resubmit this fix on top of my
> pending patches).
> 
>> LOG("len (%u) is larger than max len (%u)",
>> request->len, NBD_MAX_BUFFER_SIZE);
>> rc = -EINVAL;
>> 
> 
> --
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 
> --
> Mobile security can be enabling, not merely restricting. Employees who
> bring their own devices (BYOD) to work are irked by the imposition of MDM
> restrictions. Mobile Device Manager Plus allows you to control only the
> apps on BYO-devices by containerizing them, leaving personal data untouched!
> 

Re: [Qemu-devel] [PATCH 1/9] target-i386: Move TCG initialization check to tcg_x86_init()

2016-05-10 Thread Igor Mammedov
On Fri,  6 May 2016 15:11:24 -0300
Eduardo Habkost  wrote:

> Instead of requiring cpu.c to check if TCG was already initialized,
> simply let the function be called multiple times.
> 
> Suggested-by: Igor Mammedov 
> Signed-off-by: Eduardo Habkost 
Reviewed-by: Igor Mammedov 

> ---
>  target-i386/cpu.c   | 4 +---
>  target-i386/translate.c | 6 ++
>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index 4856cd4..a689fec 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -3087,7 +3087,6 @@ static void x86_cpu_initfn(Object *obj)
>  X86CPUClass *xcc = X86_CPU_GET_CLASS(obj);
>  CPUX86State *env = >env;
>  FeatureWord w;
> -static int inited;
>  
>  cs->env_ptr = env;
>  cpu_exec_init(cs, _abort);
> @@ -3138,8 +3137,7 @@ static void x86_cpu_initfn(Object *obj)
>  x86_cpu_load_def(cpu, xcc->cpu_def, _abort);
>  
>  /* init various static tables used in TCG mode */
> -if (tcg_enabled() && !inited) {
> -inited = 1;
> +if (tcg_enabled()) {
>  tcg_x86_init();
>  }
>  }
> diff --git a/target-i386/translate.c b/target-i386/translate.c
> index 1a1214d..92570b4 100644
> --- a/target-i386/translate.c
> +++ b/target-i386/translate.c
> @@ -8133,6 +8133,12 @@ void tcg_x86_init(void)
>  "bnd0_ub", "bnd1_ub", "bnd2_ub", "bnd3_ub"
>  };
>  int i;
> +static bool initialized = false;
> +
> +if (initialized) {
> +return;
> +}
> +initialized = true;
>  
>  cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
>  cpu_cc_op = tcg_global_mem_new_i32(cpu_env,




Re: [Qemu-devel] [PATCH v7 06/19] scsi-disk: Switch to byte-based aio block access

2016-05-10 Thread Kevin Wolf
Am 10.05.2016 um 14:56 hat Eric Blake geschrieben:
> On 05/10/2016 02:55 AM, Kevin Wolf wrote:
> > Am 06.05.2016 um 18:26 hat Eric Blake geschrieben:
> >> Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
> >> to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.
> >>
> >> As part of the cleanup, scsi_init_iovec() no longer needs to return
> >> a value, and reword a comment.
> >>
> >> Signed-off-by: Eric Blake 
> >>
> 
> >> @@ -118,7 +118,6 @@ static uint32_t scsi_init_iovec(SCSIDiskReq *r, size_t 
> >> size)
> >>  }
> >>  r->iov.iov_len = MIN(r->sector_count * 512, r->buflen);
> >>  qemu_iovec_init_external(>qiov, >iov, 1);
> >> -return r->qiov.size / 512;
> > 
> > The return value was MIN(r->sector_count, SCSI_DMA_BUF_SIZE / 512).
> > 
> >>  }
> >>
> >>  static void scsi_disk_save_request(QEMUFile *f, SCSIRequest *req)
> 
> >> -n = scsi_init_iovec(r, SCSI_DMA_BUF_SIZE);
> >> +scsi_init_iovec(r, SCSI_DMA_BUF_SIZE);
> >>  block_acct_start(blk_get_stats(s->qdev.conf.blk), >acct,
> >> - n * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> >> -r->req.aiocb = blk_aio_readv(s->qdev.conf.blk, r->sector, 
> >> >qiov, n,
> >> - scsi_read_complete, r);
> >> + SCSI_DMA_BUF_SIZE, BLOCK_ACCT_READ);
> > 
> > But here you use SCSI_DMA_BUF_SIZE without considering r->sector_count.
> > Is this correct or are requests that are smaller than SCSI_DMA_BUF_SIZE
> > now accounted for more data than they actually read?
> > 
> > Would r->qiov.size be more obviously correct? You did that for writes.
> 
> Yes. Is that something you are able to fix on commit, or should I submit
> a fixup?

Okay, I can fix it up while applying.

Kevin


pgpepWPO0LrxT.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH v14 0/3] qapi: child add/delete support

2016-05-10 Thread Max Reitz
On 10.05.2016 09:36, Changlong Xie wrote:
> ChangLog:
> v14:
> 1. Address commets from Betro and Max
> p2: introduce bdrv_drained_begin/end, rename last_index, remove redundant
> assert codes
> v13:
> 1. Rebase to the newest codes
> 2. Address commets from Betro and Max
> p1. Add R-B, fix incorrect syntax
> p2. Add missing "qemu/cutils.h" since 2.6, and rewrite quorum_add/del_child
> p3. Remove unnecessary "id", add "since 2.7"
> v11~v12:
> 1. Address comments from Max
> p1. Add R-B
> p2. Add R-B, remove unnecessary "endptr" "value"
> p3. Add R-B
> v10~v11:
> 1. Rebase to the newest codes
> 2. Address comment from Max
> Don't use contractions in error messages,
> p1: Remove R-Bs, and use "BdrvChild *child" in bdrv_del_child
> p2: Fix error logic in get_new_child_index/remove_child_index, and prefect
> child->name parsing
> p3: Make bdrv_find_child return BdrvChild *, and add missing explanation
> 
> v9~v10:
> 1. Rebase to the newest codes
> 2. Address comments from Berto and Max, update the documents in 
> block-core.json 
>and qmp-commands.hx 
> 3. Remove redundant codes in quorum_add_child() and quorum_del_child()
> v8:
> 1. Rebase to the newest codes
> 2. Address the comments from Eric Blake
> v7:
> 1. Remove the qmp command x-blockdev-change's parameter operation according
>to Kevin's comments.
> 2. Remove the hmp command.
> v6:
> 1. Use a single qmp command x-blockdev-change to replace x-blockdev-child-add
>and x-blockdev-child-delete
> v5:
> 1. Address Eric Blake's comments
> v4:
> 1. drop nbd driver's implementation. We can use human-monitor-command
>to do it.
> 2. Rename the command name.
> v3:
> 1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
>created by the QMP command blockdev-add.
> 2. The driver NBD can support filename, path, host:port now.
> v2:
> 1. Use bdrv_get_device_or_node_name() instead of new function
>bdrv_get_id_or_node_name()
> 2. Update the error message
> 3. Update the documents in block-core.json
> 
> Wen Congyang (3):
>   Add new block driver interface to add/delete a BDS's child
>   quorum: implement bdrv_add_child() and bdrv_del_child()
>   qmp: add monitor command to add/remove a child
> 
>  block.c   | 57 +++---
>  block/quorum.c| 78 
> +--
>  blockdev.c| 55 +
>  include/block/block.h |  8 +
>  include/block/block_int.h |  5 +++
>  qapi/block-core.json  | 32 +++
>  qmp-commands.hx   | 53 
>  7 files changed, 282 insertions(+), 6 deletions(-)

Thanks, I've applied the series to my block-next branch:

https://github.com/XanClic/qemu/commits/block-next

Max



signature.asc
Description: OpenPGP digital signature


  1   2   3   >