Re: [Qemu-devel] [PATCH] target-i386: add a list of enforceable CPU models to the help output

2015-08-26 Thread Peter Lieven


 Am 26.08.2015 um 21:00 schrieb Eduardo Habkost ehabk...@redhat.com:
 
 On Wed, Aug 26, 2015 at 08:46:42PM +0200, Peter Lieven wrote:
 Am 26.08.2015 um 20:38 schrieb Eduardo Habkost:
 On Mon, Aug 24, 2015 at 09:36:23PM +0200, Peter Lieven wrote:
 Am 24.08.2015 um 17:46 schrieb Eric Blake:
 On 08/24/2015 03:17 AM, Peter Lieven wrote:
 this patch adds a probe that lists all enforceable and migrateable
 CPU models to the -cpu help output. The idea is to know a priory
 which CPU modules can be exposed to the user without loosing any
 feature flags.
 
 Signed-off-by: Peter Lieven p...@kamp.de
 ---
 target-i386/cpu.c | 49 +
 1 file changed, 49 insertions(+)
 Is this same sort of listing available through QMP? Parsing '-cpu help'
 output is undesirable from libvirt point of view.
 
 A good point. But is there a QMP command to list available CPU types?
 In this case it should be easy to extend.
 Yes, that's query-cpu-definitions. See past discussion at:
 http://thread.gmane.org/gmane.comp.emulators.qemu/332554
 
 Some of the assumptions at that thread changed. See:
 http://thread.gmane.org/gmane.comp.emulators.qemu/342582/focus=346651
 That means runnability should depend only on the accelerator type, and
 not on the machine-type anymore.
 
 Thanks for the pointer. But is it possible to query cpu definitions without
 a running Qemu? Like passing a QMP command on the commandline and
 receive the answer on stdout?
 
 Well, it's impossible to check if a CPU model is runnable without running 
 QEMU.
 :)
 
 I don't think you can send a QMP command through command-line arguments, but
 you can easily start a QMP monitor on stdin/stdio.
 
 Example:
  $ (echo '{ execute: qmp_capabilities }';echo 
 '{execute:query-cpu-definitions}';echo '{execute:quit}';) | 
 ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -S -qmp stdio -machine none 
 -nographic
  {QMP: {version: {qemu: {micro: 50, minor: 4, major: 2}, 
 package: }, capabilities: []}}
  {return: {}}
  {return: [{name: Opteron_G5}, {name: Opteron_G4}, {name: 
 Opteron_G3}, {name: Opteron_G2}, {name: Opteron_G1}, {name: 
 Broadwell}, {name: Broadwell-noTSX}, {name: Haswell}, {name: 
 Haswell-noTSX}, {name: IvyBridge}, {name: SandyBridge}, {name: 
 Westmere}, {name: Nehalem}, {name: Penryn}, {name: Conroe}, 
 {name: n270}, {name: athlon}, {name: pentium3}, {name: 
 pentium2}, {name: pentium}, {name: 486}, {name: coreduo}, 
 {name: kvm32}, {name: qemu32}, {name: kvm64}, {name: 
 core2duo}, {name: phenom}, {name: qemu64}]}
  {return: {}}
  {timestamp: {seconds: 1440615228, microseconds: 854114}, event: 
 SHUTDOWN}
  $ 
 


ok, now I have a list of cpu models.
what I could do now  is fire up a qemu process for each model with enforce and 
look at the exit code to see if it is enforcable on my host.

for me this would work as well.

or add a qmp command that lists all enforcable models.

Thanks for your help.
Peter


 
 
 
 But, I wonder how to issue a QMP command before the vserver is actually
 running? Is there a common way to do it?
 What's a vserver?
 
 
 A Virtual Server.
 
 You mean the virtual machine? Yes, it is possible to run QMP commands before
 the machine is running (you just need to use -S). You can also use -machine
 none if you don't want any machine-specific initialization code to run at 
 all.
 
 -- 
 Eduardo



Re: [Qemu-devel] [PATCH] target-i386: add a list of enforceable CPU models to the help output

2015-08-26 Thread Peter Lieven
Am 26.08.2015 um 20:38 schrieb Eduardo Habkost:
 On Mon, Aug 24, 2015 at 09:36:23PM +0200, Peter Lieven wrote:
 Am 24.08.2015 um 17:46 schrieb Eric Blake:
 On 08/24/2015 03:17 AM, Peter Lieven wrote:
 this patch adds a probe that lists all enforceable and migrateable
 CPU models to the -cpu help output. The idea is to know a priory
 which CPU modules can be exposed to the user without loosing any
 feature flags.

 Signed-off-by: Peter Lieven p...@kamp.de
 ---
  target-i386/cpu.c | 49 +
  1 file changed, 49 insertions(+)
 Is this same sort of listing available through QMP? Parsing '-cpu help'
 output is undesirable from libvirt point of view.

 A good point. But is there a QMP command to list available CPU types?
 In this case it should be easy to extend.
 Yes, that's query-cpu-definitions. See past discussion at:
 http://thread.gmane.org/gmane.comp.emulators.qemu/332554

 Some of the assumptions at that thread changed. See:
 http://thread.gmane.org/gmane.comp.emulators.qemu/342582/focus=346651
 That means runnability should depend only on the accelerator type, and
 not on the machine-type anymore.

Thanks for the pointer. But is it possible to query cpu definitions without
a running Qemu? Like passing a QMP command on the commandline and
receive the answer on stdout?


 But, I wonder how to issue a QMP command before the vserver is actually
 running? Is there a common way to do it?
 What's a vserver?


A Virtual Server.

Thanks,
Peter




Re: [Qemu-devel] [PATCHv2] block/nfs: cache allocated filesize for read-only files

2015-08-26 Thread Peter Lieven
Am 26.08.2015 um 17:31 schrieb Jeff Cody:
 On Mon, Aug 24, 2015 at 10:13:16PM +0200, Max Reitz wrote:
 On 24.08.2015 21:34, Peter Lieven wrote:
 Am 24.08.2015 um 20:39 schrieb Max Reitz:
 On 24.08.2015 10:06, Peter Lieven wrote:
 If the file is readonly its not expected to grow so
 save the blocking call to nfs_fstat_async and use
 the value saved at connection time. Also important
 the monitor (and thus the main loop) will not hang
 if block device info is queried and the NFS share
 is unresponsive.

 Signed-off-by: Peter Lieven p...@kamp.de
 ---
 v1-v2: update cache on reopen_prepare [Max]

  block/nfs.c | 35 +++
  1 file changed, 35 insertions(+)
 Reviewed-by: Max Reitz mre...@redhat.com

 I hope you're ready for the Stale actual-size value with
 cache=direct,read-only=on,format=raw files on NFS reports. :-)
 actually a good point, maybe the cache should only be used if

 !(bs-open_flags  BDRV_O_NOCACHE)
 Good enough a point to fix it? ;-)

 Max

 It seems more inline with expected behavior, to add the cache checking
 in before using the size cache.  Would you be opposed to a v3 with
 this check added in?

Of course, will send it tomorrow.


 One other concern I have is similar to a concern Max raised earlier -
 about an external program modifying the raw image, while QEMU has it
 opened r/o.  In particular, I wonder about an NFS server making an
 image either sparse / non-sparse.  If it was exported read-only, it
 may be a valid assumption that this could be done safely, as it would
 not change the reported file size or contents, just the allocated size
 on disk.

This might be a use case. But if I allow caching the allocated filesize
might not always be correct. This is even the case on a NFS share mounted
through the kernel where some attributes a cached for some time.

Anyway, would it hurt here if the actual filesize was too small?
In fact it was incorrect since libnfs support was added :-)

Peter




Re: [Qemu-devel] CPU Model kvm64 and Windows2012R2

2015-08-24 Thread Peter Lieven

Am 24.08.2015 um 13:50 schrieb Paolo Bonzini:

Hi, some very old 64-bit processors had virtualization extensions but not 
lahf_lm. However, they are very rare, so it is probably a good idea to add it 
for the 2.5 machine type.


Maybe same accounts for sep ?

Will you send a patch?

BR,
Peter




[Qemu-devel] [PATCHv2] block/nfs: cache allocated filesize for read-only files

2015-08-24 Thread Peter Lieven
If the file is readonly its not expected to grow so
save the blocking call to nfs_fstat_async and use
the value saved at connection time. Also important
the monitor (and thus the main loop) will not hang
if block device info is queried and the NFS share
is unresponsive.

Signed-off-by: Peter Lieven p...@kamp.de
---
v1-v2: update cache on reopen_prepare [Max]

 block/nfs.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/block/nfs.c b/block/nfs.c
index 02eb4e4..a52e9d5 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -43,6 +43,7 @@ typedef struct NFSClient {
 int events;
 bool has_zero_init;
 AioContext *aio_context;
+blkcnt_t st_blocks;
 } NFSClient;
 
 typedef struct NFSRPC {
@@ -374,6 +375,7 @@ static int64_t nfs_client_open(NFSClient *client, const 
char *filename,
 }
 
 ret = DIV_ROUND_UP(st.st_size, BDRV_SECTOR_SIZE);
+client-st_blocks = st.st_blocks;
 client-has_zero_init = S_ISREG(st.st_mode);
 goto out;
 fail:
@@ -464,6 +466,10 @@ static int64_t 
nfs_get_allocated_file_size(BlockDriverState *bs)
 NFSRPC task = {0};
 struct stat st;
 
+if (bdrv_is_read_only(bs)) {
+return client-st_blocks * 512;
+}
+
 task.st = st;
 if (nfs_fstat_async(client-context, client-fh, nfs_co_generic_cb,
 task) != 0) {
@@ -484,6 +490,34 @@ static int nfs_file_truncate(BlockDriverState *bs, int64_t 
offset)
 return nfs_ftruncate(client-context, client-fh, offset);
 }
 
+/* Note that this will not re-establish a connection with the NFS server
+ * - it is effectively a NOP.  */
+static int nfs_reopen_prepare(BDRVReopenState *state,
+  BlockReopenQueue *queue, Error **errp)
+{
+NFSClient *client = state-bs-opaque;
+struct stat st;
+int ret = 0;
+
+if (state-flags  BDRV_O_RDWR  bdrv_is_read_only(state-bs)) {
+error_setg(errp, Cannot open a read-only mount as read-write);
+return -EACCES;
+}
+
+/* Update cache for read-only reopens */
+if (!(state-flags  BDRV_O_RDWR)) {
+ret = nfs_fstat(client-context, client-fh, st);
+if (ret  0) {
+error_setg(errp, Failed to fstat file: %s,
+   nfs_get_error(client-context));
+return ret;
+}
+client-st_blocks = st.st_blocks;
+}
+
+return 0;
+}
+
 static BlockDriver bdrv_nfs = {
 .format_name= nfs,
 .protocol_name  = nfs,
@@ -499,6 +533,7 @@ static BlockDriver bdrv_nfs = {
 .bdrv_file_open = nfs_file_open,
 .bdrv_close = nfs_file_close,
 .bdrv_create= nfs_file_create,
+.bdrv_reopen_prepare= nfs_reopen_prepare,
 
 .bdrv_co_readv  = nfs_co_readv,
 .bdrv_co_writev = nfs_co_writev,
-- 
1.9.1




[Qemu-devel] [PATCH] target-i386: add a list of enforceable CPU models to the help output

2015-08-24 Thread Peter Lieven
this patch adds a probe that lists all enforceable and migrateable
CPU models to the -cpu help output. The idea is to know a priory
which CPU modules can be exposed to the user without loosing any
feature flags.

Signed-off-by: Peter Lieven p...@kamp.de
---
 target-i386/cpu.c | 49 +
 1 file changed, 49 insertions(+)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index cfb8aa7..3a56d3f 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1961,6 +1961,45 @@ static void listflags(FILE *f, fprintf_function print, 
const char **featureset)
 }
 }
 
+/*
+ * Check if the CPU Definition is enforcable on the current host CPU
+ * and contains no unmigratable flags.
+ *
+ * Returns: true if the CPU can be enforced and migrated.
+ */
+static bool x86_cpu_enforce_and_migratable(X86CPUDefinition *def)
+{
+int i;
+for (i = 0; i  ARRAY_SIZE(feature_word_info); i++) {
+FeatureWordInfo *fw = feature_word_info[i];
+uint32_t eax, ebx, ecx, edx, host;
+host_cpuid(fw-cpuid_eax, 0, eax, ebx, ecx, edx);
+switch (fw-cpuid_reg) {
+case R_EAX:
+host = eax;
+break;
+case R_EBX:
+host = ebx;
+break;
+case R_ECX:
+host = ecx;
+break;
+case R_EDX:
+host = edx;
+break;
+default:
+return false;
+}
+if (def-features[i]  ~host) {
+return false;
+}
+if (def-features[i]  fw-unmigratable_flags) {
+return false;
+}
+}
+return true;
+}
+
 /* generate CPU information. */
 void x86_cpu_list(FILE *f, fprintf_function cpu_fprintf)
 {
@@ -1987,6 +2026,16 @@ void x86_cpu_list(FILE *f, fprintf_function cpu_fprintf)
 listflags(f, cpu_fprintf, fw-feat_names);
 (*cpu_fprintf)(f, \n);
 }
+
+(*cpu_fprintf)(f, \nEnforceable and migratable x86 CPU models in KVM 
mode:\n);
+(*cpu_fprintf)(f,  );
+for (i = 0; i  ARRAY_SIZE(builtin_x86_defs); i++) {
+def = builtin_x86_defs[i];
+if (x86_cpu_enforce_and_migratable(def)) {
+(*cpu_fprintf)(f,  %s, def-name);
+}
+}
+(*cpu_fprintf)(f, \n);
 }
 
 CpuDefinitionInfoList *arch_query_cpu_definitions(Error **errp)
-- 
1.9.1




[Qemu-devel] CPU Model kvm64 and Windows2012R2

2015-08-24 Thread Peter Lieven

Hi,

I noticed that Win2012R2 will not boot with CPU model kvm64.
The reason is that the feature lahf_lm is missing.

Is that an error or correct?

Thanks,
Peter



Re: [Qemu-devel] [PATCH] target-i386: add a list of enforceable CPU models to the help output

2015-08-24 Thread Peter Lieven
Am 24.08.2015 um 17:46 schrieb Eric Blake:
 On 08/24/2015 03:17 AM, Peter Lieven wrote:
 this patch adds a probe that lists all enforceable and migrateable
 CPU models to the -cpu help output. The idea is to know a priory
 which CPU modules can be exposed to the user without loosing any
 feature flags.

 Signed-off-by: Peter Lieven p...@kamp.de
 ---
  target-i386/cpu.c | 49 +
  1 file changed, 49 insertions(+)
 Is this same sort of listing available through QMP? Parsing '-cpu help'
 output is undesirable from libvirt point of view.


A good point. But is there a QMP command to list available CPU types?
In this case it should be easy to extend.

But, I wonder how to issue a QMP command before the vserver is actually
running? Is there a common way to do it?

Peter



Re: [Qemu-devel] [PATCH] target-i386: add a list of enforceable CPU models to the help output

2015-08-24 Thread Peter Lieven


 Am 24.08.2015 um 22:22 schrieb Andreas Färber afaer...@suse.de:
 
 Hi,
 
 Am 24.08.2015 um 03:17 schrieb Peter Lieven:
 this patch adds a probe that lists all enforceable and migrateable
 CPU models to the -cpu help output. The idea is to know a priory
 which CPU modules can be exposed to the user without loosing any
 
 models
 
 feature flags.
 
 Signed-off-by: Peter Lieven p...@kamp.de
 ---
 target-i386/cpu.c | 49 +
 1 file changed, 49 insertions(+)
 
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index cfb8aa7..3a56d3f 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 [...]
 @@ -1987,6 +2026,16 @@ void x86_cpu_list(FILE *f, fprintf_function 
 cpu_fprintf)
 listflags(f, cpu_fprintf, fw-feat_names);
 (*cpu_fprintf)(f, \n);
 }
 +
 +(*cpu_fprintf)(f, \nEnforceable and migratable x86 CPU models in KVM 
 mode:\n);
 +(*cpu_fprintf)(f,  );
 +for (i = 0; i  ARRAY_SIZE(builtin_x86_defs); i++) {
 +def = builtin_x86_defs[i];
 +if (x86_cpu_enforce_and_migratable(def)) {
 +(*cpu_fprintf)(f,  %s, def-name);
 +}
 +}
 +(*cpu_fprintf)(f, \n);
 }
 
 CpuDefinitionInfoList *arch_query_cpu_definitions(Error **errp)
 
 I don't think adding a new section is such a good idea here, it may add
 more confusion than it helps. I would rather suggest to add some
 annotation to the existing list.

I have also thought of this first, but found it confusing as well. But maybe 
you have a good idea how to format it?

I also think of Erics concern that it might be a good idea to get a json output 
of supported models somehow.

Peter

 
 Regards,
 Andreas
 
 -- 
 SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
 GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)



Re: [Qemu-devel] [PATCHv2] block/nfs: cache allocated filesize for read-only files

2015-08-24 Thread Peter Lieven
Am 24.08.2015 um 20:39 schrieb Max Reitz:
 On 24.08.2015 10:06, Peter Lieven wrote:
 If the file is readonly its not expected to grow so
 save the blocking call to nfs_fstat_async and use
 the value saved at connection time. Also important
 the monitor (and thus the main loop) will not hang
 if block device info is queried and the NFS share
 is unresponsive.

 Signed-off-by: Peter Lieven p...@kamp.de
 ---
 v1-v2: update cache on reopen_prepare [Max]

  block/nfs.c | 35 +++
  1 file changed, 35 insertions(+)
 Reviewed-by: Max Reitz mre...@redhat.com

 I hope you're ready for the Stale actual-size value with
 cache=direct,read-only=on,format=raw files on NFS reports. :-)
actually a good point, maybe the cache should only be used if

!(bs-open_flags  BDRV_O_NOCACHE)

for my cdrom stuff this is still ok.

Peter





Re: [Qemu-devel] [PATCH] block/nfs: cache allocated filesize for read-only files

2015-08-21 Thread Peter Lieven
Am 21.08.2015 um 18:46 schrieb Max Reitz:
 On 2015-08-21 at 00:49, Peter Lieven wrote:
 If the file is readonly its not expected to grow so
 save the blocking call to nfs_fstat_async and use
 the value saved at connection time. Also important
 the monitor (and thus the main loop) will not hang
 if block device info is queried and the NFS share
 is unresponsive.

 Signed-off-by: Peter Lieven p...@kamp.de
 ---
   block/nfs.c | 6 ++
   1 file changed, 6 insertions(+)

 First, I don't like the idea of this patch very much, but since I've never 
 used qemu's native NFS client, it's not up to me to decide whether it's worth 
 it.

I am trying to solve that a stale NFS Server with a CDROM ISO on it can hang 
Qemus main loop. One of the things that happens is that
you query info block in hmp or query-block via QMP and indirectly call 
bdrv_get_allocated_file_size and bang, Qemu hangs. Also I don't
know if its worth to issue an RPC call for each executing of info block.



 When it comes to breaking this, what comes to mind first is some external 
 program opening the image read-write outside of qemu and writing to it. Maybe 
 that's a case we generally don't want, but maybe that's something some people 
 do on purpose, knowing what they're doing (with raw images), you never know.

I would consider this bad behaviour. However, allocated file size shouldn't 
matter for raw images. If you resize the image from external you have to call 
bdrv_truncate anyway to make Qemu aware
of that change.


 Other than that, there's reopening. As far as I'm aware, qemu can reopen a 
 R/W image read-only, and if that happens, st_blocks may be stale.

Thats a valid point. But it can be solved be implementing .bdrv_reopen_prepare 
and update st_blocks there.

Thanks for you thoughts,
Peter



[Qemu-devel] [PATCH] block/nfs: cache allocated filesize for read-only files

2015-08-21 Thread Peter Lieven
If the file is readonly its not expected to grow so
save the blocking call to nfs_fstat_async and use
the value saved at connection time. Also important
the monitor (and thus the main loop) will not hang
if block device info is queried and the NFS share
is unresponsive.

Signed-off-by: Peter Lieven p...@kamp.de
---
 block/nfs.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/nfs.c b/block/nfs.c
index 02eb4e4..dc9ed21 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -43,6 +43,7 @@ typedef struct NFSClient {
 int events;
 bool has_zero_init;
 AioContext *aio_context;
+blkcnt_t st_blocks;
 } NFSClient;
 
 typedef struct NFSRPC {
@@ -374,6 +375,7 @@ static int64_t nfs_client_open(NFSClient *client, const 
char *filename,
 }
 
 ret = DIV_ROUND_UP(st.st_size, BDRV_SECTOR_SIZE);
+client-st_blocks = st.st_blocks;
 client-has_zero_init = S_ISREG(st.st_mode);
 goto out;
 fail:
@@ -464,6 +466,10 @@ static int64_t 
nfs_get_allocated_file_size(BlockDriverState *bs)
 NFSRPC task = {0};
 struct stat st;
 
+if (bdrv_is_read_only(bs)) {
+return client-st_blocks * 512;
+}
+
 task.st = st;
 if (nfs_fstat_async(client-context, client-fh, nfs_co_generic_cb,
 task) != 0) {
-- 
1.9.1




[Qemu-devel] [PATCH 2/2] ide/atapi: partially avoid deadlock if the storage backend is dead

2015-08-20 Thread Peter Lieven
the blk_drain_all() that is executed if the guest issues a DMA cancel
leads to a stuck main loop if the storage backend (e.g. a NFS share)
is unresponsive.

This scenario is a common case for CDROM images mounted from an
NFS share. In this case a broken NFS server can take down the
whole VM even if the mounted CDROM is not used and was just not
unmounted after usage.

This approach avoids the blk_drain_all for read-only media and
cancelles the AIO locally and makes the callback a NOP if the
original request is completed after the NFS share is responsive
again.

Signed-off-by: Peter Lieven p...@kamp.de
---
 hw/ide/pci.c | 32 ++--
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index d31ff88..a8b4175 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -240,21 +240,25 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
 /* Ignore writes to SSBM if it keeps the old value */
 if ((val  BM_CMD_START) != (bm-cmd  BM_CMD_START)) {
 if (!(val  BM_CMD_START)) {
-/*
- * We can't cancel Scatter Gather DMA in the middle of the
- * operation or a partial (not full) DMA transfer would reach
- * the storage so we wait for completion instead (we beahve
- * like if the DMA was completed by the time the guest trying
- * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
- * set).
- *
- * In the future we'll be able to safely cancel the I/O if the
- * whole DMA operation will be submitted to disk with a single
- * aio operation with preadv/pwritev.
- */
 if (bm-bus-dma-aiocb) {
-blk_drain_all();
-assert(bm-bus-dma-aiocb == NULL);
+if (!bdrv_is_read_only(bm-bus-dma-aiocb-bs)) {
+/* We can't cancel Scatter Gather DMA in the middle of the
+ * operation or a partial (not full) DMA transfer would
+ * reach the storage so we wait for completion instead
+ * (we beahve like if the DMA was completed by the time the
+ * guest trying to cancel dma with bmdma_cmd_writeb with
+ * BM_CMD_START not set). */
+blk_drain_all();
+assert(bm-bus-dma-aiocb == NULL);
+} else {
+/* On a read-only device (e.g. CDROM) we can't cause incon-
+ * sistencies and thus cancel the AIOCB locally and avoid
+ * to be called back later if the original request is
+ * completed. */
+BlockAIOCB *aiocb = bm-bus-dma-aiocb;
+aiocb-cb(aiocb-opaque, -ECANCELED);
+aiocb-cb = NULL;
+}
 }
 bm-status = ~BM_STATUS_DMAING;
 } else {
-- 
1.9.1




[Qemu-devel] [PATCH 0/2] ide/atapi: partially avoid deadlock if the storage backend is dead

2015-08-20 Thread Peter Lieven
the blk_drain_all() that is executed if the guest issues a DMA cancel
leads to a stuck main loop if the storage backend (e.g. a NFS share)
is unresponsive.

This scenario is a common case for CDROM images mounted from an
NFS share. In this case a broken NFS server can take down the
whole VM even if the mounted CDROM is not used and was just not
unmounted after usage.

This approach avoids the blk_drain_all for read-only media and
cancelles the AIO locally and makes the callback a NOP if the
original request is completed after the NFS share is responsive
again.

Peter Lieven (2):
  block/io: allow AIOCB without callback
  ide/atapi: partially avoid deadlock if the storage backend is dead

 block/io.c   |  8 ++--
 hw/ide/pci.c | 32 ++--
 2 files changed, 24 insertions(+), 16 deletions(-)

-- 
1.9.1




[Qemu-devel] [PATCH] block/nfs: fix calculation of allocated file size

2015-08-20 Thread Peter Lieven
st.st_blocks is always counted in 512 byte units. Do not
use st.st_blksize as multiplicator which may be larger.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/nfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/nfs.c b/block/nfs.c
index c026ff6..02eb4e4 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -475,7 +475,7 @@ static int64_t nfs_get_allocated_file_size(BlockDriverState 
*bs)
 aio_poll(client-aio_context, true);
 }
 
-return (task.ret  0 ? task.ret : st.st_blocks * st.st_blksize);
+return (task.ret  0 ? task.ret : st.st_blocks * 512);
 }
 
 static int nfs_file_truncate(BlockDriverState *bs, int64_t offset)
-- 
1.9.1




[Qemu-devel] [PATCH 1/2] block/io: allow AIOCB without callback

2015-08-20 Thread Peter Lieven
If the backend storage is unresponsive and we cancel a request due to
a timeout we cannot immediately destroy the AIOCB because the storage
might complete the original request laster if it is responsive again.
For this purpose allow to set the callback to NULL and ignore it in
this case.

Signed-off-by: Peter Lieven p...@kamp.de
---
 block/io.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index d4bc83b..e628581 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2007,7 +2007,9 @@ static void bdrv_aio_bh_cb(void *opaque)
 qemu_iovec_from_buf(acb-qiov, 0, acb-bounce, acb-qiov-size);
 }
 qemu_vfree(acb-bounce);
-acb-common.cb(acb-common.opaque, acb-ret);
+if (acb-common.cb) {
+acb-common.cb(acb-common.opaque, acb-ret);
+}
 qemu_bh_delete(acb-bh);
 acb-bh = NULL;
 qemu_aio_unref(acb);
@@ -2075,7 +2077,9 @@ static const AIOCBInfo bdrv_em_co_aiocb_info = {
 static void bdrv_co_complete(BlockAIOCBCoroutine *acb)
 {
 if (!acb-need_bh) {
-acb-common.cb(acb-common.opaque, acb-req.error);
+if (acb-common.cb) {
+acb-common.cb(acb-common.opaque, acb-req.error);
+}
 qemu_aio_unref(acb);
 }
 }
-- 
1.9.1




Re: [Qemu-devel] Help debugging a regression in KVM Module

2015-08-18 Thread Peter Lieven

Am 14.08.2015 um 22:01 schrieb Alex Bennée:

Peter Lieven p...@kamp.de writes:


Hi,

some time a go I stumbled across a regression in the KVM Module that has been 
introduced somewhere
between 3.17 and 3.19.

I have a rather old openSUSE guest with an XFS filesystem which realiably 
crashes after some live migrations.
I originally believed that the issue might be related to my setup with a 3.12 
host kernel and kvm-kmod 3.19,
but I now found that it is also still present with a 3.19 host kernel with 
included 3.19 kvm module.

My idea was to continue testing on a 3.12 host kernel and then bisect all 
commits to the kvm related parts.

Now my question is how to best bisect only kvm related changes (those
that go into kvm-kmod)?

In general I don't bother. As it is a bisection you eliminate half the
commits at a time you get their fairly quickly anyway. However you can
tell bisect which parts of the tree you car about:

   git bisect start -- arch/arm64/kvm include/linux/kvm* 
include/uapi/linux/kvm* virt/kvm/


After some experiments I was able to find out the bad commit that introduced 
the regression:

commit f30ebc312ca9def25650b4e1d01cdb425c310dca
Author: Radim Krčmář rkrc...@redhat.com
Date:   Thu Oct 30 15:06:47 2014 +0100

It seems that this optimisation is not working reliabliy after live migration. 
I can't reproduce if
I take a 3.19 kernel and revert this single commit.

Peter



Re: [Qemu-devel] Help debugging a regression in KVM Module

2015-08-18 Thread Peter Lieven


 Am 18.08.2015 um 17:25 schrieb Radim Krčmář rkrc...@redhat.com:
 
 2015-08-18 16:54+0200, Peter Lieven:
 After some experiments I was able to find out the bad commit that introduced 
 the regression:
 
 commit f30ebc312ca9def25650b4e1d01cdb425c310dca
 Author: Radim Krčmář rkrc...@redhat.com
 Date:   Thu Oct 30 15:06:47 2014 +0100
 
 It seems that this optimisation is not working reliabliy after live 
 migration. I can't reproduce if
 I take a 3.19 kernel and revert this single commit.
 
 Hello, this bug has gone unnoticed for a long time so it is fixed only
 since v4.1 (and v3.19.stable was dead at that point).

thanks for the pointer. i noticed the regression some time ago, but never found 
the time to debug. some distros rely on 3.19 e.g. Ubuntu LTS 14.04.2. I will 
try to ping the maintainer.

Peter

 
 commit b6ac069532218027f2991cba01d7a72a200688b0
 Author: Radim Krčmář rkrc...@redhat.com
 Date:   Fri Jun 5 20:57:41 2015 +0200
 
KVM: x86: fix lapic.timer_mode on restore
 
lapic.timer_mode was not properly initialized after migration, which
broke few useful things, like login, by making every sleep eternal.
 
Fix this by calling apic_update_lvtt in kvm_apic_post_state_restore.
 
There are other slowpaths that update lvtt, so this patch makes sure
something similar doesn't happen again by calling apic_update_lvtt
after every modification.
 
Cc: sta...@vger.kernel.org
Fixes: f30ebc312ca9 (KVM: x86: optimize some accesses to LVTT and SPIV)
Signed-off-by: Radim Krčmář rkrc...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com



Re: [Qemu-devel] Help debugging a regression in KVM Module

2015-08-17 Thread Peter Lieven

Am 14.08.2015 um 22:01 schrieb Alex Bennée:

Peter Lieven p...@kamp.de writes:


Hi,

some time a go I stumbled across a regression in the KVM Module that has been 
introduced somewhere
between 3.17 and 3.19.

I have a rather old openSUSE guest with an XFS filesystem which realiably 
crashes after some live migrations.
I originally believed that the issue might be related to my setup with a 3.12 
host kernel and kvm-kmod 3.19,
but I now found that it is also still present with a 3.19 host kernel with 
included 3.19 kvm module.

My idea was to continue testing on a 3.12 host kernel and then bisect all 
commits to the kvm related parts.

Now my question is how to best bisect only kvm related changes (those
that go into kvm-kmod)?

In general I don't bother. As it is a bisection you eliminate half the
commits at a time you get their fairly quickly anyway. However you can
tell bisect which parts of the tree you car about:

   git bisect start -- arch/arm64/kvm include/linux/kvm* 
include/uapi/linux/kvm* virt/kvm/


Yes, I just have to find out how exactly that works out if I want to bisect the 
linux submodule
of the kvm-kmod repository. But thanks for the pointer on how to limit the 
directories.

Thanks,
Peter




Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-08-15 Thread Peter Lieven
Am 14.08.2015 um 16:45 schrieb Peter Lieven:
 Am 14.08.2015 um 16:08 schrieb Kevin Wolf:
 Am 14.08.2015 um 15:43 hat Peter Lieven geschrieben:
 Am 22.06.2015 um 23:54 schrieb John Snow:
 On 06/22/2015 09:09 AM, Peter Lieven wrote:
 Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
 On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven p...@kamp.de wrote:
 Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
 On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven p...@kamp.de wrote:
 Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
 Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
 Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
 Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
 Thread 2 (Thread 0x75550700 (LWP 2636)):
 #0  0x75d87aa3 in ppoll () from
 /lib/x86_64-linux-gnu/libc.so.6
 No symbol table info available.
 #1  0x55955d91 in qemu_poll_ns (fds=0x563889c0,
 nfds=3,
   timeout=4999424576) at qemu-timer.c:326
   ts = {tv_sec = 4, tv_nsec = 999424576}
   tvsec = 4
 #2  0x55956feb in aio_poll (ctx=0x563528e0,
 blocking=true)
   at aio-posix.c:231
   node = 0x0
   was_dispatching = false
   ret = 1
   progress = false
 #3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0,
 offset=4292007936,
   qiov=0x7554f760, is_write=false, flags=0) at
 block.c:2699
   aio_context = 0x563528e0
   co = 0x563888a0
   rwco = {bs = 0x5637eae0, offset = 4292007936,
 qiov = 0x7554f760, is_write = false, ret =
 2147483647,
 flags = 0}
 #4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0,
 sector_num=8382828,
   buf=0x744cc800 (, nb_sectors=4, is_write=false,
 flags=0)
   at block.c:2722
   qiov = {iov = 0x7554f780, niov = 1, nalloc = -1,
 size =
 2048}
   iov = {iov_base = 0x744cc800, iov_len = 2048}
 #5  0x5594b008 in bdrv_read (bs=0x5637eae0,
 sector_num=8382828,
   buf=0x744cc800 (, nb_sectors=4) at block.c:2730
 No locals.
 #6  0x5599acef in blk_read (blk=0x56376820,
 sector_num=8382828,
   buf=0x744cc800 (, nb_sectors=4) at
 block/block-backend.c:404
 No locals.
 #7  0x55833ed2 in cd_read_sector (s=0x56408f88,
 lba=2095707,
   buf=0x744cc800 (, sector_size=2048) at
 hw/ide/atapi.c:116
   ret = 32767
 Here is the problem: The ATAPI emulation uses synchronous
 blk_read()
 instead of the AIO or coroutine interfaces. This means that it
 keeps
 polling for request completion while it holds the BQL until the
 request
 is completed.
 I will look at this.
 I need some further help. My way to emulate a hung NFS Server is to
 block it in the Firewall. Currently I face the problem that I
 cannot mount
 a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
 tried with
 a kernel NFS mount). It reads a few sectors and then stalls (maybe
 another
 bug):

 (gdb) thread apply all bt full

 Thread 3 (Thread 0x70c21700 (LWP 29710)):
 #0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at
 util/qemu-thread-posix.c:120
  err = optimized out
  __func__ = qemu_cond_broadcast
 #1  0x55911164 in rfifolock_unlock
 (r=r@entry=0x56259910) at
 util/rfifolock.c:75
  __PRETTY_FUNCTION__ = rfifolock_unlock
 #2  0x55875921 in aio_context_release
 (ctx=ctx@entry=0x562598b0)
 at async.c:329
 No locals.
 #3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0,
 blocking=blocking@entry=true) at aio-posix.c:272
  node = optimized out
  was_dispatching = false
  i = optimized out
  ret = optimized out
  progress = false
  timeout = 611734526
  __PRETTY_FUNCTION__ = aio_poll
 #4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0,
 offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0,
 is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
 block/io.c:552
  aio_context = 0x562598b0
  co = optimized out
  rwco = {bs = 0x5627c0f0, offset = 7038976, qiov =
 0x70c208f0, is_write = false, ret = 2147483647, flags =
 (unknown: 0)}
 #5  0x558bc533 in bdrv_rw_co (bs=0x5627c0f0,
 sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
 nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
  flags=flags@entry=(unknown: 0)) at block/io.c:575
  qiov = {iov = 0x70c208e0, niov = 1, nalloc = -1, size
 = 2048}
  iov = {iov_base = 0x57874800, iov_len = 2048}
 #6  0x558bc593 in bdrv_read (bs=optimized out,
 sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
 nb_sectors=nb_sectors@entry=4) at block/io.c:583
 No locals.
 #7  0x558af75d in blk_read (blk=optimized out,
 sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
 nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
  ret = optimized out
 #8  0x557abb88

[Qemu-devel] Help debugging a regression in KVM Module

2015-08-14 Thread Peter Lieven
Hi,

some time a go I stumbled across a regression in the KVM Module that has been 
introduced somewhere
between 3.17 and 3.19.

I have a rather old openSUSE guest with an XFS filesystem which realiably 
crashes after some live migrations.
I originally believed that the issue might be related to my setup with a 3.12 
host kernel and kvm-kmod 3.19,
but I now found that it is also still present with a 3.19 host kernel with 
included 3.19 kvm module.

My idea was to continue testing on a 3.12 host kernel and then bisect all 
commits to the kvm related parts.

Now my question is how to best bisect only kvm related changes (those that go 
into kvm-kmod)?

Thanks,
Peter




[Qemu-devel] [PATCH] block/iscsi: validate block size returned from target

2015-08-14 Thread Peter Lieven
It has been reported that at least tgtd returns a block size of 0
for LUN 0. To avoid running into divide by zero later on and protect
against other problematic block sizes validate the block size right
at connection time.

Cc: qemu-sta...@nongnu.org
Reported-by: Andrey Korolyov and...@xdel.ru
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c | 4 
 dtc   | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 5002916..fac3a7a 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1214,6 +1214,10 @@ static void iscsi_readcapacity_sync(IscsiLun *iscsilun, 
Error **errp)
 
 if (task == NULL || task-status != SCSI_STATUS_GOOD) {
 error_setg(errp, iSCSI: failed to send readcapacity10 command.);
+} else if (!iscsilun-block_size ||
+   iscsilun-block_size % BDRV_SECTOR_SIZE) {
+error_setg(errp, iSCSI: the target returned an invalid 
+   block size of %d., iscsilun-block_size);
 }
 if (task) {
 scsi_free_scsi_task(task);



Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-08-14 Thread Peter Lieven
Am 22.06.2015 um 23:54 schrieb John Snow:

 On 06/22/2015 09:09 AM, Peter Lieven wrote:
 Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
 On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven p...@kamp.de wrote:
 Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
 On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven p...@kamp.de wrote:
 Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
 Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
 Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
 Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
 Thread 2 (Thread 0x75550700 (LWP 2636)):
 #0  0x75d87aa3 in ppoll () from
 /lib/x86_64-linux-gnu/libc.so.6
 No symbol table info available.
 #1  0x55955d91 in qemu_poll_ns (fds=0x563889c0,
 nfds=3,
   timeout=4999424576) at qemu-timer.c:326
   ts = {tv_sec = 4, tv_nsec = 999424576}
   tvsec = 4
 #2  0x55956feb in aio_poll (ctx=0x563528e0,
 blocking=true)
   at aio-posix.c:231
   node = 0x0
   was_dispatching = false
   ret = 1
   progress = false
 #3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0,
 offset=4292007936,
   qiov=0x7554f760, is_write=false, flags=0) at
 block.c:2699
   aio_context = 0x563528e0
   co = 0x563888a0
   rwco = {bs = 0x5637eae0, offset = 4292007936,
 qiov = 0x7554f760, is_write = false, ret =
 2147483647,
 flags = 0}
 #4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0,
 sector_num=8382828,
   buf=0x744cc800 (, nb_sectors=4, is_write=false,
 flags=0)
   at block.c:2722
   qiov = {iov = 0x7554f780, niov = 1, nalloc = -1,
 size =
 2048}
   iov = {iov_base = 0x744cc800, iov_len = 2048}
 #5  0x5594b008 in bdrv_read (bs=0x5637eae0,
 sector_num=8382828,
   buf=0x744cc800 (, nb_sectors=4) at block.c:2730
 No locals.
 #6  0x5599acef in blk_read (blk=0x56376820,
 sector_num=8382828,
   buf=0x744cc800 (, nb_sectors=4) at
 block/block-backend.c:404
 No locals.
 #7  0x55833ed2 in cd_read_sector (s=0x56408f88,
 lba=2095707,
   buf=0x744cc800 (, sector_size=2048) at
 hw/ide/atapi.c:116
   ret = 32767
 Here is the problem: The ATAPI emulation uses synchronous
 blk_read()
 instead of the AIO or coroutine interfaces. This means that it
 keeps
 polling for request completion while it holds the BQL until the
 request
 is completed.
 I will look at this.
 I need some further help. My way to emulate a hung NFS Server is to
 block it in the Firewall. Currently I face the problem that I
 cannot mount
 a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
 tried with
 a kernel NFS mount). It reads a few sectors and then stalls (maybe
 another
 bug):

 (gdb) thread apply all bt full

 Thread 3 (Thread 0x70c21700 (LWP 29710)):
 #0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at
 util/qemu-thread-posix.c:120
  err = optimized out
  __func__ = qemu_cond_broadcast
 #1  0x55911164 in rfifolock_unlock
 (r=r@entry=0x56259910) at
 util/rfifolock.c:75
  __PRETTY_FUNCTION__ = rfifolock_unlock
 #2  0x55875921 in aio_context_release
 (ctx=ctx@entry=0x562598b0)
 at async.c:329
 No locals.
 #3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0,
 blocking=blocking@entry=true) at aio-posix.c:272
  node = optimized out
  was_dispatching = false
  i = optimized out
  ret = optimized out
  progress = false
  timeout = 611734526
  __PRETTY_FUNCTION__ = aio_poll
 #4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0,
 offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0,
 is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
 block/io.c:552
  aio_context = 0x562598b0
  co = optimized out
  rwco = {bs = 0x5627c0f0, offset = 7038976, qiov =
 0x70c208f0, is_write = false, ret = 2147483647, flags =
 (unknown: 0)}
 #5  0x558bc533 in bdrv_rw_co (bs=0x5627c0f0,
 sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
 nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
  flags=flags@entry=(unknown: 0)) at block/io.c:575
  qiov = {iov = 0x70c208e0, niov = 1, nalloc = -1, size
 = 2048}
  iov = {iov_base = 0x57874800, iov_len = 2048}
 #6  0x558bc593 in bdrv_read (bs=optimized out,
 sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
 nb_sectors=nb_sectors@entry=4) at block/io.c:583
 No locals.
 #7  0x558af75d in blk_read (blk=optimized out,
 sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
 nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
  ret = optimized out
 #8  0x557abb88 in cd_read_sector (sector_size=optimized out,
 buf=0x57874800 (, lba=3437, s=0x5760db70) at
 hw/ide/atapi.c:116
  ret = optimized out
 #9

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-08-14 Thread Peter Lieven
Am 14.08.2015 um 16:08 schrieb Kevin Wolf:
 Am 14.08.2015 um 15:43 hat Peter Lieven geschrieben:
 Am 22.06.2015 um 23:54 schrieb John Snow:
 On 06/22/2015 09:09 AM, Peter Lieven wrote:
 Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
 On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven p...@kamp.de wrote:
 Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
 On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven p...@kamp.de wrote:
 Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
 Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
 Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
 Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
 Thread 2 (Thread 0x75550700 (LWP 2636)):
 #0  0x75d87aa3 in ppoll () from
 /lib/x86_64-linux-gnu/libc.so.6
 No symbol table info available.
 #1  0x55955d91 in qemu_poll_ns (fds=0x563889c0,
 nfds=3,
   timeout=4999424576) at qemu-timer.c:326
   ts = {tv_sec = 4, tv_nsec = 999424576}
   tvsec = 4
 #2  0x55956feb in aio_poll (ctx=0x563528e0,
 blocking=true)
   at aio-posix.c:231
   node = 0x0
   was_dispatching = false
   ret = 1
   progress = false
 #3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0,
 offset=4292007936,
   qiov=0x7554f760, is_write=false, flags=0) at
 block.c:2699
   aio_context = 0x563528e0
   co = 0x563888a0
   rwco = {bs = 0x5637eae0, offset = 4292007936,
 qiov = 0x7554f760, is_write = false, ret =
 2147483647,
 flags = 0}
 #4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0,
 sector_num=8382828,
   buf=0x744cc800 (, nb_sectors=4, is_write=false,
 flags=0)
   at block.c:2722
   qiov = {iov = 0x7554f780, niov = 1, nalloc = -1,
 size =
 2048}
   iov = {iov_base = 0x744cc800, iov_len = 2048}
 #5  0x5594b008 in bdrv_read (bs=0x5637eae0,
 sector_num=8382828,
   buf=0x744cc800 (, nb_sectors=4) at block.c:2730
 No locals.
 #6  0x5599acef in blk_read (blk=0x56376820,
 sector_num=8382828,
   buf=0x744cc800 (, nb_sectors=4) at
 block/block-backend.c:404
 No locals.
 #7  0x55833ed2 in cd_read_sector (s=0x56408f88,
 lba=2095707,
   buf=0x744cc800 (, sector_size=2048) at
 hw/ide/atapi.c:116
   ret = 32767
 Here is the problem: The ATAPI emulation uses synchronous
 blk_read()
 instead of the AIO or coroutine interfaces. This means that it
 keeps
 polling for request completion while it holds the BQL until the
 request
 is completed.
 I will look at this.
 I need some further help. My way to emulate a hung NFS Server is to
 block it in the Firewall. Currently I face the problem that I
 cannot mount
 a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
 tried with
 a kernel NFS mount). It reads a few sectors and then stalls (maybe
 another
 bug):

 (gdb) thread apply all bt full

 Thread 3 (Thread 0x70c21700 (LWP 29710)):
 #0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at
 util/qemu-thread-posix.c:120
  err = optimized out
  __func__ = qemu_cond_broadcast
 #1  0x55911164 in rfifolock_unlock
 (r=r@entry=0x56259910) at
 util/rfifolock.c:75
  __PRETTY_FUNCTION__ = rfifolock_unlock
 #2  0x55875921 in aio_context_release
 (ctx=ctx@entry=0x562598b0)
 at async.c:329
 No locals.
 #3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0,
 blocking=blocking@entry=true) at aio-posix.c:272
  node = optimized out
  was_dispatching = false
  i = optimized out
  ret = optimized out
  progress = false
  timeout = 611734526
  __PRETTY_FUNCTION__ = aio_poll
 #4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0,
 offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0,
 is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
 block/io.c:552
  aio_context = 0x562598b0
  co = optimized out
  rwco = {bs = 0x5627c0f0, offset = 7038976, qiov =
 0x70c208f0, is_write = false, ret = 2147483647, flags =
 (unknown: 0)}
 #5  0x558bc533 in bdrv_rw_co (bs=0x5627c0f0,
 sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
 nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
  flags=flags@entry=(unknown: 0)) at block/io.c:575
  qiov = {iov = 0x70c208e0, niov = 1, nalloc = -1, size
 = 2048}
  iov = {iov_base = 0x57874800, iov_len = 2048}
 #6  0x558bc593 in bdrv_read (bs=optimized out,
 sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
 nb_sectors=nb_sectors@entry=4) at block/io.c:583
 No locals.
 #7  0x558af75d in blk_read (blk=optimized out,
 sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
 nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
  ret = optimized out
 #8  0x557abb88 in cd_read_sector (sector_size=optimized out,
 buf

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-08-14 Thread Peter Lieven
Am 14.08.2015 um 16:08 schrieb Kevin Wolf:
 Am 14.08.2015 um 15:43 hat Peter Lieven geschrieben:
 Am 22.06.2015 um 23:54 schrieb John Snow:
 On 06/22/2015 09:09 AM, Peter Lieven wrote:
 Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
 On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven p...@kamp.de wrote:
 Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
 On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven p...@kamp.de wrote:
 Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
 Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
 Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
 Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
 Thread 2 (Thread 0x75550700 (LWP 2636)):
 #0  0x75d87aa3 in ppoll () from
 /lib/x86_64-linux-gnu/libc.so.6
 No symbol table info available.
 #1  0x55955d91 in qemu_poll_ns (fds=0x563889c0,
 nfds=3,
   timeout=4999424576) at qemu-timer.c:326
   ts = {tv_sec = 4, tv_nsec = 999424576}
   tvsec = 4
 #2  0x55956feb in aio_poll (ctx=0x563528e0,
 blocking=true)
   at aio-posix.c:231
   node = 0x0
   was_dispatching = false
   ret = 1
   progress = false
 #3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0,
 offset=4292007936,
   qiov=0x7554f760, is_write=false, flags=0) at
 block.c:2699
   aio_context = 0x563528e0
   co = 0x563888a0
   rwco = {bs = 0x5637eae0, offset = 4292007936,
 qiov = 0x7554f760, is_write = false, ret =
 2147483647,
 flags = 0}
 #4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0,
 sector_num=8382828,
   buf=0x744cc800 (, nb_sectors=4, is_write=false,
 flags=0)
   at block.c:2722
   qiov = {iov = 0x7554f780, niov = 1, nalloc = -1,
 size =
 2048}
   iov = {iov_base = 0x744cc800, iov_len = 2048}
 #5  0x5594b008 in bdrv_read (bs=0x5637eae0,
 sector_num=8382828,
   buf=0x744cc800 (, nb_sectors=4) at block.c:2730
 No locals.
 #6  0x5599acef in blk_read (blk=0x56376820,
 sector_num=8382828,
   buf=0x744cc800 (, nb_sectors=4) at
 block/block-backend.c:404
 No locals.
 #7  0x55833ed2 in cd_read_sector (s=0x56408f88,
 lba=2095707,
   buf=0x744cc800 (, sector_size=2048) at
 hw/ide/atapi.c:116
   ret = 32767
 Here is the problem: The ATAPI emulation uses synchronous
 blk_read()
 instead of the AIO or coroutine interfaces. This means that it
 keeps
 polling for request completion while it holds the BQL until the
 request
 is completed.
 I will look at this.
 I need some further help. My way to emulate a hung NFS Server is to
 block it in the Firewall. Currently I face the problem that I
 cannot mount
 a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
 tried with
 a kernel NFS mount). It reads a few sectors and then stalls (maybe
 another
 bug):

 (gdb) thread apply all bt full

 Thread 3 (Thread 0x70c21700 (LWP 29710)):
 #0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at
 util/qemu-thread-posix.c:120
  err = optimized out
  __func__ = qemu_cond_broadcast
 #1  0x55911164 in rfifolock_unlock
 (r=r@entry=0x56259910) at
 util/rfifolock.c:75
  __PRETTY_FUNCTION__ = rfifolock_unlock
 #2  0x55875921 in aio_context_release
 (ctx=ctx@entry=0x562598b0)
 at async.c:329
 No locals.
 #3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0,
 blocking=blocking@entry=true) at aio-posix.c:272
  node = optimized out
  was_dispatching = false
  i = optimized out
  ret = optimized out
  progress = false
  timeout = 611734526
  __PRETTY_FUNCTION__ = aio_poll
 #4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0,
 offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0,
 is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
 block/io.c:552
  aio_context = 0x562598b0
  co = optimized out
  rwco = {bs = 0x5627c0f0, offset = 7038976, qiov =
 0x70c208f0, is_write = false, ret = 2147483647, flags =
 (unknown: 0)}
 #5  0x558bc533 in bdrv_rw_co (bs=0x5627c0f0,
 sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
 nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
  flags=flags@entry=(unknown: 0)) at block/io.c:575
  qiov = {iov = 0x70c208e0, niov = 1, nalloc = -1, size
 = 2048}
  iov = {iov_base = 0x57874800, iov_len = 2048}
 #6  0x558bc593 in bdrv_read (bs=optimized out,
 sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
 nb_sectors=nb_sectors@entry=4) at block/io.c:583
 No locals.
 #7  0x558af75d in blk_read (blk=optimized out,
 sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
 nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
  ret = optimized out
 #8  0x557abb88 in cd_read_sector (sector_size=optimized out,
 buf

Re: [Qemu-devel] Help debugging a regression in KVM Module

2015-08-14 Thread Peter Lieven
Am 14.08.2015 um 15:01 schrieb Paolo Bonzini:

 - Original Message -
 From: Peter Lieven p...@kamp.de
 To: qemu-devel@nongnu.org, k...@vger.kernel.org
 Cc: Paolo Bonzini pbonz...@redhat.com
 Sent: Friday, August 14, 2015 1:11:34 PM
 Subject: Help debugging a regression in KVM Module

 Hi,

 some time a go I stumbled across a regression in the KVM Module that has been
 introduced somewhere
 between 3.17 and 3.19.

 I have a rather old openSUSE guest with an XFS filesystem which realiably
 crashes after some live migrations.
 I originally believed that the issue might be related to my setup with a 3.12
 host kernel and kvm-kmod 3.19,
 but I now found that it is also still present with a 3.19 host kernel with
 included 3.19 kvm module.

 My idea was to continue testing on a 3.12 host kernel and then bisect all
 commits to the kvm related parts.

 Now my question is how to best bisect only kvm related changes (those that go
 into kvm-kmod)?
 I haven't forgotten this.  Sorry. :(

 Unfortunately I'll be away for three weeks, but I'll make it a priority
 when I'm back.

Its not time critical, but I think its worth investigating as it might affect
other systems as well - and maybe XFS is only very sensitive.

I suppose you are going on vacation. Enjoy!

Peter



Re: [Qemu-devel] [Bug 1483070] [NEW] VIRTIO Sequential Write IOPS limits not working

2015-08-13 Thread Peter Lieven

Am 10.08.2015 um 11:59 schrieb Stefan Hajnoczi stefa...@gmail.com:

 On Mon, Aug 10, 2015 at 12:15:25AM -, James Watson wrote:
 IOPS limit does not work for VIRTIO devices if the disk workload is a 
 sequential write.
 
 To confirm:
 IDE disk devices - the IOPS limit works fine. Disk transfer speed limit 
 works fine.
 VIRTIO disk devices - the IOPS limit works fine for random IO (write/read) 
 and sequential read, but not for sequential write. Disk transfer speed 
 limits work fine.
 
 Tested on Windows 7,10 and 2k12 (Fedora drivers used and here is the twist):
 virtio-win-0.1.96 (stable) or older won't limit write IO if workload is 
 sequential.
 virtio-win-0.1.105 (latest) or newer will limit but I have had two test 
 machines crash when under high workload using IOPS limit.
 
 For Linux:
 The issue is also apparent, tested on Ubuntu 14.04
 
 On the hypervisor (using KVM) machine I have tried with Qemu 2.1.2
 (3.16.0-4-amd64 - Debian 8) and Qemu 2.3.0 (3.19.8-1-pve - Proxmox 3.4
 and 4) using multiple machines but all are 64bit intel.
 
 Even though the latest VIRTIO guest drivers fix the problem, the guest
 drivers shouldn't be able to ignore the limits the host puts in place or
 am I missing something??
 
 This is probably due to I/O request merging:
 
 Your benchmark application may generate 32 x 4KB write requests, but
 they are merged by the virtio-blk device into just 1 x 128KB write
 request.
 
 The merging can happen inside the guest, depending on your benchmark
 application and the guest kernel's I/O stack.  It also happens in QEMU's
 virtio-blk emulation.
 
 The most recent versions of QEMU merge both read and write requests.
 Older versions only merged write requests.
 
 It would be more intuitive for request merging to happen after QEMU I/O
 throttling checks.  Currently QEMU's I/O queue plug/unplug isn't
 advanced enough to do the request merging, so it's done earlier in the
 virtio-blk emulation code.
 
 I've CCed Kevin Wolf, Alberto Garcia, and Peter Lieven who may have more
 thoughts on this.

I wouldn't  consider this behavior bad. Instead of virtio-blk merging the 
request
the guest could have issued big IOPS right from the beginning. If you are
concerned that big I/O is harming your storage, you can define a maximum
iops_size for throttling or limit the maximum bandwidth.

Peter





Re: [Qemu-devel] [Qemu-stable] Recent patches for 2.4

2015-08-06 Thread Peter Lieven

Am 05.08.2015 um 10:39 schrieb Paolo Bonzini:


On 05/08/2015 01:23, ronnie sahlberg wrote:

You only get 0 from this call if there are actual bytes available to read.

For context,  the problem was that


   75 static void nfs_process_read(void *arg)
   76 {
   77 NFSClient *client = arg;
   78 nfs_service(client-context, POLLIN);
   79 nfs_set_events(client);
   80 }

sometimes trigger and call nfs_service(POLLIN) eventhough the socket is
not readable.

Does read() return -1/EAGAIN or 0?

If it returns 0, then this is expected: it means that the other side has
shutdown the socket's write-side.  libnfs should either reconnect, or
disable POLLIN, treat all pending operations as aborted and stop
submitting more.


I also went on reproducing and can confirm, that I see FIONREAD return
0 bytes available when booting a CDROM in Qemu with libnfs before commit
cf420d3 (socket: use FIONREAD ioctl only for UDP). But a lot has changed
in rpc_read_from_socket in libnfs since we dropped FIONREAD. Most importantly
we changed from recv without flags to recv with MSG_DONTWAIT. And with
current master I never receive spurious readiness any more. So there is
nothing we need to fix here except from handling recv return 0 bytes and
then reconnect. This fix is already in the current master of libnfs. And this
solves the deadlock in qemu-img I have observed.

Thanks for you help,
Peter



Re: [Qemu-devel] [Qemu-stable] Recent patches for 2.4

2015-08-04 Thread Peter Lieven

Am 04.08.2015 um 14:29 schrieb Peter Lieven:

Am 04.08.2015 um 14:09 schrieb Paolo Bonzini:


On 04/08/2015 13:57, Peter Lieven wrote:

Okay, what I found out is that in aio_poll I get revents = POLLIN for
the nfs file descriptor. But there is no data available on the socket.

Does read return 0 or EAGAIN?

If it returns EAGAIN, the bug is in the QEMU main loop or the kernel.
It should never happen that poll returns POLLIN and read returns EAGAIN.

If it returns 0, it means the other side called shutdown(fd, SHUT_WR).
Then I think the bug is in the libnfs driver or more likely libnfs.  You
should stop polling the POLLIN event after read has returned 0 once.


You might be right. Ronnie originally used the FIONREAD ioctl before every read 
and considered
the socket as disconnected if the available bytes returned where 0.
I found that I get available bytes == 0 from that ioctl even if the socket was 
not closed.
This seems to be some kind of bug in Linux - at least what I have thought.

See BUGS in the select(2) manpage.

   Under Linux, select() may report a socket file descriptor as ready for 
reading, while nevertheless a subsequent read blocks. This could for example happen 
when data  has  arrived  but
   upon  examination  has  wrong checksum and is discarded. There may be 
other circumstances in which a file descriptor is spuriously reported as ready. 
 Thus it may be safer to use O_NON‐
   BLOCK on sockets that should not block.

I will debug further, but it seems to be that I receive a POLLIN even if there 
is no data available. I see 0 bytes from the recv call inside libnfs and 
continue without a deadlock - at least
so far.

Would it be a good idea to count the number of 0 bytes from recv and react 
after I received 0 bytes for a number of consecutive times?

And then: stop polling POLLIN or reconnect?


Okay, got it. Ronnie was using FIONREAD without checking for EAGAIN or EINTR.

I will send a patch for libnfs to reconnect if count == 0. Libiscsi is not 
affected, it reconnects if count is 0.

Peter




Re: [Qemu-devel] [Qemu-stable] Recent patches for 2.4

2015-08-04 Thread Peter Lieven

Am 04.08.2015 um 14:09 schrieb Paolo Bonzini:


On 04/08/2015 13:57, Peter Lieven wrote:

Okay, what I found out is that in aio_poll I get revents = POLLIN for
the nfs file descriptor. But there is no data available on the socket.

Does read return 0 or EAGAIN?

If it returns EAGAIN, the bug is in the QEMU main loop or the kernel.
It should never happen that poll returns POLLIN and read returns EAGAIN.

If it returns 0, it means the other side called shutdown(fd, SHUT_WR).
Then I think the bug is in the libnfs driver or more likely libnfs.  You
should stop polling the POLLIN event after read has returned 0 once.


You might be right. Ronnie originally used the FIONREAD ioctl before every read 
and considered
the socket as disconnected if the available bytes returned where 0.
I found that I get available bytes == 0 from that ioctl even if the socket was 
not closed.
This seems to be some kind of bug in Linux - at least what I have thought.

See BUGS in the select(2) manpage.

   Under Linux, select() may report a socket file descriptor as ready for 
reading, while nevertheless a subsequent read blocks. This could for example happen 
when data  has  arrived  but
   upon  examination  has  wrong checksum and is discarded. There may be 
other circumstances in which a file descriptor is spuriously reported as ready. 
 Thus it may be safer to use O_NON‐
   BLOCK on sockets that should not block.

I will debug further, but it seems to be that I receive a POLLIN even if there 
is no data available. I see 0 bytes from the recv call inside libnfs and 
continue without a deadlock - at least
so far.

Would it be a good idea to count the number of 0 bytes from recv and react 
after I received 0 bytes for a number of consecutive times?

And then: stop polling POLLIN or reconnect?

Thanks,
Peter



Re: [Qemu-devel] [Qemu-stable] Recent patches for 2.4

2015-08-04 Thread Peter Lieven

Am 04.08.2015 um 13:53 schrieb Paolo Bonzini:


On 04/08/2015 11:22, Peter Lieven wrote:

edec47c main-loop: fix qemu_notify_event for aio_notify optimization

Part of the above AioContext series.

So either the whole series or none of them I guess?

It's a separate bug, and theoretically it's there in 2.3.1 as well, but
no one ever reproduced it (it would hang in make check) so not
worthwhile.

Can you give me a pointer what the symtoms where?

If a thread tries to wake up the main thread using qemu_notify_event(),
the main thread will never wake up.  This for example could happen if
the first thread calls qemu_set_fd_handler() or timer_mod().


I have a qemu-img convert job on x86_64 that reproducibly hangs on
bdrv_drain_all at the end of the convert process.
I convert from nfs:// to local storage here. I try to figure out which BS
reports busy. Qemu here is still 2.2.1.

qemu-img does not use main-loop, so this cannot be the cause.

The AioContext bugs only happen when you have a thread executing the
main loop and one thread executing aio_poll, so they can also be
excluded as the cause of qemu-img problems.


Okay, what I found out is that in aio_poll I get revents = POLLIN for
the nfs file descriptor. But there is no data available on the socket.
But as a consequence progress is true and we loop here forever.

I have seen that is a common bug in Linux to return POLLIN on a fd
even there is no data available. I don't have this problem in general,
in this case no qemu-img or qemu process would ever terminate when
nfs is involved, but in this special case it happens reproducible.

Peter




Re: [Qemu-devel] [Qemu-stable] Recent patches for 2.4

2015-08-04 Thread Peter Lieven

Am 31.07.2015 um 11:29 schrieb Paolo Bonzini:


On 31/07/2015 10:35, Peter Lieven wrote:

Am 31.07.2015 um 10:22 schrieb Paolo Bonzini:

52c91da memory: do not add a reference to the owner of aliased regions

This could be backported, yes.  Feel free to send it to qemu-stable.
However, the bug was only visible with virtio 1.

Applies cleanly to 2.3.1-staging.

Good.


edec47c main-loop: fix qemu_notify_event for aio_notify optimization

Part of the above AioContext series.

So either the whole series or none of them I guess?

It's a separate bug, and theoretically it's there in 2.3.1 as well, but
no one ever reproduced it (it would hang in make check) so not worthwhile.


Can you give me a pointer what the symtoms where?
I have a qemu-img convert job on x86_64 that reproducibly hangs on 
bdrv_drain_all
at the end of the convert process.
I convert from nfs:// to local storage here. I try to figure out which BS
reports busy. Qemu here is still 2.2.1.

Peter



Re: [Qemu-devel] Forbid to pass lun 0 to iscsi driver

2015-08-03 Thread Peter Lieven

Am 02.08.2015 um 13:42 schrieb Andrey Korolyov:

Hello,

As we will never pass LUN#0 as a storage lun, it would be better to
prohibit this at least in iscsi.c, otherwise it will result in an FPU
exception and emulator crash:

traps: qemu-system-x86[32430] trap divide error ip:7f1dab7b5073
sp:7f1d713e4ae0 error:0 in block-iscsi.so[7f1dab7b+8000]

 353 static bool is_request_lun_aligned(int64_t sector_num, int nb_sectors,
 354   IscsiLun *iscsilun)
 355 {
 356 if ((sector_num * BDRV_SECTOR_SIZE) % iscsilun-block_size ||
 357 (nb_sectors * BDRV_SECTOR_SIZE) % iscsilun-block_size) {

As far as I can see the LUN#0 can be thrown out on a top level, as one
will never use it directly as an iSCSI backend. Please correct me if
I`m wrong in this assumption.


Hi Andrey,

LUN 0 is quite common on iSCSI targets. I think what causes the problem
is not the LUN ID, but the target which returns 0 for the blocksize. Which
target are you using?

Peter



Re: [Qemu-devel] Forbid to pass lun 0 to iscsi driver

2015-08-03 Thread Peter Lieven

Am 03.08.2015 um 09:47 schrieb Andrey Korolyov:

On Mon, Aug 3, 2015 at 9:45 AM, Peter Lieven p...@kamp.de wrote:

Am 02.08.2015 um 13:42 schrieb Andrey Korolyov:

Hello,

As we will never pass LUN#0 as a storage lun, it would be better to
prohibit this at least in iscsi.c, otherwise it will result in an FPU
exception and emulator crash:

traps: qemu-system-x86[32430] trap divide error ip:7f1dab7b5073
sp:7f1d713e4ae0 error:0 in block-iscsi.so[7f1dab7b+8000]

  353 static bool is_request_lun_aligned(int64_t sector_num, int
nb_sectors,
  354   IscsiLun *iscsilun)
  355 {
  356 if ((sector_num * BDRV_SECTOR_SIZE) % iscsilun-block_size ||
  357 (nb_sectors * BDRV_SECTOR_SIZE) % iscsilun-block_size) {

As far as I can see the LUN#0 can be thrown out on a top level, as one
will never use it directly as an iSCSI backend. Please correct me if
I`m wrong in this assumption.


Hi Andrey,

LUN 0 is quite common on iSCSI targets. I think what causes the problem
is not the LUN ID, but the target which returns 0 for the blocksize. Which
target are you using?

Peter

Hi Peter,

I`ve mistyped lun for tgtd upon volume hotplug, which resulted in an
accidental crash, there is nothing but human factor. Until only LUN0
may possess such unusual properties, I`d vote to explicitly work it
around instead of adding generic protection from volumes with
advertized zero block size.


I will work out a patch that forbids to mount a LUN with zero blocksize.

Peter



Re: [Qemu-devel] Forbid to pass lun 0 to iscsi driver

2015-08-03 Thread Peter Lieven

Am 03.08.2015 um 10:13 schrieb Paolo Bonzini:


On 03/08/2015 09:47, Andrey Korolyov wrote:

I`ve mistyped lun for tgtd upon volume hotplug, which resulted in an
accidental crash, there is nothing but human factor. Until only LUN0
may possess such unusual properties, I`d vote to explicitly work it
around instead of adding generic protection from volumes with
advertized zero block size.

It's only tgtd, as far as I know, that makes LUN0 special.  With lio
(in-kernel target), for example, you can associate any kind of LUN to
any LUN number.


I would error out in iscsi_readcapacity_sync if the block_size is
not dividable by BDRV_SECTOR_SIZE or 0. Ok?

Peter



[Qemu-devel] QMP greeting sometimes does not show up

2015-07-31 Thread Peter Lieven
Hi,

I recently had some cases of the QMP greeting not showing up on a vServer 
configured for incoming
migration. Not showing up in a reasonable amount of time (5 second timeout in 
my case) at least.
I retry and retry and suddenly in some cases after 30-60 seconds after vServer 
setup the greeting
is finally shown.

Has anyone observed sth like this before? The systems in question run Qemu 2.2.1

The qmp settings are like this:
-qmp tcp:0:3011,server,nowait,nodelay

Thanks,
Peter




[Qemu-devel] Recent patches for 2.4

2015-07-31 Thread Peter Lieven
Hi Paolo, hi Stefan,

you submitted some fixes for 2.4 recently. None of the folloing had qemu-stable 
in CC. Is this not stable material?

ca96ac4 AioContext: force event loop iteration using BH
a076972 AioContext: avoid leaking BHs on cleanup
fed105e virtio-blk-dataplane: delete bottom half before the AioContext is freed
05e514b AioContext: optimize clearing the EventNotifier
21a03d1 AioContext: fix broken placement of event_notifier_test_and_clear
eabc977 AioContext: fix broken ctx-dispatching optimization
52c91da memory: do not add a reference to the owner of aliased regions
edec47c main-loop: fix qemu_notify_event for aio_notify optimization
deb809e memory: count number of active VGA logging clients
ab28bd2 rcu: actually register threads that have RCU read-side critical sections
9172f42 qemu-char: handle EINTR for TCP character devices

Thanks,
Peter




Re: [Qemu-devel] Recent patches for 2.4

2015-07-31 Thread Peter Lieven
Am 31.07.2015 um 10:22 schrieb Paolo Bonzini:

 On 31/07/2015 10:12, Peter Lieven wrote:
 Hi Paolo, hi Stefan,

 you submitted some fixes for 2.4 recently. None of the folloing had 
 qemu-stable in CC. Is this not stable material?
 In general I send less and less patches to qemu-stable as we get closer
 to the new release, because I'm not sure anymore if they apply.

My background was that Michael Roth just send an announcement for an upcoming 
2.3.1 release.


 Let's look at your list:

 ca96ac4 AioContext: force event loop iteration using BH
 a076972 AioContext: avoid leaking BHs on cleanup
 fed105e virtio-blk-dataplane: delete bottom half before the AioContext is 
 freed
 05e514b AioContext: optimize clearing the EventNotifier
 21a03d1 AioContext: fix broken placement of event_notifier_test_and_clear
 eabc977 AioContext: fix broken ctx-dispatching optimization
 These could in principle be backported, but the bug was only visible on
 aarch64 hosts and only in relatively special circumstances (UEFI
 firmware, KVM, virtio-scsi, many disks) so I didn't Cc qemu-stable.

I was not aware that all those patches belong to a single bug on aarch64.


 52c91da memory: do not add a reference to the owner of aliased regions
 This could be backported, yes.  Feel free to send it to qemu-stable.
 However, the bug was only visible with virtio 1.

Applies cleanly to 2.3.1-staging.


 edec47c main-loop: fix qemu_notify_event for aio_notify optimization
 Part of the above AioContext series.

So either the whole series or none of them I guess?


 deb809e memory: count number of active VGA logging clients
 Not this one, it's new in 2.5.

 ab28bd2 rcu: actually register threads that have RCU read-side critical 
 sections
 Probably has conflicts, but can be sent to qemu-stable.

Has conflicts and I am not 100% sure to resolve them.


 9172f42 qemu-char: handle EINTR for TCP character devices
 Can be sent to qemu-stable if it applies to 2.4.

Applies cleanly to 2.3.1-staging.

Peter



Re: [Qemu-devel] [Qemu-stable] [PULL 0/3] Cve 2015 5154 patches

2015-07-27 Thread Peter Lieven

Am 27.07.2015 um 15:38 schrieb Kevin Wolf:

Am 27.07.2015 um 15:25 hat Stefan Priebe - Profihost AG geschrieben:

Am 27.07.2015 um 14:28 schrieb John Snow:


On 07/27/2015 08:10 AM, Stefan Priebe - Profihost AG wrote:

Am 27.07.2015 um 14:01 schrieb John Snow:

The following changes since commit f793d97e454a56d17e404004867985622ca1a63b:

   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
staging (2015-07-24 13:07:10 +0100)

are available in the git repository at:

   https://github.com/jnsnow/qemu.git tags/cve-2015-5154-pull-request

Any details on this CVE? Is RCE possible? Only if IDE is used?

Stefan


It's a heap overflow. The most likely outcome is a segfault, but the
guest is allowed to continue writing past the end of the PIO buffer at
its leisure. This makes it similar to CVE-2015-3456.

This CVE can be mitigated unlike CVE-2015-3456 by just removing the
CD-ROM drive until the patch can be applied.

Thanks. The seclist article explicitly references xen. So it does not
apply to qemu/kvm? Sorry for asking may be stupid questions.

The IDE emulation is shared between Xen and KVM, so both are affected.
The reason why the seclist mail only mentions Xen is probably because
the Xen security team posted it.

Meanwhile there is also a Red Hat CVE page available, which mentions
qemu-kvm: https://access.redhat.com/security/cve/CVE-2015-5154


The redhat advisory says that some Redhat versions are not affected
because they did not backport the upstream commit that introduced this issue .

Can you point out which commit exactly introduced the issue?

Thanks,
Peter


Re: [Qemu-devel] [Qemu-stable] [PULL 0/3] Cve 2015 5154 patches

2015-07-27 Thread Peter Lieven

Am 27.07.2015 um 15:54 schrieb Kevin Wolf:

Am 27.07.2015 um 15:46 hat Peter Lieven geschrieben:

Am 27.07.2015 um 15:38 schrieb Kevin Wolf:

 Am 27.07.2015 um 15:25 hat Stefan Priebe - Profihost AG geschrieben:

 Am 27.07.2015 um 14:28 schrieb John Snow:


 On 07/27/2015 08:10 AM, Stefan Priebe - Profihost AG wrote:

 Am 27.07.2015 um 14:01 schrieb John Snow:

 The following changes since commit 
f793d97e454a56d17e404004867985622ca1a63b:

   Merge remote-tracking branch 
'remotes/bonzini/tags/for-upstream' into staging (2015-07-24 13:07:10 +0100)

 are available in the git repository at:

   https://github.com/jnsnow/qemu.git 
tags/cve-2015-5154-pull-request

 Any details on this CVE? Is RCE possible? Only if IDE is used?

 Stefan


 It's a heap overflow. The most likely outcome is a segfault, but 
the
 guest is allowed to continue writing past the end of the PIO 
buffer at
 its leisure. This makes it similar to CVE-2015-3456.

 This CVE can be mitigated unlike CVE-2015-3456 by just removing the
 CD-ROM drive until the patch can be applied.

 Thanks. The seclist article explicitly references xen. So it does not
 apply to qemu/kvm? Sorry for asking may be stupid questions.

 The IDE emulation is shared between Xen and KVM, so both are affected.
 The reason why the seclist mail only mentions Xen is probably because
 the Xen security team posted it.

 Meanwhile there is also a Red Hat CVE page available, which mentions
 qemu-kvm: https://access.redhat.com/security/cve/CVE-2015-5154


The redhat advisory says that some Redhat versions are not affected
because they did not backport the upstream commit that introduced this issue
.

Can you point out which commit exactly introduced the issue?

That's the commit that introduced the code fixed in patch 2: Commit
ce560dcf ('ATAPI: STARTSTOPUNIT only eject/load media if powercondition
is 0').


Okay, so as far as I can see this is in any Qemu = 1.3.0.

Peter



Re: [Qemu-devel] [PATCH for-2.5 1/4] block: add BlockLimits.max_iov field

2015-07-08 Thread Peter Lieven

Am 08.07.2015 um 17:30 schrieb Stefan Hajnoczi:

The maximum number of struct iovec elements depends on the
BlockDriverState.  The raw-posix protocol has a maximum of IOV_MAX but
others could have different values.

Instead of assuming raw-posix and hardcoding IOV_MAX in several places,
put the limit into BlockLimits.

Cc: Peter Lieven p...@kamp.de
Suggested-by: Kevin Wolf kw...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
Peter Lieven: I think the SCSI LUN level does not have a maximum
scatter-gather segments constraint.  That is probably only at the HBA
level.  CCed you anyway in case you think block/iscsi.c should set the
max_iov field.


libiscsi will send the iovec array straight to readv and writev to
read/write from the TCP socket. So we need IOV_MAX here as well.



Kevin: The default is now INT_MAX.  This means non-raw-posix users will
now be able to merge more requests than before.  They were limited to
IOV_MAX previously.  This could expose limits in other BlockDrivers
which we weren't aware of...


Why rise the default to INT_MAX and not leave it at IOV_MAX?
Is there any case where we except that much iovectors coming in?

Peter




Re: [Qemu-devel] vpc size reporting problem

2015-07-07 Thread Peter Lieven

Am 07.07.2015 um 07:59 schrieb Chun Yan Liu:



On 7/7/2015 at 01:50 PM, in message 559b68b2.5060...@kamp.de, Peter Lieven

p...@kamp.de wrote:

Am 07.07.2015 um 03:50 schrieb Chun Yan Liu:

On 7/6/2015 at 06:42 PM, in message 559a5b79.4010...@kamp.de, Peter Lieven

p...@kamp.de wrote:

Am 06.07.2015 um 11:44 schrieb Chun Yan Liu:

While testing with a 1GB VHD file created on win7, found that the VHD file
size reported on Windows is different from that is reported by qemu-img
info or within a Linux KVM guest.

Created a dynamic VHD file on win7, on Windows, it is reported 1024MB
(2097152 sectors). But with qemu-img info or within a Linux KVM guest,
it is reported 1023MB (2096640 sectors).

The values in the footer_buf are as follows:
creator_app: win 
cylinders: 0x820 (2080)
heads: 0x10 (16)
cyl/sec: 0x3f (63)
current_size: 0x4000 (1G)

So, if using current_size, it's correct; but using CHS will get a smaller

size.

Should we add a check in this case and use current_size instead of
CHS?

As far as I remember the issue was and still is that there is no official

spec that says
use current_size in case A and CHS in case B.

Understand.


If currrent_size is greater than CHS and Windows would use CHS (we don't

know that) we might run into issues if Qemu uses current_size. In this
cas we would write data beyond the end of the container (from Windows
perspective).

That's right. The fact is in our testing we found Windows does not use CHS
but current_size (from testing result), we create and get the VHD parted on
Windows, then take the VHD file into Linux KVM guest, it fails to show

partition

table (since the reported disk size is shrinking, some of the partitions

extend

beyond the end of the disk).
  
Which version of Windows are you referring to?

Tested with WS2012R2 and Win7.


Which storage driver?

I had a look at the specs and in fact they more or less say: Use current_size
and if you have an ATA controller derive the disk size from CHS.

Peter




Re: [Qemu-devel] vpc size reporting problem

2015-07-07 Thread Peter Lieven

Am 07.07.2015 um 08:34 schrieb Chun Yan Liu:



On 7/7/2015 at 02:19 PM, in message 559B6F79.237 : 102 : 21807, Chun Yan Liu

wrote:
  

On 7/7/2015 at 02:03 PM, in message 559b6bbe.3050...@kamp.de, Peter Lieven

p...@kamp.de wrote:

Am 07.07.2015 um 07:59 schrieb Chun Yan Liu:
  

On 7/7/2015 at 01:50 PM, in message 559b68b2.5060...@kamp.de, Peter Lieven

p...@kamp.de wrote:

Am 07.07.2015 um 03:50 schrieb Chun Yan Liu:

On 7/6/2015 at 06:42 PM, in message 559a5b79.4010...@kamp.de, Peter Lieven

p...@kamp.de wrote:

Am 06.07.2015 um 11:44 schrieb Chun Yan Liu:

While testing with a 1GB VHD file created on win7, found that the VHD file
size reported on Windows is different from that is reported by qemu-img
info or within a Linux KVM guest.
  
Created a dynamic VHD file on win7, on Windows, it is reported 1024MB

(2097152 sectors). But with qemu-img info or within a Linux KVM guest,
it is reported 1023MB (2096640 sectors).
  
The values in the footer_buf are as follows:

creator_app: win 
cylinders: 0x820 (2080)
heads: 0x10 (16)
cyl/sec: 0x3f (63)
current_size: 0x4000 (1G)
  
So, if using current_size, it's correct; but using CHS will get a smaller

size.

Should we add a check in this case and use current_size instead of
CHS?
   
As far as I remember the issue was and still is that there is no official

spec that says
use current_size in case A and CHS in case B.

Understand.
  
   
If currrent_size is greater than CHS and Windows would use CHS (we don't

know that) we might run into issues if Qemu uses current_size. In this
cas we would write data beyond the end of the container (from Windows
perspective).

That's right. The fact is in our testing we found Windows does not use CHS
but current_size (from testing result), we create and get the VHD parted on
Windows, then take the VHD file into Linux KVM guest, it fails to show

partition

table (since the reported disk size is shrinking, some of the partitions

extend

beyond the end of the disk).
 
Which version of Windows are you referring to?

Tested with WS2012R2 and Win7.
   
Which storage driver?

And imported to a Win7 guest on KVM as IDE device, it's also reported as
1024MB (not CHS value, CHS is 1023MB).


And what storage driver reports 1023MB under Qemu?

Peter




[Qemu-devel] [PATCH V3] block/nfs: add support for setting debug level

2015-07-07 Thread Peter Lieven
upcoming libnfs versions will support logging debug messages. Add
support for it in qemu through a per-drive option.

Examples:
 qemu -drive if=virtio,file=nfs://...,file.debug=2
 qemu-img create -o debug=2 nfs://... 10G

Signed-off-by: Peter Lieven p...@kamp.de
---
v2-v3: use a per-drive option instead of a global one. [Stefan]
v1-v2: reworked patch to accept the debug level as a cmdline
parameter instead of an URI parameter [Stefan]

 block/nfs.c  | 28 
 qapi/block-core.json | 20 
 2 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index c026ff6..72a4247 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -233,6 +233,11 @@ static QemuOptsList runtime_opts = {
 .type = QEMU_OPT_STRING,
 .help = URL to the NFS file,
 },
+{
+.name = debug,
+.type = QEMU_OPT_NUMBER,
+.help = Set libnfs debug level (default 0 = no debug),
+},
 { /* end of list */ }
 },
 };
@@ -277,9 +282,9 @@ static void nfs_file_close(BlockDriverState *bs)
 }
 
 static int64_t nfs_client_open(NFSClient *client, const char *filename,
-   int flags, Error **errp)
+   int flags, QemuOpts *opts, Error **errp)
 {
-int ret = -EINVAL, i;
+int ret = -EINVAL, i, debug;
 struct stat st;
 URI *uri;
 QueryParams *qp = NULL;
@@ -343,6 +348,16 @@ static int64_t nfs_client_open(NFSClient *client, const 
char *filename,
 }
 }
 
+debug = qemu_opt_get_number(opts, debug, 0);
+if (debug) {
+#ifdef LIBNFS_FEATURE_DEBUG
+nfs_set_debug(client-context, debug);
+#else
+error_report(NFS Warning: The linked version of libnfs does
+  not support setting debug levels);
+#endif
+}
+
 ret = nfs_mount(client-context, uri-server, uri-path);
 if (ret  0) {
 error_setg(errp, Failed to mount nfs share: %s,
@@ -405,7 +420,7 @@ static int nfs_file_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 ret = nfs_client_open(client, qemu_opt_get(opts, filename),
   (flags  BDRV_O_RDWR) ? O_RDWR : O_RDONLY,
-  errp);
+  opts, errp);
 if (ret  0) {
 goto out;
 }
@@ -425,6 +440,11 @@ static QemuOptsList nfs_create_opts = {
 .type = QEMU_OPT_SIZE,
 .help = Virtual disk size
 },
+{
+.name = debug,
+.type = QEMU_OPT_NUMBER,
+.help = Set libnfs debug level (default 0 = no debug),
+},
 { /* end of list */ }
 }
 };
@@ -441,7 +461,7 @@ static int nfs_file_create(const char *url, QemuOpts *opts, 
Error **errp)
 total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
   BDRV_SECTOR_SIZE);
 
-ret = nfs_client_open(client, url, O_CREAT, errp);
+ret = nfs_client_open(client, url, O_CREAT, opts, errp);
 if (ret  0) {
 goto out;
 }
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 7b2efb8..f43a1b1 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1381,9 +1381,9 @@
 { 'enum': 'BlockdevDriver',
   'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
 'dmg', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
-'host_floppy', 'http', 'https', 'null-aio', 'null-co', 'parallels',
-'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'tftp', 'vdi', 'vhdx',
-'vmdk', 'vpc', 'vvfat' ] }
+'host_floppy', 'http', 'https', 'nfs', 'null-aio', 'null-co',
+'parallels', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'tftp',
+'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
 
 ##
 # @BlockdevOptionsBase
@@ -1635,6 +1635,18 @@
 '*vport': 'int',
 '*segment': 'str' } }
 
+##
+# @BlockdevOptionsNFS
+#
+# Driver specific block device options for NFS.
+#
+# @debug:   #optional set libnfs debug level (default: 0 = disabled)
+#
+# Since: 2.4
+##
+{ 'struct': 'BlockdevOptionsNFS',
+  'base': 'BlockdevOptionsFile',
+  'data': { '*debug': 'int' } }
 
 ##
 # @BlkdebugEvent
@@ -1816,7 +1828,7 @@
   'https':  'BlockdevOptionsFile',
 # TODO iscsi: Wait for structured options
 # TODO nbd: Should take InetSocketAddress for 'host'?
-# TODO nfs: Wait for structured options
+  'nfs':'BlockdevOptionsNFS',
   'null-aio':   'BlockdevOptionsNull',
   'null-co':'BlockdevOptionsNull',
   'parallels':  'BlockdevOptionsGenericFormat',
-- 
1.9.1




Re: [Qemu-devel] vpc size reporting problem

2015-07-07 Thread Peter Lieven

Am 07.07.2015 um 08:19 schrieb Chun Yan Liu:



On 7/7/2015 at 02:03 PM, in message 559b6bbe.3050...@kamp.de, Peter Lieven

p...@kamp.de wrote:

Am 07.07.2015 um 07:59 schrieb Chun Yan Liu:

On 7/7/2015 at 01:50 PM, in message 559b68b2.5060...@kamp.de, Peter Lieven

p...@kamp.de wrote:

Am 07.07.2015 um 03:50 schrieb Chun Yan Liu:

On 7/6/2015 at 06:42 PM, in message 559a5b79.4010...@kamp.de, Peter Lieven

p...@kamp.de wrote:

Am 06.07.2015 um 11:44 schrieb Chun Yan Liu:

While testing with a 1GB VHD file created on win7, found that the VHD file
size reported on Windows is different from that is reported by qemu-img
info or within a Linux KVM guest.

Created a dynamic VHD file on win7, on Windows, it is reported 1024MB
(2097152 sectors). But with qemu-img info or within a Linux KVM guest,
it is reported 1023MB (2096640 sectors).

The values in the footer_buf are as follows:
creator_app: win 
cylinders: 0x820 (2080)
heads: 0x10 (16)
cyl/sec: 0x3f (63)
current_size: 0x4000 (1G)

So, if using current_size, it's correct; but using CHS will get a smaller

size.

Should we add a check in this case and use current_size instead of
CHS?
  
As far as I remember the issue was and still is that there is no official

spec that says
use current_size in case A and CHS in case B.

Understand.

  
If currrent_size is greater than CHS and Windows would use CHS (we don't

know that) we might run into issues if Qemu uses current_size. In this
cas we would write data beyond the end of the container (from Windows
perspective).

That's right. The fact is in our testing we found Windows does not use CHS
but current_size (from testing result), we create and get the VHD parted on
Windows, then take the VHD file into Linux KVM guest, it fails to show

partition

table (since the reported disk size is shrinking, some of the partitions

extend

beyond the end of the disk).

Which version of Windows are you referring to?

Tested with WS2012R2 and Win7.
  
Which storage driver?

I'm not sure. See from device management - disk drive, it's named as Msft
virtual disk SCSI disk device, and from storage controller, it has a separate
controller named as Microsoft VHD HBA. Anyway, seems not controlled by
ATA/ATAPI.


Can you change that to IDE or ATA and check which size windows reports then?

Thanks,
Peter




Re: [Qemu-devel] [PULL for-2.4 3/7] block/nfs: add support for setting debug level

2015-07-07 Thread Peter Lieven

Am 07.07.2015 um 14:08 schrieb Stefan Hajnoczi:

From: Peter Lieven p...@kamp.de

upcoming libnfs versions will support logging debug messages. Add
support for it in qemu through a per-drive option.

Examples:
  qemu -drive if=virtio,file=nfs://...,file.debug=2
  qemu-img create -o debug=2 nfs://... 10G

Signed-off-by: Peter Lieven p...@kamp.de
Reviewed-by: Fam Zheng f...@redhat.com
Message-id: 1436251847-16875-1-git-send-email...@kamp.de
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com


Thanks for pulling this.

Question: Is it possible to insert a CDROM other than with the 'change' command?
The change command obviously does not support file.debug parameters...

Peter




Re: [Qemu-devel] [PULL for-2.4 3/7] block/nfs: add support for setting debug level

2015-07-07 Thread Peter Lieven

Am 07.07.2015 um 15:24 schrieb Stefan Hajnoczi:

On Tue, Jul 7, 2015 at 1:08 PM, Stefan Hajnoczi stefa...@redhat.com wrote:

From: Peter Lieven p...@kamp.de

upcoming libnfs versions will support logging debug messages. Add
support for it in qemu through a per-drive option.

Examples:
  qemu -drive if=virtio,file=nfs://...,file.debug=2
  qemu-img create -o debug=2 nfs://... 10G

Signed-off-by: Peter Lieven p...@kamp.de
Reviewed-by: Fam Zheng f...@redhat.com
Message-id: 1436251847-16875-1-git-send-email...@kamp.de
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
  block/nfs.c  | 28 
  qapi/block-core.json | 20 
  2 files changed, 40 insertions(+), 8 deletions(-)

Kevin has pointed out at the QAPI portion of this patch isn't ready
for prime time yet, so I will revert the patch and resend the pull
request.


Yes please, I was not aware that adding the QAPI part
has such an impact. I would appreciate if someone who has
more experience with the QAPI stuff would look at this and
make a proposal how to handle the NFS (and also the iSCSI)
case. I'm happy to look at the implementation then.

Peter




Re: [Qemu-devel] vpc size reporting problem

2015-07-06 Thread Peter Lieven

Am 06.07.2015 um 11:44 schrieb Chun Yan Liu:

While testing with a 1GB VHD file created on win7, found that the VHD file
size reported on Windows is different from that is reported by qemu-img
info or within a Linux KVM guest.

Created a dynamic VHD file on win7, on Windows, it is reported 1024MB
(2097152 sectors). But with qemu-img info or within a Linux KVM guest,
it is reported 1023MB (2096640 sectors).

The values in the footer_buf are as follows:
creator_app: win 
cylinders: 0x820 (2080)
heads: 0x10 (16)
cyl/sec: 0x3f (63)
current_size: 0x4000 (1G)

So, if using current_size, it's correct; but using CHS will get a smaller size.

Should we add a check in this case and use current_size instead of
CHS?


As far as I remember the issue was and still is that there is no official spec 
that says
use current_size in case A and CHS in case B.

If currrent_size is greater than CHS and Windows would use CHS (we don't
know that) we might run into issues if Qemu uses current_size. In this
cas we would write data beyond the end of the container (from Windows
perspective).



BTW, before commit 0444dceee, there is a similar check for 'd2v',
if creator_app is 'd2v', using size instead of CHS. But in commit
0444dceee, this check is removed. To me, the new check and 'd2v'
check seem to be two different cases, why removing 'd2v' check?


d2v always writes a magic combination of 65535x16x255 for CHS. So commit
0444dceee just changed the behaviour to always use current_size in
case CHS is 65535x16x255 (including d2v).

I personally wouldn't mind to always use current_size that is what e.g. 
VirtualBox does.
Or use current_size if it is greater than the size derived from CHS. But this 
might
break things.

Peter



Re: [Qemu-devel] vpc size reporting problem

2015-07-06 Thread Peter Lieven

Am 07.07.2015 um 03:50 schrieb Chun Yan Liu:



On 7/6/2015 at 06:42 PM, in message 559a5b79.4010...@kamp.de, Peter Lieven

p...@kamp.de wrote:

Am 06.07.2015 um 11:44 schrieb Chun Yan Liu:

While testing with a 1GB VHD file created on win7, found that the VHD file
size reported on Windows is different from that is reported by qemu-img
info or within a Linux KVM guest.

Created a dynamic VHD file on win7, on Windows, it is reported 1024MB
(2097152 sectors). But with qemu-img info or within a Linux KVM guest,
it is reported 1023MB (2096640 sectors).

The values in the footer_buf are as follows:
creator_app: win 
cylinders: 0x820 (2080)
heads: 0x10 (16)
cyl/sec: 0x3f (63)
current_size: 0x4000 (1G)

So, if using current_size, it's correct; but using CHS will get a smaller

size.

Should we add a check in this case and use current_size instead of
CHS?
  
As far as I remember the issue was and still is that there is no official

spec that says
use current_size in case A and CHS in case B.

Understand.

  
If currrent_size is greater than CHS and Windows would use CHS (we don't

know that) we might run into issues if Qemu uses current_size. In this
cas we would write data beyond the end of the container (from Windows
perspective).

That's right. The fact is in our testing we found Windows does not use CHS
but current_size (from testing result), we create and get the VHD parted on
Windows, then take the VHD file into Linux KVM guest, it fails to show partition
table (since the reported disk size is shrinking, some of the partitions extend
beyond the end of the disk).


Which version of Windows are you referring to?

I personally think that it might be ok to use current_size if its greater
than the size derived from CHS. Our current implementation when
creating an image assumes we have to choose CHS to be equal
or greater to current_size. This can cause the same issue in the
other direction.

Peter



[Qemu-devel] [PATCH] block/iscsi: restore compatiblity with libiscsi 1.9.0

2015-06-26 Thread Peter Lieven
RHEL7 and others are stuck with libiscsi 1.9.0 since there
unfortunately was an ABI breakage after that release.

Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c   |   31 ++-
 configure   |6 +++---
 qemu-options.hx |3 ++-
 3 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index f19a56a..551d583 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -168,6 +168,19 @@ static inline unsigned exp_random(double mean)
 return -mean * log((double)rand() / RAND_MAX);
 }
 
+/* SCSI_STATUS_TASK_SET_FULL and SCSI_STATUS_TIMEOUT were introduced
+ * in libiscsi 1.10.0 as part of an enum. The LIBISCSI_API_VERSION
+ * macro was introduced in 1.11.0. So use the API_VERSION macro as
+ * a hint that the macros are defined and define them ourselves
+ * otherwise to keep the required libiscsi version at 1.9.0 */
+#if !defined(LIBISCSI_API_VERSION)
+#define _SCSI_STATUS_TASK_SET_FULL  0x28
+#define _SCSI_STATUS_TIMEOUT0x0f02
+#else
+#define _SCSI_STATUS_TASK_SET_FULL  SCSI_STATUS_TASK_SET_FULL
+#define _SCSI_STATUS_TIMEOUTSCSI_STATUS_TIMEOUT
+#endif
+
 static void
 iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
 void *command_data, void *opaque)
@@ -188,11 +201,11 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int 
status,
 iTask-do_retry = 1;
 goto out;
 }
-if (status == SCSI_STATUS_BUSY || status == SCSI_STATUS_TIMEOUT ||
-status == SCSI_STATUS_TASK_SET_FULL) {
+if (status == SCSI_STATUS_BUSY || status == _SCSI_STATUS_TIMEOUT ||
+status == _SCSI_STATUS_TASK_SET_FULL) {
 unsigned retry_time =
 exp_random(iscsi_retry_times[iTask-retries - 1]);
-if (status == SCSI_STATUS_TIMEOUT) {
+if (status == _SCSI_STATUS_TIMEOUT) {
 /* make sure the request is rescheduled AFTER the
  * reconnect is initiated */
 retry_time = EVENT_INTERVAL * 2;
@@ -1358,7 +1371,7 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 QemuOpts *opts;
 Error *local_err = NULL;
 const char *filename;
-int i, ret = 0;
+int i, ret = 0, timeout = 0;
 
 opts = qemu_opts_create(runtime_opts, NULL, 0, error_abort);
 qemu_opts_absorb_qdict(opts, options, local_err);
@@ -1428,7 +1441,15 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto out;
 }
 
-iscsi_set_timeout(iscsi, parse_timeout(iscsi_url-target));
+/* timeout handling is broken in libiscsi before 1.15.0 */
+timeout = parse_timeout(iscsi_url-target);
+#if defined(LIBISCSI_API_VERSION)  LIBISCSI_API_VERSION = 20150621
+iscsi_set_timeout(iscsi, timeout);
+#else
+if (timeout) {
+error_report(iSCSI: ignoring timeout value for libiscsi 1.15.0);
+}
+#endif
 
 if (iscsi_full_connect_sync(iscsi, iscsi_url-portal, iscsi_url-lun) != 
0) {
 error_setg(errp, iSCSI: Failed to connect to LUN : %s,
diff --git a/configure b/configure
index 2ed9db2..222694f 100755
--- a/configure
+++ b/configure
@@ -3660,15 +3660,15 @@ if compile_prog   ; then
 fi
 
 ##
-# Do we have libiscsi = 1.10.0
+# Do we have libiscsi = 1.9.0
 if test $libiscsi != no ; then
-  if $pkg_config --atleast-version=1.10.0 libiscsi; then
+  if $pkg_config --atleast-version=1.9.0 libiscsi; then
 libiscsi=yes
 libiscsi_cflags=$($pkg_config --cflags libiscsi)
 libiscsi_libs=$($pkg_config --libs libiscsi)
   else
 if test $libiscsi = yes ; then
-  feature_not_found libiscsi Install libiscsi = 1.10.0
+  feature_not_found libiscsi Install libiscsi = 1.9.0
 fi
 libiscsi=no
   fi
diff --git a/qemu-options.hx b/qemu-options.hx
index ca37481..2d23330 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2293,7 +2293,8 @@ line or a configuration file.
 
 Since version Qemu 2.4 it is possible to specify a iSCSI request timeout to 
detect
 stalled requests and force a reestablishment of the session. The timeout
-is specified in seconds. The default is 0 which means no timeout.
+is specified in seconds. The default is 0 which means no timeout. Libiscsi
+1.15.0 is required for this feature.
 
 Example (without authentication):
 @example
-- 
1.7.9.5




Re: [Qemu-devel] [Qemu-block] [PATCH] block/nfs: add support for setting debug level

2015-06-26 Thread Peter Lieven
Am 26.06.2015 um 11:14 schrieb Stefan Hajnoczi:
 On Thu, Jun 25, 2015 at 03:26:46PM +0200, Peter Lieven wrote:
 Am 25.06.2015 um 15:18 schrieb Stefan Hajnoczi:
 On Tue, Jun 23, 2015 at 10:12:15AM +0200, Peter Lieven wrote:
 upcoming libnfs versions will support logging debug messages. Add
 support for it in qemu through an URL parameter.

 Signed-off-by: Peter Lieven p...@kamp.de
 ---
  block/nfs.c | 4 
  1 file changed, 4 insertions(+)

 diff --git a/block/nfs.c b/block/nfs.c
 index ca9e24e..f7388a3 100644
 --- a/block/nfs.c
 +++ b/block/nfs.c
 @@ -329,6 +329,10 @@ static int64_t nfs_client_open(NFSClient *client, 
 const char *filename,
  } else if (!strcmp(qp-p[i].name, readahead)) {
  nfs_set_readahead(client-context, val);
  #endif
 +#ifdef LIBNFS_FEATURE_DEBUG
 +} else if (!strcmp(qp-p[i].name, debug)) {
 +nfs_set_debug(client-context, val);
 +#endif
  } else {
  error_setg(errp, Unknown NFS parameter name: %s,
 qp-p[i].name);
 Untrusted users may be able to set these options since they are encoded
 in the URI.  I'm imagining a hosting or cloud scenario like OpenStack.

 A verbose debug level spams stderr and could consume a lot of disk
 space.

 (The uid and gid options are probably okay since the NFS server cannot
 trust the uid/gid coming from QEMU anyway.)

 I think we can merge this patch for QEMU 2.4 but I'd like to have a
 discussion about the security risk of encoding libnfs options in the
 URI.

 CCed Eric Blake in case libvirt is affected.

 Has anyone thought about this and what are the rules?
 Good point. In general I think there should be some kind of sanitization of 
 the parameters
 before they are passed on to Qemu. In our use case the user cannot pass any 
 kind of URIs himself,
 but this might be different in other backends. The readahead value is as 
 dangerous as well
 if not sanitized.

 I had a discussion with Ronnie in the past if we should encode parameters in 
 the URI or via environment
 like it is done in libiscsi. If I remember correctly we came up with the URI 
 parameters for better usability,
 but hadn't attack scenarios in mind.

 I am also open to only allow uncritical parameters in the URI and pass 
 others via a new -nfs cmdline option.
 Or limit max readahead and max debug level settable via URI.
 I'd feel safer if the option was in runtime_opts instead.  The the
 management tool has to pass them explicitly and the end user cannot
 influence them via the URI.

 If an option is needed both at open and create time, then it must also
 be parsed from nfs_file_create() opts.

Ok, I will send a patch that follows this approach. And also a second one to
limit the readahead size to a reasonable value.

Peter



[Qemu-devel] [PATCH] block/nfs: limit maximum readahead size to 1MB

2015-06-26 Thread Peter Lieven
a malicious caller could otherwise specify a very
large value via the URI and force libnfs to allocate
a large amount of memory for the readahead buffer.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/nfs.c |7 +++
 1 file changed, 7 insertions(+)

diff --git a/block/nfs.c b/block/nfs.c
index 43d48ae..4fb1937 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -35,6 +35,8 @@
 #include sysemu/sysemu.h
 #include nfsc/libnfs.h
 
+#define QEMU_NFS_MAX_READAHEAD_SIZE 1048576
+
 typedef struct NFSClient {
 struct nfs_context *context;
 struct nfsfh *fh;
@@ -351,6 +353,11 @@ static int64_t nfs_client_open(NFSClient *client, const 
char *filename,
 nfs_set_tcp_syncnt(client-context, val);
 #ifdef LIBNFS_FEATURE_READAHEAD
 } else if (!strcmp(qp-p[i].name, readahead)) {
+if (val  QEMU_NFS_MAX_READAHEAD_SIZE) {
+error_report(NFS Warning: Truncating NFS readahead
+  size to %d, QEMU_NFS_MAX_READAHEAD_SIZE);
+val = QEMU_NFS_MAX_READAHEAD_SIZE;
+}
 nfs_set_readahead(client-context, val);
 #endif
 } else {
-- 
1.7.9.5




[Qemu-devel] [PATCH V2] block/iscsi: restore compatiblity with libiscsi 1.9.0

2015-06-26 Thread Peter Lieven
RHEL7 and others are stuck with libiscsi 1.9.0 since there
unfortunately was an ABI breakage after that release.

Signed-off-by: Peter Lieven p...@kamp.de
---
v1-v2: - do not use reserved macro names [Paolo]
- change text in qemu-options.hx to libiscsi 1.15.0 or greater

 block/iscsi.c   |   32 +++-
 configure   |6 +++---
 qemu-options.hx |3 ++-
 3 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index f19a56a..fd4a66e 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -168,6 +168,19 @@ static inline unsigned exp_random(double mean)
 return -mean * log((double)rand() / RAND_MAX);
 }
 
+/* SCSI_STATUS_TASK_SET_FULL and SCSI_STATUS_TIMEOUT were introduced
+ * in libiscsi 1.10.0 as part of an enum. The LIBISCSI_API_VERSION
+ * macro was introduced in 1.11.0. So use the API_VERSION macro as
+ * a hint that the macros are defined and define them ourselves
+ * otherwise to keep the required libiscsi version at 1.9.0 */
+#if !defined(LIBISCSI_API_VERSION)
+#define QEMU_SCSI_STATUS_TASK_SET_FULL  0x28
+#define QEMU_SCSI_STATUS_TIMEOUT0x0f02
+#else
+#define QEMU_SCSI_STATUS_TASK_SET_FULL  SCSI_STATUS_TASK_SET_FULL
+#define QEMU_SCSI_STATUS_TIMEOUTSCSI_STATUS_TIMEOUT
+#endif
+
 static void
 iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
 void *command_data, void *opaque)
@@ -188,11 +201,12 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int 
status,
 iTask-do_retry = 1;
 goto out;
 }
-if (status == SCSI_STATUS_BUSY || status == SCSI_STATUS_TIMEOUT ||
-status == SCSI_STATUS_TASK_SET_FULL) {
+if (status == SCSI_STATUS_BUSY ||
+status == QEMU_SCSI_STATUS_TIMEOUT ||
+status == QEMU_SCSI_STATUS_TASK_SET_FULL) {
 unsigned retry_time =
 exp_random(iscsi_retry_times[iTask-retries - 1]);
-if (status == SCSI_STATUS_TIMEOUT) {
+if (status == QEMU_SCSI_STATUS_TIMEOUT) {
 /* make sure the request is rescheduled AFTER the
  * reconnect is initiated */
 retry_time = EVENT_INTERVAL * 2;
@@ -1358,7 +1372,7 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 QemuOpts *opts;
 Error *local_err = NULL;
 const char *filename;
-int i, ret = 0;
+int i, ret = 0, timeout = 0;
 
 opts = qemu_opts_create(runtime_opts, NULL, 0, error_abort);
 qemu_opts_absorb_qdict(opts, options, local_err);
@@ -1428,7 +1442,15 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto out;
 }
 
-iscsi_set_timeout(iscsi, parse_timeout(iscsi_url-target));
+/* timeout handling is broken in libiscsi before 1.15.0 */
+timeout = parse_timeout(iscsi_url-target);
+#if defined(LIBISCSI_API_VERSION)  LIBISCSI_API_VERSION = 20150621
+iscsi_set_timeout(iscsi, timeout);
+#else
+if (timeout) {
+error_report(iSCSI: ignoring timeout value for libiscsi 1.15.0);
+}
+#endif
 
 if (iscsi_full_connect_sync(iscsi, iscsi_url-portal, iscsi_url-lun) != 
0) {
 error_setg(errp, iSCSI: Failed to connect to LUN : %s,
diff --git a/configure b/configure
index 2ed9db2..222694f 100755
--- a/configure
+++ b/configure
@@ -3660,15 +3660,15 @@ if compile_prog   ; then
 fi
 
 ##
-# Do we have libiscsi = 1.10.0
+# Do we have libiscsi = 1.9.0
 if test $libiscsi != no ; then
-  if $pkg_config --atleast-version=1.10.0 libiscsi; then
+  if $pkg_config --atleast-version=1.9.0 libiscsi; then
 libiscsi=yes
 libiscsi_cflags=$($pkg_config --cflags libiscsi)
 libiscsi_libs=$($pkg_config --libs libiscsi)
   else
 if test $libiscsi = yes ; then
-  feature_not_found libiscsi Install libiscsi = 1.10.0
+  feature_not_found libiscsi Install libiscsi = 1.9.0
 fi
 libiscsi=no
   fi
diff --git a/qemu-options.hx b/qemu-options.hx
index ca37481..389cf64 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2293,7 +2293,8 @@ line or a configuration file.
 
 Since version Qemu 2.4 it is possible to specify a iSCSI request timeout to 
detect
 stalled requests and force a reestablishment of the session. The timeout
-is specified in seconds. The default is 0 which means no timeout.
+is specified in seconds. The default is 0 which means no timeout. Libiscsi
+1.15.0 or greater is required for this feature.
 
 Example (without authentication):
 @example
-- 
1.7.9.5




[Qemu-devel] [PATCH V2] block/nfs: add support for setting debug level

2015-06-26 Thread Peter Lieven
upcoming libnfs versions will support logging debug messages. Add
support for it in qemu through a cmdline parameter.

Example
 qemu -nfs debug=99 -cdrom nfs://...

Signed-off-by: Peter Lieven p...@kamp.de
---
v1-v2: reworked patch to accept the debug level as a cmdline
parameter instead of an URI parameter [Stefan]

 block/nfs.c |   40 
 qemu-options.hx |   21 +
 vl.c|8 
 3 files changed, 69 insertions(+)

diff --git a/block/nfs.c b/block/nfs.c
index ca9e24e..43d48ae 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -274,6 +274,30 @@ static void nfs_file_close(BlockDriverState *bs)
 nfs_client_close(client);
 }
 
+static void nfs_parse_options(NFSClient *client)
+{
+QemuOptsList *list;
+QemuOpts *opts;
+const char *debug;
+
+list = qemu_find_opts(nfs);
+if (list) {
+opts = QTAILQ_FIRST(list-head);
+if (opts) {
+debug = qemu_opt_get(opts, debug);
+if (debug) {
+#ifdef LIBNFS_FEATURE_DEBUG
+nfs_set_debug(client-context, atoi(debug));
+#else
+error_report(NFS Warning: The linked version of libnfs does
+  not support setting debug levels);
+#endif
+}
+}
+}
+}
+
+
 static int64_t nfs_client_open(NFSClient *client, const char *filename,
int flags, Error **errp)
 {
@@ -336,6 +360,8 @@ static int64_t nfs_client_open(NFSClient *client, const 
char *filename,
 }
 }
 
+nfs_parse_options(client);
+
 ret = nfs_mount(client-context, uri-server, uri-path);
 if (ret  0) {
 error_setg(errp, Failed to mount nfs share: %s,
@@ -501,9 +527,23 @@ static BlockDriver bdrv_nfs = {
 .bdrv_attach_aio_context= nfs_attach_aio_context,
 };
 
+static QemuOptsList qemu_nfs_opts = {
+.name = nfs,
+.head = QTAILQ_HEAD_INITIALIZER(qemu_nfs_opts.head),
+.desc = {
+{
+.name = debug,
+.type = QEMU_OPT_NUMBER,
+.help = Set libnfs debug level (default 0 = no debug),
+},
+{ /* end of list */ }
+},
+};
+
 static void nfs_block_init(void)
 {
 bdrv_register(bdrv_nfs);
+qemu_add_opts(qemu_nfs_opts);
 }
 
 block_init(nfs_block_init);
diff --git a/qemu-options.hx b/qemu-options.hx
index 389cf64..63c60e7 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2329,6 +2329,26 @@ STEXI
 iSCSI parameters such as username and password can also be specified via
 a configuration file. See qemu-doc for more information and examples.
 
+@item NFS
+NFS support allows QEMU to access NFS shares directly and use as
+images for the guest storage. Both disk and cdrom images are supported.
+
+Syntax for specifying NFS shares is
+``nfs://server-ip/export/filename''
+
+Example (setting deubg level to 2 and use ISO from NFS share as CDROM):
+@example
+qemu-system-i386 -nfs debug=2 -cdrom nfs://127.0.0.1/isos/cdimage.iso
+@end example
+
+NFS support is an optional feature of QEMU and only available when
+compiled and linked against libnfs.
+ETEXI
+DEF(nfs, HAS_ARG, QEMU_OPTION_nfs,
+-nfs   [debug=debug]\n
+NFS connection parameters\n, QEMU_ARCH_ALL)
+STEXI
+
 @item NBD
 QEMU supports NBD (Network Block Devices) both using TCP protocol as well
 as Unix Domain Sockets.
@@ -2480,6 +2500,7 @@ ETEXI
 STEXI
 @end table
 ETEXI
+DEFHEADING()
 
 DEFHEADING(Bluetooth(R) options:)
 STEXI
diff --git a/vl.c b/vl.c
index 9542095..4317572 100644
--- a/vl.c
+++ b/vl.c
@@ -3141,6 +3141,14 @@ int main(int argc, char **argv, char **envp)
 }
 break;
 #endif
+#ifdef CONFIG_LIBNFS
+case QEMU_OPTION_nfs:
+opts = qemu_opts_parse(qemu_find_opts(nfs), optarg, 0);
+if (!opts) {
+exit(1);
+}
+break;
+#endif
 #ifdef CONFIG_SLIRP
 case QEMU_OPTION_tftp:
 legacy_tftp_prefix = optarg;
-- 
1.7.9.5




Re: [Qemu-devel] [Qemu-block] [PATCH] block/nfs: add support for setting debug level

2015-06-25 Thread Peter Lieven

Am 25.06.2015 um 15:18 schrieb Stefan Hajnoczi:

On Tue, Jun 23, 2015 at 10:12:15AM +0200, Peter Lieven wrote:

upcoming libnfs versions will support logging debug messages. Add
support for it in qemu through an URL parameter.

Signed-off-by: Peter Lieven p...@kamp.de
---
  block/nfs.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/block/nfs.c b/block/nfs.c
index ca9e24e..f7388a3 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -329,6 +329,10 @@ static int64_t nfs_client_open(NFSClient *client, const 
char *filename,
  } else if (!strcmp(qp-p[i].name, readahead)) {
  nfs_set_readahead(client-context, val);
  #endif
+#ifdef LIBNFS_FEATURE_DEBUG
+} else if (!strcmp(qp-p[i].name, debug)) {
+nfs_set_debug(client-context, val);
+#endif
  } else {
  error_setg(errp, Unknown NFS parameter name: %s,
 qp-p[i].name);

Untrusted users may be able to set these options since they are encoded
in the URI.  I'm imagining a hosting or cloud scenario like OpenStack.

A verbose debug level spams stderr and could consume a lot of disk
space.

(The uid and gid options are probably okay since the NFS server cannot
trust the uid/gid coming from QEMU anyway.)

I think we can merge this patch for QEMU 2.4 but I'd like to have a
discussion about the security risk of encoding libnfs options in the
URI.

CCed Eric Blake in case libvirt is affected.

Has anyone thought about this and what are the rules?


Good point. In general I think there should be some kind of sanitization of the 
parameters
before they are passed on to Qemu. In our use case the user cannot pass any 
kind of URIs himself,
but this might be different in other backends. The readahead value is as 
dangerous as well
if not sanitized.

I had a discussion with Ronnie in the past if we should encode parameters in 
the URI or via environment
like it is done in libiscsi. If I remember correctly we came up with the URI 
parameters for better usability,
but hadn't attack scenarios in mind.

I am also open to only allow uncritical parameters in the URI and pass others 
via a new -nfs cmdline option.
Or limit max readahead and max debug level settable via URI.

Peter



Re: [Qemu-devel] [PATCH] block/iscsi: add support for request timeouts

2015-06-25 Thread Peter Lieven
Am 25.06.2015 um 23:08 schrieb Paolo Bonzini:

 On 16/06/2015 13:45, Peter Lieven wrote:
 libiscsi starting with 1.15 will properly support timeout of iscsi
 commands. The default will remain no timeout, but this can
 be changed via cmdline parameters, e.g.:

 qemu -iscsi timeout=30 -drive file=iscsi://...

 If a timeout occurs a reconnect is scheduled and the timed out command
 will be requeued for processing after a successful reconnect.

 The required API call iscsi_set_timeout is present since libiscsi
 1.10 which was released in October 2013. However, due to some bugs
 in the libiscsi code the use is not recommended before version 1.15.
 If so, QEMU should not allow it if libiscsi is older than 1.15.

Not accept a timeout parameter or ignore it and print a warning?


 Please note that this patch bumps the libiscsi requirement to 1.10
 to have all function and macros defined.
 This is not acceptable, unfortunately.  I explained this two months ago
 (https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01847.html)
 and it is still true.

Sorry, i missed that. Can you verify if 1.15.0 has a soname that
makes it possible to jump to 1.15.0 at some point in the future?


 libiscsi keeps breaking ABI compatibility and for a while did not even
 bump the soname when they do.  This makes it completely impossible for
 distros to upgrade to a newer libiscsi, and RHEL7 is thus stuck with 1.9.

 Yes, it is 2 years old.  It doesn't matter.  If libiscsi upstream only
 _tried_ to preserve ABI compatibility, they wouldn't be in this
 situation.  And I know that it is not even trying, because it broke
 again sometime between 1.11 and 1.14 for a totally trivial reason:

 --- a/iscsi/iscsi.h
 +++ b/iscsi/iscsi.h
 @@ -91,6 +136,8 @@ struct iscsi_url {
 char target[MAX_STRING_SIZE + 1];
 char user[MAX_STRING_SIZE + 1];
 char passwd[MAX_STRING_SIZE + 1];
 +   char target_user[MAX_STRING_SIZE + 1];
 +   char target_passwd[MAX_STRING_SIZE + 1];
 int lun;
 struct iscsi_context *iscsi;
  };


 This is the only change between these releases that breaks the ABI, but
 it is already one too much. :(

 (Also, the parsing of URLs into iscsi_url doesn't even try to obey the
 RFCs...).

 The patch fixes also a
 off-by-one error in the NOP timeout calculation which was fixed
 while touching these code parts.
 Can you please separate this part anyway?

Sure.

I will send a v2 that is compatible with 1.9.0 and enables the timeout
stuff only for libiscsi = 1.15.0

Peter



Re: [Qemu-devel] [PATCH] block/iscsi: add support for request timeouts

2015-06-25 Thread Peter Lieven

Am 23.06.2015 um 01:03 schrieb ronnie sahlberg:

LGTM

It is good to finally have timeouts that work in libiscsi,  and a consumer that 
can use and benefit from it.


Paolo, Kevin, Stefan, do you think this is sth for 2.4?

Peter



Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-23 Thread Peter Lieven

Am 22.06.2015 um 23:54 schrieb John Snow:


On 06/22/2015 09:09 AM, Peter Lieven wrote:

Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:

On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven p...@kamp.de wrote:

Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:

On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven p...@kamp.de wrote:

Am 18.06.2015 um 10:42 schrieb Kevin Wolf:

Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:

Am 18.06.2015 um 09:45 schrieb Kevin Wolf:

Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:

Thread 2 (Thread 0x75550700 (LWP 2636)):
#0  0x75d87aa3 in ppoll () from
/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x55955d91 in qemu_poll_ns (fds=0x563889c0,
nfds=3,
   timeout=4999424576) at qemu-timer.c:326
   ts = {tv_sec = 4, tv_nsec = 999424576}
   tvsec = 4
#2  0x55956feb in aio_poll (ctx=0x563528e0,
blocking=true)
   at aio-posix.c:231
   node = 0x0
   was_dispatching = false
   ret = 1
   progress = false
#3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0,
offset=4292007936,
   qiov=0x7554f760, is_write=false, flags=0) at
block.c:2699
   aio_context = 0x563528e0
   co = 0x563888a0
   rwco = {bs = 0x5637eae0, offset = 4292007936,
 qiov = 0x7554f760, is_write = false, ret =
2147483647,
flags = 0}
#4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0,
sector_num=8382828,
   buf=0x744cc800 (, nb_sectors=4, is_write=false,
flags=0)
   at block.c:2722
   qiov = {iov = 0x7554f780, niov = 1, nalloc = -1,
size =
2048}
   iov = {iov_base = 0x744cc800, iov_len = 2048}
#5  0x5594b008 in bdrv_read (bs=0x5637eae0,
sector_num=8382828,
   buf=0x744cc800 (, nb_sectors=4) at block.c:2730
No locals.
#6  0x5599acef in blk_read (blk=0x56376820,
sector_num=8382828,
   buf=0x744cc800 (, nb_sectors=4) at
block/block-backend.c:404
No locals.
#7  0x55833ed2 in cd_read_sector (s=0x56408f88,
lba=2095707,
   buf=0x744cc800 (, sector_size=2048) at
hw/ide/atapi.c:116
   ret = 32767

Here is the problem: The ATAPI emulation uses synchronous
blk_read()
instead of the AIO or coroutine interfaces. This means that it
keeps
polling for request completion while it holds the BQL until the
request
is completed.

I will look at this.

I need some further help. My way to emulate a hung NFS Server is to
block it in the Firewall. Currently I face the problem that I
cannot mount
a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
tried with
a kernel NFS mount). It reads a few sectors and then stalls (maybe
another
bug):

(gdb) thread apply all bt full

Thread 3 (Thread 0x70c21700 (LWP 29710)):
#0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at
util/qemu-thread-posix.c:120
  err = optimized out
  __func__ = qemu_cond_broadcast
#1  0x55911164 in rfifolock_unlock
(r=r@entry=0x56259910) at
util/rfifolock.c:75
  __PRETTY_FUNCTION__ = rfifolock_unlock
#2  0x55875921 in aio_context_release
(ctx=ctx@entry=0x562598b0)
at async.c:329
No locals.
#3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0,
blocking=blocking@entry=true) at aio-posix.c:272
  node = optimized out
  was_dispatching = false
  i = optimized out
  ret = optimized out
  progress = false
  timeout = 611734526
  __PRETTY_FUNCTION__ = aio_poll
#4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0,
offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0,
is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
block/io.c:552
  aio_context = 0x562598b0
  co = optimized out
  rwco = {bs = 0x5627c0f0, offset = 7038976, qiov =
0x70c208f0, is_write = false, ret = 2147483647, flags =
(unknown: 0)}
#5  0x558bc533 in bdrv_rw_co (bs=0x5627c0f0,
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
  flags=flags@entry=(unknown: 0)) at block/io.c:575
  qiov = {iov = 0x70c208e0, niov = 1, nalloc = -1, size
= 2048}
  iov = {iov_base = 0x57874800, iov_len = 2048}
#6  0x558bc593 in bdrv_read (bs=optimized out,
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
nb_sectors=nb_sectors@entry=4) at block/io.c:583
No locals.
#7  0x558af75d in blk_read (blk=optimized out,
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
  ret = optimized out
#8  0x557abb88 in cd_read_sector (sector_size=optimized out,
buf=0x57874800 (, lba=3437, s=0x5760db70) at
hw/ide/atapi.c:116
  ret = optimized out
#9  ide_atapi_cmd_reply_end (s=0x5760db70) at hw/ide/atapi.c:190

[Qemu-devel] [PATCH] block/nfs: add support for setting debug level

2015-06-23 Thread Peter Lieven
upcoming libnfs versions will support logging debug messages. Add
support for it in qemu through an URL parameter.

Signed-off-by: Peter Lieven p...@kamp.de
---
 block/nfs.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/block/nfs.c b/block/nfs.c
index ca9e24e..f7388a3 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -329,6 +329,10 @@ static int64_t nfs_client_open(NFSClient *client, const 
char *filename,
 } else if (!strcmp(qp-p[i].name, readahead)) {
 nfs_set_readahead(client-context, val);
 #endif
+#ifdef LIBNFS_FEATURE_DEBUG
+} else if (!strcmp(qp-p[i].name, debug)) {
+nfs_set_debug(client-context, val);
+#endif
 } else {
 error_setg(errp, Unknown NFS parameter name: %s,
qp-p[i].name);
-- 
1.9.1




Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-22 Thread Peter Lieven

Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:

On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven p...@kamp.de wrote:

Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:

On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven p...@kamp.de wrote:

Am 18.06.2015 um 10:42 schrieb Kevin Wolf:

Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:

Am 18.06.2015 um 09:45 schrieb Kevin Wolf:

Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:

Thread 2 (Thread 0x75550700 (LWP 2636)):
#0  0x75d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x55955d91 in qemu_poll_ns (fds=0x563889c0, nfds=3,
  timeout=4999424576) at qemu-timer.c:326
  ts = {tv_sec = 4, tv_nsec = 999424576}
  tvsec = 4
#2  0x55956feb in aio_poll (ctx=0x563528e0, blocking=true)
  at aio-posix.c:231
  node = 0x0
  was_dispatching = false
  ret = 1
  progress = false
#3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0,
offset=4292007936,
  qiov=0x7554f760, is_write=false, flags=0) at block.c:2699
  aio_context = 0x563528e0
  co = 0x563888a0
  rwco = {bs = 0x5637eae0, offset = 4292007936,
qiov = 0x7554f760, is_write = false, ret = 2147483647,
flags = 0}
#4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0,
sector_num=8382828,
  buf=0x744cc800 (, nb_sectors=4, is_write=false, flags=0)
  at block.c:2722
  qiov = {iov = 0x7554f780, niov = 1, nalloc = -1, size =
2048}
  iov = {iov_base = 0x744cc800, iov_len = 2048}
#5  0x5594b008 in bdrv_read (bs=0x5637eae0,
sector_num=8382828,
  buf=0x744cc800 (, nb_sectors=4) at block.c:2730
No locals.
#6  0x5599acef in blk_read (blk=0x56376820,
sector_num=8382828,
  buf=0x744cc800 (, nb_sectors=4) at block/block-backend.c:404
No locals.
#7  0x55833ed2 in cd_read_sector (s=0x56408f88,
lba=2095707,
  buf=0x744cc800 (, sector_size=2048) at hw/ide/atapi.c:116
  ret = 32767

Here is the problem: The ATAPI emulation uses synchronous blk_read()
instead of the AIO or coroutine interfaces. This means that it keeps
polling for request completion while it holds the BQL until the request
is completed.

I will look at this.

I need some further help. My way to emulate a hung NFS Server is to
block it in the Firewall. Currently I face the problem that I cannot mount
a CD Iso via libnfs (nfs://) without hanging Qemu (i previously tried with
a kernel NFS mount). It reads a few sectors and then stalls (maybe another
bug):

(gdb) thread apply all bt full

Thread 3 (Thread 0x70c21700 (LWP 29710)):
#0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at
util/qemu-thread-posix.c:120
 err = optimized out
 __func__ = qemu_cond_broadcast
#1  0x55911164 in rfifolock_unlock (r=r@entry=0x56259910) at
util/rfifolock.c:75
 __PRETTY_FUNCTION__ = rfifolock_unlock
#2  0x55875921 in aio_context_release (ctx=ctx@entry=0x562598b0)
at async.c:329
No locals.
#3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0,
blocking=blocking@entry=true) at aio-posix.c:272
 node = optimized out
 was_dispatching = false
 i = optimized out
 ret = optimized out
 progress = false
 timeout = 611734526
 __PRETTY_FUNCTION__ = aio_poll
#4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0,
offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0,
is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
block/io.c:552
 aio_context = 0x562598b0
 co = optimized out
 rwco = {bs = 0x5627c0f0, offset = 7038976, qiov =
0x70c208f0, is_write = false, ret = 2147483647, flags = (unknown: 0)}
#5  0x558bc533 in bdrv_rw_co (bs=0x5627c0f0,
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
 flags=flags@entry=(unknown: 0)) at block/io.c:575
 qiov = {iov = 0x70c208e0, niov = 1, nalloc = -1, size = 2048}
 iov = {iov_base = 0x57874800, iov_len = 2048}
#6  0x558bc593 in bdrv_read (bs=optimized out,
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
nb_sectors=nb_sectors@entry=4) at block/io.c:583
No locals.
#7  0x558af75d in blk_read (blk=optimized out,
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
 ret = optimized out
#8  0x557abb88 in cd_read_sector (sector_size=optimized out,
buf=0x57874800 (, lba=3437, s=0x5760db70) at hw/ide/atapi.c:116
 ret = optimized out
#9  ide_atapi_cmd_reply_end (s=0x5760db70) at hw/ide/atapi.c:190
 byte_count_limit = optimized out
 size = optimized out
 ret = 2

This is still the same scenario Kevin explained

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-19 Thread Peter Lieven
Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
 On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven p...@kamp.de wrote:
 Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
 Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
 Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
 Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
 Thread 2 (Thread 0x75550700 (LWP 2636)):
 #0  0x75d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
 No symbol table info available.
 #1  0x55955d91 in qemu_poll_ns (fds=0x563889c0, nfds=3,
  timeout=4999424576) at qemu-timer.c:326
  ts = {tv_sec = 4, tv_nsec = 999424576}
  tvsec = 4
 #2  0x55956feb in aio_poll (ctx=0x563528e0, blocking=true)
  at aio-posix.c:231
  node = 0x0
  was_dispatching = false
  ret = 1
  progress = false
 #3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0,
 offset=4292007936,
  qiov=0x7554f760, is_write=false, flags=0) at block.c:2699
  aio_context = 0x563528e0
  co = 0x563888a0
  rwco = {bs = 0x5637eae0, offset = 4292007936,
qiov = 0x7554f760, is_write = false, ret = 2147483647,
 flags = 0}
 #4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0,
 sector_num=8382828,
  buf=0x744cc800 (, nb_sectors=4, is_write=false, flags=0)
  at block.c:2722
  qiov = {iov = 0x7554f780, niov = 1, nalloc = -1, size =
 2048}
  iov = {iov_base = 0x744cc800, iov_len = 2048}
 #5  0x5594b008 in bdrv_read (bs=0x5637eae0,
 sector_num=8382828,
  buf=0x744cc800 (, nb_sectors=4) at block.c:2730
 No locals.
 #6  0x5599acef in blk_read (blk=0x56376820,
 sector_num=8382828,
  buf=0x744cc800 (, nb_sectors=4) at block/block-backend.c:404
 No locals.
 #7  0x55833ed2 in cd_read_sector (s=0x56408f88,
 lba=2095707,
  buf=0x744cc800 (, sector_size=2048) at hw/ide/atapi.c:116
  ret = 32767
 Here is the problem: The ATAPI emulation uses synchronous blk_read()
 instead of the AIO or coroutine interfaces. This means that it keeps
 polling for request completion while it holds the BQL until the request
 is completed.
 I will look at this.

 I need some further help. My way to emulate a hung NFS Server is to
 block it in the Firewall. Currently I face the problem that I cannot mount
 a CD Iso via libnfs (nfs://) without hanging Qemu (i previously tried with
 a kernel NFS mount). It reads a few sectors and then stalls (maybe another
 bug):

 (gdb) thread apply all bt full

 Thread 3 (Thread 0x70c21700 (LWP 29710)):
 #0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at
 util/qemu-thread-posix.c:120
 err = optimized out
 __func__ = qemu_cond_broadcast
 #1  0x55911164 in rfifolock_unlock (r=r@entry=0x56259910) at
 util/rfifolock.c:75
 __PRETTY_FUNCTION__ = rfifolock_unlock
 #2  0x55875921 in aio_context_release (ctx=ctx@entry=0x562598b0)
 at async.c:329
 No locals.
 #3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0,
 blocking=blocking@entry=true) at aio-posix.c:272
 node = optimized out
 was_dispatching = false
 i = optimized out
 ret = optimized out
 progress = false
 timeout = 611734526
 __PRETTY_FUNCTION__ = aio_poll
 #4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0,
 offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0,
 is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
 block/io.c:552
 aio_context = 0x562598b0
 co = optimized out
 rwco = {bs = 0x5627c0f0, offset = 7038976, qiov =
 0x70c208f0, is_write = false, ret = 2147483647, flags = (unknown: 0)}
 #5  0x558bc533 in bdrv_rw_co (bs=0x5627c0f0,
 sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
 nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
 flags=flags@entry=(unknown: 0)) at block/io.c:575
 qiov = {iov = 0x70c208e0, niov = 1, nalloc = -1, size = 2048}
 iov = {iov_base = 0x57874800, iov_len = 2048}
 #6  0x558bc593 in bdrv_read (bs=optimized out,
 sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
 nb_sectors=nb_sectors@entry=4) at block/io.c:583
 No locals.
 #7  0x558af75d in blk_read (blk=optimized out,
 sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
 nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
 ret = optimized out
 #8  0x557abb88 in cd_read_sector (sector_size=optimized out,
 buf=0x57874800 (, lba=3437, s=0x5760db70) at hw/ide/atapi.c:116
 ret = optimized out
 #9  ide_atapi_cmd_reply_end (s=0x5760db70) at hw/ide/atapi.c:190
 byte_count_limit = optimized out
 size = optimized out
 ret = 2
 This is still the same scenario Kevin explained.

 The ATAPI CD-ROM emulation code is using synchronous

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-18 Thread Peter Lieven

Am 17.06.2015 um 10:35 schrieb Kevin Wolf:

Am 16.06.2015 um 17:34 hat Stefan Hajnoczi geschrieben:

On Tue, Jun 16, 2015 at 3:44 PM, Peter Lieven p...@kamp.de wrote:

I wonder how difficult it would be to have the IDE CDROM run in its own
thread?
We usually have ISOs mounted on an NFS share as CDROM. Problem: If the NFS
Share
goes down, it takes down monitor, qmp, vnc etc. with it.

Maybe its already possible to do this via cmdline args?

Any ideas, comments?

If QEMU hangs in the read/write/flush/discard code path due to NFS
downtime it is a bug.

QEMU is expected to hang in open/reopen because those are performed in
a blocking fashion.

Which of these cases applies to what you are seeing?  Maybe it can be fixed.

Don't forget bdrv_drain_all(), which is called a lot by the monitor. So
no matter what you do (and this includes moving to a thread as in a
hypothetical ATAPI dataplane), you end up with a hang sooner or later.


It seems like the mainloop is waiting here:

#0  0x7606c89c in __lll_lock_wait ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x76068065 in _L_lock_858 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2  0x76067eba in pthread_mutex_lock ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0x559f2557 in qemu_mutex_lock (mutex=0x55ed6d40)
at util/qemu-thread-posix.c:76
err = 0
__func__ = qemu_mutex_lock
#4  0x556306ef in qemu_mutex_lock_iothread ()
at /usr/src/qemu-2.2.0/cpus.c:1123
No locals.
#5  0x55954a87 in os_host_main_loop_wait (timeout=79413589)
at main-loop.c:242
ret = 1
spin_counter = 0
#6  0x55954b5f in main_loop_wait (nonblocking=0) at main-loop.c:494
ret = 15
timeout = 4294967295
timeout_ns = 79413589
#7  0x5575e702 in main_loop () at vl.c:1882
nonblocking = false
last_io = 1

Peter




Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-18 Thread Peter Lieven

Am 18.06.2015 um 08:59 schrieb Paolo Bonzini:


On 18/06/2015 08:39, Peter Lieven wrote:

It seems like the mainloop is waiting here:

#0  0x7606c89c in __lll_lock_wait ()
from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x76068065 in _L_lock_858 ()
from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2  0x76067eba in pthread_mutex_lock ()
from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0x559f2557 in qemu_mutex_lock (mutex=0x55ed6d40)
 at util/qemu-thread-posix.c:76
 err = 0
 __func__ = qemu_mutex_lock
#4  0x556306ef in qemu_mutex_lock_iothread ()
 at /usr/src/qemu-2.2.0/cpus.c:1123
No locals.

This means the VCPU is busy with some synchronous activity---maybe a
bdrv_aio_cancel?


Here is what the other threads are doing (dropped VNC thread):

Thread 3 (Thread 0x74d4f700 (LWP 2637)):
#0  0x7606c89c in __lll_lock_wait ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x76068065 in _L_lock_858 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2  0x76067eba in pthread_mutex_lock ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0x559f2557 in qemu_mutex_lock (mutex=0x55ed6d40)
at util/qemu-thread-posix.c:76
err = 0
__func__ = qemu_mutex_lock
#4  0x556306ef in qemu_mutex_lock_iothread ()
at /usr/src/qemu-2.2.0/cpus.c:1123
No locals.
#5  0x5564b9ac in kvm_cpu_exec (cpu=0x563cb870)
at /usr/src/qemu-2.2.0/kvm-all.c:1770
run = 0x77ee2000
ret = 65536
run_ret = -4
#6  0x556301dc in qemu_kvm_cpu_thread_fn (arg=0x563cb870)
at /usr/src/qemu-2.2.0/cpus.c:953
cpu = 0x563cb870
r = 65536
#5  0x75d9338d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#6  0x in ?? ()
No symbol table info available.

Thread 3 (Thread 0x74d4f700 (LWP 2637)):
#0  0x7606c89c in __lll_lock_wait ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x76068065 in _L_lock_858 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2  0x76067eba in pthread_mutex_lock ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0x559f2557 in qemu_mutex_lock (mutex=0x55ed6d40)
at util/qemu-thread-posix.c:76
err = 0
__func__ = qemu_mutex_lock
#4  0x556306ef in qemu_mutex_lock_iothread ()
at /usr/src/qemu-2.2.0/cpus.c:1123
No locals.
#5  0x5564b9ac in kvm_cpu_exec (cpu=0x563cb870)
at /usr/src/qemu-2.2.0/kvm-all.c:1770
run = 0x77ee2000
ret = 65536
run_ret = -4
#6  0x556301dc in qemu_kvm_cpu_thread_fn (arg=0x563cb870)
at /usr/src/qemu-2.2.0/cpus.c:953
cpu = 0x563cb870
r = 65536
#7  0x76065e9a in start_thread ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#8  0x75d9338d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#9  0x in ?? ()
No symbol table info available.

Thread 2 (Thread 0x75550700 (LWP 2636)):
#0  0x75d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x55955d91 in qemu_poll_ns (fds=0x563889c0, nfds=3,
timeout=4999424576) at qemu-timer.c:326
ts = {tv_sec = 4, tv_nsec = 999424576}
tvsec = 4
#2  0x55956feb in aio_poll (ctx=0x563528e0, blocking=true)
at aio-posix.c:231
node = 0x0
was_dispatching = false
ret = 1
progress = false
#3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0, offset=4292007936,
qiov=0x7554f760, is_write=false, flags=0) at block.c:2699
aio_context = 0x563528e0
co = 0x563888a0
rwco = {bs = 0x5637eae0, offset = 4292007936,
  qiov = 0x7554f760, is_write = false, ret = 2147483647, flags = 0}
#4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0, sector_num=8382828,
buf=0x744cc800 (, nb_sectors=4, is_write=false, flags=0)
at block.c:2722
qiov = {iov = 0x7554f780, niov = 1, nalloc = -1, size = 2048}
iov = {iov_base = 0x744cc800, iov_len = 2048}
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#8  0x75d9338d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#9  0x in ?? ()
No symbol table info available.

Thread 2 (Thread 0x75550700 (LWP 2636)):
#0  0x75d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x55955d91 in qemu_poll_ns (fds=0x563889c0, nfds=3,
timeout

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-18 Thread Peter Lieven

Am 18.06.2015 um 09:03 schrieb Peter Lieven:

Am 18.06.2015 um 08:59 schrieb Paolo Bonzini:


On 18/06/2015 08:39, Peter Lieven wrote:

It seems like the mainloop is waiting here:

#0  0x7606c89c in __lll_lock_wait ()
from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x76068065 in _L_lock_858 ()
from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2  0x76067eba in pthread_mutex_lock ()
from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0x559f2557 in qemu_mutex_lock (mutex=0x55ed6d40)
 at util/qemu-thread-posix.c:76
 err = 0
 __func__ = qemu_mutex_lock
#4  0x556306ef in qemu_mutex_lock_iothread ()
 at /usr/src/qemu-2.2.0/cpus.c:1123
No locals.

This means the VCPU is busy with some synchronous activity---maybe a
bdrv_aio_cancel?


Here is what the other threads are doing (dropped VNC thread):


Sorry, sth messed up while copying the buffer. Here should be the correct 
output:

(gdb) thread apply all bt full

Thread 4 (Thread 0x7fffee9ff700 (LWP 2640)):
#0  0x76069d84 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x559f27ae in qemu_cond_wait (cond=0x563beed0,
mutex=0x563bef00) at util/qemu-thread-posix.c:135
err = 0
__func__ = qemu_cond_wait
#2  0x5593f12e in vnc_worker_thread_loop (queue=0x563beed0)
at ui/vnc-jobs.c:222
job = 0x5637bbd0
entry = 0x0
tmp = 0x0
vs = {csock = -1, dirty = {{0, 0, 0} repeats 2048 times},
  lossy_rect = 0x563ecd10, vd = 0x74465010, need_update = 0,
  force_update = 0, has_dirty = 0, features = 195, absolute = 0,
  last_x = 0, last_y = 0, last_bmask = 0, client_width = 0,
  client_height = 0, share_mode = 0, vnc_encoding = 5, major = 0,
  minor = 0, auth = 0, challenge = '\000' repeats 15 times,
  info = 0x0, output = {capacity = 6257, offset = 1348,
buffer = 0x7fffe4000d10 }, input = {capacity = 0, offset = 0,
buffer = 0x0},
  write_pixels = 0x55925d57 vnc_write_pixels_copy, client_pf = {
bits_per_pixel = 32 ' ', bytes_per_pixel = 4 '\004',
depth = 24 '\030', rmask = 16711680, gmask = 65280, bmask = 255,
amask = 0, rshift = 16 '\020', gshift = 8 '\b', bshift = 0 '\000',
ashift = 24 '\030', rmax = 255 '\377', gmax = 255 '\377',
bmax = 255 '\377', amax = 0 '\000', rbits = 8 '\b',
gbits = 8 '\b', bbits = 8 '\b', abits = 0 '\000'},
  client_format = 0, client_be = false, audio_cap = 0x0, as = {
freq = 0, nchannels = 0, fmt = AUD_FMT_U8, endianness = 0},
  read_handler = 0, read_handler_expect = 0,
  modifiers_state = '\000' repeats 255 times, led = 0x0,
  abort = false, initialized = false, output_mutex = {lock = {
  __data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0,
__kind = 0, __spins = 0, __list = {__prev = 0x0,
  __next = 0x0}}, __size = '\000' repeats 39 times,
  __align = 0}}, bh = 0x0, jobs_buffer = {capacity = 0,
offset = 0, buffer = 0x0}, tight = {type = 0,
quality = 255 '\377', compression = 9 '\t', pixel24 = 0 '\000',
tight = {capacity = 0, offset = 0, buffer = 0x0}, tmp = {
  capacity = 0, offset = 0, buffer = 0x0}, zlib = {capacity = 0,
  offset = 0, buffer = 0x0}, gradient = {capacity = 0, offset = 0,
  buffer = 0x0}, levels = {0, 0, 0, 0}, stream = {{next_in = 0x0,
avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0,
total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0,
opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, {
next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0,
avail_out = 0, total_out = 0, msg = 0x0, state = 0x0,
zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = 0,
reserved = 0}, {next_in = 0x0, avail_in = 0, total_in = 0,
next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0,
state = 0x0, zalloc = 0, zfree = 0, opaque = 0x0,
data_type = 0, adler = 0, reserved = 0}, {next_in = 0x0,
avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0,
total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0,
opaque = 0x0, data_type = 0, adler = 0, reserved = 0}}},
  zlib = {zlib = {capacity = 0, offset = 0, buffer = 0x0}, tmp = {
  capacity = 0, offset = 0, buffer = 0x0}, stream = {
  next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0,
  avail_out = 0, total_out = 0, msg = 0x0, state = 0x0

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-18 Thread Peter Lieven

Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:

On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven p...@kamp.de wrote:

Am 18.06.2015 um 10:42 schrieb Kevin Wolf:

Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:

Am 18.06.2015 um 09:45 schrieb Kevin Wolf:

Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:

Thread 2 (Thread 0x75550700 (LWP 2636)):
#0  0x75d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x55955d91 in qemu_poll_ns (fds=0x563889c0, nfds=3,
  timeout=4999424576) at qemu-timer.c:326
  ts = {tv_sec = 4, tv_nsec = 999424576}
  tvsec = 4
#2  0x55956feb in aio_poll (ctx=0x563528e0, blocking=true)
  at aio-posix.c:231
  node = 0x0
  was_dispatching = false
  ret = 1
  progress = false
#3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0,
offset=4292007936,
  qiov=0x7554f760, is_write=false, flags=0) at block.c:2699
  aio_context = 0x563528e0
  co = 0x563888a0
  rwco = {bs = 0x5637eae0, offset = 4292007936,
qiov = 0x7554f760, is_write = false, ret = 2147483647,
flags = 0}
#4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0,
sector_num=8382828,
  buf=0x744cc800 (, nb_sectors=4, is_write=false, flags=0)
  at block.c:2722
  qiov = {iov = 0x7554f780, niov = 1, nalloc = -1, size =
2048}
  iov = {iov_base = 0x744cc800, iov_len = 2048}
#5  0x5594b008 in bdrv_read (bs=0x5637eae0,
sector_num=8382828,
  buf=0x744cc800 (, nb_sectors=4) at block.c:2730
No locals.
#6  0x5599acef in blk_read (blk=0x56376820,
sector_num=8382828,
  buf=0x744cc800 (, nb_sectors=4) at block/block-backend.c:404
No locals.
#7  0x55833ed2 in cd_read_sector (s=0x56408f88,
lba=2095707,
  buf=0x744cc800 (, sector_size=2048) at hw/ide/atapi.c:116
  ret = 32767

Here is the problem: The ATAPI emulation uses synchronous blk_read()
instead of the AIO or coroutine interfaces. This means that it keeps
polling for request completion while it holds the BQL until the request
is completed.

I will look at this.


I need some further help. My way to emulate a hung NFS Server is to
block it in the Firewall. Currently I face the problem that I cannot mount
a CD Iso via libnfs (nfs://) without hanging Qemu (i previously tried with
a kernel NFS mount). It reads a few sectors and then stalls (maybe another
bug):

(gdb) thread apply all bt full

Thread 3 (Thread 0x70c21700 (LWP 29710)):
#0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at
util/qemu-thread-posix.c:120
 err = optimized out
 __func__ = qemu_cond_broadcast
#1  0x55911164 in rfifolock_unlock (r=r@entry=0x56259910) at
util/rfifolock.c:75
 __PRETTY_FUNCTION__ = rfifolock_unlock
#2  0x55875921 in aio_context_release (ctx=ctx@entry=0x562598b0)
at async.c:329
No locals.
#3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0,
blocking=blocking@entry=true) at aio-posix.c:272
 node = optimized out
 was_dispatching = false
 i = optimized out
 ret = optimized out
 progress = false
 timeout = 611734526
 __PRETTY_FUNCTION__ = aio_poll
#4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0,
offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0,
is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
block/io.c:552
 aio_context = 0x562598b0
 co = optimized out
 rwco = {bs = 0x5627c0f0, offset = 7038976, qiov =
0x70c208f0, is_write = false, ret = 2147483647, flags = (unknown: 0)}
#5  0x558bc533 in bdrv_rw_co (bs=0x5627c0f0,
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
 flags=flags@entry=(unknown: 0)) at block/io.c:575
 qiov = {iov = 0x70c208e0, niov = 1, nalloc = -1, size = 2048}
 iov = {iov_base = 0x57874800, iov_len = 2048}
#6  0x558bc593 in bdrv_read (bs=optimized out,
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
nb_sectors=nb_sectors@entry=4) at block/io.c:583
No locals.
#7  0x558af75d in blk_read (blk=optimized out,
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (,
nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
 ret = optimized out
#8  0x557abb88 in cd_read_sector (sector_size=optimized out,
buf=0x57874800 (, lba=3437, s=0x5760db70) at hw/ide/atapi.c:116
 ret = optimized out
#9  ide_atapi_cmd_reply_end (s=0x5760db70) at hw/ide/atapi.c:190
 byte_count_limit = optimized out
 size = optimized out
 ret = 2

This is still the same scenario Kevin explained.

The ATAPI CD-ROM emulation code is using synchronous blk_read().  This
function holds the QEMU global mutex while

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-18 Thread Peter Lieven

Am 18.06.2015 um 10:42 schrieb Kevin Wolf:

Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:

Am 18.06.2015 um 09:45 schrieb Kevin Wolf:

Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:

Thread 2 (Thread 0x75550700 (LWP 2636)):
#0  0x75d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x55955d91 in qemu_poll_ns (fds=0x563889c0, nfds=3,
 timeout=4999424576) at qemu-timer.c:326
 ts = {tv_sec = 4, tv_nsec = 999424576}
 tvsec = 4
#2  0x55956feb in aio_poll (ctx=0x563528e0, blocking=true)
 at aio-posix.c:231
 node = 0x0
 was_dispatching = false
 ret = 1
 progress = false
#3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0, offset=4292007936,
 qiov=0x7554f760, is_write=false, flags=0) at block.c:2699
 aio_context = 0x563528e0
 co = 0x563888a0
 rwco = {bs = 0x5637eae0, offset = 4292007936,
   qiov = 0x7554f760, is_write = false, ret = 2147483647, flags = 0}
#4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0, sector_num=8382828,
 buf=0x744cc800 (, nb_sectors=4, is_write=false, flags=0)
 at block.c:2722
 qiov = {iov = 0x7554f780, niov = 1, nalloc = -1, size = 2048}
 iov = {iov_base = 0x744cc800, iov_len = 2048}
#5  0x5594b008 in bdrv_read (bs=0x5637eae0, sector_num=8382828,
 buf=0x744cc800 (, nb_sectors=4) at block.c:2730
No locals.
#6  0x5599acef in blk_read (blk=0x56376820, sector_num=8382828,
 buf=0x744cc800 (, nb_sectors=4) at block/block-backend.c:404
No locals.
#7  0x55833ed2 in cd_read_sector (s=0x56408f88, lba=2095707,
 buf=0x744cc800 (, sector_size=2048) at hw/ide/atapi.c:116
 ret = 32767

Here is the problem: The ATAPI emulation uses synchronous blk_read()
instead of the AIO or coroutine interfaces. This means that it keeps
polling for request completion while it holds the BQL until the request
is completed.

I will look at this.


We can (and should) fix that, otherwise the VCPUs is blocked while we're
reading from the image, even without a hang. It doesn't fully fix your
problem, though, as bdrv_drain_all() and friends still exist.

Any idea which commands actually call bdrv_drain_alll?

At least 'stop' and all commands changing the BDS graph (block jobs,
snapshots, commit, etc.). For a full list, I would have to inspect each
command in the code.

The guest can even trigger bdrv_drain_all() by stopping a running DMA
operation.


Unfortunately, excactly this is happening...
Is there any way to avoid the bdrv_drain_all in bmdma_cmd_writeb?

Peter



Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-18 Thread Peter Lieven

Am 18.06.2015 um 09:45 schrieb Kevin Wolf:

Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:

Thread 2 (Thread 0x75550700 (LWP 2636)):
#0  0x75d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x55955d91 in qemu_poll_ns (fds=0x563889c0, nfds=3,
 timeout=4999424576) at qemu-timer.c:326
 ts = {tv_sec = 4, tv_nsec = 999424576}
 tvsec = 4
#2  0x55956feb in aio_poll (ctx=0x563528e0, blocking=true)
 at aio-posix.c:231
 node = 0x0
 was_dispatching = false
 ret = 1
 progress = false
#3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0, offset=4292007936,
 qiov=0x7554f760, is_write=false, flags=0) at block.c:2699
 aio_context = 0x563528e0
 co = 0x563888a0
 rwco = {bs = 0x5637eae0, offset = 4292007936,
   qiov = 0x7554f760, is_write = false, ret = 2147483647, flags = 0}
#4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0, sector_num=8382828,
 buf=0x744cc800 (, nb_sectors=4, is_write=false, flags=0)
 at block.c:2722
 qiov = {iov = 0x7554f780, niov = 1, nalloc = -1, size = 2048}
 iov = {iov_base = 0x744cc800, iov_len = 2048}
#5  0x5594b008 in bdrv_read (bs=0x5637eae0, sector_num=8382828,
 buf=0x744cc800 (, nb_sectors=4) at block.c:2730
No locals.
#6  0x5599acef in blk_read (blk=0x56376820, sector_num=8382828,
 buf=0x744cc800 (, nb_sectors=4) at block/block-backend.c:404
No locals.
#7  0x55833ed2 in cd_read_sector (s=0x56408f88, lba=2095707,
 buf=0x744cc800 (, sector_size=2048) at hw/ide/atapi.c:116
 ret = 32767

Here is the problem: The ATAPI emulation uses synchronous blk_read()
instead of the AIO or coroutine interfaces. This means that it keeps
polling for request completion while it holds the BQL until the request
is completed.


I will look at this.



We can (and should) fix that, otherwise the VCPUs is blocked while we're
reading from the image, even without a hang. It doesn't fully fix your
problem, though, as bdrv_drain_all() and friends still exist.


Any idea which commands actually call bdrv_drain_alll?

Peter



Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-18 Thread Peter Lieven

Am 18.06.2015 um 10:42 schrieb Kevin Wolf:

Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:

Am 18.06.2015 um 09:45 schrieb Kevin Wolf:

Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:

Thread 2 (Thread 0x75550700 (LWP 2636)):
#0  0x75d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x55955d91 in qemu_poll_ns (fds=0x563889c0, nfds=3,
 timeout=4999424576) at qemu-timer.c:326
 ts = {tv_sec = 4, tv_nsec = 999424576}
 tvsec = 4
#2  0x55956feb in aio_poll (ctx=0x563528e0, blocking=true)
 at aio-posix.c:231
 node = 0x0
 was_dispatching = false
 ret = 1
 progress = false
#3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0, offset=4292007936,
 qiov=0x7554f760, is_write=false, flags=0) at block.c:2699
 aio_context = 0x563528e0
 co = 0x563888a0
 rwco = {bs = 0x5637eae0, offset = 4292007936,
   qiov = 0x7554f760, is_write = false, ret = 2147483647, flags = 0}
#4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0, sector_num=8382828,
 buf=0x744cc800 (, nb_sectors=4, is_write=false, flags=0)
 at block.c:2722
 qiov = {iov = 0x7554f780, niov = 1, nalloc = -1, size = 2048}
 iov = {iov_base = 0x744cc800, iov_len = 2048}
#5  0x5594b008 in bdrv_read (bs=0x5637eae0, sector_num=8382828,
 buf=0x744cc800 (, nb_sectors=4) at block.c:2730
No locals.
#6  0x5599acef in blk_read (blk=0x56376820, sector_num=8382828,
 buf=0x744cc800 (, nb_sectors=4) at block/block-backend.c:404
No locals.
#7  0x55833ed2 in cd_read_sector (s=0x56408f88, lba=2095707,
 buf=0x744cc800 (, sector_size=2048) at hw/ide/atapi.c:116
 ret = 32767

Here is the problem: The ATAPI emulation uses synchronous blk_read()
instead of the AIO or coroutine interfaces. This means that it keeps
polling for request completion while it holds the BQL until the request
is completed.

I will look at this.


I need some further help. My way to emulate a hung NFS Server is to
block it in the Firewall. Currently I face the problem that I cannot mount
a CD Iso via libnfs (nfs://) without hanging Qemu (i previously tried with
a kernel NFS mount). It reads a few sectors and then stalls (maybe another bug):

(gdb) thread apply all bt full

Thread 3 (Thread 0x70c21700 (LWP 29710)):
#0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at 
util/qemu-thread-posix.c:120
err = optimized out
__func__ = qemu_cond_broadcast
#1  0x55911164 in rfifolock_unlock (r=r@entry=0x56259910) at 
util/rfifolock.c:75
__PRETTY_FUNCTION__ = rfifolock_unlock
#2  0x55875921 in aio_context_release (ctx=ctx@entry=0x562598b0) at 
async.c:329
No locals.
#3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0, 
blocking=blocking@entry=true) at aio-posix.c:272
node = optimized out
was_dispatching = false
i = optimized out
ret = optimized out
progress = false
timeout = 611734526
__PRETTY_FUNCTION__ = aio_poll
#4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0, 
offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0, 
is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at block/io.c:552
aio_context = 0x562598b0
co = optimized out
rwco = {bs = 0x5627c0f0, offset = 7038976, qiov = 0x70c208f0, 
is_write = false, ret = 2147483647, flags = (unknown: 0)}
#5  0x558bc533 in bdrv_rw_co (bs=0x5627c0f0, 
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (, 
nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
flags=flags@entry=(unknown: 0)) at block/io.c:575
qiov = {iov = 0x70c208e0, niov = 1, nalloc = -1, size = 2048}
iov = {iov_base = 0x57874800, iov_len = 2048}
#6  0x558bc593 in bdrv_read (bs=optimized out, 
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (, 
nb_sectors=nb_sectors@entry=4) at block/io.c:583
No locals.
#7  0x558af75d in blk_read (blk=optimized out, 
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 (, 
nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
ret = optimized out
#8  0x557abb88 in cd_read_sector (sector_size=optimized out, buf=0x57874800 
(, lba=3437, s=0x5760db70) at hw/ide/atapi.c:116
ret = optimized out
#9  ide_atapi_cmd_reply_end (s=0x5760db70) at hw/ide/atapi.c:190
byte_count_limit = optimized out
size = optimized out
ret = 2
#10 0x556398a6 in memory_region_write_accessor (mr=0x577f85d0, addr=optimized 
out, value=0x70c20a68, size=2, shift=optimized out, mask=optimized out, 
attrs=...)
at /home/lieven/git/qemu/memory.c:459
tmp = optimized out
#11 0x5563956b in access_with_adjusted_size (addr

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-18 Thread Peter Lieven

Am 17.06.2015 um 10:35 schrieb Kevin Wolf:

Am 16.06.2015 um 17:34 hat Stefan Hajnoczi geschrieben:

On Tue, Jun 16, 2015 at 3:44 PM, Peter Lieven p...@kamp.de wrote:

I wonder how difficult it would be to have the IDE CDROM run in its own
thread?
We usually have ISOs mounted on an NFS share as CDROM. Problem: If the NFS
Share
goes down, it takes down monitor, qmp, vnc etc. with it.

Maybe its already possible to do this via cmdline args?

Any ideas, comments?

If QEMU hangs in the read/write/flush/discard code path due to NFS
downtime it is a bug.

QEMU is expected to hang in open/reopen because those are performed in
a blocking fashion.

Which of these cases applies to what you are seeing?  Maybe it can be fixed.

Don't forget bdrv_drain_all(), which is called a lot by the monitor. So
no matter what you do (and this includes moving to a thread as in a
hypothetical ATAPI dataplane), you end up with a hang sooner or later.


I will have a look where qemu hangs. The problem exists with an NFS share
mounted by the kernel and also with libnfs. So it might be a bdrv_drain_all.
I regularly query info block and info blockstats. Do these commands always
call bdrv_drain_all()?.

Peter




Re: [Qemu-devel] Windows2012R2 virtio-win drivers and IPv6

2015-06-16 Thread Peter Lieven

Hi Yan,

thanks for the reply.



* When you experience the issues is it between VMs on the same host?


I can reproduce it between a Windows VM on Host A and a Linux VM on Host B.
Linux and Windows are not on the same host systems due to licensing.

Between Linux VMs and/or if I download to the Windows VM there is no issue.
Only outbound TCP traffic from the Windows vServer is affected and it only 
affects IPv6 traffic.



* Can you describe your host network configuration?


Windows VM on Qemu - tap - Linux Bridge - eth (tagged) - Network - eth(tagged) 
- Linux Bridge - tap - Linux VM on Qemu.



* Can provide the output for ethtool -k” for every NIC (including tap devices) 
in the host that is involved with the VM that has a problem?


Offload parameters for tap4:
rx-checksumming: off
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: off
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off

Offload parameters for br136:
rx-checksumming: off
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: off
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off

Offload parameters for eth2:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off





  * What are the host kernel version, QEMU version and guest driver version you 
are using?


host: 3.13.0-53-generic
qemu: 2.2.1
guest dirver: 0.1.100

Thank You,
Peter




[Qemu-devel] [PATCH] block/iscsi: add support for request timeouts

2015-06-16 Thread Peter Lieven
libiscsi starting with 1.15 will properly support timeout of iscsi
commands. The default will remain no timeout, but this can
be changed via cmdline parameters, e.g.:

qemu -iscsi timeout=30 -drive file=iscsi://...

If a timeout occurs a reconnect is scheduled and the timed out command
will be requeued for processing after a successful reconnect.

The required API call iscsi_set_timeout is present since libiscsi
1.10 which was released in October 2013. However, due to some bugs
in the libiscsi code the use is not recommended before version 1.15.

Please note that this patch bumps the libiscsi requirement to 1.10
to have all function and macros defined. The patch fixes also a
off-by-one error in the NOP timeout calculation which was fixed
while touching these code parts.

Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c   | 87 ++---
 configure   |  6 ++--
 qemu-options.hx |  4 +++
 3 files changed, 72 insertions(+), 25 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 14e97a6..f19a56a 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -69,6 +69,7 @@ typedef struct IscsiLun {
 bool dpofua;
 bool has_write_same;
 bool force_next_flush;
+bool request_timed_out;
 } IscsiLun;
 
 typedef struct IscsiTask {
@@ -99,7 +100,8 @@ typedef struct IscsiAIOCB {
 #endif
 } IscsiAIOCB;
 
-#define EVENT_INTERVAL 250
+/* libiscsi uses time_t so its enough to process events every second */
+#define EVENT_INTERVAL 1000
 #define NOP_INTERVAL 5000
 #define MAX_NOP_FAILURES 3
 #define ISCSI_CMD_RETRIES ARRAY_SIZE(iscsi_retry_times)
@@ -186,13 +188,18 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int 
status,
 iTask-do_retry = 1;
 goto out;
 }
-/* status 0x28 is SCSI_TASK_SET_FULL. It was first introduced
- * in libiscsi 1.10.0. Hardcode this value here to avoid
- * the need to bump the libiscsi requirement to 1.10.0 */
-if (status == SCSI_STATUS_BUSY || status == 0x28) {
+if (status == SCSI_STATUS_BUSY || status == SCSI_STATUS_TIMEOUT ||
+status == SCSI_STATUS_TASK_SET_FULL) {
 unsigned retry_time =
 exp_random(iscsi_retry_times[iTask-retries - 1]);
-error_report(iSCSI Busy/TaskSetFull (retry #%u in %u ms): %s,
+if (status == SCSI_STATUS_TIMEOUT) {
+/* make sure the request is rescheduled AFTER the
+ * reconnect is initiated */
+retry_time = EVENT_INTERVAL * 2;
+iTask-iscsilun-request_timed_out = true;
+}
+error_report(iSCSI Busy/TaskSetFull/TimeOut
+  (retry #%u in %u ms): %s,
  iTask-retries, retry_time,
  iscsi_get_error(iscsi));
 aio_timer_init(iTask-iscsilun-aio_context,
@@ -276,20 +283,26 @@ iscsi_set_events(IscsiLun *iscsilun)
iscsilun);
 iscsilun-events = ev;
 }
-
-/* newer versions of libiscsi may return zero events. In this
- * case start a timer to ensure we are able to return to service
- * once this situation changes. */
-if (!ev) {
-timer_mod(iscsilun-event_timer,
-  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + EVENT_INTERVAL);
-}
 }
 
-static void iscsi_timed_set_events(void *opaque)
+static void iscsi_timed_check_events(void *opaque)
 {
 IscsiLun *iscsilun = opaque;
+
+/* check for timed out requests */
+iscsi_service(iscsilun-iscsi, 0);
+
+if (iscsilun-request_timed_out) {
+iscsilun-request_timed_out = false;
+iscsi_reconnect(iscsilun-iscsi);
+}
+
+/* newer versions of libiscsi may return zero events. Ensure we are able
+ * to return to service once this situation changes. */
 iscsi_set_events(iscsilun);
+
+timer_mod(iscsilun-event_timer,
+  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + EVENT_INTERVAL);
 }
 
 static void
@@ -1096,16 +1109,37 @@ static char *parse_initiator_name(const char *target)
 return iscsi_name;
 }
 
+static int parse_timeout(const char *target)
+{
+QemuOptsList *list;
+QemuOpts *opts;
+const char *timeout;
+
+list = qemu_find_opts(iscsi);
+if (list) {
+opts = qemu_opts_find(list, target);
+if (!opts) {
+opts = QTAILQ_FIRST(list-head);
+}
+if (opts) {
+timeout = qemu_opt_get(opts, timeout);
+if (timeout) {
+return atoi(timeout);
+}
+}
+}
+
+return 0;
+}
+
 static void iscsi_nop_timed_event(void *opaque)
 {
 IscsiLun *iscsilun = opaque;
 
-if (iscsi_get_nops_in_flight(iscsilun-iscsi)  MAX_NOP_FAILURES) {
+if (iscsi_get_nops_in_flight(iscsilun-iscsi) = MAX_NOP_FAILURES) {
 error_report(iSCSI: NOP timeout

Re: [Qemu-devel] Windows2012R2 virtio-win drivers and IPv6

2015-06-08 Thread Peter Lieven

Am 28.05.2015 um 16:24 schrieb Vadim Rozenfeld:

On Thu, 2015-05-28 at 16:07 +0200, Peter Lieven wrote:

Hi,

we observed problems with Windows Update Services and Uploading Files from a 
Windows System to other systems.
We tracked this down to IPv6 connections only. IPv4 and IPv6 in Receive 
direction seems to be not a problem.
It turned out that setting TCP/UDP Checksum Offloading for IPv6 from RX/TX enabled to 
RX enabled in the
Network Driver settings solved the issue. I remember similar problems existed 
with Linux some time ago.
Linux guests on the same host are also not an issue.

Is this a known issue? I tried latest and stable drivers from the Fedora 
Website.


Adding Yan.


Hi Yan,

have you had a chance to look at this or a hint where to start?

Thanks,
Peter




[Qemu-devel] Windows2012R2 virtio-win drivers and IPv6

2015-05-28 Thread Peter Lieven

Hi,

we observed problems with Windows Update Services and Uploading Files from a 
Windows System to other systems.
We tracked this down to IPv6 connections only. IPv4 and IPv6 in Receive 
direction seems to be not a problem.
It turned out that setting TCP/UDP Checksum Offloading for IPv6 from RX/TX enabled to 
RX enabled in the
Network Driver settings solved the issue. I remember similar problems existed 
with Linux some time ago.
Linux guests on the same host are also not an issue.

Is this a known issue? I tried latest and stable drivers from the Fedora 
Website.

Thanks,
Peter




Re: [Qemu-devel] [Qemu-stable] [PATCH v5 3/8] mirror: Do zero write on target if sectors not allocated

2015-05-25 Thread Peter Lieven

Am 26.05.2015 um 05:36 schrieb Fam Zheng:

If guest discards a source cluster, mirroring with bdrv_aio_readv is overkill.
Some protocols do zero upon discard, where it's best to use
bdrv_aio_write_zeroes, otherwise, bdrv_aio_discard will be enough.

Signed-off-by: Fam Zheng f...@redhat.com
---
  block/mirror.c | 20 ++--
  1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 85995b2..ba33254 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -164,6 +164,8 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
  int64_t end, sector_num, next_chunk, next_sector, hbitmap_next_sector;
  uint64_t delay_ns = 0;
  MirrorOp *op;
+int pnum;
+int64_t ret;
  
  s-sector_num = hbitmap_iter_next(s-hbi);

  if (s-sector_num  0) {
@@ -290,8 +292,22 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
  s-in_flight++;
  s-sectors_in_flight += nb_sectors;
  trace_mirror_one_iteration(s, sector_num, nb_sectors);
-bdrv_aio_readv(source, sector_num, op-qiov, nb_sectors,
-   mirror_read_complete, op);
+
+ret = bdrv_get_block_status_above(source, NULL, sector_num,
+  nb_sectors, pnum);
+if (ret  0 || pnum  nb_sectors ||
+(ret  BDRV_BLOCK_DATA  !(ret  BDRV_BLOCK_ZERO))) {
+bdrv_aio_readv(source, sector_num, op-qiov, nb_sectors,
+   mirror_read_complete, op);
+} else if (ret  BDRV_BLOCK_ZERO) {
+bdrv_aio_write_zeroes(s-target, sector_num, op-nb_sectors,
+  s-unmap ? BDRV_REQ_MAY_UNMAP : 0,
+  mirror_write_complete, op);
+} else {
+assert(!(ret  BDRV_BLOCK_DATA));
+bdrv_aio_discard(s-target, sector_num, op-nb_sectors,
+ mirror_write_complete, op);
+}


I wonder what happens if on the destination the discard is a NOP which is legal 
(at least in SCSI).
In this case we might end up having different contents and source and 
destination. Or is this
not a problem?

Peter



Re: [Qemu-devel] [Qemu-stable] [PATCH] fdc: force the fifo access to be in bounds of the allocated buffer

2015-05-13 Thread Peter Lieven
Am 13.05.2015 um 21:09 schrieb Stefan Priebe:
 Am 13.05.2015 um 21:05 schrieb Stefan Weil:
 Am 13.05.2015 um 20:59 schrieb Stefan Priebe:

 Am 13.05.2015 um 20:51 schrieb Stefan Weil:
 Hi,

 I just noticed this patch because my provider told me that my KVM based
 server
 needs a reboot because of a CVE (see this German news:
 http://www.heise.de/newsticker/meldung/Venom-Schwachstelle-Aus-Hypervisor-ausbrechen-und-VMs-ausspionieren-2649614.html)


 Isn't a live migration to a fixed version enough instead of a reboot?

 Stefan

 Good point. A live migration would be sufficient - if there are no bugs
 in QEMU's live migration.

 just migrating all our customer machines and wanted to be sure that live 
 migration is enough.

Just to confirm: If Qemu is started with -nodefaults and there is no fdc 
configuration the system is not affected by this CVE?

Thanks,
Peter



Re: [Qemu-devel] [Qemu-stable] [PATCH] fdc: force the fifo access to be in bounds of the allocated buffer

2015-05-13 Thread Peter Lieven
Am 13.05.2015 um 22:03 schrieb John Snow:

 On 05/13/2015 04:02 PM, Peter Lieven wrote:
 Am 13.05.2015 um 21:52 schrieb Markus Armbruster:
 Peter Lieven p...@kamp.de writes:

 Am 13.05.2015 um 21:09 schrieb Stefan Priebe:
 Am 13.05.2015 um 21:05 schrieb Stefan Weil:
 Am 13.05.2015 um 20:59 schrieb Stefan Priebe:
 Am 13.05.2015 um 20:51 schrieb Stefan Weil:
 Hi,

 I just noticed this patch because my provider told me that my KVM based
 server
 needs a reboot because of a CVE (see this German news:
 http://www.heise.de/newsticker/meldung/Venom-Schwachstelle-Aus-Hypervisor-ausbrechen-und-VMs-ausspionieren-2649614.html)

 Isn't a live migration to a fixed version enough instead of a reboot?

 Stefan
 Good point. A live migration would be sufficient - if there are no bugs
 in QEMU's live migration.
 just migrating all our customer machines and wanted to be sure that
 live migration is enough.
 Just to confirm: If Qemu is started with -nodefaults and there is no
 fdc configuration the system is not affected by this CVE?
 Not true.  The FD controller is still there.  It has no drives attached
 then, but is vulnerable all the same.
 Are you sure? With -nodefaults the hmp command 'info block' returns nothing 
 and
 the guest sees no floppy drive.

 Without -nodefaults I indeed see floppy0 and I have /dev/fd0 in the guest 
 respectively.

 Peter

 It's not the *drive* that is the problem, it is the *controller*.

Okay, and indeed in the qtree I see the isa-fdc.

Thanks for sorting that out.

Thank you,
Peter




Re: [Qemu-devel] [Qemu-stable] [PATCH] fdc: force the fifo access to be in bounds of the allocated buffer

2015-05-13 Thread Peter Lieven
Am 13.05.2015 um 21:52 schrieb Markus Armbruster:
 Peter Lieven p...@kamp.de writes:

 Am 13.05.2015 um 21:09 schrieb Stefan Priebe:
 Am 13.05.2015 um 21:05 schrieb Stefan Weil:
 Am 13.05.2015 um 20:59 schrieb Stefan Priebe:
 Am 13.05.2015 um 20:51 schrieb Stefan Weil:
 Hi,

 I just noticed this patch because my provider told me that my KVM based
 server
 needs a reboot because of a CVE (see this German news:
 http://www.heise.de/newsticker/meldung/Venom-Schwachstelle-Aus-Hypervisor-ausbrechen-und-VMs-ausspionieren-2649614.html)

 Isn't a live migration to a fixed version enough instead of a reboot?

 Stefan
 Good point. A live migration would be sufficient - if there are no bugs
 in QEMU's live migration.
 just migrating all our customer machines and wanted to be sure that
 live migration is enough.
 Just to confirm: If Qemu is started with -nodefaults and there is no
 fdc configuration the system is not affected by this CVE?
 Not true.  The FD controller is still there.  It has no drives attached
 then, but is vulnerable all the same.

Are you sure? With -nodefaults the hmp command 'info block' returns nothing and
the guest sees no floppy drive.

Without -nodefaults I indeed see floppy0 and I have /dev/fd0 in the guest 
respectively.

Peter



Re: [Qemu-devel] [PATCH for-2.4 07/10] block/iscsi: bump libiscsi requirement to 1.10.0

2015-04-17 Thread Peter Lieven
Am 16.04.2015 um 15:20 schrieb Paolo Bonzini:

 On 16/04/2015 14:58, Peter Lieven wrote:
 On 16/04/2015 14:18, Peter Lieven wrote:
 We need this to support SCSI_STATUS_TASK_SET_FULL.
 Any reason apart from the missing constant?
 No, but I wanted to avoid starting checking for constants that were
 added shortly after this.
 You can't check with #ifdef for a constant in an enum.
 But you can #define it if libiscsi version is 1.10.

There is no macro to check for that. I took the easy way
hardcoding the value. Its an official standard so there is
no chance it will change.

Peter



Re: [Qemu-devel] [PATCH for-2.4 05/10] block/iscsi: optimize WRITE10/16 if cache.writeback is not set

2015-04-17 Thread Peter Lieven
Am 16.04.2015 um 15:17 schrieb Paolo Bonzini:

 On 16/04/2015 15:02, Peter Lieven wrote:
 Also, I think it is iscsi_co_generic_cb that should set
 force_next_flush, so that it is only set on failure.  Not really for the
 optimization value, but because it's clearer.
 I don't get what you mean with it should only set on failure.
 My approach would be to add a flag to the iTask that the command
 requires to set force_flush after successful completion. Is it that
 what you mean?
 Set on success.  Lack of sleep.

I have send a v2 following your suggestions.

Peter




[Qemu-devel] [PATCH for-2.4 09/10] block/iscsi: bump year in copyright notice

2015-04-16 Thread Peter Lieven
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 3d0ffeb..04c1309 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -2,7 +2,7 @@
  * QEMU Block driver for iSCSI images
  *
  * Copyright (c) 2010-2011 Ronnie Sahlberg ronniesahlb...@gmail.com
- * Copyright (c) 2012-2014 Peter Lieven p...@kamp.de
+ * Copyright (c) 2012-2015 Peter Lieven p...@kamp.de
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the Software), to 
deal
-- 
1.9.1




[Qemu-devel] [PATCH for-2.4 06/10] block/iscsi: increase retry count

2015-04-16 Thread Peter Lieven
The idea is that a command is retried in a BUSY condition
up a time of approx. 60 seconds before it is failed. This should
be far higher than any command timeout in the guest.

Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 7fb04d7..a4902ea 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -102,7 +102,7 @@ typedef struct IscsiAIOCB {
 #define NOP_INTERVAL 5000
 #define MAX_NOP_FAILURES 3
 #define ISCSI_CMD_RETRIES ARRAY_SIZE(iscsi_retry_times)
-static const unsigned iscsi_retry_times[] = {8, 32, 128, 512, 2048};
+static const unsigned iscsi_retry_times[] = {8, 32, 128, 512, 2048, 8192, 
32768};
 
 /* this threshold is a trade-off knob to choose between
  * the potential additional overhead of an extra GET_LBA_STATUS request
-- 
1.9.1




Re: [Qemu-devel] [PATCH for-2.4 05/10] block/iscsi: optimize WRITE10/16 if cache.writeback is not set

2015-04-16 Thread Peter Lieven

Am 16.04.2015 um 14:42 schrieb Paolo Bonzini:


On 16/04/2015 14:18, Peter Lieven wrote:

SCSI allowes to tell the target to not return from a write command
if the date is not written to the disk. Use this so called FUA
bit if it is supported to optimize WRITE commands if writeback is
not allowed.

In this case qemu always issues a WRITE followed by a FLUSH. This
is 2 round trip times. If we set the FUA bit we can ignore the
following FLUSH.

Signed-off-by: Peter Lieven p...@kamp.de
---
  block/iscsi.c | 15 +--
  1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 237faa1..7fb04d7 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -68,6 +68,7 @@ typedef struct IscsiLun {
  bool lbprz;
  bool dpofua;
  bool has_write_same;
+bool force_next_flush;
  } IscsiLun;
  
  typedef struct IscsiTask {

@@ -370,6 +371,7 @@ static int coroutine_fn iscsi_co_writev(BlockDriverState 
*bs,
  struct IscsiTask iTask;
  uint64_t lba;
  uint32_t num_sectors;
+int fua = iscsilun-dpofua  !bs-enable_write_cache;
  
  if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {

  return -EINVAL;
@@ -388,12 +390,12 @@ retry:
  if (iscsilun-use_16_for_rw) {
  iTask.task = iscsi_write16_task(iscsilun-iscsi, iscsilun-lun, lba,
  NULL, num_sectors * 
iscsilun-block_size,
-iscsilun-block_size, 0, 0, 0, 0, 0,
+iscsilun-block_size, 0, 0, fua, 0, 0,
  iscsi_co_generic_cb, iTask);
  } else {
  iTask.task = iscsi_write10_task(iscsilun-iscsi, iscsilun-lun, lba,
  NULL, num_sectors * 
iscsilun-block_size,
-iscsilun-block_size, 0, 0, 0, 0, 0,
+iscsilun-block_size, 0, 0, fua, 0, 0,
  iscsi_co_generic_cb, iTask);
  }
  if (iTask.task == NULL) {
@@ -621,6 +623,11 @@ static int coroutine_fn iscsi_co_flush(BlockDriverState 
*bs)
  return 0;
  }
  
+if (iscsilun-dpofua  !bs-enable_write_cache 

+!iscsilun-force_next_flush) {
+return 0;
+}
+
  iscsi_co_init_iscsitask(iscsilun, iTask);
  
  retry:

@@ -648,6 +655,8 @@ retry:
  return -EIO;
  }
  
+iscsilun-force_next_flush = false;

You still need a flush if you do WRITE SAME, WRITE+FUA, WRITE+FUA.
Also, since bs-enable_write_cache can be toggled arbitrarily, I would
prefer to set force_next_flush on all non-FUA writes, including those
done with bs-enable_write_cache.


Good idea. So we can avoid flushes if there was never a write before.




  return 0;
  }
  
@@ -969,6 +978,8 @@ retry:

  iscsi_allocationmap_set(iscsilun, sector_num, nb_sectors);
  }
  
+iscsilun-force_next_flush = true;

Also, I think it is iscsi_co_generic_cb that should set
force_next_flush, so that it is only set on failure.  Not really for the
optimization value, but because it's clearer.


I don't get what you mean with it should only set on failure.
My approach would be to add a flag to the iTask that the command
requires to set force_flush after successful completion. Is it that
what you mean?



I think that if you do these changes, iscsi_co_flush can just check if
(!iscsilun-force_next_flush).

But still---nice approach. :)


Thanks :-)

Peter



[Qemu-devel] [PATCH for-2.4 01/10] block/iscsi: do not forget to logout from target

2015-04-16 Thread Peter Lieven
We actually were always impolitely dropping the connection and
not cleanly logging out.

CC: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index ba33290..be8af46 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1501,6 +1501,9 @@ out:
 
 if (ret) {
 if (iscsi != NULL) {
+if (iscsi_is_logged_in(iscsi)) {
+iscsi_logout_sync(iscsi);
+}
 iscsi_destroy_context(iscsi);
 }
 memset(iscsilun, 0, sizeof(IscsiLun));
@@ -1514,6 +1517,9 @@ static void iscsi_close(BlockDriverState *bs)
 struct iscsi_context *iscsi = iscsilun-iscsi;
 
 iscsi_detach_aio_context(bs);
+if (iscsi_is_logged_in(iscsi)) {
+iscsi_logout_sync(iscsi);
+}
 iscsi_destroy_context(iscsi);
 g_free(iscsilun-zeroblock);
 g_free(iscsilun-allocationmap);
-- 
1.9.1




[Qemu-devel] [PATCH for-2.4 04/10] block/iscsi: store DPOFUA bit from the modesense command

2015-04-16 Thread Peter Lieven
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index 221c9fc..237faa1 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -66,6 +66,7 @@ typedef struct IscsiLun {
 bool write_protected;
 bool lbpme;
 bool lbprz;
+bool dpofua;
 bool has_write_same;
 } IscsiLun;
 
@@ -1258,6 +1259,7 @@ static void iscsi_modesense_sync(IscsiLun *iscsilun)
 struct scsi_task *task;
 struct scsi_mode_sense *ms = NULL;
 iscsilun-write_protected = false;
+iscsilun-dpofua = false;
 
 task = iscsi_modesense6_sync(iscsilun-iscsi, iscsilun-lun,
  1, SCSI_MODESENSE_PC_CURRENT,
@@ -1279,6 +1281,7 @@ static void iscsi_modesense_sync(IscsiLun *iscsilun)
 goto out;
 }
 iscsilun-write_protected = ms-device_specific_parameter  0x80;
+iscsilun-dpofua  = ms-device_specific_parameter  0x10;
 
 out:
 if (task) {
-- 
1.9.1




Re: [Qemu-devel] [PATCH for-2.4 07/10] block/iscsi: bump libiscsi requirement to 1.10.0

2015-04-16 Thread Peter Lieven

Am 16.04.2015 um 14:33 schrieb Paolo Bonzini:


On 16/04/2015 14:18, Peter Lieven wrote:

We need this to support SCSI_STATUS_TASK_SET_FULL.

Any reason apart from the missing constant?


No, but I wanted to avoid starting checking for constants that were added 
shortly after this.
You can't check with #ifdef for a constant in an enum.
Libiscsi 1.10 was released in September 2013.

Peter




Paolo


Signed-off-by: Peter Lieven p...@kamp.de
---
  configure | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index 6969f6f..f73b8d0 100755
--- a/configure
+++ b/configure
@@ -3630,15 +3630,15 @@ if compile_prog   ; then
  fi
  
  ##

-# Do we have libiscsi = 1.9.0
+# Do we have libiscsi = 1.10.0
  if test $libiscsi != no ; then
-  if $pkg_config --atleast-version=1.9.0 libiscsi; then
+  if $pkg_config --atleast-version=1.10.0 libiscsi; then
  libiscsi=yes
  libiscsi_cflags=$($pkg_config --cflags libiscsi)
  libiscsi_libs=$($pkg_config --libs libiscsi)
else
  if test $libiscsi = yes ; then
-  feature_not_found libiscsi Install libiscsi = 1.9.0
+  feature_not_found libiscsi Install libiscsi = 1.10.0
  fi
  libiscsi=no
fi




--

Mit freundlichen Grüßen

Peter Lieven

...

  KAMP Netzwerkdienste GmbH
  Vestische Str. 89-91 | 46117 Oberhausen
  Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
  p...@kamp.de | http://www.kamp.de

  Geschäftsführer: Heiner Lante | Michael Lante
  Amtsgericht Duisburg | HRB Nr. 12154
  USt-Id-Nr.: DE 120607556

...




[Qemu-devel] [PATCH for-2.4 08/10] block/iscsi: handle SCSI_STATUS_TASK_SET_FULL

2015-04-16 Thread Peter Lieven
a target may issue a SCSI_STATUS_TASK_SET_FULL status
if there is more than one BUSY command queued already.

Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index a4902ea..3d0ffeb 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -185,10 +185,10 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int 
status,
 iTask-do_retry = 1;
 goto out;
 }
-if (status == SCSI_STATUS_BUSY) {
+if (status == SCSI_STATUS_BUSY || status == 
SCSI_STATUS_TASK_SET_FULL) {
 unsigned retry_time =
 exp_random(iscsi_retry_times[iTask-retries - 1]);
-error_report(iSCSI Busy (retry #%u in %u ms): %s,
+error_report(iSCSI Busy/TaskSetFull (retry #%u in %u ms): %s,
  iTask-retries, retry_time,
  iscsi_get_error(iscsi));
 aio_timer_init(iTask-iscsilun-aio_context,
-- 
1.9.1




[Qemu-devel] [PATCH for-2.4 07/10] block/iscsi: bump libiscsi requirement to 1.10.0

2015-04-16 Thread Peter Lieven
We need this to support SCSI_STATUS_TASK_SET_FULL.

Signed-off-by: Peter Lieven p...@kamp.de
---
 configure | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index 6969f6f..f73b8d0 100755
--- a/configure
+++ b/configure
@@ -3630,15 +3630,15 @@ if compile_prog   ; then
 fi
 
 ##
-# Do we have libiscsi = 1.9.0
+# Do we have libiscsi = 1.10.0
 if test $libiscsi != no ; then
-  if $pkg_config --atleast-version=1.9.0 libiscsi; then
+  if $pkg_config --atleast-version=1.10.0 libiscsi; then
 libiscsi=yes
 libiscsi_cflags=$($pkg_config --cflags libiscsi)
 libiscsi_libs=$($pkg_config --libs libiscsi)
   else
 if test $libiscsi = yes ; then
-  feature_not_found libiscsi Install libiscsi = 1.9.0
+  feature_not_found libiscsi Install libiscsi = 1.10.0
 fi
 libiscsi=no
   fi
-- 
1.9.1




[Qemu-devel] [PATCH for-2.4 10/10] block/iscsi: use the allocationmap also if cache.direct=on

2015-04-16 Thread Peter Lieven
the allocationmap has only a hint character. The driver always
double checks that blocks marked unallocated in the cache are
still unallocated before taking the fast path and return zeroes.
So using the allocationmap is migration safe and can
also be enabled with cache.direct=on.

Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 04c1309..0737354 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1495,7 +1495,7 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 iscsilun-bl.opt_unmap_gran * iscsilun-block_size = 16 * 1024 * 
1024) {
 iscsilun-cluster_sectors = (iscsilun-bl.opt_unmap_gran *
  iscsilun-block_size)  BDRV_SECTOR_BITS;
-if (iscsilun-lbprz  !(bs-open_flags  BDRV_O_NOCACHE)) {
+if (iscsilun-lbprz) {
 iscsilun-allocationmap = iscsi_allocationmap_init(iscsilun);
 if (iscsilun-allocationmap == NULL) {
 ret = -ENOMEM;
-- 
1.9.1




[Qemu-devel] [PATCH for-2.4 03/10] block/iscsi: rename iscsi_write_protected and let it return void

2015-04-16 Thread Peter Lieven
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 6cf7e99..221c9fc 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1253,11 +1253,11 @@ static void iscsi_attach_aio_context(BlockDriverState 
*bs,
   iscsi_timed_set_events, iscsilun);
 }
 
-static bool iscsi_is_write_protected(IscsiLun *iscsilun)
+static void iscsi_modesense_sync(IscsiLun *iscsilun)
 {
 struct scsi_task *task;
 struct scsi_mode_sense *ms = NULL;
-bool wrprotected = false;
+iscsilun-write_protected = false;
 
 task = iscsi_modesense6_sync(iscsilun-iscsi, iscsilun-lun,
  1, SCSI_MODESENSE_PC_CURRENT,
@@ -1278,13 +1278,12 @@ static bool iscsi_is_write_protected(IscsiLun *iscsilun)
  iscsi_get_error(iscsilun-iscsi));
 goto out;
 }
-wrprotected = ms-device_specific_parameter  0x80;
+iscsilun-write_protected = ms-device_specific_parameter  0x80;
 
 out:
 if (task) {
 scsi_free_scsi_task(task);
 }
-return wrprotected;
 }
 
 /*
@@ -1403,7 +1402,8 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 scsi_free_scsi_task(task);
 task = NULL;
 
-iscsilun-write_protected = iscsi_is_write_protected(iscsilun);
+iscsi_modesense_sync(iscsilun);
+
 /* Check the write protect flag of the LUN if we want to write */
 if (iscsilun-type == TYPE_DISK  (flags  BDRV_O_RDWR) 
 iscsilun-write_protected) {
-- 
1.9.1




[Qemu-devel] [PATCH for-2.4 02/10] block/iscsi: change all iscsilun properties from uint8_t to bool

2015-04-16 Thread Peter Lieven
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index be8af46..6cf7e99 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -57,9 +57,6 @@ typedef struct IscsiLun {
 int events;
 QEMUTimer *nop_timer;
 QEMUTimer *event_timer;
-uint8_t lbpme;
-uint8_t lbprz;
-uint8_t has_write_same;
 struct scsi_inquiry_logical_block_provisioning lbp;
 struct scsi_inquiry_block_limits bl;
 unsigned char *zeroblock;
@@ -67,6 +64,9 @@ typedef struct IscsiLun {
 int cluster_sectors;
 bool use_16_for_rw;
 bool write_protected;
+bool lbpme;
+bool lbprz;
+bool has_write_same;
 } IscsiLun;
 
 typedef struct IscsiTask {
@@ -460,7 +460,7 @@ static int64_t coroutine_fn 
iscsi_co_get_block_status(BlockDriverState *bs,
 *pnum = nb_sectors;
 
 /* LUN does not support logical block provisioning */
-if (iscsilun-lbpme == 0) {
+if (!iscsilun-lbpme) {
 goto out;
 }
 
@@ -1121,8 +1121,8 @@ static void iscsi_readcapacity_sync(IscsiLun *iscsilun, 
Error **errp)
 } else {
 iscsilun-block_size = rc16-block_length;
 iscsilun-num_blocks = rc16-returned_lba + 1;
-iscsilun-lbpme = rc16-lbpme;
-iscsilun-lbprz = rc16-lbprz;
+iscsilun-lbpme = !!rc16-lbpme;
+iscsilun-lbprz = !!rc16-lbprz;
 iscsilun-use_16_for_rw = (rc16-returned_lba  
0x);
 }
 }
@@ -1655,7 +1655,7 @@ out:
 static int iscsi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
 IscsiLun *iscsilun = bs-opaque;
-bdi-unallocated_blocks_are_zero = !!iscsilun-lbprz;
+bdi-unallocated_blocks_are_zero = iscsilun-lbprz;
 bdi-can_write_zeroes_with_unmap = iscsilun-lbprz  iscsilun-lbp.lbpws;
 bdi-cluster_size = iscsilun-cluster_sectors * BDRV_SECTOR_SIZE;
 return 0;
-- 
1.9.1




[Qemu-devel] [PATCH for-2.4 00/10] various improvements for the iSCSI driver

2015-04-16 Thread Peter Lieven
Peter Lieven (10):
  block/iscsi: do not forget to logout from target
  block/iscsi: change all iscsilun properties from uint8_t to bool
  block/iscsi: rename iscsi_write_protected and let it return void
  block/iscsi: store DPOFUA bit from the modesense command
  block/iscsi: optimize WRITE10/16 if cache.writeback is not set
  block/iscsi: increase retry count
  block/iscsi: bump libiscsi requirement to 1.10.0
  block/iscsi: handle SCSI_STATUS_TASK_SET_FULL
  block/iscsi: bump year in copyright notice
  block/iscsi: use the allocationmap also if cache.direct=on

 block/iscsi.c | 58 +++---
 configure |  6 +++---
 2 files changed, 42 insertions(+), 22 deletions(-)

-- 
1.9.1




[Qemu-devel] [PATCH for-2.4 05/10] block/iscsi: optimize WRITE10/16 if cache.writeback is not set

2015-04-16 Thread Peter Lieven
SCSI allowes to tell the target to not return from a write command
if the date is not written to the disk. Use this so called FUA
bit if it is supported to optimize WRITE commands if writeback is
not allowed.

In this case qemu always issues a WRITE followed by a FLUSH. This
is 2 round trip times. If we set the FUA bit we can ignore the
following FLUSH.

Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 237faa1..7fb04d7 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -68,6 +68,7 @@ typedef struct IscsiLun {
 bool lbprz;
 bool dpofua;
 bool has_write_same;
+bool force_next_flush;
 } IscsiLun;
 
 typedef struct IscsiTask {
@@ -370,6 +371,7 @@ static int coroutine_fn iscsi_co_writev(BlockDriverState 
*bs,
 struct IscsiTask iTask;
 uint64_t lba;
 uint32_t num_sectors;
+int fua = iscsilun-dpofua  !bs-enable_write_cache;
 
 if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
 return -EINVAL;
@@ -388,12 +390,12 @@ retry:
 if (iscsilun-use_16_for_rw) {
 iTask.task = iscsi_write16_task(iscsilun-iscsi, iscsilun-lun, lba,
 NULL, num_sectors * 
iscsilun-block_size,
-iscsilun-block_size, 0, 0, 0, 0, 0,
+iscsilun-block_size, 0, 0, fua, 0, 0,
 iscsi_co_generic_cb, iTask);
 } else {
 iTask.task = iscsi_write10_task(iscsilun-iscsi, iscsilun-lun, lba,
 NULL, num_sectors * 
iscsilun-block_size,
-iscsilun-block_size, 0, 0, 0, 0, 0,
+iscsilun-block_size, 0, 0, fua, 0, 0,
 iscsi_co_generic_cb, iTask);
 }
 if (iTask.task == NULL) {
@@ -621,6 +623,11 @@ static int coroutine_fn iscsi_co_flush(BlockDriverState 
*bs)
 return 0;
 }
 
+if (iscsilun-dpofua  !bs-enable_write_cache 
+!iscsilun-force_next_flush) {
+return 0;
+}
+
 iscsi_co_init_iscsitask(iscsilun, iTask);
 
 retry:
@@ -648,6 +655,8 @@ retry:
 return -EIO;
 }
 
+iscsilun-force_next_flush = false;
+
 return 0;
 }
 
@@ -969,6 +978,8 @@ retry:
 iscsi_allocationmap_set(iscsilun, sector_num, nb_sectors);
 }
 
+iscsilun-force_next_flush = true;
+
 return 0;
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH for-2.4 V2 8/9] block/iscsi: bump year in copyright notice

2015-04-16 Thread Peter Lieven
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 328907b..8364f97 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -2,7 +2,7 @@
  * QEMU Block driver for iSCSI images
  *
  * Copyright (c) 2010-2011 Ronnie Sahlberg ronniesahlb...@gmail.com
- * Copyright (c) 2012-2014 Peter Lieven p...@kamp.de
+ * Copyright (c) 2012-2015 Peter Lieven p...@kamp.de
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the Software), to 
deal
-- 
1.9.1




[Qemu-devel] [PATCH for-2.4 V2 0/9] various improvements for the iSCSI driver

2015-04-16 Thread Peter Lieven
v1-v2: - removed the requirement for libiscsi 1.10.0 [Paolo]
- reworked to force_next_flush logic [Paolo]

Peter Lieven (9):
  block/iscsi: do not forget to logout from target
  block/iscsi: change all iscsilun properties from uint8_t to bool
  block/iscsi: rename iscsi_write_protected and let it return void
  block/iscsi: store DPOFUA bit from the modesense command
  block/iscsi: optimize WRITE10/16 if cache.writeback is not set
  block/iscsi: increase retry count
  block/iscsi: handle SCSI_STATUS_TASK_SET_FULL
  block/iscsi: bump year in copyright notice
  block/iscsi: use the allocationmap also if cache.direct=on

 block/iscsi.c | 64 ---
 1 file changed, 44 insertions(+), 20 deletions(-)

-- 
1.9.1




[Qemu-devel] [PATCH for-2.4 V2 2/9] block/iscsi: change all iscsilun properties from uint8_t to bool

2015-04-16 Thread Peter Lieven
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index be8af46..6cf7e99 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -57,9 +57,6 @@ typedef struct IscsiLun {
 int events;
 QEMUTimer *nop_timer;
 QEMUTimer *event_timer;
-uint8_t lbpme;
-uint8_t lbprz;
-uint8_t has_write_same;
 struct scsi_inquiry_logical_block_provisioning lbp;
 struct scsi_inquiry_block_limits bl;
 unsigned char *zeroblock;
@@ -67,6 +64,9 @@ typedef struct IscsiLun {
 int cluster_sectors;
 bool use_16_for_rw;
 bool write_protected;
+bool lbpme;
+bool lbprz;
+bool has_write_same;
 } IscsiLun;
 
 typedef struct IscsiTask {
@@ -460,7 +460,7 @@ static int64_t coroutine_fn 
iscsi_co_get_block_status(BlockDriverState *bs,
 *pnum = nb_sectors;
 
 /* LUN does not support logical block provisioning */
-if (iscsilun-lbpme == 0) {
+if (!iscsilun-lbpme) {
 goto out;
 }
 
@@ -1121,8 +1121,8 @@ static void iscsi_readcapacity_sync(IscsiLun *iscsilun, 
Error **errp)
 } else {
 iscsilun-block_size = rc16-block_length;
 iscsilun-num_blocks = rc16-returned_lba + 1;
-iscsilun-lbpme = rc16-lbpme;
-iscsilun-lbprz = rc16-lbprz;
+iscsilun-lbpme = !!rc16-lbpme;
+iscsilun-lbprz = !!rc16-lbprz;
 iscsilun-use_16_for_rw = (rc16-returned_lba  
0x);
 }
 }
@@ -1655,7 +1655,7 @@ out:
 static int iscsi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
 IscsiLun *iscsilun = bs-opaque;
-bdi-unallocated_blocks_are_zero = !!iscsilun-lbprz;
+bdi-unallocated_blocks_are_zero = iscsilun-lbprz;
 bdi-can_write_zeroes_with_unmap = iscsilun-lbprz  iscsilun-lbp.lbpws;
 bdi-cluster_size = iscsilun-cluster_sectors * BDRV_SECTOR_SIZE;
 return 0;
-- 
1.9.1




[Qemu-devel] [PATCH for-2.4 V2 4/9] block/iscsi: store DPOFUA bit from the modesense command

2015-04-16 Thread Peter Lieven
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index 221c9fc..237faa1 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -66,6 +66,7 @@ typedef struct IscsiLun {
 bool write_protected;
 bool lbpme;
 bool lbprz;
+bool dpofua;
 bool has_write_same;
 } IscsiLun;
 
@@ -1258,6 +1259,7 @@ static void iscsi_modesense_sync(IscsiLun *iscsilun)
 struct scsi_task *task;
 struct scsi_mode_sense *ms = NULL;
 iscsilun-write_protected = false;
+iscsilun-dpofua = false;
 
 task = iscsi_modesense6_sync(iscsilun-iscsi, iscsilun-lun,
  1, SCSI_MODESENSE_PC_CURRENT,
@@ -1279,6 +1281,7 @@ static void iscsi_modesense_sync(IscsiLun *iscsilun)
 goto out;
 }
 iscsilun-write_protected = ms-device_specific_parameter  0x80;
+iscsilun-dpofua  = ms-device_specific_parameter  0x10;
 
 out:
 if (task) {
-- 
1.9.1




[Qemu-devel] [PATCH for-2.4 V2 3/9] block/iscsi: rename iscsi_write_protected and let it return void

2015-04-16 Thread Peter Lieven
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 6cf7e99..221c9fc 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1253,11 +1253,11 @@ static void iscsi_attach_aio_context(BlockDriverState 
*bs,
   iscsi_timed_set_events, iscsilun);
 }
 
-static bool iscsi_is_write_protected(IscsiLun *iscsilun)
+static void iscsi_modesense_sync(IscsiLun *iscsilun)
 {
 struct scsi_task *task;
 struct scsi_mode_sense *ms = NULL;
-bool wrprotected = false;
+iscsilun-write_protected = false;
 
 task = iscsi_modesense6_sync(iscsilun-iscsi, iscsilun-lun,
  1, SCSI_MODESENSE_PC_CURRENT,
@@ -1278,13 +1278,12 @@ static bool iscsi_is_write_protected(IscsiLun *iscsilun)
  iscsi_get_error(iscsilun-iscsi));
 goto out;
 }
-wrprotected = ms-device_specific_parameter  0x80;
+iscsilun-write_protected = ms-device_specific_parameter  0x80;
 
 out:
 if (task) {
 scsi_free_scsi_task(task);
 }
-return wrprotected;
 }
 
 /*
@@ -1403,7 +1402,8 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 scsi_free_scsi_task(task);
 task = NULL;
 
-iscsilun-write_protected = iscsi_is_write_protected(iscsilun);
+iscsi_modesense_sync(iscsilun);
+
 /* Check the write protect flag of the LUN if we want to write */
 if (iscsilun-type == TYPE_DISK  (flags  BDRV_O_RDWR) 
 iscsilun-write_protected) {
-- 
1.9.1




[Qemu-devel] [PATCH for-2.4 V2 5/9] block/iscsi: optimize WRITE10/16 if cache.writeback is not set

2015-04-16 Thread Peter Lieven
SCSI allowes to tell the target to not return from a write command
if the date is not written to the disk. Use this so called FUA
bit if it is supported to optimize WRITE commands if writeback is
not allowed.

In this case qemu always issues a WRITE followed by a FLUSH. This
is 2 round trip times. If we set the FUA bit we can ignore the
following FLUSH.

Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 237faa1..600 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -68,6 +68,7 @@ typedef struct IscsiLun {
 bool lbprz;
 bool dpofua;
 bool has_write_same;
+bool force_next_flush;
 } IscsiLun;
 
 typedef struct IscsiTask {
@@ -80,6 +81,7 @@ typedef struct IscsiTask {
 QEMUBH *bh;
 IscsiLun *iscsilun;
 QEMUTimer retry_timer;
+bool force_next_flush;
 } IscsiTask;
 
 typedef struct IscsiAIOCB {
@@ -200,6 +202,8 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
 }
 }
 error_report(iSCSI Failure: %s, iscsi_get_error(iscsi));
+} else {
+iTask-iscsilun-force_next_flush |= iTask-force_next_flush;
 }
 
 out:
@@ -370,6 +374,7 @@ static int coroutine_fn iscsi_co_writev(BlockDriverState 
*bs,
 struct IscsiTask iTask;
 uint64_t lba;
 uint32_t num_sectors;
+int fua;
 
 if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
 return -EINVAL;
@@ -385,15 +390,17 @@ static int coroutine_fn iscsi_co_writev(BlockDriverState 
*bs,
 num_sectors = sector_qemu2lun(nb_sectors, iscsilun);
 iscsi_co_init_iscsitask(iscsilun, iTask);
 retry:
+fua = iscsilun-dpofua  !bs-enable_write_cache;
+iTask.force_next_flush = !fua;
 if (iscsilun-use_16_for_rw) {
 iTask.task = iscsi_write16_task(iscsilun-iscsi, iscsilun-lun, lba,
 NULL, num_sectors * 
iscsilun-block_size,
-iscsilun-block_size, 0, 0, 0, 0, 0,
+iscsilun-block_size, 0, 0, fua, 0, 0,
 iscsi_co_generic_cb, iTask);
 } else {
 iTask.task = iscsi_write10_task(iscsilun-iscsi, iscsilun-lun, lba,
 NULL, num_sectors * 
iscsilun-block_size,
-iscsilun-block_size, 0, 0, 0, 0, 0,
+iscsilun-block_size, 0, 0, fua, 0, 0,
 iscsi_co_generic_cb, iTask);
 }
 if (iTask.task == NULL) {
@@ -621,8 +628,12 @@ static int coroutine_fn iscsi_co_flush(BlockDriverState 
*bs)
 return 0;
 }
 
-iscsi_co_init_iscsitask(iscsilun, iTask);
+if (!iscsilun-force_next_flush) {
+return 0;
+}
+iscsilun-force_next_flush = false;
 
+iscsi_co_init_iscsitask(iscsilun, iTask);
 retry:
 if (iscsi_synchronizecache10_task(iscsilun-iscsi, iscsilun-lun, 0, 0, 0,
   0, iscsi_co_generic_cb, iTask) == NULL) 
{
@@ -918,6 +929,7 @@ coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, 
int64_t sector_num,
 }
 
 iscsi_co_init_iscsitask(iscsilun, iTask);
+iTask.force_next_flush = true;
 retry:
 if (use_16_for_ws) {
 iTask.task = iscsi_writesame16_task(iscsilun-iscsi, iscsilun-lun, 
lba,
-- 
1.9.1




[Qemu-devel] [PATCH for-2.4 V2 7/9] block/iscsi: handle SCSI_STATUS_TASK_SET_FULL

2015-04-16 Thread Peter Lieven
a target may issue a SCSI_STATUS_TASK_SET_FULL status
if there is more than one BUSY command queued already.

Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 5999f74..328907b 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -186,10 +186,13 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int 
status,
 iTask-do_retry = 1;
 goto out;
 }
-if (status == SCSI_STATUS_BUSY) {
+/* status 0x28 is SCSI_TASK_SET_FULL. It was first introduced
+ * in libiscsi 1.10.0. Hardcode this value here to avoid
+ * the need to bump the libiscsi requirement to 1.10.0 */
+if (status == SCSI_STATUS_BUSY || status == 0x28) {
 unsigned retry_time =
 exp_random(iscsi_retry_times[iTask-retries - 1]);
-error_report(iSCSI Busy (retry #%u in %u ms): %s,
+error_report(iSCSI Busy/TaskSetFull (retry #%u in %u ms): %s,
  iTask-retries, retry_time,
  iscsi_get_error(iscsi));
 aio_timer_init(iTask-iscsilun-aio_context,
-- 
1.9.1




[Qemu-devel] [PATCH for-2.4 V2 1/9] block/iscsi: do not forget to logout from target

2015-04-16 Thread Peter Lieven
We actually were always impolitely dropping the connection and
not cleanly logging out.

CC: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index ba33290..be8af46 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1501,6 +1501,9 @@ out:
 
 if (ret) {
 if (iscsi != NULL) {
+if (iscsi_is_logged_in(iscsi)) {
+iscsi_logout_sync(iscsi);
+}
 iscsi_destroy_context(iscsi);
 }
 memset(iscsilun, 0, sizeof(IscsiLun));
@@ -1514,6 +1517,9 @@ static void iscsi_close(BlockDriverState *bs)
 struct iscsi_context *iscsi = iscsilun-iscsi;
 
 iscsi_detach_aio_context(bs);
+if (iscsi_is_logged_in(iscsi)) {
+iscsi_logout_sync(iscsi);
+}
 iscsi_destroy_context(iscsi);
 g_free(iscsilun-zeroblock);
 g_free(iscsilun-allocationmap);
-- 
1.9.1




<    4   5   6   7   8   9   10   11   12   13   >