Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process

2015-02-18 Thread Jason J. Herne

On 02/05/2015 11:26 AM, Jason J. Herne wrote:

Hello Eric,

it has been quite a while since we have looked at this :). I'm
revisiting this problem and I find that it still exists. Here is an
archive link of our previous discussions on this:

https://www.redhat.com/archives/libvir-list/2014-July/thread.html#00809

Basically, when I do a migration and run out of disk space I do not see
a meaningful error message without this patch. Early on in our previous
discussions you asked me how I had gotten through the
qemuDomainSaveMemory function without hitting this code:

if (virFileWrapperFdClose(wrapperFd)  0)
 goto cleanup;

If qemuMigrationToFile fails we go directly to cleanup and do not pass
through the virFileWrapperFdClose call. For reference:

 /* Perform the migration */
 if (qemuMigrationToFile(driver, vm, fd, offset, path,
 qemuCompressProgramName(compressed),
 bypassSecurityDriver,
 asyncJob)  0)
 goto cleanup;

This is why my patch adds the additional call to virFileWrapperFdClose.
Perhaps, if we cannot tolerate calling it twice, I can adjust the code
such that we only call it once.

For some odd reason, you seem to see messages that I do not (take look
at my final message in the thread linked above). I was not able to see
your goodbye world \n message.

I would be interested to know what you see when you run my test case
without my patch. If you feel like trying it just create a 50MB disk
image and loop mount it:

   mount -o loop /usr/local/var/lib/libvirt/qemu/save.img
/usr/local/var/lib/libvirt/qemu/save

Then:
 virsh managedsave guestname

And report what you saw on the command line and the libvirtd console
and/or log depending on how you run the daemon. I suspect (since you saw
goodbye world) that you would also see the out of disk space message
ok. If I'm correct, then the question becomes... why don't I see it too? :)

Thanks for your time.



Polite Ping? :)

--
-- Jason J. Herne (jjhe...@linux.vnet.ibm.com)

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process

2015-02-05 Thread Jason J. Herne

Hello Eric,

it has been quite a while since we have looked at this :). I'm 
revisiting this problem and I find that it still exists. Here is an 
archive link of our previous discussions on this:


https://www.redhat.com/archives/libvir-list/2014-July/thread.html#00809

Basically, when I do a migration and run out of disk space I do not see 
a meaningful error message without this patch. Early on in our previous 
discussions you asked me how I had gotten through the 
qemuDomainSaveMemory function without hitting this code:


if (virFileWrapperFdClose(wrapperFd)  0)
goto cleanup;

If qemuMigrationToFile fails we go directly to cleanup and do not pass 
through the virFileWrapperFdClose call. For reference:


/* Perform the migration */
if (qemuMigrationToFile(driver, vm, fd, offset, path,
qemuCompressProgramName(compressed),
bypassSecurityDriver,
asyncJob)  0)
goto cleanup;

This is why my patch adds the additional call to virFileWrapperFdClose. 
Perhaps, if we cannot tolerate calling it twice, I can adjust the code 
such that we only call it once.


For some odd reason, you seem to see messages that I do not (take look 
at my final message in the thread linked above). I was not able to see 
your goodbye world \n message.


I would be interested to know what you see when you run my test case 
without my patch. If you feel like trying it just create a 50MB disk 
image and loop mount it:


  mount -o loop /usr/local/var/lib/libvirt/qemu/save.img 
/usr/local/var/lib/libvirt/qemu/save


Then:
virsh managedsave guestname

And report what you saw on the command line and the libvirtd console 
and/or log depending on how you run the daemon. I suspect (since you saw 
goodbye world) that you would also see the out of disk space message 
ok. If I'm correct, then the question becomes... why don't I see it too? :)


Thanks for your time.

--
-- Jason J. Herne (jjhe...@linux.vnet.ibm.com)

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process

2014-07-30 Thread Jason Herne

Hi Eric,

Thanks for continuing work on this.

To repeat my test:
Make a small disk image (50MB) and mount it
at /usr/local/var/lib/libvirt/qemu/save/.
Then attempt to managed save a guest whose memory is larger, say 256MB.
The image will quickly fill and you should see the Unexpected Error
message but nothing more.

I was unable to get any meaningful error info from iohelper. I'll try your
goodbye world patch and see what I get. Things might get interesting if
my result is different :).

   - Jason J. Herne
   z/VM CP Development
   IBM Corporation, Endicott, NY
   Email - jjhe...@us.ibm.com
   Phone 607-429-5136 or tie-line 620-5136


|
| From:  |
|
  
--|
  |Eric Blake ebl...@redhat.com   
 |
  
--|
|
| To:|
|
  
--|
  |Jason Herne/Endicott/IBM@IBMUS, libvir-list@redhat.com,  
 |
  
--|
|
| Date:  |
|
  
--|
  |07/16/2014 07:57 PM  
 |
  
--|
|
| Subject:   |
|
  
--|
  |Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process
 |
  
--|





On 07/16/2014 05:20 PM, Eric Blake wrote:

 But if I then rework the iohelper patch:

 diff --git i/src/util/iohelper.c w/src/util/iohelper.c
 index 8a3c377..efb1366 100644
 --- i/src/util/iohelper.c
 +++ w/src/util/iohelper.c
 @@ -301,6 +301,7 @@ main(int argc, char **argv)
  exit(EXIT_FAILURE);
  }

 +fprintf(stderr, _(goodbye world\n)); goto error;
  if (fd  0 || runIO(path, fd, oflags, length)  0)
  goto error;


 the error is now:

 error: Failed to save domain testvm1 to /tmp/save
 error: operation failed: domain save job: unexpectedly failed

and with your patch, I see:

error: Failed to save domain testvm1 to /tmp/save
error: internal error: Child process (LIBVIRT_LOG_OUTPUTS=1:stderr
/home/eblake/libvirt/src/libvirt_iohelper /tmp/save 0 1) unexpected exit
status 1: goodbye world
/home/eblake/libvirt/src/libvirt_iohelper: unknown failure with /tmp/save

on the console, but this longer spew in libvirt's log:

2014-07-16 23:34:23.855+: 25406: error :
qemuMigrationUpdateJobStatus:1788 : operation failed: domain save job:
unexpectedly failed
2014-07-16 23:34:23.857+: 25406: error : virCommandWait:2423 :
internal error: Child process (LIBVIRT_LOG_OUTPUTS=1:stderr
/home/eblake/libvirt/src/libvirt_iohelper /tmp/save 0 1) unexpected exit
status 1: goodbye world
/home/eblake/libvirt/src/libvirt_iohelper: unknown failure with /tmp/save

2014-07-16 23:34:23.857+: 25406: warning : virFileWrapperFdClose:326
: iohelper reports: goodbye world
/home/eblake/libvirt/src/libvirt_iohelper: unknown failure with /tmp/save


so the act of closing the wrapperfd is losing the earlier error message
from being reported to the user (seems okay in this case, but might not
always be), AND logging the stderr contents twice (once via the error
reported to the user, and again due to a VIR_WARN).


 So the problem is that we have _two_ possible sources of errors (qemu
 reporting failure because it can't write to iohelper, and iohelper
 reporting an error from whatever other reason, such as disk full).  If
 qemu finishes, we have only the iohelper message and properly report it;
 but if we have both failures, then the qemu error takes priority, and in
 this case it is lower priority.  There are also cases where qemu will
 error out but iohelper succeeds (such as if qemu refuses to migrate
 because the guest

Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process

2014-07-30 Thread Jason Herne

When I use this patch:

diff --git a/src/util/iohelper.c b/src/util/iohelper.c
index 8a3c377..255365c 100644
--- a/src/util/iohelper.c
+++ b/src/util/iohelper.c
@@ -307,6 +307,7 @@ main(int argc, char **argv)
 if (delete)
 unlink(path);

+fprintf(stderr, _(goodbye world \n)); goto error;
 return 0;

  error:

The output I see on the command line is:
  # virsh managedsave kvm1
  error: Failed to save domain kvm1 state
  error: operation failed: domain save job: unexpectedly failed

libvirtd outputs the following:
  2014-07-30 14:39:39.765+: 5975: error :
qemuMigrationUpdateJobStatus:1788 : operation failed: domain save job:
unexpectedly failed

I completed my testing with the following commit from master
  commit 221b1828f9bf9d1d59d2d4b318249de9f7ec7bb8
  Author: Roman Bogorodskiy bogorods...@gmail.com
  Date:   Thu Jul 24 19:52:11 2014 +0400

We appear to be getting different results. Can you try running out of disk
space during a managed save and reporting what you get? Perhaps there is no
code problem here after all? A quick look at my libvirtd.conf shows nothing
obvious. I'm just using the default config file.--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH] Fix reporting of i/o errors by iohelper process

2014-07-16 Thread Jason J. Herne
From: Jason J. Herne jjhe...@us.ibm.com

libvirt_iohelper is a helper process that is exec'ed and used to handle I/O
during a Qemu managed save operation. Due to a missing call to
virFileWrapperFdClose, all I/O error messages reported by iohelper are lost.

This patch adds a call to virFileWrapperFdClose to the cleanup phase of
qemuDomainSaveMemory.

This patch also modifies virFileWrapperFdClose such that errors are only
reported when the length of the err_msg buffer is  0. Before now, the
existence of the buffer would trigger error reporting in virFileWrapperFdClose.

Signed-off-by: Jason J. Herne jjhe...@us.ibm.com
---
 src/qemu/qemu_driver.c | 1 +
 src/util/virfile.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index ecccf6c..8d78805 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -3015,6 +3015,7 @@ qemuDomainSaveMemory(virQEMUDriverPtr driver,
 
  cleanup:
 VIR_FORCE_CLOSE(fd);
+virFileWrapperFdClose(wrapperFd);
 virFileWrapperFdFree(wrapperFd);
 VIR_FREE(xml);
 
diff --git a/src/util/virfile.c b/src/util/virfile.c
index 463064c..813b4f5 100644
--- a/src/util/virfile.c
+++ b/src/util/virfile.c
@@ -322,7 +322,7 @@ virFileWrapperFdClose(virFileWrapperFdPtr wfd)
 return 0;
 
 ret = virCommandWait(wfd-cmd, NULL);
-if (wfd-err_msg)
+if (wfd-err_msg  strlen(wfd-err_msg))
 VIR_WARN(iohelper reports: %s, wfd-err_msg);
 
 return ret;
-- 
1.8.3.2

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process

2014-07-16 Thread Eric Blake
On 07/16/2014 08:12 AM, Jason J. Herne wrote:
 From: Jason J. Herne jjhe...@us.ibm.com
 
 libvirt_iohelper is a helper process that is exec'ed and used to handle I/O
 during a Qemu managed save operation. Due to a missing call to
 virFileWrapperFdClose, all I/O error messages reported by iohelper are lost.
 
 This patch adds a call to virFileWrapperFdClose to the cleanup phase of
 qemuDomainSaveMemory.
 
 This patch also modifies virFileWrapperFdClose such that errors are only
 reported when the length of the err_msg buffer is  0. Before now, the
 existence of the buffer would trigger error reporting in 
 virFileWrapperFdClose.
 
 Signed-off-by: Jason J. Herne jjhe...@us.ibm.com
 ---
  src/qemu/qemu_driver.c | 1 +
  src/util/virfile.c | 2 +-
  2 files changed, 2 insertions(+), 1 deletion(-)

Nice! Thanks for persevering on this one.  Congrats on your first
libvirt patch.

 
 diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
 index ecccf6c..8d78805 100644
 --- a/src/qemu/qemu_driver.c
 +++ b/src/qemu/qemu_driver.c
 @@ -3015,6 +3015,7 @@ qemuDomainSaveMemory(virQEMUDriverPtr driver,
  
   cleanup:
  VIR_FORCE_CLOSE(fd);
 +virFileWrapperFdClose(wrapperFd);
  virFileWrapperFdFree(wrapperFd);
  VIR_FREE(xml);
  
 diff --git a/src/util/virfile.c b/src/util/virfile.c
 index 463064c..813b4f5 100644
 --- a/src/util/virfile.c
 +++ b/src/util/virfile.c
 @@ -322,7 +322,7 @@ virFileWrapperFdClose(virFileWrapperFdPtr wfd)
  return 0;
  
  ret = virCommandWait(wfd-cmd, NULL);
 -if (wfd-err_msg)
 +if (wfd-err_msg  strlen(wfd-err_msg))

Micro-optimization: more efficient when written:

if (wfd-err_msg  *wfd-err_msg)

since then you don't have to waste time of crawling the entire string.

ACK based on looking at the code, and I'll push with that modification
once I've also tested it (or reply again if something turns up).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process

2014-07-16 Thread Eric Blake
On 07/16/2014 08:30 AM, Eric Blake wrote:
 On 07/16/2014 08:12 AM, Jason J. Herne wrote:
 From: Jason J. Herne jjhe...@us.ibm.com

 libvirt_iohelper is a helper process that is exec'ed and used to handle I/O
 during a Qemu managed save operation. Due to a missing call to
 virFileWrapperFdClose, all I/O error messages reported by iohelper are lost.

 This patch adds a call to virFileWrapperFdClose to the cleanup phase of
 qemuDomainSaveMemory.

 This patch also modifies virFileWrapperFdClose such that errors are only
 reported when the length of the err_msg buffer is  0. Before now, the
 existence of the buffer would trigger error reporting in 
 virFileWrapperFdClose.

 Signed-off-by: Jason J. Herne jjhe...@us.ibm.com
 ---
  src/qemu/qemu_driver.c | 1 +
  src/util/virfile.c | 2 +-
  2 files changed, 2 insertions(+), 1 deletion(-)
 
 Nice! Thanks for persevering on this one.  Congrats on your first
 libvirt patch.

Sadly, the more I look at this, the more I wonder how it can fix anything.

 

 diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
 index ecccf6c..8d78805 100644
 --- a/src/qemu/qemu_driver.c
 +++ b/src/qemu/qemu_driver.c
 @@ -3015,6 +3015,7 @@ qemuDomainSaveMemory(virQEMUDriverPtr driver,
  
   cleanup:
  VIR_FORCE_CLOSE(fd);
 +virFileWrapperFdClose(wrapperFd);
  virFileWrapperFdFree(wrapperFd);
  VIR_FREE(xml);

But qemuDomainSaveMemory() already has a call to virFileWrapperFdClose()
earlier on; worse, the current implementation of virFileWrapperFdClose()
is not designed to be called twice if an error occurred (rather, if
there is an error, two calls will end up logging the error twice, when
once would have been sufficient).  How are you getting to a point where
the cleanup label is reached without going through the earlier close?

Is there some easy setup you used in testing this patch, so that I can
reproduce a scenario where iohelper will reliably fail?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process

2014-07-16 Thread Eric Blake
On 07/16/2014 03:56 PM, Eric Blake wrote:
 On 07/16/2014 08:30 AM, Eric Blake wrote:
 On 07/16/2014 08:12 AM, Jason J. Herne wrote:
 From: Jason J. Herne jjhe...@us.ibm.com

 libvirt_iohelper is a helper process that is exec'ed and used to handle I/O
 during a Qemu managed save operation. Due to a missing call to
 virFileWrapperFdClose, all I/O error messages reported by iohelper are lost.


 +++ b/src/qemu/qemu_driver.c
 @@ -3015,6 +3015,7 @@ qemuDomainSaveMemory(virQEMUDriverPtr driver,
  
   cleanup:
  VIR_FORCE_CLOSE(fd);
 +virFileWrapperFdClose(wrapperFd);
  virFileWrapperFdFree(wrapperFd);
  VIR_FREE(xml);
 
 But qemuDomainSaveMemory() already has a call to virFileWrapperFdClose()
 earlier on; worse, the current implementation of virFileWrapperFdClose()
 is not designed to be called twice if an error occurred (rather, if
 there is an error, two calls will end up logging the error twice, when
 once would have been sufficient).  How are you getting to a point where
 the cleanup label is reached without going through the earlier close?
 
 Is there some easy setup you used in testing this patch, so that I can
 reproduce a scenario where iohelper will reliably fail?

Okay, looking into this a bit more:

I made the following change:

diff --git i/src/util/iohelper.c w/src/util/iohelper.c
index 8a3c377..dfb45e1 100644
--- i/src/util/iohelper.c
+++ w/src/util/iohelper.c
@@ -307,6 +307,7 @@ main(int argc, char **argv)
 if (delete)
 unlink(path);

+fprintf(stderr, _(goodbye world\n)); goto error;
 return 0;

  error:

then tried to do a 'virsh save testvm1 /tmp/save'.  Even without your
patch, I get a very nice error:

error: Failed to save domain testvm1 to /tmp/save
error: internal error: Child process (LIBVIRT_LOG_OUTPUTS=1:stderr
/home/eblake/libvirt/src/libvirt_iohelper /tmp/save 0 1) unexpected exit
status 1: goodbye world
/home/eblake/libvirt/src/libvirt_iohelper: unknown failure with /tmp/save

But if I then rework the iohelper patch:

diff --git i/src/util/iohelper.c w/src/util/iohelper.c
index 8a3c377..efb1366 100644
--- i/src/util/iohelper.c
+++ w/src/util/iohelper.c
@@ -301,6 +301,7 @@ main(int argc, char **argv)
 exit(EXIT_FAILURE);
 }

+fprintf(stderr, _(goodbye world\n)); goto error;
 if (fd  0 || runIO(path, fd, oflags, length)  0)
 goto error;


the error is now:

error: Failed to save domain testvm1 to /tmp/save
error: operation failed: domain save job: unexpectedly failed

So the problem is that we have _two_ possible sources of errors (qemu
reporting failure because it can't write to iohelper, and iohelper
reporting an error from whatever other reason, such as disk full).  If
qemu finishes, we have only the iohelper message and properly report it;
but if we have both failures, then the qemu error takes priority, and in
this case it is lower priority.  There are also cases where qemu will
error out but iohelper succeeds (such as if qemu refuses to migrate
because the guest has hostdev passthrough).

So I _think_ what we want to do is always check BOTH places for error;
if only one of the two fails, use that message.  If both fail, then I
don't know whether it is possible to have a heuristic for which failure
message is more meaningful, or whether it is better to report both
errors (even though it will often be the case that one error was a
knock-on effect from the other).  But I'm a bit stuck on the best way to
implement that.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process

2014-07-16 Thread Eric Blake
On 07/16/2014 05:20 PM, Eric Blake wrote:

 But if I then rework the iohelper patch:
 
 diff --git i/src/util/iohelper.c w/src/util/iohelper.c
 index 8a3c377..efb1366 100644
 --- i/src/util/iohelper.c
 +++ w/src/util/iohelper.c
 @@ -301,6 +301,7 @@ main(int argc, char **argv)
  exit(EXIT_FAILURE);
  }
 
 +fprintf(stderr, _(goodbye world\n)); goto error;
  if (fd  0 || runIO(path, fd, oflags, length)  0)
  goto error;
 
 
 the error is now:
 
 error: Failed to save domain testvm1 to /tmp/save
 error: operation failed: domain save job: unexpectedly failed

and with your patch, I see:

error: Failed to save domain testvm1 to /tmp/save
error: internal error: Child process (LIBVIRT_LOG_OUTPUTS=1:stderr
/home/eblake/libvirt/src/libvirt_iohelper /tmp/save 0 1) unexpected exit
status 1: goodbye world
/home/eblake/libvirt/src/libvirt_iohelper: unknown failure with /tmp/save

on the console, but this longer spew in libvirt's log:

2014-07-16 23:34:23.855+: 25406: error :
qemuMigrationUpdateJobStatus:1788 : operation failed: domain save job:
unexpectedly failed
2014-07-16 23:34:23.857+: 25406: error : virCommandWait:2423 :
internal error: Child process (LIBVIRT_LOG_OUTPUTS=1:stderr
/home/eblake/libvirt/src/libvirt_iohelper /tmp/save 0 1) unexpected exit
status 1: goodbye world
/home/eblake/libvirt/src/libvirt_iohelper: unknown failure with /tmp/save

2014-07-16 23:34:23.857+: 25406: warning : virFileWrapperFdClose:326
: iohelper reports: goodbye world
/home/eblake/libvirt/src/libvirt_iohelper: unknown failure with /tmp/save


so the act of closing the wrapperfd is losing the earlier error message
from being reported to the user (seems okay in this case, but might not
always be), AND logging the stderr contents twice (once via the error
reported to the user, and again due to a VIR_WARN).

 
 So the problem is that we have _two_ possible sources of errors (qemu
 reporting failure because it can't write to iohelper, and iohelper
 reporting an error from whatever other reason, such as disk full).  If
 qemu finishes, we have only the iohelper message and properly report it;
 but if we have both failures, then the qemu error takes priority, and in
 this case it is lower priority.  There are also cases where qemu will
 error out but iohelper succeeds (such as if qemu refuses to migrate
 because the guest has hostdev passthrough).
 
 So I _think_ what we want to do is always check BOTH places for error;
 if only one of the two fails, use that message.  If both fail, then I
 don't know whether it is possible to have a heuristic for which failure
 message is more meaningful, or whether it is better to report both
 errors (even though it will often be the case that one error was a
 knock-on effect from the other).  But I'm a bit stuck on the best way to
 implement that.

I'm still thinking about the best solution

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list