Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process
On 02/05/2015 11:26 AM, Jason J. Herne wrote: Hello Eric, it has been quite a while since we have looked at this :). I'm revisiting this problem and I find that it still exists. Here is an archive link of our previous discussions on this: https://www.redhat.com/archives/libvir-list/2014-July/thread.html#00809 Basically, when I do a migration and run out of disk space I do not see a meaningful error message without this patch. Early on in our previous discussions you asked me how I had gotten through the qemuDomainSaveMemory function without hitting this code: if (virFileWrapperFdClose(wrapperFd) 0) goto cleanup; If qemuMigrationToFile fails we go directly to cleanup and do not pass through the virFileWrapperFdClose call. For reference: /* Perform the migration */ if (qemuMigrationToFile(driver, vm, fd, offset, path, qemuCompressProgramName(compressed), bypassSecurityDriver, asyncJob) 0) goto cleanup; This is why my patch adds the additional call to virFileWrapperFdClose. Perhaps, if we cannot tolerate calling it twice, I can adjust the code such that we only call it once. For some odd reason, you seem to see messages that I do not (take look at my final message in the thread linked above). I was not able to see your goodbye world \n message. I would be interested to know what you see when you run my test case without my patch. If you feel like trying it just create a 50MB disk image and loop mount it: mount -o loop /usr/local/var/lib/libvirt/qemu/save.img /usr/local/var/lib/libvirt/qemu/save Then: virsh managedsave guestname And report what you saw on the command line and the libvirtd console and/or log depending on how you run the daemon. I suspect (since you saw goodbye world) that you would also see the out of disk space message ok. If I'm correct, then the question becomes... why don't I see it too? :) Thanks for your time. Polite Ping? :) -- -- Jason J. Herne (jjhe...@linux.vnet.ibm.com) -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process
Hello Eric, it has been quite a while since we have looked at this :). I'm revisiting this problem and I find that it still exists. Here is an archive link of our previous discussions on this: https://www.redhat.com/archives/libvir-list/2014-July/thread.html#00809 Basically, when I do a migration and run out of disk space I do not see a meaningful error message without this patch. Early on in our previous discussions you asked me how I had gotten through the qemuDomainSaveMemory function without hitting this code: if (virFileWrapperFdClose(wrapperFd) 0) goto cleanup; If qemuMigrationToFile fails we go directly to cleanup and do not pass through the virFileWrapperFdClose call. For reference: /* Perform the migration */ if (qemuMigrationToFile(driver, vm, fd, offset, path, qemuCompressProgramName(compressed), bypassSecurityDriver, asyncJob) 0) goto cleanup; This is why my patch adds the additional call to virFileWrapperFdClose. Perhaps, if we cannot tolerate calling it twice, I can adjust the code such that we only call it once. For some odd reason, you seem to see messages that I do not (take look at my final message in the thread linked above). I was not able to see your goodbye world \n message. I would be interested to know what you see when you run my test case without my patch. If you feel like trying it just create a 50MB disk image and loop mount it: mount -o loop /usr/local/var/lib/libvirt/qemu/save.img /usr/local/var/lib/libvirt/qemu/save Then: virsh managedsave guestname And report what you saw on the command line and the libvirtd console and/or log depending on how you run the daemon. I suspect (since you saw goodbye world) that you would also see the out of disk space message ok. If I'm correct, then the question becomes... why don't I see it too? :) Thanks for your time. -- -- Jason J. Herne (jjhe...@linux.vnet.ibm.com) -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process
Hi Eric, Thanks for continuing work on this. To repeat my test: Make a small disk image (50MB) and mount it at /usr/local/var/lib/libvirt/qemu/save/. Then attempt to managed save a guest whose memory is larger, say 256MB. The image will quickly fill and you should see the Unexpected Error message but nothing more. I was unable to get any meaningful error info from iohelper. I'll try your goodbye world patch and see what I get. Things might get interesting if my result is different :). - Jason J. Herne z/VM CP Development IBM Corporation, Endicott, NY Email - jjhe...@us.ibm.com Phone 607-429-5136 or tie-line 620-5136 | | From: | | --| |Eric Blake ebl...@redhat.com | --| | | To:| | --| |Jason Herne/Endicott/IBM@IBMUS, libvir-list@redhat.com, | --| | | Date: | | --| |07/16/2014 07:57 PM | --| | | Subject: | | --| |Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process | --| On 07/16/2014 05:20 PM, Eric Blake wrote: But if I then rework the iohelper patch: diff --git i/src/util/iohelper.c w/src/util/iohelper.c index 8a3c377..efb1366 100644 --- i/src/util/iohelper.c +++ w/src/util/iohelper.c @@ -301,6 +301,7 @@ main(int argc, char **argv) exit(EXIT_FAILURE); } +fprintf(stderr, _(goodbye world\n)); goto error; if (fd 0 || runIO(path, fd, oflags, length) 0) goto error; the error is now: error: Failed to save domain testvm1 to /tmp/save error: operation failed: domain save job: unexpectedly failed and with your patch, I see: error: Failed to save domain testvm1 to /tmp/save error: internal error: Child process (LIBVIRT_LOG_OUTPUTS=1:stderr /home/eblake/libvirt/src/libvirt_iohelper /tmp/save 0 1) unexpected exit status 1: goodbye world /home/eblake/libvirt/src/libvirt_iohelper: unknown failure with /tmp/save on the console, but this longer spew in libvirt's log: 2014-07-16 23:34:23.855+: 25406: error : qemuMigrationUpdateJobStatus:1788 : operation failed: domain save job: unexpectedly failed 2014-07-16 23:34:23.857+: 25406: error : virCommandWait:2423 : internal error: Child process (LIBVIRT_LOG_OUTPUTS=1:stderr /home/eblake/libvirt/src/libvirt_iohelper /tmp/save 0 1) unexpected exit status 1: goodbye world /home/eblake/libvirt/src/libvirt_iohelper: unknown failure with /tmp/save 2014-07-16 23:34:23.857+: 25406: warning : virFileWrapperFdClose:326 : iohelper reports: goodbye world /home/eblake/libvirt/src/libvirt_iohelper: unknown failure with /tmp/save so the act of closing the wrapperfd is losing the earlier error message from being reported to the user (seems okay in this case, but might not always be), AND logging the stderr contents twice (once via the error reported to the user, and again due to a VIR_WARN). So the problem is that we have _two_ possible sources of errors (qemu reporting failure because it can't write to iohelper, and iohelper reporting an error from whatever other reason, such as disk full). If qemu finishes, we have only the iohelper message and properly report it; but if we have both failures, then the qemu error takes priority, and in this case it is lower priority. There are also cases where qemu will error out but iohelper succeeds (such as if qemu refuses to migrate because the guest
Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process
When I use this patch: diff --git a/src/util/iohelper.c b/src/util/iohelper.c index 8a3c377..255365c 100644 --- a/src/util/iohelper.c +++ b/src/util/iohelper.c @@ -307,6 +307,7 @@ main(int argc, char **argv) if (delete) unlink(path); +fprintf(stderr, _(goodbye world \n)); goto error; return 0; error: The output I see on the command line is: # virsh managedsave kvm1 error: Failed to save domain kvm1 state error: operation failed: domain save job: unexpectedly failed libvirtd outputs the following: 2014-07-30 14:39:39.765+: 5975: error : qemuMigrationUpdateJobStatus:1788 : operation failed: domain save job: unexpectedly failed I completed my testing with the following commit from master commit 221b1828f9bf9d1d59d2d4b318249de9f7ec7bb8 Author: Roman Bogorodskiy bogorods...@gmail.com Date: Thu Jul 24 19:52:11 2014 +0400 We appear to be getting different results. Can you try running out of disk space during a managed save and reporting what you get? Perhaps there is no code problem here after all? A quick look at my libvirtd.conf shows nothing obvious. I'm just using the default config file.-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH] Fix reporting of i/o errors by iohelper process
From: Jason J. Herne jjhe...@us.ibm.com libvirt_iohelper is a helper process that is exec'ed and used to handle I/O during a Qemu managed save operation. Due to a missing call to virFileWrapperFdClose, all I/O error messages reported by iohelper are lost. This patch adds a call to virFileWrapperFdClose to the cleanup phase of qemuDomainSaveMemory. This patch also modifies virFileWrapperFdClose such that errors are only reported when the length of the err_msg buffer is 0. Before now, the existence of the buffer would trigger error reporting in virFileWrapperFdClose. Signed-off-by: Jason J. Herne jjhe...@us.ibm.com --- src/qemu/qemu_driver.c | 1 + src/util/virfile.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index ecccf6c..8d78805 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -3015,6 +3015,7 @@ qemuDomainSaveMemory(virQEMUDriverPtr driver, cleanup: VIR_FORCE_CLOSE(fd); +virFileWrapperFdClose(wrapperFd); virFileWrapperFdFree(wrapperFd); VIR_FREE(xml); diff --git a/src/util/virfile.c b/src/util/virfile.c index 463064c..813b4f5 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -322,7 +322,7 @@ virFileWrapperFdClose(virFileWrapperFdPtr wfd) return 0; ret = virCommandWait(wfd-cmd, NULL); -if (wfd-err_msg) +if (wfd-err_msg strlen(wfd-err_msg)) VIR_WARN(iohelper reports: %s, wfd-err_msg); return ret; -- 1.8.3.2 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process
On 07/16/2014 08:12 AM, Jason J. Herne wrote: From: Jason J. Herne jjhe...@us.ibm.com libvirt_iohelper is a helper process that is exec'ed and used to handle I/O during a Qemu managed save operation. Due to a missing call to virFileWrapperFdClose, all I/O error messages reported by iohelper are lost. This patch adds a call to virFileWrapperFdClose to the cleanup phase of qemuDomainSaveMemory. This patch also modifies virFileWrapperFdClose such that errors are only reported when the length of the err_msg buffer is 0. Before now, the existence of the buffer would trigger error reporting in virFileWrapperFdClose. Signed-off-by: Jason J. Herne jjhe...@us.ibm.com --- src/qemu/qemu_driver.c | 1 + src/util/virfile.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) Nice! Thanks for persevering on this one. Congrats on your first libvirt patch. diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index ecccf6c..8d78805 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -3015,6 +3015,7 @@ qemuDomainSaveMemory(virQEMUDriverPtr driver, cleanup: VIR_FORCE_CLOSE(fd); +virFileWrapperFdClose(wrapperFd); virFileWrapperFdFree(wrapperFd); VIR_FREE(xml); diff --git a/src/util/virfile.c b/src/util/virfile.c index 463064c..813b4f5 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -322,7 +322,7 @@ virFileWrapperFdClose(virFileWrapperFdPtr wfd) return 0; ret = virCommandWait(wfd-cmd, NULL); -if (wfd-err_msg) +if (wfd-err_msg strlen(wfd-err_msg)) Micro-optimization: more efficient when written: if (wfd-err_msg *wfd-err_msg) since then you don't have to waste time of crawling the entire string. ACK based on looking at the code, and I'll push with that modification once I've also tested it (or reply again if something turns up). -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process
On 07/16/2014 08:30 AM, Eric Blake wrote: On 07/16/2014 08:12 AM, Jason J. Herne wrote: From: Jason J. Herne jjhe...@us.ibm.com libvirt_iohelper is a helper process that is exec'ed and used to handle I/O during a Qemu managed save operation. Due to a missing call to virFileWrapperFdClose, all I/O error messages reported by iohelper are lost. This patch adds a call to virFileWrapperFdClose to the cleanup phase of qemuDomainSaveMemory. This patch also modifies virFileWrapperFdClose such that errors are only reported when the length of the err_msg buffer is 0. Before now, the existence of the buffer would trigger error reporting in virFileWrapperFdClose. Signed-off-by: Jason J. Herne jjhe...@us.ibm.com --- src/qemu/qemu_driver.c | 1 + src/util/virfile.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) Nice! Thanks for persevering on this one. Congrats on your first libvirt patch. Sadly, the more I look at this, the more I wonder how it can fix anything. diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index ecccf6c..8d78805 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -3015,6 +3015,7 @@ qemuDomainSaveMemory(virQEMUDriverPtr driver, cleanup: VIR_FORCE_CLOSE(fd); +virFileWrapperFdClose(wrapperFd); virFileWrapperFdFree(wrapperFd); VIR_FREE(xml); But qemuDomainSaveMemory() already has a call to virFileWrapperFdClose() earlier on; worse, the current implementation of virFileWrapperFdClose() is not designed to be called twice if an error occurred (rather, if there is an error, two calls will end up logging the error twice, when once would have been sufficient). How are you getting to a point where the cleanup label is reached without going through the earlier close? Is there some easy setup you used in testing this patch, so that I can reproduce a scenario where iohelper will reliably fail? -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process
On 07/16/2014 03:56 PM, Eric Blake wrote: On 07/16/2014 08:30 AM, Eric Blake wrote: On 07/16/2014 08:12 AM, Jason J. Herne wrote: From: Jason J. Herne jjhe...@us.ibm.com libvirt_iohelper is a helper process that is exec'ed and used to handle I/O during a Qemu managed save operation. Due to a missing call to virFileWrapperFdClose, all I/O error messages reported by iohelper are lost. +++ b/src/qemu/qemu_driver.c @@ -3015,6 +3015,7 @@ qemuDomainSaveMemory(virQEMUDriverPtr driver, cleanup: VIR_FORCE_CLOSE(fd); +virFileWrapperFdClose(wrapperFd); virFileWrapperFdFree(wrapperFd); VIR_FREE(xml); But qemuDomainSaveMemory() already has a call to virFileWrapperFdClose() earlier on; worse, the current implementation of virFileWrapperFdClose() is not designed to be called twice if an error occurred (rather, if there is an error, two calls will end up logging the error twice, when once would have been sufficient). How are you getting to a point where the cleanup label is reached without going through the earlier close? Is there some easy setup you used in testing this patch, so that I can reproduce a scenario where iohelper will reliably fail? Okay, looking into this a bit more: I made the following change: diff --git i/src/util/iohelper.c w/src/util/iohelper.c index 8a3c377..dfb45e1 100644 --- i/src/util/iohelper.c +++ w/src/util/iohelper.c @@ -307,6 +307,7 @@ main(int argc, char **argv) if (delete) unlink(path); +fprintf(stderr, _(goodbye world\n)); goto error; return 0; error: then tried to do a 'virsh save testvm1 /tmp/save'. Even without your patch, I get a very nice error: error: Failed to save domain testvm1 to /tmp/save error: internal error: Child process (LIBVIRT_LOG_OUTPUTS=1:stderr /home/eblake/libvirt/src/libvirt_iohelper /tmp/save 0 1) unexpected exit status 1: goodbye world /home/eblake/libvirt/src/libvirt_iohelper: unknown failure with /tmp/save But if I then rework the iohelper patch: diff --git i/src/util/iohelper.c w/src/util/iohelper.c index 8a3c377..efb1366 100644 --- i/src/util/iohelper.c +++ w/src/util/iohelper.c @@ -301,6 +301,7 @@ main(int argc, char **argv) exit(EXIT_FAILURE); } +fprintf(stderr, _(goodbye world\n)); goto error; if (fd 0 || runIO(path, fd, oflags, length) 0) goto error; the error is now: error: Failed to save domain testvm1 to /tmp/save error: operation failed: domain save job: unexpectedly failed So the problem is that we have _two_ possible sources of errors (qemu reporting failure because it can't write to iohelper, and iohelper reporting an error from whatever other reason, such as disk full). If qemu finishes, we have only the iohelper message and properly report it; but if we have both failures, then the qemu error takes priority, and in this case it is lower priority. There are also cases where qemu will error out but iohelper succeeds (such as if qemu refuses to migrate because the guest has hostdev passthrough). So I _think_ what we want to do is always check BOTH places for error; if only one of the two fails, use that message. If both fail, then I don't know whether it is possible to have a heuristic for which failure message is more meaningful, or whether it is better to report both errors (even though it will often be the case that one error was a knock-on effect from the other). But I'm a bit stuck on the best way to implement that. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Fix reporting of i/o errors by iohelper process
On 07/16/2014 05:20 PM, Eric Blake wrote: But if I then rework the iohelper patch: diff --git i/src/util/iohelper.c w/src/util/iohelper.c index 8a3c377..efb1366 100644 --- i/src/util/iohelper.c +++ w/src/util/iohelper.c @@ -301,6 +301,7 @@ main(int argc, char **argv) exit(EXIT_FAILURE); } +fprintf(stderr, _(goodbye world\n)); goto error; if (fd 0 || runIO(path, fd, oflags, length) 0) goto error; the error is now: error: Failed to save domain testvm1 to /tmp/save error: operation failed: domain save job: unexpectedly failed and with your patch, I see: error: Failed to save domain testvm1 to /tmp/save error: internal error: Child process (LIBVIRT_LOG_OUTPUTS=1:stderr /home/eblake/libvirt/src/libvirt_iohelper /tmp/save 0 1) unexpected exit status 1: goodbye world /home/eblake/libvirt/src/libvirt_iohelper: unknown failure with /tmp/save on the console, but this longer spew in libvirt's log: 2014-07-16 23:34:23.855+: 25406: error : qemuMigrationUpdateJobStatus:1788 : operation failed: domain save job: unexpectedly failed 2014-07-16 23:34:23.857+: 25406: error : virCommandWait:2423 : internal error: Child process (LIBVIRT_LOG_OUTPUTS=1:stderr /home/eblake/libvirt/src/libvirt_iohelper /tmp/save 0 1) unexpected exit status 1: goodbye world /home/eblake/libvirt/src/libvirt_iohelper: unknown failure with /tmp/save 2014-07-16 23:34:23.857+: 25406: warning : virFileWrapperFdClose:326 : iohelper reports: goodbye world /home/eblake/libvirt/src/libvirt_iohelper: unknown failure with /tmp/save so the act of closing the wrapperfd is losing the earlier error message from being reported to the user (seems okay in this case, but might not always be), AND logging the stderr contents twice (once via the error reported to the user, and again due to a VIR_WARN). So the problem is that we have _two_ possible sources of errors (qemu reporting failure because it can't write to iohelper, and iohelper reporting an error from whatever other reason, such as disk full). If qemu finishes, we have only the iohelper message and properly report it; but if we have both failures, then the qemu error takes priority, and in this case it is lower priority. There are also cases where qemu will error out but iohelper succeeds (such as if qemu refuses to migrate because the guest has hostdev passthrough). So I _think_ what we want to do is always check BOTH places for error; if only one of the two fails, use that message. If both fail, then I don't know whether it is possible to have a heuristic for which failure message is more meaningful, or whether it is better to report both errors (even though it will often be the case that one error was a knock-on effect from the other). But I'm a bit stuck on the best way to implement that. I'm still thinking about the best solution -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list