Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-21 Thread Ben Taylor

 Daniel P. Berrange [EMAIL PROTECTED] wrote: 
 On Wed, Feb 20, 2008 at 03:53:46PM +, Ian Jackson wrote:
 Content-Description: message body text
  bdrv_flush is declared to return void, but this is wrong because it
  means that the implementations have nowhere to report their errors.
  Indeed, the implementations generally ignore errors.
  
  This patch corrects this by making it return int (implicitly, either 0
  or -errno, as for other similar functions).  All of the
  implementations and callers are adjusted too.
  
  
  While looking at this I was surprised to see this:
  
  static void scsi_write_complete(void * opaque, int ret)
  ...
  if (ret) {
  fprintf(stderr, scsi-disc: IO write error\n);
  exit(1);
  }
  
  Surely that is overkill ?
 
 Yes, any i/o errors I'd expect to be propagated back up to the guest
 OS as the most appropriate IDE / SCSI error code.
 
  Also, in block-raw-posix.c, raw_pwrite et al seem to return -1 on
  error (the return value from write) whereas the other block read/write
  methods return errno values.  This is a mistake, surely ?  -1 would be
  -EPERM.  If any of the callers did anything with these return values
  you'd get incorrect error indications.
  
  
  Finally, it would perhaps be best if the block device emulators wrote
  to the qemu console to complain if they give write errors.  Otherwise
  the errno value and other important information will be lost, which
  makes debugging hard.
 
 If by 'qemu console' you mean stderr, then fine, but please don't
 spew log messages to the  monitor console, because that'll make it
 very hard to interact with reliably from management tools. 

Would it make sense to have a log messages screen associated with
the monitor (like Ctrl-Alt-7) to deal with those sorts of things?

Ben




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-21 Thread Daniel P. Berrange
On Thu, Feb 21, 2008 at 12:19:22PM -0500, Ben Taylor wrote:
 
  Daniel P. Berrange [EMAIL PROTECTED] wrote: 
  On Wed, Feb 20, 2008 at 03:53:46PM +, Ian Jackson wrote:
  Content-Description: message body text
   bdrv_flush is declared to return void, but this is wrong because it
   means that the implementations have nowhere to report their errors.
   Indeed, the implementations generally ignore errors.
   
   This patch corrects this by making it return int (implicitly, either 0
   or -errno, as for other similar functions).  All of the
   implementations and callers are adjusted too.
   
   
   While looking at this I was surprised to see this:
   
   static void scsi_write_complete(void * opaque, int ret)
   ...
 if (ret) {
 fprintf(stderr, scsi-disc: IO write error\n);
 exit(1);
 }
   
   Surely that is overkill ?
  
  Yes, any i/o errors I'd expect to be propagated back up to the guest
  OS as the most appropriate IDE / SCSI error code.
  
   Also, in block-raw-posix.c, raw_pwrite et al seem to return -1 on
   error (the return value from write) whereas the other block read/write
   methods return errno values.  This is a mistake, surely ?  -1 would be
   -EPERM.  If any of the callers did anything with these return values
   you'd get incorrect error indications.
   
   
   Finally, it would perhaps be best if the block device emulators wrote
   to the qemu console to complain if they give write errors.  Otherwise
   the errno value and other important information will be lost, which
   makes debugging hard.
  
  If by 'qemu console' you mean stderr, then fine, but please don't
  spew log messages to the  monitor console, because that'll make it
  very hard to interact with reliably from management tools. 
 
 Would it make sense to have a log messages screen associated with
 the monitor (like Ctrl-Alt-7) to deal with those sorts of things?

Why invent a new special QEMU log screen, when stderr works just fine. If an
app wants to capture log messages they just capture stderr and persist it.
We already capture stderr for exactly this reason in libvirt when managing
QEMU instances.

Dan,
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-   Perl modules: http://search.cpan.org/~danberr/  -=|
|=-   Projects: http://freshmeat.net/~danielpb/   -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-21 Thread risc
On Thu, Feb 21, 2008 at 05:24:10PM +, Daniel P. Berrange wrote:
 On Thu, Feb 21, 2008 at 12:19:22PM -0500, Ben Taylor wrote:
  
   Daniel P. Berrange [EMAIL PROTECTED] wrote: 
   On Wed, Feb 20, 2008 at 03:53:46PM +, Ian Jackson wrote:
   Content-Description: message body text
bdrv_flush is declared to return void, but this is wrong because it
means that the implementations have nowhere to report their errors.
Indeed, the implementations generally ignore errors.

This patch corrects this by making it return int (implicitly, either 0
or -errno, as for other similar functions).  All of the
implementations and callers are adjusted too.


While looking at this I was surprised to see this:

static void scsi_write_complete(void * opaque, int ret)
...
if (ret) {
fprintf(stderr, scsi-disc: IO write error\n);
exit(1);
}

Surely that is overkill ?
   
   Yes, any i/o errors I'd expect to be propagated back up to the guest
   OS as the most appropriate IDE / SCSI error code.
   
Also, in block-raw-posix.c, raw_pwrite et al seem to return -1 on
error (the return value from write) whereas the other block read/write
methods return errno values.  This is a mistake, surely ?  -1 would be
-EPERM.  If any of the callers did anything with these return values
you'd get incorrect error indications.


Finally, it would perhaps be best if the block device emulators wrote
to the qemu console to complain if they give write errors.  Otherwise
the errno value and other important information will be lost, which
makes debugging hard.
   
   If by 'qemu console' you mean stderr, then fine, but please don't
   spew log messages to the  monitor console, because that'll make it
   very hard to interact with reliably from management tools. 
  
  Would it make sense to have a log messages screen associated with
  the monitor (like Ctrl-Alt-7) to deal with those sorts of things?
 
 Why invent a new special QEMU log screen, when stderr works just fine. If an
 app wants to capture log messages they just capture stderr and persist it.
 We already capture stderr for exactly this reason in libvirt when managing
 QEMU instances.
 
 Dan,
 -- 
 |=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
 |=-   Perl modules: http://search.cpan.org/~danberr/  -=|
 |=-   Projects: http://freshmeat.net/~danielpb/   -=|
 |=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 
 

Please excuse my intrusion in this thread, but I'm a user of the new
ncurses user interface. when ssh'd in, running qemu, I don't believe 
having messages pop out of stderr and over the current screen contents is 
the appropriate behavior, as it sounds to me like it would cause redraw 
defects in the normal text console (via ncurses)

Julia Longtin [EMAIL PROTECTED]





Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-21 Thread Daniel P. Berrange
On Thu, Feb 21, 2008 at 11:28:23AM -0600, [EMAIL PROTECTED] wrote:
 On Thu, Feb 21, 2008 at 05:24:10PM +, Daniel P. Berrange wrote:
  On Thu, Feb 21, 2008 at 12:19:22PM -0500, Ben Taylor wrote:
 Also, in block-raw-posix.c, raw_pwrite et al seem to return -1 on
 error (the return value from write) whereas the other block read/write
 methods return errno values.  This is a mistake, surely ?  -1 would be
 -EPERM.  If any of the callers did anything with these return values
 you'd get incorrect error indications.
 
 
 Finally, it would perhaps be best if the block device emulators wrote
 to the qemu console to complain if they give write errors.  Otherwise
 the errno value and other important information will be lost, which
 makes debugging hard.

If by 'qemu console' you mean stderr, then fine, but please don't
spew log messages to the  monitor console, because that'll make it
very hard to interact with reliably from management tools. 
   
   Would it make sense to have a log messages screen associated with
   the monitor (like Ctrl-Alt-7) to deal with those sorts of things?
  
  Why invent a new special QEMU log screen, when stderr works just fine. If an
  app wants to capture log messages they just capture stderr and persist it.
  We already capture stderr for exactly this reason in libvirt when managing
  QEMU instances.
 
 Please excuse my intrusion in this thread, but I'm a user of the new
 ncurses user interface. when ssh'd in, running qemu, I don't believe 
 having messages pop out of stderr and over the current screen contents is 
 the appropriate behavior, as it sounds to me like it would cause redraw 
 defects in the normal text console (via ncurses)

So just redirect stderr to a logfile, or /dev/null, etc, etc

Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-   Perl modules: http://search.cpan.org/~danberr/  -=|
|=-   Projects: http://freshmeat.net/~danielpb/   -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Daniel P. Berrange
On Wed, Feb 20, 2008 at 03:53:46PM +, Ian Jackson wrote:
Content-Description: message body text
 bdrv_flush is declared to return void, but this is wrong because it
 means that the implementations have nowhere to report their errors.
 Indeed, the implementations generally ignore errors.
 
 This patch corrects this by making it return int (implicitly, either 0
 or -errno, as for other similar functions).  All of the
 implementations and callers are adjusted too.
 
 
 While looking at this I was surprised to see this:
 
 static void scsi_write_complete(void * opaque, int ret)
 ...
   if (ret) {
   fprintf(stderr, scsi-disc: IO write error\n);
   exit(1);
   }
 
 Surely that is overkill ?

Yes, any i/o errors I'd expect to be propagated back up to the guest
OS as the most appropriate IDE / SCSI error code.

 Also, in block-raw-posix.c, raw_pwrite et al seem to return -1 on
 error (the return value from write) whereas the other block read/write
 methods return errno values.  This is a mistake, surely ?  -1 would be
 -EPERM.  If any of the callers did anything with these return values
 you'd get incorrect error indications.
 
 
 Finally, it would perhaps be best if the block device emulators wrote
 to the qemu console to complain if they give write errors.  Otherwise
 the errno value and other important information will be lost, which
 makes debugging hard.

If by 'qemu console' you mean stderr, then fine, but please don't
spew log messages to the  monitor console, because that'll make it
very hard to interact with reliably from management tools. 

Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-   Perl modules: http://search.cpan.org/~danberr/  -=|
|=-   Projects: http://freshmeat.net/~danielpb/   -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Ian Jackson
Daniel P. Berrange writes (Re: [Qemu-devel] [PATCH] bdrv_flush error 
handling):
 On Wed, Feb 20, 2008 at 03:53:46PM +, Ian Jackson wrote:
  Finally, it would perhaps be best if the block device emulators wrote
  to the qemu console to complain if they give write errors.  Otherwise
  the errno value and other important information will be lost, which
  makes debugging hard.
 
 If by 'qemu console' you mean stderr, then fine, but please don't
 spew log messages to the  monitor console, because that'll make it
 very hard to interact with reliably from management tools. 

Is it permitted, then, to generally print to qemu's stderr from inside
a device model ?  This seems to be done quite a bit during
initialisation, and quite a bit in various rather less common kinds of
device, but not routinely in the main emulations.

Perhaps it would be better to invent a `logprintf' function to make it
easier to redirect these kind of messages later if that turns out to
be desirable ?

Ian.




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Paul Brook
  Finally, it would perhaps be best if the block device emulators wrote
  to the qemu console to complain if they give write errors.  Otherwise
  the errno value and other important information will be lost, which
  makes debugging hard.

 If by 'qemu console' you mean stderr, then fine, but please don't
 spew log messages to the  monitor console, because that'll make it
 very hard to interact with reliably from management tools.

Actually I think a better default would be for qemu to die right there and 
then.  If the host is getting IO errors then something is badly wrong, and 
you're probably screwed anyway.

Paul




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Avi Kivity

Paul Brook wrote:

Finally, it would perhaps be best if the block device emulators wrote
to the qemu console to complain if they give write errors.  Otherwise
the errno value and other important information will be lost, which
makes debugging hard.
  

If by 'qemu console' you mean stderr, then fine, but please don't
spew log messages to the  monitor console, because that'll make it
very hard to interact with reliably from management tools.



Actually I think a better default would be for qemu to die right there and 
then.  If the host is getting IO errors then something is badly wrong, and 
you're probably screwed anyway.
  


If you're passing through a real disk or volume, this denies the guest 
the possibility of recover y (for example, if it is running a RAID setup 
over multiple volumes, or if you are testing the guest's ability to 
recover from errors).


For non-raw formats, you can pass through errors on data, but it is 
impossible to recover on metadata errors, so dying on I/O error should 
be fine.


--
Any sufficiently difficult bug is indistinguishable from a feature.





Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Ian Jackson
Avi Kivity writes (Re: [Qemu-devel] [PATCH] bdrv_flush error handling):
 For non-raw formats, you can pass through errors on data, but it is 
 impossible to recover on metadata errors, so dying on I/O error should 
 be fine.

You mean metadata write errors I assume.  I don't see why a metadata
read error is any worse than any other read error.

The guest will often prefer to be told that the volume was broken and
then to be denied the ability to continue accessing it.  Who knows
what the guest thinks of the device ?  Perhaps it's an unimportant
peripheral, or simulated network block device, used only for data
interchange.

Write errors for non-raw formats can easily be caused by a disk full
situation on the host.  Killing the guest hard is unfriendly for that
situation.

Ian.




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Ian Jackson
Paul Brook writes (Re: [Qemu-devel] [PATCH] bdrv_flush error handling):
 Disk full is a fundamentally unfriendly situation to be in. There is no good 
 answer. Reporting errors back to the host has its own set of problems. Many 
 guest OS effectively just lock up when this occurs.

I think that's fine, surely ?  A locked up guest isn't very nice but
it's better than a guest shot dead.

Ian.




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Paul Brook
 Write errors for non-raw formats can easily be caused by a disk full
 situation on the host.  Killing the guest hard is unfriendly for that
 situation.

Disk full is a fundamentally unfriendly situation to be in. There is no good 
answer. Reporting errors back to the host has its own set of problems. Many 
guest OS effectively just lock up when this occurs.

Paul




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Paul Brook
 Is savevm-upon-disk-full a realistic prospect?

Not particularly useful. If you're run out of disk space, chances are that 
savevm will also fail.

Paul




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Jamie Lokier
Ian Jackson wrote:
 Paul Brook writes (Re: [Qemu-devel] [PATCH] bdrv_flush error handling):
  Disk full is a fundamentally unfriendly situation to be in. There is no 
  good 
  answer. Reporting errors back to the host has its own set of problems. Many 
  guest OS effectively just lock up when this occurs.
 
 I think that's fine, surely ?  A locked up guest isn't very nice but
 it's better than a guest shot dead.

Well, a guest which receives an IDE write error might do things like
mark parts of the device bad, to avoiding writing to those parts.  If
the guest is running software RAID, for example, it will radically
change its behaviour in response to those errors.

Sometimes that's what you want, but sometimes it is really unwanted.
If the host runs out of disk space, ideally you might want to suspend
the guest until you can free up host disk space (or move to another
host), then resume the guest, perhaps manually.

Is savevm-upon-disk-full a realistic prospect?

-- Jamie




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Jamie Lokier
Paul Brook wrote:
  Is savevm-upon-disk-full a realistic prospect?
 
 Not particularly useful. If you're run out of disk space, chances are that 
 savevm will also fail.

I'm imagining (a) that the savevm space is preallocated, or (b) is on
a different disk.

-- Jamie




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Daniel P. Berrange
On Wed, Feb 20, 2008 at 04:31:56PM +, Jamie Lokier wrote:
 Ian Jackson wrote:
  Paul Brook writes (Re: [Qemu-devel] [PATCH] bdrv_flush error handling):
   Disk full is a fundamentally unfriendly situation to be in. There is no 
   good 
   answer. Reporting errors back to the host has its own set of problems. 
   Many 
   guest OS effectively just lock up when this occurs.
  
  I think that's fine, surely ?  A locked up guest isn't very nice but
  it's better than a guest shot dead.
 
 Well, a guest which receives an IDE write error might do things like
 mark parts of the device bad, to avoiding writing to those parts.  If
 the guest is running software RAID, for example, it will radically
 change its behaviour in response to those errors.
 
 Sometimes that's what you want, but sometimes it is really unwanted.
 If the host runs out of disk space, ideally you might want to suspend
 the guest until you can free up host disk space (or move to another
 host), then resume the guest, perhaps manually.
 
 Is savevm-upon-disk-full a realistic prospect?

In the 'out of disk space' scenario you wouldn't need to save the guest - merely
stop its CPU. This gives the host admin the opportunity to hot-add storage
to the host  then resume execution, or to kill the VM, or to free enough
space to save the VM to disk / live migrate it to another host. 

Shooting it dead on any I/O error doesn't give the host admin any choices
at all

Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-   Perl modules: http://search.cpan.org/~danberr/  -=|
|=-   Projects: http://freshmeat.net/~danielpb/   -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Anthony Liguori

Daniel P. Berrange wrote:

On Wed, Feb 20, 2008 at 04:31:56PM +, Jamie Lokier wrote:
  

Ian Jackson wrote:


Paul Brook writes (Re: [Qemu-devel] [PATCH] bdrv_flush error handling):
  
Disk full is a fundamentally unfriendly situation to be in. There is no good 
answer. Reporting errors back to the host has its own set of problems. Many 
guest OS effectively just lock up when this occurs.


I think that's fine, surely ?  A locked up guest isn't very nice but
it's better than a guest shot dead.
  

Well, a guest which receives an IDE write error might do things like
mark parts of the device bad, to avoiding writing to those parts.  If
the guest is running software RAID, for example, it will radically
change its behaviour in response to those errors.

Sometimes that's what you want, but sometimes it is really unwanted.
If the host runs out of disk space, ideally you might want to suspend
the guest until you can free up host disk space (or move to another
host), then resume the guest, perhaps manually.

Is savevm-upon-disk-full a realistic prospect?



In the 'out of disk space' scenario you wouldn't need to save the guest - merely
stop its CPU. This gives the host admin the opportunity to hot-add storage
to the host  then resume execution, or to kill the VM, or to free enough
space to save the VM to disk / live migrate it to another host. 
  


I agree.  Stopping the CPUs and spitting out a big fat warning message 
would be the best thing to do.  For the average user, this would give 
the opportunity to free up some space if possible.


Regards,

Anthony Liguori


Shooting it dead on any I/O error doesn't give the host admin any choices
at all

Dan.
  






Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Rick Vernam
On Wednesday 20 February 2008 01:01:33 pm Anthony Liguori wrote:
 Daniel P. Berrange wrote:
  On Wed, Feb 20, 2008 at 04:31:56PM +, Jamie Lokier wrote:
  Ian Jackson wrote:
  Paul Brook writes (Re: [Qemu-devel] [PATCH] bdrv_flush error 
handling):
  Disk full is a fundamentally unfriendly situation to be in. There is
  no good answer. Reporting errors back to the host has its own set of
  problems. Many guest OS effectively just lock up when this occurs.
 
  I think that's fine, surely ?  A locked up guest isn't very nice but
  it's better than a guest shot dead.
 
  Well, a guest which receives an IDE write error might do things like
  mark parts of the device bad, to avoiding writing to those parts.  If
  the guest is running software RAID, for example, it will radically
  change its behaviour in response to those errors.
 
  Sometimes that's what you want, but sometimes it is really unwanted.
  If the host runs out of disk space, ideally you might want to suspend
  the guest until you can free up host disk space (or move to another
  host), then resume the guest, perhaps manually.
 
  Is savevm-upon-disk-full a realistic prospect?
 
  In the 'out of disk space' scenario you wouldn't need to save the guest -
  merely stop its CPU. This gives the host admin the opportunity to hot-add
  storage to the host  then resume execution, or to kill the VM, or to
  free enough space to save the VM to disk / live migrate it to another
  host.

 I agree.  Stopping the CPUs and spitting out a big fat warning message
 would be the best thing to do.  For the average user, this would give
 the opportunity to free up some space if possible.

agreed.


 Regards,

 Anthony Liguori

  Shooting it dead on any I/O error doesn't give the host admin any choices
  at all
 
  Dan.






Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Thomas Irlet
Sometimes the guest intentionally seeks the error. Example?

TrueCrypt 5.0 supports encryption of the full system disk. To get the real
size of the disk, the truecrypt driver queries the number of blocks of the
disk, but then tries to read after the last block, until it gets an error.
Qemu hangs in this operation. So, for me, the blockdriver has to give the
errors back to the guest (in this case, reading behind eof). In no cases,
qemu should die because of that.


Tom

PS: I had to patch truecrypt because of this qemu feature.