Peter Xu <pet...@redhat.com> writes:

> On Thu, Jul 06, 2023 at 10:50:34AM -0300, Fabiano Rosas wrote:
>> Peter Xu <pet...@redhat.com> writes:
>> 
>> > On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote:
>> >> Peter Xu <pet...@redhat.com> writes:
>> >> 
>> >> > Provide an explicit reason for qemu_file_shutdown()s, which can be
>> >> > displayed in query-migrate when used.
>> >> >
>> >> 
>> >> Can we consider this to cover the TODO:
>> >> 
>> >>  * TODO: convert to propagate Error objects instead of squashing
>> >>  * to a fixed errno value
>> >> 
>> >> or would that need something fancier?
>> >
>> > The TODO seems to say we want to allow qemu_file_shutdown() to report an
>> > Error* when anything wrong happened (e.g. shutdown() failed)?  While this
>> > patch was trying to store a specific error string so when query migration
>> > later it'll show up to the user.  If so, IMHO they're two things.
>> >
>> 
>> Ok, just making sure.
>> 
>> >> 
>> >> > This will make e.g. migrate-pause to display explicit error 
>> >> > descriptions,
>> >> > from:
>> >> >
>> >> > "error-desc": "Channel error: Input/output error"
>> >> >
>> >> > To:
>> >> >
>> >> > "error-desc": "Channel is explicitly shutdown by the user"
>> >> >
>> >> > in query-migrate.
>> >> >
>> >> > Signed-off-by: Peter Xu <pet...@redhat.com>
>> >> > ---
>> >> >  migration/qemu-file.c | 5 ++++-
>> >> >  1 file changed, 4 insertions(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> >> > index 419b4092e7..ff605027de 100644
>> >> > --- a/migration/qemu-file.c
>> >> > +++ b/migration/qemu-file.c
>> >> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
>> >> >       *      --> guest crash!
>> >> >       */
>> >> >      if (!f->last_error) {
>> >> > -        qemu_file_set_error(f, -EIO);
>> >> > +        Error *err = NULL;
>> >> > +
>> >> > +        error_setg(&err, "Channel is explicitly shutdown by the user");
>> >> 
>> >> It is good that we can grep this message. However, I'm confused about
>> >> who the "user" is meant to be here and how are they implicated in this
>> >> error.
>> >
>> > Ah, here the user is who sends the "migrate-pause" command, according to
>> > the example of the commit message.
>> >
>> 
>> That's where I'm confused. There are 15 callsites for
>> qemu_file_shutdown(). Only 2 of them are from migrate-pause. So I'm
>> missing the logical step that links migrate-pause with this
>> error_setg().
>> Are you assuming that the race described will only happen
>> with migrate-pause and the other invocations would have set an error
>> already?
>
> It's not a race, but I think you're right. I thought it was always the case

I'm talking about the race with another thread checking f->last_error
and this thread setting it. Described in commit f5816b5c86ed
("migration: Fix race on qemu_file_shutdown()").

> to shut but actually not: we do shutdown() also in a few places where we
> don't really fail, either for COLO or for completion of migration.  With
> the 1st patch, it'll even show in query-migrate.  Thanks for spotting it -
> I could have done better.
>

The idea is that we avoid doing IO after the file has been shutdown, so
we preload this -EIO error. We could just alter the message to "Channel
has been explicitly shutdown" or "Tried to do IO after channel
shutdown". It would still be better than the generic EIO message.

But up to you.

Reply via email to