On Thu, Jul 06, 2023 at 02:33:42PM -0300, Fabiano Rosas wrote:
> Peter Xu <pet...@redhat.com> writes:
> 
> > On Thu, Jul 06, 2023 at 10:50:34AM -0300, Fabiano Rosas wrote:
> >> Peter Xu <pet...@redhat.com> writes:
> >> 
> >> > On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote:
> >> >> Peter Xu <pet...@redhat.com> writes:
> >> >> 
> >> >> > Provide an explicit reason for qemu_file_shutdown()s, which can be
> >> >> > displayed in query-migrate when used.
> >> >> >
> >> >> 
> >> >> Can we consider this to cover the TODO:
> >> >> 
> >> >>  * TODO: convert to propagate Error objects instead of squashing
> >> >>  * to a fixed errno value
> >> >> 
> >> >> or would that need something fancier?
> >> >
> >> > The TODO seems to say we want to allow qemu_file_shutdown() to report an
> >> > Error* when anything wrong happened (e.g. shutdown() failed)?  While this
> >> > patch was trying to store a specific error string so when query migration
> >> > later it'll show up to the user.  If so, IMHO they're two things.
> >> >
> >> 
> >> Ok, just making sure.
> >> 
> >> >> 
> >> >> > This will make e.g. migrate-pause to display explicit error 
> >> >> > descriptions,
> >> >> > from:
> >> >> >
> >> >> > "error-desc": "Channel error: Input/output error"
> >> >> >
> >> >> > To:
> >> >> >
> >> >> > "error-desc": "Channel is explicitly shutdown by the user"
> >> >> >
> >> >> > in query-migrate.
> >> >> >
> >> >> > Signed-off-by: Peter Xu <pet...@redhat.com>
> >> >> > ---
> >> >> >  migration/qemu-file.c | 5 ++++-
> >> >> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >> >> >
> >> >> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> >> >> > index 419b4092e7..ff605027de 100644
> >> >> > --- a/migration/qemu-file.c
> >> >> > +++ b/migration/qemu-file.c
> >> >> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
> >> >> >       *      --> guest crash!
> >> >> >       */
> >> >> >      if (!f->last_error) {
> >> >> > -        qemu_file_set_error(f, -EIO);
> >> >> > +        Error *err = NULL;
> >> >> > +
> >> >> > +        error_setg(&err, "Channel is explicitly shutdown by the 
> >> >> > user");
> >> >> 
> >> >> It is good that we can grep this message. However, I'm confused about
> >> >> who the "user" is meant to be here and how are they implicated in this
> >> >> error.
> >> >
> >> > Ah, here the user is who sends the "migrate-pause" command, according to
> >> > the example of the commit message.
> >> >
> >> 
> >> That's where I'm confused. There are 15 callsites for
> >> qemu_file_shutdown(). Only 2 of them are from migrate-pause. So I'm
> >> missing the logical step that links migrate-pause with this
> >> error_setg().
> >> Are you assuming that the race described will only happen
> >> with migrate-pause and the other invocations would have set an error
> >> already?
> >
> > It's not a race, but I think you're right. I thought it was always the case
> 
> I'm talking about the race with another thread checking f->last_error
> and this thread setting it. Described in commit f5816b5c86ed
> ("migration: Fix race on qemu_file_shutdown()").

I don't yet catch your point, sorry.  I thought f5816b5c86ed closed that
race.  What's still missing?

> 
> > to shut but actually not: we do shutdown() also in a few places where we
> > don't really fail, either for COLO or for completion of migration.  With
> > the 1st patch, it'll even show in query-migrate.  Thanks for spotting it -
> > I could have done better.
> >
> 
> The idea is that we avoid doing IO after the file has been shutdown, so
> we preload this -EIO error. We could just alter the message to "Channel
> has been explicitly shutdown" or "Tried to do IO after channel
> shutdown". It would still be better than the generic EIO message.

My point is I'm afraid (I thought after you pointed out, but maybe I just
misread what you said..) we'll call qemu_file_shutdown() even in normal
paths, so we can see an error poped up in query-migrate even if nothing
wrong happened. I think that's unwanted.

We can still improve that msg by only setting that specific error in e.g.
qmp_migrate_pause|cancel() or paths where we know we want to set the error,
but I'd rather drop the patch first so the rest patches can be reviewed and
merged first; that'll be a cosmetic change.

-- 
Peter Xu


Reply via email to