* Daniel P. Berrangé (berra...@redhat.com) wrote:
> On Tue, Feb 13, 2018 at 03:09:12PM +0000, Dr. David Alan Gilbert wrote:
> > * Thomas Huth (th...@redhat.com) wrote:
> > > We are currently facing some migration failure on s390x when running
> > > certain avocado tests, e.g. when running the test
> > > type_specific.io-github-autotest-qemu.migrate.with_reboot.exec.gzip_exec.
> > > This test is using 'migrate -d "exec:nc localhost 5200"' for the 
> > > migration.
> > > The problem is detected at the receiving side, where the migration stream
> > > apparently ends too early. However, the cause for the problem is the
> > > sending side: After writing the migration stream into the pipe to netcat,
> > > the source QEMU calls qio_channel_command_close() which closes the pipe
> > > and immediately (!) kills the child process afterwards. So if the
> > > sending netcat did not read the final bytes from the pipe yet, or
> > > if it did not manage to send out all its buffers yet, it is killed
> > > before the whole migration stream is passed to the destination side.
> > 
> > Thanks for tracking that down!
> > 
> > > To ease the situation at least a little bit, we should give the child
> > > process at least some few more time slices before we kill it with
> > > SIGTERM and then with SIGKILL. With this change, the avocado test now
> > > succeeds here in 10 out of 10 runs.
> > > 
> > > Signed-off-by: Thomas Huth <th...@redhat.com>
> > > ---
> > >  io/channel-command.c | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/io/channel-command.c b/io/channel-command.c
> > > index 319c5ed..f64db3e 100644
> > > --- a/io/channel-command.c
> > > +++ b/io/channel-command.c
> > > @@ -177,11 +177,11 @@ static int 
> > > qio_channel_command_abort(QIOChannelCommand *ioc,
> > >              return -1;
> > >          }
> > >      } else if (ret == 0) {
> > > -        if (step == 0) {
> > > +        if (step == 4) {
> > >              kill(ioc->pid, SIGTERM);
> > > -        } else if (step == 1) {
> > > +        } else if (step == 8) {
> > >              kill(ioc->pid, SIGKILL);
> > > -        } else {
> > > +        } else if (step >= 9) {
> > 
> > Hmm.  This seems pretty arbitrary; if I understand correctly you're
> > saying it'll get a SIGTERM after 4 (arbitrary) * 10ms (arbitrary).
> > 
> > Who is to say that's enough for a scp or gzip or the like?
> 
> We could conceivably implement the  qio_channel_shutdown() operation
> for the QIOChannelCommand class. It would merely close the FD to the
> child process, but leave it running. That would give it time to read
> any data still in the pipe from QEMU IIUC.

Yeh that's better; although when would we call shutdown or close on it?

Dave

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Reply via email to