On Tue, Sep 16, 2025 at 04:01:57PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 16.09.25 11:28, Daniel P. Berrangé wrote:
> > On Mon, Sep 15, 2025 at 04:18:58PM -0400, Peter Xu wrote:
> > > On Mon, Sep 15, 2025 at 10:30:57PM +0300, Vladimir Sementsov-Ogievskiy 
> > > wrote:
> > > > qemu_file_set_blocking() is a wrapper on qio_channel_set_blocking(),
> > > > so let's passthrough the errp.
> > > This looks all reasonable in general.
> > > 
> > > Said that, using error_abort in migration code normally are not suggested
> > > because it's too strong.
> > Note, that prior to this series, the existing qemu_socket_set_nonblock
> > method that migration is calling will assert on failure. This series
> > removes that assert and propagates it back to the callers to let them
> > decide what to do. Ideally they would gracefully handle it, but if
> > they assert that is no worse than current behaviour.
> > 
> 
> In details, prior to series:
> 
> posix + set_nonblock -> crash on failure
> 
> other variants (posix/win32 + set_block, win32 + set_nonblock) -> ignore 
> failure

Correct, but IIUC that's for sockets only.

Major channel types that migration cares the most should also include file
now.  qio_channel_file_set_blocking() also doesn't assert but return a
failure.

> 
> > > I did check all of below should be on the incoming side which is not as
> > > severe (because killing dest qemu before switchover is normally
> > > benign). Still, can we switch all below users to error_warn (including the
> > > one below that may want to error_report_err(), IMHO a warn report is fine
> > > even for such error)?
> > IMHO ignoring a failure to change the blocking flag status is not
> > a warnnig, it is unrecoverable for the migration operation. It
> > should be possible to propagate the error in some way, but it will
> > potentially require changes across multiple migration methods to
> > handle this.

In most cases I agree.  But still, using error_abort doesn't mean to fail
migration, but to crash the VM.  We still at least doesn't want to do it on
src..

Meanwhile, this could violate things like newly introduced exit-on-error,
but I agree we used to ignore those, so even if it fails before and didn't
crash, we could have ignored those errors.. and not reportable to libvirt.

The ideal way to do is to always fail either src/dst when set blocking
failed for sure, but yes, it's slightly involved on some paths this patch
touched.

So.. I think we can go with this patch, with a sincere wish that it'll
simply almost never fail.  But then, let's mention that in the commit
message, (1) this patch only asserts on the dest qemu and only before
switchover (hence src can still fallback), never src, (2) state the facts
that it so far is a slight violation to exit-on-error, but it's extremely
unlikely to happen anyway (NOTE: this is not a programming error that
normal assertions would do, so it falls into exit-on-error category).

Thanks,

-- 
Peter Xu


Reply via email to