On Fri, Jun 06, 2025 at 10:37:28AM -0500, JAEHOON KIM wrote:
> 
> On 6/6/2025 10:12 AM, Steven Sistare wrote:
> > On 6/6/2025 11:06 AM, JAEHOON KIM wrote:
> > > On 6/6/2025 9:14 AM, Steven Sistare wrote:
> > > > On 6/6/2025 9:53 AM, Daniel P. Berrangé wrote:
> > > > > On Thu, Jun 05, 2025 at 06:08:08PM -0500, Jaehoon Kim wrote:
> > > > > > When the source VM attempts to connect to the destination VM's Unix
> > > > > > domain socket(cpr.sock) during CPR transfer, the socket
> > > > > > file might not
> > > > > > yet be exist if the destination side hasn't completed the bind
> > > > > > operation. This can lead to connection failures when
> > > > > > running tests with
> > > > > > the qtest framework.
> > > > > 
> > > > > This sounds like a flawed test impl to me - whatever is initiating
> > > > > the cpr operation on the source has done so prematurely - it should
> > > > > ensure the dest is ready before starting the operation.
> > > > > 
> > > > > > To address this, add cpr_validate_socket_path(), which wait for the
> > > > > > socket file to appear. This avoids intermittent qtest
> > > > > > failures caused by
> > > > > > early connection attempts.
> > > > > 
> > > > > IMHO it is dubious to special case cpr in this way.
> > > > 
> > > > I agree with Daniel, and unfortunately it is not just a test issue;
> > > > every management framework that supports cpr-transfer must add logic to
> > > > wait for the cpr socket to appear in the target before proceeding.
> > > > 
> > > > This is analogous to waiting for the monitor socket to appear before
> > > > connecting to it.
> > > > 
> > > > - Steve
> > > 
> > > Thank you very much for your valuable review and feedback.
> > > 
> > > Just to clarify, the added cpr_validate_socket_path() function is
> > > not limited to the test framework.
> > > It is part of the actual CPR implementation and is intended to
> > > ensure correct behavior in all cases, including outside of tests.
> > > 
> > > I mentioned the qtest failure simply as an example where this issue
> > > became apparent.
> > 
> > Yes, I understand that you understand :)
> > Are you willing to move your fix to the qtest?
> > 
> > - Steve
> 
> Thank you for your question and feedback.
> 
> I agree that the issue could be addressed within the qtest framework to
> improve stability.
> 
> However, this socket readiness check is a fundamental part of CPR transfer
> process,
> and I believe that incorporating cpr_validate_socket_path() directly into
> the CPR implementation helps ensure more reliable behavior
> across all environments - not only during testing.
> 
> Just from my perspective, adding this logic to the CPR code does not
> introduce significant overhead or side effects.
> I would appreciate if you could share more details about your concerns, so I
> can better address them.

Requiring a busy wait like this is a sign of a design problem.

There needs to be a way to setup the incoming socket listener
without resorting to busy waiting - that's showing a lack of
synchronization.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to