On Fri, Jun 06, 2025 at 10:37:28AM -0500, JAEHOON KIM wrote: > > On 6/6/2025 10:12 AM, Steven Sistare wrote: > > On 6/6/2025 11:06 AM, JAEHOON KIM wrote: > > > On 6/6/2025 9:14 AM, Steven Sistare wrote: > > > > On 6/6/2025 9:53 AM, Daniel P. Berrangé wrote: > > > > > On Thu, Jun 05, 2025 at 06:08:08PM -0500, Jaehoon Kim wrote: > > > > > > When the source VM attempts to connect to the destination VM's Unix > > > > > > domain socket(cpr.sock) during CPR transfer, the socket > > > > > > file might not > > > > > > yet be exist if the destination side hasn't completed the bind > > > > > > operation. This can lead to connection failures when > > > > > > running tests with > > > > > > the qtest framework. > > > > > > > > > > This sounds like a flawed test impl to me - whatever is initiating > > > > > the cpr operation on the source has done so prematurely - it should > > > > > ensure the dest is ready before starting the operation. > > > > > > > > > > > To address this, add cpr_validate_socket_path(), which wait for the > > > > > > socket file to appear. This avoids intermittent qtest > > > > > > failures caused by > > > > > > early connection attempts. > > > > > > > > > > IMHO it is dubious to special case cpr in this way. > > > > > > > > I agree with Daniel, and unfortunately it is not just a test issue; > > > > every management framework that supports cpr-transfer must add logic to > > > > wait for the cpr socket to appear in the target before proceeding. > > > > > > > > This is analogous to waiting for the monitor socket to appear before > > > > connecting to it. > > > > > > > > - Steve > > > > > > Thank you very much for your valuable review and feedback. > > > > > > Just to clarify, the added cpr_validate_socket_path() function is > > > not limited to the test framework. > > > It is part of the actual CPR implementation and is intended to > > > ensure correct behavior in all cases, including outside of tests. > > > > > > I mentioned the qtest failure simply as an example where this issue > > > became apparent. > > > > Yes, I understand that you understand :) > > Are you willing to move your fix to the qtest? > > > > - Steve > > Thank you for your question and feedback. > > I agree that the issue could be addressed within the qtest framework to > improve stability. > > However, this socket readiness check is a fundamental part of CPR transfer > process, > and I believe that incorporating cpr_validate_socket_path() directly into > the CPR implementation helps ensure more reliable behavior > across all environments - not only during testing. > > Just from my perspective, adding this logic to the CPR code does not > introduce significant overhead or side effects. > I would appreciate if you could share more details about your concerns, so I > can better address them.
Requiring a busy wait like this is a sign of a design problem. There needs to be a way to setup the incoming socket listener without resorting to busy waiting - that's showing a lack of synchronization. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|