* Peter Xu (pet...@redhat.com) wrote:
> On Thu, Sep 22, 2022 at 05:41:30PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (pet...@redhat.com) wrote:
> > > On Thu, Sep 22, 2022 at 03:49:38PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (pet...@redhat.com) wrote:
> > > > > When starting ram saving procedure (especially at the completion 
> > > > > phase),
> > > > > always set last_seen_block to non-NULL to make sure we can always 
> > > > > correctly
> > > > > detect the case where "we've migrated all the dirty pages".
> > > > > 
> > > > > Then we'll guarantee both last_seen_block and pss.block will be valid
> > > > > always before the loop starts.
> > > > > 
> > > > > See the comment in the code for some details.
> > > > > 
> > > > > Signed-off-by: Peter Xu <pet...@redhat.com>
> > > > 
> > > > Yeh I guess it can currently only happen during restart?
> > > 
> > > There're only two places to clear last_seen_block:
> > > 
> > > ram_state_reset[2683]          rs->last_seen_block = NULL;
> > > ram_postcopy_send_discard_bitmap[2876] rs->last_seen_block = NULL;
> > > 
> > > Where for the reset case:
> > > 
> > > ram_state_init[2994]           ram_state_reset(*rsp);
> > > ram_state_resume_prepare[3110] ram_state_reset(rs);
> > > ram_save_iterate[3271]         ram_state_reset(rs);
> > > 
> > > So I think it can at least happen in two places, either (1) postcopy just
> > > started (assume when postcopy starts accidentally when all dirty pages 
> > > were
> > > migrated?), or (2) postcopy recover from failure.
> > 
> > Oh, (1) is a more general problem then; yeh.
> > 
> > > In my case I triggered this deadloop when I was debugging the other bug
> > > fixed by the next patch where it was postcopy recovery (on tls), but only
> > > once..  So currently I'm still not 100% sure whether this is the same
> > > problem, but logically it could trigger.
> > > 
> > > I also remember I used to hit very rare deadloops before too, maybe 
> > > they're
> > > the same thing because I did test recovery a lot.
> > 
> > Note; 'deadlock' not 'deadloop'.
> 
> (Oops I somehow forgot there's still this series pending..)
> 
> Here it's not about a lock, or maybe I should add a space ("dead loop")?

So the normal phrases I'm used to are:
  'deadlock' - two threads waiting for each other
  'livelock' - two threads spinning for each other

Dave

> -- 
> Peter Xu
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


Reply via email to