* Peter Xu (pet...@redhat.com) wrote: > On Thu, Sep 22, 2022 at 05:41:30PM +0100, Dr. David Alan Gilbert wrote: > > * Peter Xu (pet...@redhat.com) wrote: > > > On Thu, Sep 22, 2022 at 03:49:38PM +0100, Dr. David Alan Gilbert wrote: > > > > * Peter Xu (pet...@redhat.com) wrote: > > > > > When starting ram saving procedure (especially at the completion > > > > > phase), > > > > > always set last_seen_block to non-NULL to make sure we can always > > > > > correctly > > > > > detect the case where "we've migrated all the dirty pages". > > > > > > > > > > Then we'll guarantee both last_seen_block and pss.block will be valid > > > > > always before the loop starts. > > > > > > > > > > See the comment in the code for some details. > > > > > > > > > > Signed-off-by: Peter Xu <pet...@redhat.com> > > > > > > > > Yeh I guess it can currently only happen during restart? > > > > > > There're only two places to clear last_seen_block: > > > > > > ram_state_reset[2683] rs->last_seen_block = NULL; > > > ram_postcopy_send_discard_bitmap[2876] rs->last_seen_block = NULL; > > > > > > Where for the reset case: > > > > > > ram_state_init[2994] ram_state_reset(*rsp); > > > ram_state_resume_prepare[3110] ram_state_reset(rs); > > > ram_save_iterate[3271] ram_state_reset(rs); > > > > > > So I think it can at least happen in two places, either (1) postcopy just > > > started (assume when postcopy starts accidentally when all dirty pages > > > were > > > migrated?), or (2) postcopy recover from failure. > > > > Oh, (1) is a more general problem then; yeh. > > > > > In my case I triggered this deadloop when I was debugging the other bug > > > fixed by the next patch where it was postcopy recovery (on tls), but only > > > once.. So currently I'm still not 100% sure whether this is the same > > > problem, but logically it could trigger. > > > > > > I also remember I used to hit very rare deadloops before too, maybe > > > they're > > > the same thing because I did test recovery a lot. > > > > Note; 'deadlock' not 'deadloop'. > > (Oops I somehow forgot there's still this series pending..) > > Here it's not about a lock, or maybe I should add a space ("dead loop")?
So the normal phrases I'm used to are: 'deadlock' - two threads waiting for each other 'livelock' - two threads spinning for each other Dave > -- > Peter Xu > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK