Am 04.11.2010 14:18, Anders Blomdell wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Am 04.11.2010 10:26, Jan Kiszka wrote: >>>> Am 04.11.2010 10:16, Gilles Chanteperdrix wrote: >>>>> Jan Kiszka wrote: >>>>>> Take a step back and look at the root cause for this issue again. >>>>>> Unlocked >>>>>> >>>>>> if need-resched >>>>>> __xnpod_schedule >>>>>> >>>>>> is inherently racy and will always be (not only for the remote >>>>>> reschedule case BTW). >>>>> Ok, let us examine what may happen with this code if we only set the >>>>> XNRESCHED bit on the local cpu. First, other bits than XNRESCHED do not >>>>> matter, because they can not change under our feet. So, we have two >>>>> cases for this race: >>>>> 1- we see the XNRESCHED bit, but it has been cleared once nklock is >>>>> locked in __xnpod_schedule. >>>>> 2- we do not see the XNRESCHED bit, but it get set right after we test it. >>>>> >>>>> 1 is not a problem. >>>> Yes, as long as we remove the debug check from the scheduler code (or >>>> fix it somehow). The scheduling code already catches this race. >>>> >>>>> 2 is not a problem, because anything which sets the XNRESCHED (it may >>>>> only be an interrupt in fact) bit will cause xnpod_schedule to be called >>>>> right after that. >>>>> >>>>> So no, no race here provided that we only set the XNRESCHED bit on the >>>>> local cpu. >>>>> >>>>> So we either have to accept this and remove the >>>>>> debugging check from the scheduler or push the check back to >>>>>> __xnpod_schedule where it once came from. When this it cleaned up, we >>>>>> can look into the remote resched protocol again. >>>>> The problem of the debug check is that it checks whether the scheduler >>>>> state is modified without the XNRESCHED bit being set. And this is the >>>>> problem, because yes, in that case, we have a race: the scheduler state >>>>> may be modified before the XNRESCHED bit is set by an IPI. >>>>> >>>>> If we want to fix the debug check, we have to have a special bit, on in >>>>> the sched->status flag, only for the purpose of debugging. Or remove the >>>>> debug check. >>>> Exactly my point. Is there any benefit in keeping the debug check? The >>>> code to make it work may end up as "complex" as the logic it verifies, >>>> at least that's my current feeling. >>>> >>> This would be the radical approach of removing the check (and cleaning >>> up some bits). If it's acceptable, I would split it up properly. >> >> This debug check saved our asses when debugging SMP issues, and I >> suspect it may help debugging skin issues. So, I think we should try and >> keep it. >> >> >> At first sight, here you are more breaking things than cleaning them. > Still, it has the SMP record for my test program, still runs with ftrace > on (after 2 hours, where it previously failed after maximum 23 minutes).
My version was indeed still buggy, I'm reworking it ATM. > > If I get the gist of Jan's changes, they are (using the IPI to transfer > one bit of information: your cpu needs to reschedule): > > xnsched_set_resched: > - setbits((__sched__)->status, XNRESCHED); > > xnpod_schedule_handler: > + xnsched_set_resched(sched); > > If you (we?) decide to keep the debug checks, under what circumstances > would the current check trigger (in laymans language, that I'll be able > to understand)? That's actually what /me is wondering as well. I do not see yet how you can reliably detect a missed reschedule reliably (that was the purpose of the debug check) given the racy nature between signaling resched and processing the resched hints. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux _______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core