Am 06.11.2010 23:49, Philippe Gerum wrote:
> On Sat, 2010-11-06 at 21:37 +0100, Gilles Chanteperdrix wrote:
>> Anders Blomdell wrote:
>>> Gilles Chanteperdrix wrote:
>>>> Anders Blomdell wrote:
>>>>> Gilles Chanteperdrix wrote:
>>>>>> Jan Kiszka wrote:
>>>>>>> Am 05.11.2010 00:24, Gilles Chanteperdrix wrote:
>>>>>>>> Jan Kiszka wrote:
>>>>>>>>> Am 04.11.2010 23:06, Gilles Chanteperdrix wrote:
>>>>>>>>>> Jan Kiszka wrote:
>>>>>>>>>>>>> At first sight, here you are more breaking things than cleaning
>>>>>>>>>>>>> them.
>>>>>>>>>>>> Still, it has the SMP record for my test program, still runs with
>>>>>>>>>>>> ftrace
>>>>>>>>>>>> on (after 2 hours, where it previously failed after maximum 23
>>>>>>>>>>>> minutes).
>>>>>>>>>>> My version was indeed still buggy, I'm reworking it ATM.
>>>>>>>>>>>
>>>>>>>>>>>> If I get the gist of Jan's changes, they are (using the IPI to
>>>>>>>>>>>> transfer
>>>>>>>>>>>> one bit of information: your cpu needs to reschedule):
>>>>>>>>>>>>
>>>>>>>>>>>> xnsched_set_resched:
>>>>>>>>>>>> - setbits((__sched__)->status, XNRESCHED);
>>>>>>>>>>>>
>>>>>>>>>>>> xnpod_schedule_handler:
>>>>>>>>>>>> + xnsched_set_resched(sched);
>>>>>>>>>>>>
>>>>>>>>>>>> If you (we?) decide to keep the debug checks, under what
>>>>>>>>>>>> circumstances
>>>>>>>>>>>> would the current check trigger (in laymans language, that I'll be
>>>>>>>>>>>> able
>>>>>>>>>>>> to understand)?
>>>>>>>>>>> That's actually what /me is wondering as well. I do not see yet how
>>>>>>>>>>> you
>>>>>>>>>>> can reliably detect a missed reschedule reliably (that was the
>>>>>>>>>>> purpose
>>>>>>>>>>> of the debug check) given the racy nature between signaling resched
>>>>>>>>>>> and
>>>>>>>>>>> processing the resched hints.
>>>>>>>>>> The purpose of the debugging change is to detect a change of the
>>>>>>>>>> scheduler state which was not followed by setting the XNRESCHED bit.
>>>>>>>>> But that is nucleus business, nothing skins can screw up (as long as
>>>>>>>>> they do not misuse APIs).
>>>>>>>> Yes, but it happens that we modify the nucleus from time to time.
>>>>>>>>
>>>>>>>>>> Getting it to work is relatively simple: we add a "scheduler change
>>>>>>>>>> set
>>>>>>>>>> remotely" bit to the sched structure which is NOT in the status bit,
>>>>>>>>>> set
>>>>>>>>>> this bit when changing a remote sched (under nklock). In the debug
>>>>>>>>>> check
>>>>>>>>>> code, if the scheduler state changed, and the XNRESCHED bit is not
>>>>>>>>>> set,
>>>>>>>>>> only consider this a but if this new bit is not set. All this is
>>>>>>>>>> compiled out if the debug is not enabled.
>>>>>>>>> I still see no benefit in this check. Where to you want to place the
>>>>>>>>> bit
>>>>>>>>> set? Aren't that just the same locations where
>>>>>>>>> xnsched_set_[self_]resched already is today?
>>>>>>>> Well no, that would be another bit in the sched structure which would
>>>>>>>> allow us to manipulate the status bits from the local cpu. That
>>>>>>>> supplementary bit would only be changed from a distant CPU, and serve
>>>>>>>> to
>>>>>>>> detect the race which causes the false positive. The resched bits are
>>>>>>>> set on the local cpu to get xnpod_schedule to trigger a rescheduling on
>>>>>>>> the distance cpu. That bit would be set on the remote cpu's sched. Only
>>>>>>>> when debugging is enabled.
>>>>>>>>
>>>>>>>>> But maybe you can provide some motivating bug scenarios, real ones of
>>>>>>>>> the past or realistic ones of the future.
>>>>>>>> Of course. The bug is anything which changes the scheduler state but
>>>>>>>> does not set the XNRESCHED bit. This happened when we started the SMP
>>>>>>>> port. New scheduling policies would be good candidates for a revival of
>>>>>>>> this bug.
>>>>>>>>
>>>>>>> You don't gain any worthwhile check if you cannot make the
>>>>>>> instrumentation required for a stable detection simpler than the proper
>>>>>>> problem solution itself. And this is what I'm still skeptical of.
>>>>>> The solution is simple, but finding the problem without the
>>>>>> instrumentation is way harder than with the instrumentation, so the
>>>>>> instrumentation is worth something.
>>>>>>
>>>>>> Reproducing the false positive is surprisingly easy with a simple
>>>>>> dual-cpu semaphore ping-pong test. So, here is the (tested) patch,
>>>>>> using a ridiculous long variable name to illustrate what I was
>>>>>> thinking about:
>>>>>>
>>>>>> diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
>>>>>> index 8888cf4..454b8e8 100644
>>>>>> --- a/include/nucleus/sched.h
>>>>>> +++ b/include/nucleus/sched.h
>>>>>> @@ -108,6 +108,9 @@ typedef struct xnsched {
>>>>>> struct xnthread *gktarget;
>>>>>> #endif
>>>>>>
>>>>>> +#ifdef CONFIG_XENO_OPT_DEBUG_NUCLEUS
>>>>>> + int debug_resched_from_remote;
>>>>>> +#endif
>>>>>> } xnsched_t;
>>>>>>
>>>>>> union xnsched_policy_param;
>>>>>> @@ -185,6 +188,8 @@ static inline int xnsched_resched_p(struct xnsched
>>>>>> *sched)
>>>>>> xnsched_t *current_sched = xnpod_current_sched(); \
>>>>>> __setbits(current_sched->status, XNRESCHED); \
>>>>>> if (current_sched != (__sched__)) { \
>>>>>> + if (XENO_DEBUG(NUCLEUS)) \
>>>>>> + __sched__->debug_resched_from_remote = 1; \
>>>>>> xnarch_cpu_set(xnsched_cpu(__sched__), current_sched->resched); \
>>>>>> } \
>>>>>> } while (0)
>>>>>> diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
>>>>>> index 4cb707a..50b0f49 100644
>>>>>> --- a/ksrc/nucleus/pod.c
>>>>>> +++ b/ksrc/nucleus/pod.c
>>>>>> @@ -2177,6 +2177,10 @@ static inline int __xnpod_test_resched(struct
>>>>>> xnsched *sched)
>>>>>> xnarch_cpus_clear(sched->resched);
>>>>>> }
>>>>>> #endif
>>>>>> + if (XENO_DEBUG(NUCLEUS) && sched->debug_resched_from_remote) {
>>>>>> + sched->debug_resched_from_remote = 0;
>>>>>> + resched = 1;
>>>>>> + }
>>>>>> clrbits(sched->status, XNRESCHED);
>>>>>> return resched;
>>>>>> }
>>>>>>
>>>>>>
>>>>>> I am still uncertain.
>>>>> Will only work if all is done under nklock, otherwise two almost
>>>>> simultaneous xnsched_resched_p from different cpus, might lead to one of
>>>>> the ipi wakeups sees the 0 written due to handling the first ipi
>>>>> interrupt.
>>>> This is a patch artifact, the function modified are xnsched_set_resched
>>>> and xnpod_test_resched, and both are run with the nklock locked.
>>>>
>>>
>>> Isn't this a possible scenario?
>>>
>>> CPU A CPU B CPU C
>>> take nklock
>>> remote = 1
>>> send ipi #1
>>> release nklock
>>> take nklock handle ipi
>>> remote = 1 ack ipi #1
>>> send ipi #2
>>> release nklock
>>> take nklock
>>> if remote (==1)
>>> remote = 0
>>> reseched = 1
>>> relese nklock
>>> handle ipi
>>> ack ipi #2
>>> take nklock
>>> if remote (==0)
>>> OOPS!
>>
>> No problem here, since handling the first IPI has taken into account the
>> two scheduler state changes. So, no OOPS. The second IPI is spurious.
>>
>> Anyway, after some thoughts, I think we are going to try and make the
>> current situation work instead of going back to the old way.
>>
>> You can find the patch which attempts to do so here:
>> http://sisyphus.hd.free.fr/~gilles/sched_status.txt
>
> Ack. At last, this addresses the real issues without asking for
> regression funkiness: fix the lack of barrier before testing XNSCHED inCheck the kernel, we actually need it on both sides. Wherever the final barriers will be, we should leave a comment behind why they are there. Could be picked up from kernel/smp.c. > the xnpod_schedule pre-test, and stop sched->status trashing due to > XNINIRQ/XNHTICK/XNRPICK ops done un-synced on nklock. > > In short, this patch looks like moving the local-only flags where they > belong, i.e. anywhere you want but *outside* of the status with remotely > accessed bits. XNRPICK seems to be handled differently, but it makes > sense to group it with other RPI data as you did, so fine with me. I just hope we finally converge over a solution. Looks like all possibilities have been explored now. A few more comments on this one: It probably makes sense to group the status bits accordingly (both their values and definitions) and briefly document on which status field they are supposed to be applied. I do not understand the split logic - or some bits are simply not yet migrated. XNHDEFER, XNSWLOCK, XNKCOUT are all local-only as well, no? Then better put them in the _local_ status field, that's more consistent (and would help if we once wanted to optimize their cache line usage). The naming is unfortunate: status vs. lstatus. This is asking for confusion and typos. They must be better distinguishable, e.g. local_status. Or we need accessors that have debug checks built in, catching wrong bits for their target fields. Good catch of the RPI breakage, Gilles! Jan
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Xenomai-core mailing list [email protected] https://mail.gna.org/listinfo/xenomai-core
