Re: [Xenomai] Target frozen with rtcan_virt

Philippe Gerum Wed, 10 Oct 2012 06:17:53 -0700

On 10/10/2012 03:09 PM, Philippe Gerum wrote:
> On 10/10/2012 03:01 PM, Gilles Chanteperdrix wrote:
>> On 10/10/2012 02:57 PM, Philippe Gerum wrote:
>>> On 10/10/2012 02:55 PM, Gilles Chanteperdrix wrote:
>>>> On 10/10/2012 02:41 PM, Philippe Gerum wrote:
>>>>> On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
>>>>>> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>>>>>>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>>>>>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>>>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in 
>>>>>>>>>>>>>>>>>>>> rtcan_virt, we have
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> and 
>>>>>>>>>>>>>>>>>>>> rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second 
>>>>>>>>>>>>>>>>>>>> patch, 
>>>>>>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the 
>>>>>>>>>>>>>>>>>>> second patch,
>>>>>>>>>>>>>>>>>>> we might want to try and reduce the overhead of 
>>>>>>>>>>>>>>>>>>> xnpod_unlock_sched.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? 
>>>>>>>>>>>>>>>>>> That's a
>>>>>>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we 
>>>>>>>>>>>>>>>>>> are also using
>>>>>>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the 
>>>>>>>>>>>>>>>>>> full context
>>>>>>>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The second patch idea was to lock the scheduler while 
>>>>>>>>>>>>>>>>> spinlocks are
>>>>>>>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock 
>>>>>>>>>>>>>>>>> is no longer
>>>>>>>>>>>>>>>>> a bug.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up 
>>>>>>>>>>>>>>> another
>>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue 
>>>>>>>>>>>>>>> we are
>>>>>>>>>>>>>>> talking about.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the 
>>>>>>>>>>>>>> encapsulated
>>>>>>>>>>>>>> code, executing it atomically as rescheduling is postponed until 
>>>>>>>>>>>>>> the end
>>>>>>>>>>>>>> of the block.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Err... no. Absolutely not.
>>>>>>>>>>>>
>>>>>>>>>>>> Err... absolutely right.
>>>>>>>>>>>>
>>>>>>>>>>>> The good news is: we don't need to worry about such kind of 
>>>>>>>>>>>> locking. In
>>>>>>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we 
>>>>>>>>>>>> are
>>>>>>>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>>>>>>>> first patch is fine.
>>>>>>>>>>>
>>>>>>>>>>> Which one? The first one does not seem to work because the rtdm 
>>>>>>>>>>> locks
>>>>>>>>>>> seem to be nested. The second one would probably need to find a way 
>>>>>>>>>>> to
>>>>>>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, 
>>>>>>>>>>> however,
>>>>>>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>>>>>>
>>>>>>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan 
>>>>>>>>>> wasn't
>>>>>>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>>>>>>> semantical change to rtdm_lock/unlock.
>>>>>>>>>>
>>>>>>>>>> But then we really need something as light-weight as 
>>>>>>>>>> preempt_enable/disable.
>>>>>>>>>>
>>>>>>>>> This is not as lightweight as it might be given that we pair a flag 
>>>>>>>>> and
>>>>>>>>> a counter to achieve this (which saves one data reference in
>>>>>>>>> xnpod_schedule() though), but this is a start:
>>>>>>>>>
>>>>>>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>>>>>>
>>>>>>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in 
>>>>>>>>> stone
>>>>>>>>> in the nucleus API to manipulate the sched locking counter from a
>>>>>>>>> context where the nucleus lock is already held, so that RTDM can rely 
>>>>>>>>> on
>>>>>>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>>>>>>
>>>>>>>>
>>>>>>>> The problem of xnpod_unlock_sched from my point of view is this section
>>>>>>>> of code:
>>>>>>>>
>>>>>>>>                xnsched_set_self_resched(curr->sched);
>>>>>>>>                xnpod_schedule();
>>>>>>>>
>>>>>>>> It means that we will go to the full blown __xnpod_schedule when
>>>>>>>> unlocking the top-most spinlock.
>>>>>>>>
>>>>>>>> I guess what I meant is that we should have a scheduler bit that is
>>>>>>>> simply tested in xnpod_schedule, but then we loose the ability for the
>>>>>>>> threads with the scheduler locked to suspend themselves. So, maybe we
>>>>>>>> should define another xnpod_lock/unlock pair, maybe something like
>>>>>>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS 
>>>>>>> parlance.
>>>>>>
>>>>>> Yes, but the reason why we have to go to __xnpod_schedule in
>>>>>> xnpod_unlock_sched() is to allow thread to be suspended with the
>>>>>> scheduler locked (you added that at some point for the vxworks emulator,
>>>>>> if I remember correctly). But without that need, we can put the XNLOCK
>>>>>> bit in the sched->status structure, and simply test that bit with all
>>>>>> the others in xnpod_schedule(), and forget about the need to call
>>>>>> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.
>>>>>>
>>>>>
>>>>> You may have to reschedule after an IRQ has interrupted the lock owner
>>>>> inside the locked section, in which case you have to make sure to
>>>>> reschedule when dropping the sched lock.
>>>>
>>>> Well, in that case the resched bit will have been set already by the irq
>>>> handler calling xnpod_resume_thread, or other service, you do not have
>>>> to force it.
>>>>
>>>
>>> Yes, and this is why you have to call __xnpod_schedule() regardless.
>>>
>> no, xnpod_schedule is enough, it will only call __xnpod_schedule() if
>> the resched bit is set.
>>
> 
> We are not discussing about the same issue, I'm afraid. We can't
> optimize the schedlock path via a scheduler flag since we have to care
> about lock nesting. Since the only sane option there is to hold such
> counter in the current thread context, the optimization is 100% in the
> way we access this information.
>


And yes, I do understand that the ability for a thread to schedule out
while holding the scheduler lock is part of the problem, but this
ability is key to have a consistent scheduler locking scheme.

-- 
Philippe.

_______________________________________________
Xenomai mailing list
Xenomai@xenomai.org
http://www.xenomai.org/mailman/listinfo/xenomai

Re: [Xenomai] Target frozen with rtcan_virt

Reply via email to