> On 23 Mar 2021, at 19:26, Julien Grall <jul...@xen.org> wrote:
> 
> 
> 
> On 23/03/2021 17:06, Luca Fancellu wrote:
>> Hi all,
> 
> Hi,
> 
> Please avoid top posting when answering to a comment. This makes more 
> difficult to follow.
> 
>> I have an update, changing the lock introduced by the serie from spinlock_t 
>> to raw_spinlock_t, changing the lock/unlock function to use the raw_* 
>> version and keeping the BUG_ON(…) (now we can because raw_* implementation 
>> disable interrupts on preempt_rt) the kernel is booting correctly.
>> So seems that the BUG_ON(…) is needed and the unmask function should run 
>> with interrupt disabled, anyone knows why this change worked?
> 
> Do you mean why no-one spotted the issue before? If so, AFAIK, on vanilla 
> Linux, spin_lock is still just a wrapper to raw_spinlock. IOW there is no 
> option to replace it with a RT spinlock.
> 
> So if you don't apply the RT patches, you would not be able to trigger the 
> issue.
> 
> As to the fix itself, I think using raw_spinlock_t is the correct thing to do 
> because the lock is also used in interrupt context (even with RT enabled).
> 
> Would you be able to send a patch?

Yes I’ll send a patch soon

> 
>>> On 23 Mar 2021, at 15:39, Luca Fancellu <luca.fance...@arm.com> wrote:
>>> 
>>> Hi Jason,
>>> 
>>> Thanks for your hints, unfortunately seems not an init problem because in 
>>> the same init configuration I tried the 5.10.23 (preempt_rt) without the 
>>> Juergen patch but with the BUG_ON removed and it boots without problem. So 
>>> seems that applying the serie does something (on a preempt_rt kernel) and 
>>> we are trying to figure out what.
>>> 
>>> 
>>>> On 23 Mar 2021, at 12:36, Jason Andryuk <jandr...@gmail.com> wrote:
>>>> 
>>>> On Mon, Mar 22, 2021 at 3:09 PM Luca Fancellu <luca.fance...@arm.com> 
>>>> wrote:
>>>>> 
>>>>> Hi Juergen,
>>>>> 
>>>>> Yes you are right it was my mistake, as you said to remove the BUG_ON(…) 
>>>>> this serie 
>>>>> (https://patchwork.kernel.org/project/xen-devel/cover/20210306161833.4552-1-jgr...@suse.com/)
>>>>>  is needed, since I’m using yocto I’m able to build a preempt_rt kernel 
>>>>> up to the 5.10.23 and for this reason I’m applying that serie on top of 
>>>>> this version, then I’m removing the BUG_ON(…).
>>>>> 
>>>>> A thing that was not expected is that now the Dom0 kernel is stuck on 
>>>>> “Setting domain 0 name, domid and JSON config…” step and the system seems 
>>>>> unresponsive. Seems like a deadlock issue but looking into the serie we 
>>>>> can’t spot anything and that serie was also tested by others from the 
>>>>> community.
> 
> The deadlock is expected. When you enable RT spinlock, the interrupts will 
> not disabled even when you call spin_lock_irqsave().
> 
> As the lock is also used in interrupt context (e.g. with interrupt masked), 
> this will lead to a deadlock because the lock can be held with interrupt 
> unmasked.
> 
> This is quite a common error as developpers are not yet used to test RT. I 
> remember finding a few other instances like that when I worked on RT a couple 
> of years ago.
> 
> For future reference, I think CONFIG_PROVE_LOCKING=y could help you to detect 
> (potential) deadlock.
> 
> Cheers,
> 
> -- 
> Julien Grall

Reply via email to