Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

Sean Anderson Tue, 18 Aug 2020 03:27:23 -0700


On 8/18/20 5:00 AM, Heinrich Schuchardt wrote:
> On 11.08.20 12:32, Sean Anderson wrote:
>> On 8/11/20 3:50 AM, Heinrich Schuchardt wrote:
>>> On 11.08.20 08:20, Rick Chen wrote:
>>>> Hi Heinrich
>>>>
>>>>> From: Heinrich Schuchardt [mailto:[email protected]]
>>>>> Sent: Tuesday, August 11, 2020 11:57 AM
>>>>> To: Rick Jian-Zhi Chen(陳建志)
>>>>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel 
>>>>> Schwierzeck; [email protected]; Heinrich Schuchardt
>>>>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>>>>>
>>>>> At least on the Kendryte K210:
>>>>>
>>>>> gd->arch.available_harts= 0x0000000000000003 arch.ipi[0].addr=
>>>>> gd->0x0000000000000000 arch.ipi[0].arg0= 0x0000000000000000
>>>>> gd->arch.ipi[0].arg1= 0x0000000000000000 arch.ipi[1].addr=
>>>>> gd->0x0000000000000000 arch.ipi[1].arg0= 0x0000000000000000
>>>>> gd->arch.ipi[1].arg1= 0x0000000000000000
>>>>>
>>>>> We should not jump to 0x0 to handle an interrupt.
>>>>
>>>> Can you explain why K210 be affected by it ?
>>>
>>> I have been running U-Boot on the MaixDuino.
>>>
>>> Without this patch secondary_hart_loop() is reached only once. With the
>>> patch it is reached several thousand times.
>>
>> Hm, interesting. To me, this is a symptom of something else going
>> terribly wrong. I originally had this check in place so that it would
>> be easier to detect these sorts of errors. I don't think this is the
>> correct fix, however. We should really try and find the root cause of
>> the bug.
>>
>>> I would not expect NULL to contain any code that should be executed by
>>> the secondary hart. See doc/board/sipeed/maix.rst:
>>>
>>> Address    Size      Description
>>> ========== ========= ===========
>>> 0x00000000 0x1000    debug
>>> 0x00001000 0x1000    rom
>>> 0x02000000 0xC000    clint
>>>
>>>>
>>>>>
>>>>> Signed-off-by: Heinrich Schuchardt <[email protected]>
>>>>> ---
>>>>>  arch/riscv/lib/smp.c | 2 ++
>>>>>  1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index 
>>>>> ac22136314..d725fa32e8 100644
>>>>> --- a/arch/riscv/lib/smp.c
>>>>> +++ b/arch/riscv/lib/smp.c
>>>>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>>>>>                 return;
>>>>>         }
>>>>>
>>>>> +       if (!smp_function)
>>>>> +               return;
>>>>>         smp_function(hart, gd->arch.ipi[hart].arg0, 
>>>>> gd->arch.ipi[hart].arg1);  }
>>>>
>>>> I remember Sean add this check in
>>>> [v10,14/21] riscv: Clean up IPI initialization code
>>>> . And I ask him to remove.
>>>> https://patchwork.ozlabs.org/project/uboot/patch/[email protected]/
>>>
>>> Your comment was: "Please remove the sanity check. If zero address is
>>> the intended jump point, then system will work abnormal."
>>>
>>> The only place where gd->arch.ipi[hart].addr is set is:
>>>
>>> arch/riscv/lib/smp.c:53 send_ipi_many():
>>> gd->arch.ipi[reg].addr = ipi->addr;
>>>
>>> send_ipi_many() is only called in smp_call_function().
>>>
>>> So the line
>>>
>>> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);
>>>
>>> can only work if smp_function() has been called before this point at any
>>> time. The start code only calls it in spl_secondary_hart_stack_gd_setup().
>>
>> Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi
>> in [2] as part of a larger revert. The actual revert-of-revert is in
>> [3], though it really should be split out into its own patch. The
>> original commit making these changes is [4].
> 
> Patch series [1] makes not difference. You still have
> 
> gd->arch.ipi[0].addr= 0x0000000000000000
> gd->arch.ipi[1].addr= 0x0000000000000000
> 
> and the secondary hart jumps to NULL and never returns.


Hm, then there is a code path where an IPI gets triggered by something
other than opensbi.

> 
>>
>> Note that the situation before [4] was that the IPI got initialized by
>> the first hart to call any IPI function. If that hart was not the boot
>> hart, Bad Things started to happen (e.g. race conditions, memory
>> corruption, etc). In that patch, I moved the initialization to its own
>> function so we would not have any race conditions and instead have
>> (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of
>> course, when everything is working properly, we should get neither of
>> these.
>>
>> [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047
>> [2] 
>> https://patchwork.ozlabs.org/project/uboot/patch/[email protected]/
>> [3] 
>> https://patchwork.ozlabs.org/project/uboot/patch/[email protected]/
>> [4] 
>> https://patchwork.ozlabs.org/project/uboot/patch/[email protected]/
>>
>>> Why do we call handle_ipi() on the secondary hart at all?
>>
>> Presumably to handle the IPI it got sent? Sorry, I'm confused as to what
>> you're getting at.
>>
> 
> I cannot see anything to be done by a secondary hart in case of a
> software interrupt.

Isn't it supposed to run the function which the boot hart sent to it?

--Sean

Re: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()

Reply via email to