On 8/18/20 5:00 AM, Heinrich Schuchardt wrote: > On 11.08.20 12:32, Sean Anderson wrote: >> On 8/11/20 3:50 AM, Heinrich Schuchardt wrote: >>> On 11.08.20 08:20, Rick Chen wrote: >>>> Hi Heinrich >>>> >>>>> From: Heinrich Schuchardt [mailto:[email protected]] >>>>> Sent: Tuesday, August 11, 2020 11:57 AM >>>>> To: Rick Jian-Zhi Chen(陳建志) >>>>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel >>>>> Schwierzeck; [email protected]; Heinrich Schuchardt >>>>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi() >>>>> >>>>> At least on the Kendryte K210: >>>>> >>>>> gd->arch.available_harts= 0x0000000000000003 arch.ipi[0].addr= >>>>> gd->0x0000000000000000 arch.ipi[0].arg0= 0x0000000000000000 >>>>> gd->arch.ipi[0].arg1= 0x0000000000000000 arch.ipi[1].addr= >>>>> gd->0x0000000000000000 arch.ipi[1].arg0= 0x0000000000000000 >>>>> gd->arch.ipi[1].arg1= 0x0000000000000000 >>>>> >>>>> We should not jump to 0x0 to handle an interrupt. >>>> >>>> Can you explain why K210 be affected by it ? >>> >>> I have been running U-Boot on the MaixDuino. >>> >>> Without this patch secondary_hart_loop() is reached only once. With the >>> patch it is reached several thousand times. >> >> Hm, interesting. To me, this is a symptom of something else going >> terribly wrong. I originally had this check in place so that it would >> be easier to detect these sorts of errors. I don't think this is the >> correct fix, however. We should really try and find the root cause of >> the bug. >> >>> I would not expect NULL to contain any code that should be executed by >>> the secondary hart. See doc/board/sipeed/maix.rst: >>> >>> Address Size Description >>> ========== ========= =========== >>> 0x00000000 0x1000 debug >>> 0x00001000 0x1000 rom >>> 0x02000000 0xC000 clint >>> >>>> >>>>> >>>>> Signed-off-by: Heinrich Schuchardt <[email protected]> >>>>> --- >>>>> arch/riscv/lib/smp.c | 2 ++ >>>>> 1 file changed, 2 insertions(+) >>>>> >>>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index >>>>> ac22136314..d725fa32e8 100644 >>>>> --- a/arch/riscv/lib/smp.c >>>>> +++ b/arch/riscv/lib/smp.c >>>>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart) >>>>> return; >>>>> } >>>>> >>>>> + if (!smp_function) >>>>> + return; >>>>> smp_function(hart, gd->arch.ipi[hart].arg0, >>>>> gd->arch.ipi[hart].arg1); } >>>> >>>> I remember Sean add this check in >>>> [v10,14/21] riscv: Clean up IPI initialization code >>>> . And I ask him to remove. >>>> https://patchwork.ozlabs.org/project/uboot/patch/[email protected]/ >>> >>> Your comment was: "Please remove the sanity check. If zero address is >>> the intended jump point, then system will work abnormal." >>> >>> The only place where gd->arch.ipi[hart].addr is set is: >>> >>> arch/riscv/lib/smp.c:53 send_ipi_many(): >>> gd->arch.ipi[reg].addr = ipi->addr; >>> >>> send_ipi_many() is only called in smp_call_function(). >>> >>> So the line >>> >>> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1); >>> >>> can only work if smp_function() has been called before this point at any >>> time. The start code only calls it in spl_secondary_hart_stack_gd_setup(). >> >> Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi >> in [2] as part of a larger revert. The actual revert-of-revert is in >> [3], though it really should be split out into its own patch. The >> original commit making these changes is [4]. > > Patch series [1] makes not difference. You still have > > gd->arch.ipi[0].addr= 0x0000000000000000 > gd->arch.ipi[1].addr= 0x0000000000000000 > > and the secondary hart jumps to NULL and never returns.
Hm, then there is a code path where an IPI gets triggered by something other than opensbi. > >> >> Note that the situation before [4] was that the IPI got initialized by >> the first hart to call any IPI function. If that hart was not the boot >> hart, Bad Things started to happen (e.g. race conditions, memory >> corruption, etc). In that patch, I moved the initialization to its own >> function so we would not have any race conditions and instead have >> (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of >> course, when everything is working properly, we should get neither of >> these. >> >> [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047 >> [2] >> https://patchwork.ozlabs.org/project/uboot/patch/[email protected]/ >> [3] >> https://patchwork.ozlabs.org/project/uboot/patch/[email protected]/ >> [4] >> https://patchwork.ozlabs.org/project/uboot/patch/[email protected]/ >> >>> Why do we call handle_ipi() on the secondary hart at all? >> >> Presumably to handle the IPI it got sent? Sorry, I'm confused as to what >> you're getting at. >> > > I cannot see anything to be done by a secondary hart in case of a > software interrupt. Isn't it supposed to run the function which the boot hart sent to it? --Sean

