On 11/27/17 10:06, Andrew Turner wrote:
On 26 Nov 2017, at 17:07, Nathan Whitehorn <nwhiteh...@freebsd.org> wrote:



On 11/26/17 01:46, Andrew Turner wrote:
On 25 Nov 2017, at 23:41, Nathan Whitehorn <nwhiteh...@freebsd.org> wrote:
Author: nwhitehorn
Date: Sat Nov 25 23:41:05 2017
New Revision: 326218
URL: https://svnweb.freebsd.org/changeset/base/326218

Log:
  Remove some, but not all, assumptions that the BSP is CPU 0 and that CPUs
  are numbered densely from there to n_cpus.

  MFC after:    1 month

Modified:
  head/sys/kern/kern_clock.c
  head/sys/kern/kern_clocksource.c
  head/sys/kern/kern_shutdown.c
  head/sys/kern/kern_timeout.c
  head/sys/kern/sched_ule.c
  head/sys/kern/subr_pcpu.c

...
Modified: head/sys/kern/subr_pcpu.c
==============================================================================
--- head/sys/kern/subr_pcpu.c   Sat Nov 25 23:23:24 2017        (r326217)
+++ head/sys/kern/subr_pcpu.c   Sat Nov 25 23:41:05 2017        (r326218)
@@ -279,6 +279,8 @@ pcpu_destroy(struct pcpu *pcpu)
struct pcpu *
pcpu_find(u_int cpuid)
{
+       KASSERT(cpuid_to_pcpu[cpuid] != NULL,
+           ("Getting uninitialized PCPU %d", cpuid));

        return (cpuid_to_pcpu[cpuid]);
}
This breaks on one arm64 simulator I have where the device tree lists 8 cpus, 
but only 2 are enabled in the simulation. The ofw_cpu device nodes for these 
are cpu0 and cpu4 so the call to cpu_find in ofw_cpu_attach will hit this 
KASSERT when the CPU has not been enabled.

Andrew

cpu0: <Open Firmware CPU> on cpulist0
cpu1: <Open Firmware CPU> on cpulist0
panic: Getting uninitialized PCPU 1
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
         pc = 0xffff0000005f03e8  lr = 0xffff000000087048
         sp = 0xffff0000000105a0  fp = 0xffff0000000107b0
That's unfortunate. Removing the new KASSERT is probably not the right option, 
since all it is doing is indicating a pre-existing bug. Specifically, it is 
preventing ofw_cpu from using a NULL pointer for CPU 4, which I suppose it was 
previously happily doing (it would only get dereferenced if you had cpufreq 
support, hence no previous panic).
I would expect cpu4 to have a non-NULL value as its cpu pointer will have been 
set. It’s cpu1 that is the issue as it’s in the device tree, but doesn’t exist 
in “hardware”. Because of this it fails when we try to enable it so free it’s 
pcpu data clear the pointer.

Yes, I phrased that badly. As you say, the CPU ID that ofw_cpu has is "1", which doesn't exist. To fix this, we need to change the CPU ID that ofw_cpu has to match what the kernel thinks it is using.


A super quick-and-dirty fix would be to be fail attach on 
CPU_ABSENT(device_get_unit(dev)), which restores the previous buggy behavior 
but without the panic. If you do not actually use ofw_cpu for anything, we 
could also just disable it temporarily on your platform (it's off for some PPC 
systems, you will note, for similar reasons).
We used to use it to find CPUs to start, however this is no longer the case.

OK, so disabling it works fine for now.


A real fix is somewhat complex, since the driver relies on being able to get a 
mapping from platform-specific numbers in the device tree to an entry in pcpu, 
which intrinsically relies on reverse-engineering the platform's mapping 
between some kind of hardware CPU ID and the kernel CPU numbering. I can't 
think of a way to do that internally. We could make some kind of platform 
macro, but the whole concept is even somewhat dubious at the moment since a 
number of IBM systems with SMT don't even have a 1:1 relationship between CPUs 
as FreeBSD defines them and device tree nodes, so it's possible we need a 
serious rearchitecture of the driver.
On arm64 we use the ID from the device tree to assign the cpuid so there is a 
1:1 mapping between the two. In most cases both are identical, the only case 
that’s not correct is when we boot from a non-zero CPU, and I know of only one 
SoC where this is true.

The "ID from the device tree" being the value of "reg"? Or something else? I see a lot of variation in the encoding in sys/dts and there does not seem to be an obvious single answer. This is making me think that the only workable solution is something like a new ofw_machdep function that gets the phandle for a given CPU ID.
-Nathan


Andrew



_______________________________________________
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to