> Date: Sat, 1 Aug 2020 18:23:08 +1000
> From: Jonathan Matthew <jonat...@d14n.org>
> Cc: tech@openbsd.org
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
> 
> On Wed, Jul 29, 2020 at 08:29:31PM +1000, Jonathan Matthew wrote:
> > On Wed, Jul 29, 2020 at 10:06:14AM +0200, Mark Kettenis wrote:
> > > > Date: Wed, 29 Jul 2020 10:38:55 +1000
> > > > From: Jonathan Matthew <jonat...@d14n.org>
> > > > 
> > > > On Tue, Jul 28, 2020 at 07:30:36PM +0200, Mark Kettenis wrote:
> > > > > > Date: Tue, 28 Jul 2020 21:42:46 +1000
> > > > > > From: Jonathan Matthew <jonat...@d14n.org>
> > > > > > 
> > > > > > On Tue, Jul 28, 2020 at 11:12:21AM +0200, Mark Kettenis wrote:
> > > > > > > > Date: Tue, 28 Jul 2020 13:46:34 +1000
> > > > > > > > From: Jonathan Matthew <jonat...@d14n.org>
> > > > > > > > 
> > > > > > > > On Mon, Jul 27, 2020 at 05:16:47PM +0200, Mark Kettenis wrote:
> > > > > > > > > > Date: Mon, 27 Jul 2020 17:02:41 +0200 (CEST)
> > > > > > > > > > From: Mark Kettenis <mark.kette...@xs4all.nl>
> > > > > > > > > > 
> > > > > > > > > > Recent ACPI versions have deprecated "Processor()" nodes in 
> > > > > > > > > > favout of
> > > > > > > > > > "Device()" nodes with a _HID() method that returns 
> > > > > > > > > > "ACPI0007".  This
> > > > > > > > > > diff tries to support machines with firmware that 
> > > > > > > > > > implements this.  If
> > > > > > > > > > you see something like:
> > > > > > > > > > 
> > > > > > > > > >   "ACPI0007" at acpi0 not configured
> > > > > > > > > > 
> > > > > > > > > > please try the following diff and report back with an 
> > > > > > > > > > updated dmesg.
> > > > > > > > > > 
> > > > > > > > > > Cheers,
> > > > > > > > > > 
> > > > > > > > > > Mark
> > > > > > > > > 
> > > > > > > > > And now with the right diff...
> > > > > > > > 
> > > > > > > > On a dell r6415, it looks like this:
> > > > > > > > 
> > > > > > > > acpicpu0 at acpi0copyvalue: 6: C1(@1 halt!)
> > > > > > > > all the way up to
> > > > > > > > acpicpu127 at acpi0copyvalue: 6: no cpu matching ACPI ID 127
> > > > > > > > 
> > > > > > > > which I guess means aml_copyvalue() needs to learn how to copy 
> > > > > > > > AML_OBJTYPE_DEVICE.
> > > > > > > 
> > > > > > > Yes.  It is not immediately obvious how this should work.  Do we 
> > > > > > > need
> > > > > > > to copy the aml_node pointer or not?  We don't do that for
> > > > > > > AML_OBJTYPE_PROCESSOR and AML_OBJTYPE_POWERRSRC types which are
> > > > > > > similar to AML_OBJTYPE_DEVICE.  But AML_OBJTYPE_DEVICE object 
> > > > > > > don't
> > > > > > > carry any additional information.  So we end up with just an empty
> > > > > > > case to avoid the warning.
> > > > > > > 
> > > > > > > Does this work on the Dell machines?
> > > > > > 
> > > > > > We've seen crashes in pool_cache_get() in various places after all 
> > > > > > the acpicpus
> > > > > > attach, which we haven't seen before on these machines, so I think 
> > > > > > it's
> > > > > > corrupting memory somehow.
> > > > > 
> > > > > Does that happen with only the acpicpu(4) diff?
> > > > 
> > > > Yes.  Looking at this a bit more, in the case where aml_evalnode() can't
> > > > copy the result value, it leaves it uninitialised, which means we'll 
> > > > call
> > > > aml_freevalue(&res) where res is stack junk.  memset(&res, 0, 
> > > > sizeof(res))
> > > > seems to fix it.
> > > 
> > > Eh, where exactly?
> > 
> > I had it just before the call to aml_evalnode(), but that can't be it,
> > since aml_evalnode() does the same thing.
> 
> Much better theory: the acpicpu_sc array has MAXCPUS elements, but on this
> system (and all R6415s, as far as I can tell) we have more acpicpu devices
> than that.  I suppose we should just make acpicpu_match fail if cf->cf_unit
> is >= MAXCPUS as we do with the actual cpu devices.

Yes, that makes more sense.

There are a couple of problems with this approach:

1. The set of "active" CPUs might not actually be covered by the first
   MAXCPUS ACPI0007 devices that appear in the ACPI device tree.

2. There are machines that have both Processor() nodes and ACPI0007
   Device() nodes.  See for example the machine from the "no output on
   glass console after switching to serial" post from sthen@ on tech@.

To address #1 we probably should check for a matching CPU early in
acpicpu_attach() and only add it to the acpicpu_sc[] array if there is
a match.  The current code is actually broken since the array is
indexed by acpicpu(4) dv_unit in places but cpu_number() in other
places.

To address #2 we could move the check for Processor() nodes later and
skip it if any acpicpu(4) devices already attached.

Cheers,

Mark

P.S. I don't have amd64 hardware with ACPI0007 devices, so if you want
     to take this diff, that is fine with me.  If not, let me know and
     I'll try to come up with a suitable diff.


> Index: acpicpu.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/acpi/acpicpu.c,v
> retrieving revision 1.85
> diff -u -p -r1.85 acpicpu.c
> --- acpicpu.c 27 May 2020 05:02:21 -0000      1.85
> +++ acpicpu.c 1 Aug 2020 08:18:49 -0000
> @@ -186,6 +186,11 @@ struct cfdriver acpicpu_cd = {
>       NULL, "acpicpu", DV_DULL
>  };
>  
> +const char *acpicpu_hids[] = {
> +     "ACPI0007",
> +     NULL
> +};
> +
>  extern int setperf_prio;
>  
>  struct acpicpu_softc *acpicpu_sc[MAXCPUS];
> @@ -650,6 +655,12 @@ acpicpu_match(struct device *parent, voi
>       struct acpi_attach_args *aa = aux;
>       struct cfdata           *cf = match;
>  
> +     if (cf->cf_unit >= MAXCPUS)
> +             return (0);
> +
> +     if (acpi_matchhids(aa, acpicpu_hids, cf->cf_driver->cd_name))
> +             return (1);
> +
>       /* sanity */
>       if (aa->aaa_name == NULL ||
>           strcmp(aa->aaa_name, cf->cf_driver->cd_name) != 0 ||
> @@ -665,6 +676,7 @@ acpicpu_attach(struct device *parent, st
>       struct acpicpu_softc    *sc = (struct acpicpu_softc *)self;
>       struct acpi_attach_args *aa = aux;
>       struct aml_value        res;
> +     int64_t                 uid;
>       int                     i;
>       uint32_t                status = 0;
>       CPU_INFO_ITERATOR       cii;
> @@ -675,6 +687,10 @@ acpicpu_attach(struct device *parent, st
>       acpicpu_sc[sc->sc_dev.dv_unit] = sc;
>  
>       SLIST_INIT(&sc->sc_cstates);
> +
> +     if (aml_evalinteger(sc->sc_acpi, sc->sc_devnode,
> +         "_UID", 0, NULL, &uid) == 0)
> +             sc->sc_cpu = uid;
>  
>       if (aml_evalnode(sc->sc_acpi, sc->sc_devnode, 0, NULL, &res) == 0) {
>               if (res.type == AML_OBJTYPE_PROCESSOR) {
> 

Reply via email to