Fascinating. Your prtconf output doesn't even have that property, which reinforces the assertion that the MADT just doesn't exist on that system. I'd look around in the BIOS on that system for settings related to ACPI or multiprocessor support. Who makes that motherboard? The only way I can think of that the MADT wouldn't be found is if it's somehow corrupt -- I can't believe it was just omitted, unless, like I said, there's a BIOS option to do that (I don't remember ever having seen such an option though).

 --S

Quoting Sunay Tripathi, who wrote the following on Sat, 17 Jul 2010:

It shows only 8 CPUs. Attached is the prtconf -v. If you want, I can
let you get on the system. I can run linux as well but it might take
a bit time to download the iso and burn the DVD etc.

Cheers,
Sunay

On 07/17/10 01:33 PM, Seth Goldberg wrote:

It's very weird that all other tables would be detected EXCEPT the MADT.
The fact that boot-ncpus was set to 16 already indicates that SOMETHING
was around that gave the kernel information about available CPUs to
start. Look at prtconf -v at the cpu_apicid_array property, which is
generated very early in boot. Can you boot another OS, like a Linux Live
CD and use acpidump there to capture the set of ACPI tables?

--S

Quoting Sunay Tripathi, who wrote the following on Sat, 17 Jul 2010:

On 07/16/10 06:33 PM, Seth Goldberg wrote:

Bart/Seth,

Yup. I double checked. Its enabled. BTW, disabling it doesn't
change anything. Do we keep any per thread data struct or
soemthing? I want to see if its one of our bugs or something
weird with the processor/bios (btw, its a supermicro machine
I think in past Solaris used to work pretty easily on them).

What is the value of the boot-ncpus property? I would also dump the ACPI
MADT to see what the BIOS is telling the OS wrt # of CPUs it can start.
If you didn't get any warnings in the log or during boot that Solaris
couldn't start CPUs, then it's likely that Solaris started all the CPUs
that the BIOS specified.

--S

I checked bunch of other machines with 5600 CPUs and seems like they
all are failing because of ACPI failing to initialize (can't get MADT
table). I see these in the /var/adm/messages

Jul 17 12:50:46 orion1 acpidev: [ID 230966 kern.warning] WARNING:
acpidev: failed to get MADT table in acpidev_walk_apic().
Jul 17 12:50:46 orion1 unix: [ID 933431 kern.warning] WARNING:
cpupm_init: processor 0: unable to get ACPI handle
Jul 17 12:50:46 orion1 unix: [ID 494986 kern.info] NOTICE: CPU power
management will not function.

The interesting part is that max_ncpus=16 and ncpus=8. So why is that
we end up skipping half the CPUs in start_other_cpus() even if we can't
get MADT handle? I have attached /var/adm/messages, acpidump, summary
and DATA files.

Any workarounds for the time being?

Cheers,
Sunay





_______________________________________________
on-discuss mailing list
on-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/on-discuss

Reply via email to