And just confirming that indeed worked. Thanks Seth!! Putting some
more context, the newer bios that support Intel 5500 series processor
(specially the 6 cores ones which got me ensnared in this in the
first place) have a option ACPI APIC (under Advanced->ACPI setting
in the BIOS). If that option is disabled, seems like hyperthreading
is disabled even though BIOS has it enabled.

Fix went in b142. If you are running b142 or later, you should be
fine otherwise. For builds before 142, do teh workaround (you can
also disable the ACPI APIC option in bios, the machine will still
boot without hyperthreading and add 'set idle_cpu_no_deep_c=1' in
your /etc/system; reboot; enabled APIC option and get all your
threads :)

Cheers,
Sunay

On 07/17/10 03:54 PM, Seth Goldberg wrote:

For the record, Sunay and I had some offline exchanges, and ther WAS a
BIOS option for the MADT and it was disabled. When he enabled it, he hit
bug 6949969. The workaround, booting with kmdb -d (-kd) and setting
idle_cpu_no_deep_c/W 1 should allow the system to boot.

--S

Quoting Sunay Tripathi, who wrote the following on Sat, 17 Jul 2010:

It shows only 8 CPUs. Attached is the prtconf -v. If you want, I can
let you get on the system. I can run linux as well but it might take
a bit time to download the iso and burn the DVD etc.

Cheers,
Sunay


On 07/17/10 01:33 PM, Seth Goldberg wrote:

It's very weird that all other tables would be detected EXCEPT the MADT.
The fact that boot-ncpus was set to 16 already indicates that SOMETHING
was around that gave the kernel information about available CPUs to
start. Look at prtconf -v at the cpu_apicid_array property, which is
generated very early in boot. Can you boot another OS, like a Linux Live
CD and use acpidump there to capture the set of ACPI tables?

--S

Quoting Sunay Tripathi, who wrote the following on Sat, 17 Jul 2010:

On 07/16/10 06:33 PM, Seth Goldberg wrote:

Bart/Seth,

Yup. I double checked. Its enabled. BTW, disabling it doesn't
change anything. Do we keep any per thread data struct or
soemthing? I want to see if its one of our bugs or something
weird with the processor/bios (btw, its a supermicro machine
I think in past Solaris used to work pretty easily on them).

What is the value of the boot-ncpus property? I would also dump the
ACPI
MADT to see what the BIOS is telling the OS wrt # of CPUs it can
start.
If you didn't get any warnings in the log or during boot that Solaris
couldn't start CPUs, then it's likely that Solaris started all the
CPUs
that the BIOS specified.

--S

I checked bunch of other machines with 5600 CPUs and seems like they
all are failing because of ACPI failing to initialize (can't get MADT
table). I see these in the /var/adm/messages

Jul 17 12:50:46 orion1 acpidev: [ID 230966 kern.warning] WARNING:
acpidev: failed to get MADT table in acpidev_walk_apic().
Jul 17 12:50:46 orion1 unix: [ID 933431 kern.warning] WARNING:
cpupm_init: processor 0: unable to get ACPI handle
Jul 17 12:50:46 orion1 unix: [ID 494986 kern.info] NOTICE: CPU power
management will not function.

The interesting part is that max_ncpus=16 and ncpus=8. So why is that
we end up skipping half the CPUs in start_other_cpus() even if we can't
get MADT handle? I have attached /var/adm/messages, acpidump, summary
and DATA files.

Any workarounds for the time being?

Cheers,
Sunay






_______________________________________________
on-discuss mailing list
on-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/on-discuss

Reply via email to