This is run on a simulated serengeti machine.
In the meanwhile, I have tried two things:
1/ I made my own sbdp_cpu_poweroff function(), which is correctly
recognized. The execution *seems* to go well, but I have nevertheless
the idea that it is not correct. Using `psradm -v -a -n` I can verify
which cpus were already online and which weren't. When I run this
command a first time after my module has powered a processor off, all
seems ok. The cpus that were powered off are put back online.
BUT when I run the psradm command a second time (without reloading the
module), I get an error saying that lpl_topo_verify() failed.
Based on the error code, the problem seems to be
LPL_TOPO_LPL_BAD_NCPU, which is triggered in
http://src.opensolaris.org/source/xref/onnv/aside/usr/src/uts/common/os/lgrp.c
on lines 2185 and 2244.
The lpl_topo_verify() function was executed from the following stack:
lgrp_config
cpu_add_active_internal
cpu_online
p_online_internal
p_online
I am not sure what causes the problem. I have looked in the code of
the original sbdp_cpu_poweroff() if there are any clues to something
that adds or removes a function, but I can't find something. The
mp_cpu_quiesce() function does a CPUSET_DEL call, which might be
related, but since mp_cpu_quiesce() is void, I don't know if something
went wrong there.
Here is my sbdp_cpu_poweroff alternative. Remember that this is on a
simulated system, and I'm thus not interested in all hardware aspects
like RAM mappings. That's why I left them out, I believe they are not
related to the problem.
int
bdp_cpu_poweroff(struct cpu *cp)
{
ASSERT(MUTEX_HELD(&cpu_lock));
promsafe_pause_cpus();
mp_cpu_quiesce(cp);
cp->cpu_flags = CPU_OFFLINE | CPU_QUIESCED | CPU_POWEROFF;
CPU_SIGNATURE(OS_SIG, SIGST_DETACHED, SIGSUBST_NULL, cp->cpu_id);
start_cpus();
return (0);
}
The second thing I tried was:
2/ I loaded the sbdp module, and using my own module called
cpu_poweroff() after having it put offline using cpu_offline(). When
cpu_poweroff() is executed I get the following error:
-----------------
SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
EVENT-TIME: 0x3cf9fb19.0x16be5f58 (0x1889489670)
PLATFORM: SUNW,Sun-Fire, CSN: -, HOSTNAME: sarek.network.sim
SOURCE: SunOS, REV: 5.11 snv_57
DESC: Errors have been detected that require a reboot to ensure system
integrity. See http://www.sun.com/msg/SUNOS-8000-0G for more information.
AUTO-RESPONSE: Solaris will attempt to save and diagnose the error telemetry
IMPACT: The system will sync files, save a crash dump if needed, and reboot
REC-ACTION: Save the error summary below in case telemetry cannot be saved
ereport.io.sch.pbm.s-ma ena=188934075000801 detector=[ version=0 scheme="dev"
device-path="/[EMAIL PROTECTED],0/[EMAIL PROTECTED],600000" ] pci-status=22a0
pci-command=140
pbm-csr=e001f pbm-afsr=fff0000000000000 pbm-afar=0 errant-slot=0 pbm-valog=0
ereport.io.pci.ma ena=188934075000801 detector=[ version=0 scheme="dev"
device-path="/[EMAIL PROTECTED],0/[EMAIL PROTECTED],600000" ] pci-status=22a0
pci-command=140 pci-pa=
0
ereport.io.sch.pbm.s-ma ena=188934075000801 detector=[ version=0 scheme="dev"
device-path="/[EMAIL PROTECTED],0/[EMAIL PROTECTED],600000" ] pci-status=22a0
pci-command=140
pbm-csr=e001f pbm-afsr=fff0000000000000 pbm-afar=0 errant-slot=0 pbm-valog=0
ereport.io.pci.ma ena=188934075000801 detector=[ version=0 scheme="dev"
device-path="/[EMAIL PROTECTED],0/[EMAIL PROTECTED],600000" ] pci-status=22a0
pci-command=140 pci-pa=
0
ereport.io.sch.pbm.s-ma ena=188934075000801 detector=[ version=0 scheme="dev"
device-path="/[EMAIL PROTECTED],0/[EMAIL PROTECTED],700000" ] pci-status=22a0
pci-command=140
pbm-csr=e001f pbm-afsr=fff0000000000000 pbm-afar=0 errant-slot=0 pbm-valog=0
ereport.io.pci.ma ena=188934075000801 detector=[ version=0 scheme="dev"
device-path="/[EMAIL PROTECTED],0/[EMAIL PROTECTED],700000" ] pci-status=22a0
pci-command=140 pci-pa=
0
ereport.io.sch.pbm.s-ma ena=188934075000801 detector=[ version=0 scheme="dev"
device-path="/[EMAIL PROTECTED],0/[EMAIL PROTECTED],700000" ] pci-status=22a0
pci-command=140
pbm-csr=e001f pbm-afsr=fff0000000000000 pbm-afar=0 errant-slot=0 pbm-valog=0
ereport.io.pci.ma ena=188934075000801 detector=[ version=0 scheme="dev"
device-path="/[EMAIL PROTECTED],0/[EMAIL PROTECTED],700000" ] pci-status=22a0
pci-command=140 pci-pa=
0
ereport.cpu.ultraSPARC-IIIplus.to ena=188934075000801 detector=[ version=1
scheme="cpu" cpuid=2 cpumask=55 serial="0" ] afsr=100000000000 afar-status=1
afar=1fff0900000 pc=13eff58 tl=0 tt=32 privileged=1 multiple=0
panic[cpu2]/thread=2a1003bfcc0: TO Error(s)
000002a1003e71b0 SUNW,UltraSPARC-III+:cpu_deferred_error+5b4 (601000,
203000, 0, 100000000000, 1fff0900000, 100000000000)
%l0-3: 00000300015ee8e0 000000007fcffc00 0000000002603000 0000080c00000040
%l4-7: 0000080c00000000 0000000000000002 00000000ecc1ecc1 ecc1ecc100000000
000002a1003e7c50 unix:ktl0+48 (84, 800000, 1083e00, 84000, 1881800, 8000)
%l0-3: 0000000000000002 0000000000001400 0000004400001601 00000000012306bc
%l4-7: 000003000127b990 0000000000000001 000000000000000d 000002a1003e7d00
000002a1003e7da0 SUNW,UltraSPARC-III+:scrub_dcache+34 (2a1001276e0, 0,
40, 701b5e58, 13eff4c, d0)
%l0-3: 0000030001f0a000 0000000000000001 0000000000000084 00000300015ed4a8
%l4-7: 0000000001924a98 0000000000000084 00000000000007ff 0000000000000800
000002a1003e7e50 unix:xc_serv+8c (2, 1, 0, e, 10, 1837c90)
%l0-3: 0000000001837800 000000000000001a 0000000000000002 000000000000001c
%l4-7: 0000000000000000 0000000000000004 0000000000000000 0000000000000004
000002a1003e7f50 unix:current_thread+168 (300015ea000, 300015ea000, 2,
2f, 1059a00, 182c000)
%l0-3: 00000000010076e4 000002a1003bf0d1 000000000000000d 000000007009cbf8
%l4-7: 0000000000000004 0000000000000000 0000000000000000 000002a1003bf980
000002a1003bfa20 unix:idle+88 (3000175b900, 0, 300015ea000,
ffffffffffffffff, 3, 182ac00)
%l0-3: 00000300010c4540 000000000000000b ffffffffffffffff 00000300010c4540
%l4-7: ffffffffffffffff 00000000018ac840 0000000001059a00 000000000182c000
panic: entering debugger (continue to save dump)
Type 'go' to resume
debugger entered.
{2} ok
-----------------------------
Can someone help me with these problems?
Thank you very much,
Thomas
On 5/18/07, Artem Kachitchkine <[EMAIL PROTECTED]> wrote:
> When I execute cpu_poweroff() which in turn calls this
> plat_cpu_poweroff() function, it returns with ENOTSUP, so it seems the
> sbdp_cpu_poweroff function does not exist.
What kind of machine are you running this on?
> However, using the source browser, there seems to be a
> sbdp_cpu_poweroff() function in uts/sun4u/serengeti/io/sbdp_cpu.c. If
> I understand correctly, this is part of some sbdp module for
> serengeti, which does not seem to be loaded and isn't in my /kernel
> directory either.
It is in the platform-specific directory:
$ nm /platform/SUNW,Sun-Fire/kernel/misc/sparcv9/sbdp | grep
sbdp_cpu_poweroff
[142] | 8596| 532|FUNC |GLOB |0 |1
|sbdp_cpu_poweroff
The module that depends on it:
$ ldd /platform/sun4u/kernel/misc/sparcv9/sbd
misc/sbdp => (file not found)
sbd module is loaded from ssm_open()
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/sun4u/serengeti/io/ssm.c#732
734 if (modload("misc", "sbd") == -1) {
which is called whenever ssm device is opened by cfgadm.
> what does the acronym sbdp stand for?
Don't know, but perhaps what the module name says: "System Board DR v%I%".
Is there a particular problem you're trying to solve. DR is one of the
most complicated parts of the system and probably one of the ugliest.
-Artem.
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code