Public bug reported:
On this system, I've run into two scenarios where the 4 cores are
offlined and can't be brought back up.
IN both cases, all but cpu0 appear to be taken offline, and the only way
to restore them is to access the control system and power the system
down and back up manually. Additionally, I've noticed that the power
off/on sequence must be done twice before it's effective.
First scenario (both from certification testing) is the cpu_offlining
script. It attempts to offline cores 1 - N, skipping core 0. However,
once it brings the cores down (with errors for each one) it is
impossible to bring them back up, as writes to the file
sys/devices/system/cpu/cpuN/online error out with messages about being
unable to write to the file. For more details, see bug #1182637
Next, when running the cpu stress test. Ordinarily, the tool 'stress'
is installe by checkbox and run like this:
stress --cpu NUM_OF_CORES --vm (RAM/4) --timeout 7200
I discovered that even a small, quick run like this will kill the cores:
stress --cpu 1 --vm 4 --timeout 7200.
To demonstrate, here is the output of /proc/cpuinfo before and after the quick
stress run I just described:
ubuntu@c18:~$ cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 0 (v7l)
BogoMIPS : 2183.63
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x3
CPU part : 0xc09
CPU revision : 0
processor : 1
model name : ARMv7 Processor rev 0 (v7l)
BogoMIPS : 2191.03
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x3
CPU part : 0xc09
CPU revision : 0
processor : 2
model name : ARMv7 Processor rev 0 (v7l)
BogoMIPS : 2191.03
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x3
CPU part : 0xc09
CPU revision : 0
processor : 3
model name : ARMv7 Processor rev 0 (v7l)
BogoMIPS : 2191.03
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x3
CPU part : 0xc09
CPU revision : 0
Hardware : Highbank
Revision : 0000
Serial : 0000000000000000
ubuntu@c18:~$ stress --cpu 1 --vm 4 --timeout 90
stress: info: [2612] dispatching hogs: 1 cpu, 0 io, 4 vm, 0 hdd
stress: info: [2612] successful run completed in 91s
ubuntu@c18:~$ cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 0 (v7l)
BogoMIPS : 2183.63
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x3
CPU part : 0xc09
CPU revision : 0
Hardware : Highbank
Revision : 0000
Serial : 0000000000000000
ubuntu@c18:~$
Also, note this from dmesg:
[ 61.915519] init: bootchart post-stop process (1281) terminated with status 2
[ 180.674675] CPU1: shutdown
[ 180.695118] CPU2: shutdown
[ 180.722629] CPU3: shutdown
[ 181.770108] CPU1: failed to come online
[ 182.785723] CPU2: failed to come online
[ 183.801385] CPU3: failed to come online
Additionally, I want to say that the system offlined those cores without
me doing anything at all on a previous attempt. But I have no proof of
that.
ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: linux-image-3.8.0-22-generic 3.8.0-22.33
ProcVersionSignature: User Name 3.8.0-22.33-generic 3.8.11
Uname: Linux 3.8.0-22-generic armv7l
AlsaDevices:
total 0
crw-rw---T 1 root audio 116, 1 May 30 00:13 seq
crw-rw---T 1 root audio 116, 33 May 30 00:13 timer
AlsaVersion: Advanced Linux Sound Architecture Driver Version k3.8.0-22-generic.
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.9.2-0ubuntu8
Architecture: armhf
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq',
'/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
Date: Thu May 30 00:15:38 2013
HibernationDevice: RESUME=UUID=92423a77-1db5-4768-bbac-d0d80f3bdfbd
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize
libusb: -99
MarkForUpload: True
PciMultimedia:
ProcEnviron:
TERM=screen
PATH=(custom, no user)
LANG=C
SHELL=/bin/bash
ProcFB:
ProcKernelCmdLine: console=ttyAMA0 nosplash
RelatedPackageVersions:
linux-restricted-modules-3.8.0-22-generic N/A
linux-backports-modules-3.8.0-22-generic N/A
linux-firmware 1.106
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
** Affects: linux (Ubuntu)
Importance: Undecided
Status: New
** Tags: apport-bug arm armhf raring server
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1185669
Title:
CPU cores offline and can't be brought back up on ARM Server card
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1185669/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs