Hello everyone,
(I'm posting this here in addition to [EMAIL PROTECTED] as the sles-e mailing-
list seems a bit deserted)
I have experienced a few kernel oopses and I'm wondering which way to search
for a solution. Maybe someone can give me some pointers:
I have three identical (HW/SW) servers running SLES9-x86_64 SP2 and Oracle RAC.
On two of these I have had kernel oopses recently.
The machines are HP ProLiant DL585 G1 with 12 GB RAM. For more info see below.
Oopses on dezulcamrc01:
The first oops occured (non-fatal) on 16:00:06, the second oops occured at
18:00:04
and probably hung the machine so I only have the final console-output (nothing
in the
syslog since that moment):
Final console-output:
----------------------------------------------------------------------
dezulcamrc01:~ #
Message from [EMAIL PROTECTED] at Mon Dec 25 16:00:08 2006 ...
dezulcamrc01 kernel: Oops: 0000 [1] SMP
Message from [EMAIL PROTECTED] at Mon Dec 25 16:00:08 2006 ...
dezulcamrc01 kernel: CR2: 0000003f80496680
Message from [EMAIL PROTECTED] at Mon Dec 25 18:00:04 2006 ...
dezulcamrc01 kernel: Oops: 0000 [2] SMP
----------------------------------------------------------------------
The first oops:
----------------------------------------------------------------------
Dec 25 16:00:06 dezulcamrc01 kernel: Badness in kobject_get at lib/kobject.c:457
Dec 25 16:00:06 dezulcamrc01 kernel:
Dec 25 16:00:06 dezulcamrc01 kernel: Call
Trace:<ffffffff8022fab6>{kobject_get+54} <ffffffff80195ae5>{do_open+581}
Dec 25 16:00:06 dezulcamrc01 kernel: <ffffffff80195d3f>{blkdev_open+47}
<ffffffff80189bb6>{dentry_open_it+262}
Dec 25 16:00:06 dezulcamrc01 kernel: <ffffffff80189d91>{filp_open+113}
<ffffffff80189e3f>{sys_open+159}
Dec 25 16:00:06 dezulcamrc01 kernel: <ffffffff801107d4>{system_call+124}
Dec 25 16:00:08 dezulcamrc01 kernel: Badness in kobject_get at lib/kobject.c:457
Dec 25 16:00:08 dezulcamrc01 kernel:
Dec 25 16:00:08 dezulcamrc01 kernel: Call
Trace:<ffffffff8022fab6>{kobject_get+54} <ffffffff80195ae5>{do_open+581}
Dec 25 16:00:08 dezulcamrc01 kernel: <ffffffff80195d3f>{blkdev_open+47}
<ffffffff80189bb6>{dentry_open_it+262}
Dec 25 16:00:08 dezulcamrc01 kernel: <ffffffff80189d91>{filp_open+113}
<ffffffff80189e3f>{sys_open+159}
Dec 25 16:00:08 dezulcamrc01 kernel: <ffffffff801107d4>{system_call+124}
Dec 25 16:00:08 dezulcamrc01 kernel: Unable to handle kernel paging request at
0000003f80496680 RIP:
Dec 25 16:00:08 dezulcamrc01 kernel: <ffffffff8016d67d>{kfree+77}
Dec 25 16:00:08 dezulcamrc01 kernel: PML4 26ffae067 PGD 0
Dec 25 16:00:08 dezulcamrc01 kernel: Oops: 0000 [1] SMP
Dec 25 16:00:08 dezulcamrc01 kernel: CPU 1
Dec 25 16:00:08 dezulcamrc01 kernel: Pid: 13041, comm: pvdisplay Tainted: P U
(2.6.5-7.201-smp SLES9_SP2_BRANCH-200508250620450000)
Dec 25 16:00:08 dezulcamrc01 kernel: RIP: 0010:[<ffffffff8016d67d>]
<ffffffff8016d67d>{kfree+77}
Dec 25 16:00:08 dezulcamrc01 kernel: RSP: 0018:00000102309a5e68 EFLAGS:
00010016
Dec 25 16:00:08 dezulcamrc01 kernel: RAX: 0000003fffffc000 RBX:
00000102fc4e9710 RCX: 000000000000001a
Dec 25 16:00:08 dezulcamrc01 kernel: RDX: 0000000000000000 RSI:
0000000000000000 RDI: 0000000000001000
Dec 25 16:00:08 dezulcamrc01 kernel: RBP: 0000000000001000 R08:
0000000000000000 R09: 0000000000000000
Dec 25 16:00:08 dezulcamrc01 kernel: R10: 0000000000000006 R11:
0000000000000000 R12: 00000102fc4e9740
Dec 25 16:00:08 dezulcamrc01 kernel: R13: 00000102fc4e9740 R14:
0000000000000000 R15: 0000000000000000
Dec 25 16:00:08 dezulcamrc01 kernel: FS: 0000002a95a994c0(0000)
GS:ffffffff80562f00(0000) knlGS:00000000556c2800
Dec 25 16:00:08 dezulcamrc01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
Dec 25 16:00:08 dezulcamrc01 kernel: CR2: 0000003f80496680 CR3:
0000000006449000 CR4: 00000000000006e0
Dec 25 16:00:08 dezulcamrc01 kernel: Process pvdisplay (pid: 13041, threadinfo
00000102309a4000, task 0000010103ae2b00)
Dec 25 16:00:08 dezulcamrc01 kernel: Stack: 0000000000000206 00000102fc4e9710
00000102fc4e9740 ffffffff8022f9ea
Dec 25 16:00:08 dezulcamrc01 kernel: 000001004adf4138 000001004adf4080
ffffffffa0029480 000001004adf4098
Dec 25 16:00:08 dezulcamrc01 kernel: 00000102ffcdd600 ffffffff801954bb
Dec 25 16:00:08 dezulcamrc01 kernel: Call
Trace:<ffffffff8022f9ea>{kobject_cleanup+74} <ffffffff801954bb>{blkdev_put+299}
Dec 25 16:00:08 dezulcamrc01 kernel: <ffffffff8018d9e2>{__fput+98}
<ffffffff8018970e>{filp_close+126}
Dec 25 16:00:08 dezulcamrc01 kernel: <ffffffff80189815>{sys_close+229}
<ffffffff801107d4>{system_call+124}
Dec 25 16:00:08 dezulcamrc01 kernel:
Dec 25 16:00:08 dezulcamrc01 kernel:
Dec 25 16:00:08 dezulcamrc01 kernel: Code: 48 0f b6 80 80 a6 49 80 48 8b 0c c5
80 a7 49 80 48 b8 ff ff
Dec 25 16:00:08 dezulcamrc01 kernel: RIP <ffffffff8016d67d>{kfree+77} RSP
<00000102309a5e68>
Dec 25 16:00:08 dezulcamrc01 kernel: CR2: 0000003f80496680
----------------------------------------------------------------------
Some more details:
# uname -a
Linux dezulcamrc01 2.6.5-7.201-smp #1 SMP Thu Aug 25 06:20:45 UTC 2005 x86_64
x86_64 x86_64 GNU/Linux
# more /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 37
model name : AMD Opteron (tm) Processor 852
stepping : 1
cpu MHz : 2399.968
cache size : 1024 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow pni
bogomips : 4718.59
TLB size : 1088 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 37
model name : AMD Opteron (tm) Processor 852
stepping : 1
cpu MHz : 2399.968
cache size : 1024 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow pni
bogomips : 3538.94
TLB size : 1088 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
Oopses on dezulcamrc02:
I had some more oopses (52 altogether) on another one of those 3 machines two
weeks ago (not necessarily related)
which looked like this:
Dec 11 18:00:07 dezulcamrc02 kernel: Unable to handle kernel paging request at
00000000000152c0 RIP:
Dec 11 18:00:07 dezulcamrc02 kernel: <ffffffff80174264>{blk_queue_bounce+20}
Dec 11 18:00:07 dezulcamrc02 kernel: PML4 dd28b067 PGD 16558067 PMD 0
Dec 11 18:00:07 dezulcamrc02 kernel: Oops: 0000 [1] SMP
Dec 11 18:00:07 dezulcamrc02 kernel: CPU 0
Dec 11 18:00:07 dezulcamrc02 kernel: Pid: 21177, comm: oracle Tainted: P U
(2.6.5-7.201-smp SLES9_SP2_BRANCH-200508250620450000)
Dec 11 18:00:07 dezulcamrc02 kernel: RIP: 0010:[<ffffffff80174264>]
<ffffffff80174264>{blk_queue_bounce+20}
Dec 11 18:00:07 dezulcamrc02 kernel: RSP: 0018:000001000cc819d8 EFLAGS:
00010216
Dec 11 18:00:07 dezulcamrc02 kernel: RAX: 0000000000000000 RBX:
0000000000000000 RCX: 0000000000000000
Dec 11 18:00:07 dezulcamrc02 kernel: RDX: 00000102fbbf0780 RSI:
000001000cc81a38 RDI: 0000000000015000
Dec 11 18:00:07 dezulcamrc02 kernel: RBP: 0000000000015000 R08:
000001017d4d1070 R09: 000001017d4d12c0
Dec 11 18:00:07 dezulcamrc02 kernel: R10: 0000000000000000 R11:
0000000000000001 R12: 0000000000000000
Dec 11 18:00:07 dezulcamrc02 kernel: R13: 0000000000015000 R14:
0000000000000008 R15: 000001000cc81a38
Dec 11 18:00:07 dezulcamrc02 kernel: FS: 0000002a977ef020(0000)
GS:ffffffff80562e80(0000) knlGS:00000000576d7bb0
Dec 11 18:00:07 dezulcamrc02 kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
Dec 11 18:00:07 dezulcamrc02 kernel: CR2: 00000000000152c0 CR3:
0000000000101000 CR4: 00000000000006e0
Dec 11 18:00:07 dezulcamrc02 kernel: Process oracle (pid: 21177, threadinfo
000001000cc80000, task 000001017d2214d0)
Dec 11 18:00:07 dezulcamrc02 kernel: Stack: 000001017d4d1008 ffffff00000c8008
0000000000000000 0000000000000000
Dec 11 18:00:07 dezulcamrc02 kernel: 0000000000015000 0000000000000000
0000010027516540 0000000000000008
Dec 11 18:00:07 dezulcamrc02 kernel: 0000000000000400 ffffffff8028603a
Dec 11 18:00:07 dezulcamrc02 kernel: Call
Trace:<ffffffff8028603a>{__make_request+74}
<ffffffffa01db714>{:emcp:PowerPlatformBottomDispatch+180}
Dec 11 18:00:07 dezulcamrc02 kernel:
<ffffffff8013ceb0>{autoremove_wake_function+0}
<ffffffffa01dd754>{:emcp:PowerTopDispatch+612}
Dec 11 18:00:07 dezulcamrc02 kernel:
<ffffffffa01dd92f>{:emcp:emcp_pseudo_mrf+79}
<ffffffff80284aba>{generic_make_request+394}
Dec 11 18:00:07 dezulcamrc02 kernel:
<ffffffff80192cad>{__bio_add_page+157} <ffffffff80284be0>{submit_bio+272}
Dec 11 18:00:07 dezulcamrc02 kernel:
<ffffffff801b2ebf>{dio_bio_add_page+31} <ffffffff801b322b>{dio_bio_submit+107}
Dec 11 18:00:07 dezulcamrc02 kernel:
<ffffffff801b43e0>{__blockdev_direct_IO+2736}
<ffffffff80195045>{blkdev_direct_IO+69}
Dec 11 18:00:07 dezulcamrc02 kernel:
<ffffffff80194c20>{blkdev_get_blocks+0}
<ffffffff80165d6a>{generic_file_direct_IO+154}
Dec 11 18:00:07 dezulcamrc02 kernel:
<ffffffff80165fc4>{__generic_file_aio_read+228}
<ffffffff8016624b>{generic_file_read+187}
Dec 11 18:00:07 dezulcamrc02 kernel:
<ffffffffa04c7151>{:raw:raw_open+209} <ffffffff801969b2>{chrdev_open+418}
Dec 11 18:00:07 dezulcamrc02 kernel:
<ffffffff8016cb90>{file_ra_state_init+32}
<ffffffff8013ceb0>{autoremove_wake_function+0}
Dec 11 18:00:07 dezulcamrc02 kernel: <ffffffff8018d234>{vfs_read+244}
<ffffffff8018d38c>{sys_pread64+236}
Dec 11 18:00:07 dezulcamrc02 kernel: <ffffffff801107d4>{system_call+124}
Dec 11 18:00:07 dezulcamrc02 kernel:
Dec 11 18:00:07 dezulcamrc02 kernel: Code: f6 87 c0 02 00 00 01 75 23 48 8b 05
dc 19 3c 00 48 39 87 b8
Dec 11 18:00:07 dezulcamrc02 kernel: RIP
<ffffffff80174264>{blk_queue_bounce+20} RSP <000001000cc819d8>
Dec 11 18:00:07 dezulcamrc02 kernel: CR2: 00000000000152c0
Dec 11 18:00:07 dezulcamrc02 kernel: <1>Unable to handle kernel NULL pointer
dereference at 0000000000000474 RIP:
Dec 11 18:00:07 dezulcamrc02 kernel: <ffffffff80287f90>{show_partition+112}
Dec 11 18:00:07 dezulcamrc02 kernel: PML4 2f23e8067 PGD 2f2fa2067 PMD 0
Dec 11 18:00:07 dezulcamrc02 kernel: Oops: 0000 [2] SMP
Dec 11 18:00:07 dezulcamrc02 kernel: CPU 1
Dec 11 18:00:07 dezulcamrc02 kernel: Pid: 20276, comm: mlragent Tainted: P U
(2.6.5-7.201-smp SLES9_SP2_BRANCH-200508250620450000)
Dec 11 18:00:07 dezulcamrc02 kernel: RIP: 0010:[<ffffffff80287f90>]
<ffffffff80287f90>{show_partition+112}
Dec 11 18:00:07 dezulcamrc02 kernel: RSP: 0018:000001025878de28 EFLAGS:
00010287
Dec 11 18:00:07 dezulcamrc02 kernel: RAX: 00000000000004ec RBX:
00000100dd5cf900 RCX: 00000000000004ec
Dec 11 18:00:07 dezulcamrc02 kernel: RDX: 0000000000000424 RSI:
0000000000000424 RDI: 00000100dd5cf900
Dec 11 18:00:07 dezulcamrc02 kernel: RBP: 0000000000000424 R08:
00000000ffffffff R09: 0000000000000006
Dec 11 18:00:07 dezulcamrc02 kernel: R10: 00000000ffffffff R11:
0000000000000000 R12: 0000000000000000
Dec 11 18:00:07 dezulcamrc02 kernel: R13: 00000100dd5cf900 R14:
00000000000003fc R15: 00000100dd5cf928
Dec 11 18:00:07 dezulcamrc02 kernel: FS: 0000002a977ef020(0000)
GS:ffffffff80562f00(005b) knlGS:000000005bfa8bb0
Dec 11 18:00:07 dezulcamrc02 kernel: CS: 0010 DS: 002b ES: 002b CR0:
0000000080050033
Dec 11 18:00:07 dezulcamrc02 kernel: CR2: 0000000000000474 CR3:
0000000006449000 CR4: 00000000000006e0
Dec 11 18:00:07 dezulcamrc02 kernel: Process mlragent (pid: 20276, threadinfo
000001025878c000, task 000001001f995640)
Dec 11 18:00:07 dezulcamrc02 kernel: Stack: 00000000327a6473 0000000000000212
0000000000000212 000001001f995640
Dec 11 18:00:07 dezulcamrc02 kernel: 00000100dd5cf900 0000000000000424
0000000000000000 000000000000016e
Dec 11 18:00:07 dezulcamrc02 kernel: 00000000000003fc ffffffff801aea83
Dec 11 18:00:07 dezulcamrc02 kernel: Call
Trace:<ffffffff801aea83>{seq_read+451} <ffffffff8018d234>{vfs_read+244}
Dec 11 18:00:07 dezulcamrc02 kernel: <ffffffff8018d48d>{sys_read+157}
<ffffffff80124fe1>{cstar_do_call+27}
Dec 11 18:00:07 dezulcamrc02 kernel:
Dec 11 18:00:07 dezulcamrc02 kernel:
Dec 11 18:00:07 dezulcamrc02 kernel: Code: 48 8b 55 50 48 85 d2 0f 84 c3 00 00
00 83 7d 08 01 75 0d 8b
Dec 11 18:00:07 dezulcamrc02 kernel: RIP <ffffffff80287f90>{show_partition+112}
RSP <000001025878de28>
Dec 11 18:00:07 dezulcamrc02 kernel: CR2: 0000000000000474
...
... 49 more oopses removed
...
Dec 11 18:40:05 dezulcamrc02 kernel: <1>Unable to handle kernel NULL pointer
dereference at 0000000000000474 RIP:
Dec 11 18:40:05 dezulcamrc02 kernel: <ffffffff80287f90>{show_partition+112}
Dec 11 18:40:05 dezulcamrc02 kernel: PML4 3e51a067 PGD 518dd067 PMD 0
Dec 11 18:40:05 dezulcamrc02 kernel: Oops: 0000 [52] SMP
Dec 11 18:40:05 dezulcamrc02 kernel: CPU 0
Dec 11 18:40:05 dezulcamrc02 kernel: Pid: 32002, comm: grep Tainted: P U
(2.6.5-7.201-smp SLES9_SP2_BRANCH-200508250620450000)
Dec 11 18:40:05 dezulcamrc02 kernel: RIP: 0010:[<ffffffff80287f90>]
<ffffffff80287f90>{show_partition+112}
Dec 11 18:40:05 dezulcamrc02 kernel: RSP: 0018:00000100e1dc7e28 EFLAGS:
00010287
Dec 11 18:40:05 dezulcamrc02 kernel: RAX: 00000000000004ec RBX:
00000101a1f30280 RCX: 00000000000004ec
Dec 11 18:40:05 dezulcamrc02 kernel: RDX: 0000000000000424 RSI:
0000000000000424 RDI: 00000101a1f30280
Dec 11 18:40:05 dezulcamrc02 kernel: RBP: 0000000000000424 R08:
00000000ffffffff R09: 0000000000000006
Dec 11 18:40:05 dezulcamrc02 kernel: R10: 00000000ffffffff R11:
0000000000000000 R12: 0000000000000000
Dec 11 18:40:05 dezulcamrc02 kernel: R13: 00000101a1f30280 R14:
0000000000008000 R15: 00000101a1f302a8
Dec 11 18:40:05 dezulcamrc02 kernel: FS: 0000002a9588e700(0000)
GS:ffffffff80562e80(0000) knlGS:0000000055ea1bb0
Dec 11 18:40:05 dezulcamrc02 kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
Dec 11 18:40:05 dezulcamrc02 kernel: CR2: 0000000000000474 CR3:
0000000000101000 CR4: 00000000000006e0
Dec 11 18:40:05 dezulcamrc02 kernel: Process grep (pid: 32002, threadinfo
00000100e1dc6000, task 00000102f38253e0)
Dec 11 18:40:05 dezulcamrc02 kernel: Stack: 30630000327a6473 ffffff0031703164
0000000000000206 ffffffff801971da
Dec 11 18:40:05 dezulcamrc02 kernel: 00000101a1f30280 0000000000000424
0000000000000000 0000000000000572
Dec 11 18:40:05 dezulcamrc02 kernel: 0000000000008000 ffffffff801aea83
Dec 11 18:40:05 dezulcamrc02 kernel: Call
Trace:<ffffffff801971da>{cp_new_stat+234} <ffffffff801aea83>{seq_read+451}
Dec 11 18:40:05 dezulcamrc02 kernel: <ffffffff8018d234>{vfs_read+244}
<ffffffff8018d48d>{sys_read+157}
Dec 11 18:40:05 dezulcamrc02 kernel: <ffffffff801107d4>{system_call+124}
Dec 11 18:40:05 dezulcamrc02 kernel:
Dec 11 18:40:05 dezulcamrc02 kernel: Code: 48 8b 55 50 48 85 d2 0f 84 c3 00 00
00 83 7d 08 01 75 0d 8b
Dec 11 18:40:05 dezulcamrc02 kernel: RIP <ffffffff80287f90>{show_partition+112}
RSP <00000100e1dc7e28>
Dec 11 18:40:05 dezulcamrc02 kernel: CR2: 0000000000000474
This server (dezulcamrc02) I ran memtest86+ on afterwards for 5 days but
without complaints.
Any idea?
Thanks,
Kai
--
Kai Groshert
Technischer Consultant / Technical Consultant
ITH2 Competence Center Unix
PIKS Porsche-Information-Kommunikation-Services GmbH
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]