PROBLEM: BUG: null pointer deref., segfaults

2007-12-30 Thread Erno Kovacs
[1.] One line summary of the problem:

using dd on a broken hdd causes kernel NULL pointer dereference


[2.] Full description of the problem/report:

I have a broken hdd (unreadable sector). While dd-ing it into another same size 
hdd, 
I get kernel-level error. First time it is a NULL pointer dereference then a 
few minutes
later its a BUG in mm/slab.c, no IO operation anymore shown by iostat, and dd 
gets 
kernel-space process (shown as [dd] by ps). System becomes unstable, processes 
get 
segfaulted, even reboot is unoperational.

Dmesg output:
BUG: unable to handle kernel NULL pointer dereference at virtual address 
002a
 printing eip:
c016b8ce
*pde = 
Oops: 0002 [#1]
Modules linked in: ac battery dm_crypt dm_snapshot dm_mirror dm_mod joydev 
tsdev usbhid sd_mod fan rtc evdev thermal processor button psmouse 

serio_raw pcspkr via_agp agpgart i2c_viapro ehci_hcd hpt366 uhci_hcd i2c_core 
usbcore sata_sil
CPU:0
EIP:0060:[c016b8ce]Not tainted VLI
EFLAGS: 00010213   (2.6.23.12 #1)
EIP is at drop_buffers+0x5a/0xd5
eax: 002e   ebx: d6e300f0   ecx: 0026   edx: d6e30118
esi: c126e140   edi: d6e300f0   ebp: d6e300f0   esp: c149bdd8
ds: 007b   es: 007b   fs:   gs:   ss: 0068
Process kswapd0 (pid: 163, ti=c149a000 task=dff6f030 task.ti=c149a000)
Stack:  c1274860 c1117260 df1152fc c149be00 c126e140  0001
   c149bf84 c016b987  c126e140 df1152fc c0141f72 043e 0011
    c149bf14 0120  0004 0004 0001 c12dc440
Call Trace:
 [c016b987] try_to_free_buffers+0x3e/0x6c
 [c0141f72] shrink_page_list+0x414/0x4fc
 [c01415a7] isolate_lru_pages+0x44/0x17f
 [c0142128] shrink_inactive_list+0xce/0x265
 [c0117f98] check_preempt_curr_fair+0x52/0x56
 [c014237d] shrink_zone+0xbe/0xe2
 [c014278d] kswapd+0x251/0x3b8
 [c0127a69] autoremove_wake_function+0x0/0x35
 [c014253c] kswapd+0x0/0x3b8
 [c0127913] kthread+0x36/0x5b
 [c01278dd] kthread+0x0/0x5b
 [c010468f] kernel_thread_helper+0x7/0x10
 ===
Code: 14 8b 02 83 e0 06 0b 42 34 74 07 31 c0 e9 8c 00 00 00 8b 52 04 39 fa 75 
d5 89 fb 8b 4b 28 8d 53 28 8b 6b 04 39 d1 74 53 8b 42 04 89 41 04 89 

08 89 52 04 83 7b 30 00 89 53 28 75 29 c7 44 24 0c
EIP: [c016b8ce] drop_buffers+0x5a/0xd5 SS:ESP 0068:c149bdd8

then later 

[ cut here ]
kernel BUG at mm/slab.c:2983!
invalid opcode:  [#2]
Modules linked in: ac battery dm_crypt dm_snapshot dm_mirror dm_mod joydev 
tsdev usbhid sd_mod fan rtc evdev thermal processor button psmouse 

serio_raw pcspkr via_agp agpgart i2c_viapro ehci_hcd hpt366 uhci_hcd i2c_core 
usbcore sata_sil
CPU:0
EIP:0060:[c01501fc]Tainted: G  D VLI
EFLAGS: 00010046   (2.6.23.12 #1)
EIP is at cache_alloc_refill+0xe0/0x3ec
eax: 0043   ebx: 0020   ecx: dffe7da0   edx: dffe7da0
esi: d4686000   edi: dffedd20   ebp: dffe9200   esp: de49dcc8
ds: 007b   es: 007b   fs:   gs: 0033  ss: 0068
Process dd (pid: 1888, ti=de49c000 task=df7dd030 task.ti=de49c000)
Stack: 0043 8050 dffe7da0  001a c0262880 d67d79e0 c013d8d3
   c016b777 c1126900 dffe7da0 0282 8050 c01500e2 c1126900 
    1000 c016b75d c1126900 c016bda5 0001 c1126900 f000
Call Trace:
 [c0262880] submit_bio+0xa5/0xac
 [c013d8d3] mempool_alloc+0x1c/0x93
 [c016b777] alloc_buffer_head+0x2a/0x2e
 [c01500e2] kmem_cache_alloc+0x2b/0x65
 [c016b75d] alloc_buffer_head+0x10/0x2e
 [c016bda5] alloc_page_buffers+0x2d/0xbb
 [c016be43] create_empty_buffers+0x10/0x6b
 [c016dce5] block_read_full_page+0x40/0x296
 [c017014f] blkdev_get_block+0x0/0x42
 [c014054f] __do_page_cache_readahead+0x16c/0x1bf
 [c0140714] ondemand_readahead+0x48/0xf1
 [c013bbf8] do_generic_mapping_read+0x10d/0x3c8
 [c013d2ca] generic_file_aio_read+0x12d/0x159
 [c013b5b7] file_read_actor+0x0/0xca
 [c01528a4] do_sync_read+0xc6/0x109
 [c0139bbe] handle_IRQ_event+0x1a/0x3f
 [c0127a69] autoremove_wake_function+0x0/0x35
 [c012d12a] clockevents_program_event+0x9c/0xa3
 [c01527de] do_sync_read+0x0/0x109
 [c01530bb] vfs_read+0xa6/0x128
 [c01533db] sys_read+0x41/0x67
 [c0103afe] sysenter_past_esp+0x5f/0x85
 ===
Code: be 00 00 00 8b 37 39 fe 75 15 8b 77 10 8d 47 10 c7 47 30 01 00 00 00 39 
c6 0f 84 9a 00 00 00 8b 54 24 08 8b 42 1c 39 46 10 72 2d 0f 0b eb fe 

8b 44 24 08 8b 5e 14 8b 4d 00 8b 50 10 8b 04 24 0f
EIP: [c01501fc] cache_alloc_refill+0xe0/0x3ec SS:ESP 0068:de49dcc8



[3.] Keywords (i.e., modules, networking, kernel):

dd null pointer dereference segfault


[4.] Kernel version (from /proc/version):

Linux version 2.6.23.12 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 
(prerelease) (Debian 4.1.1-21)) #1 Thu Dec 27 13:22:18 CET 2007


[5.] Output of Oops.. message (if applicable) with symbolic information 
 resolved (see Documentation/oops-tracing.txt)

sorry:)


[6.] A small shell script or example program which triggers the
 problem (if possible)

dd if=/dev/sdd of=/dev/sdb bs=65536
where sdd is the 

Re: PROBLEM: BUG: null pointer deref., segfaults

2007-12-30 Thread Matthew Wilcox

From the backtrace, this doesn't seem to be a scsi or ide problem.
It might be a block-layer bug, or a VM problem.  I've cc'd the VM people
to see what they think.

On Mon, Dec 31, 2007 at 12:06:00AM +0100, Erno Kovacs wrote:
 [1.] One line summary of the problem:
 
 using dd on a broken hdd causes kernel NULL pointer dereference
 
 
 [2.] Full description of the problem/report:
 
 I have a broken hdd (unreadable sector). While dd-ing it into another same 
 size hdd, 
 I get kernel-level error. First time it is a NULL pointer dereference then a 
 few minutes
 later its a BUG in mm/slab.c, no IO operation anymore shown by iostat, and dd 
 gets 
 kernel-space process (shown as [dd] by ps). System becomes unstable, 
 processes get 
 segfaulted, even reboot is unoperational.
 
 Dmesg output:
 BUG: unable to handle kernel NULL pointer dereference at virtual address 
 002a
  printing eip:
 c016b8ce
 *pde = 
 Oops: 0002 [#1]
 Modules linked in: ac battery dm_crypt dm_snapshot dm_mirror dm_mod joydev 
 tsdev usbhid sd_mod fan rtc evdev thermal processor button psmouse 
 
 serio_raw pcspkr via_agp agpgart i2c_viapro ehci_hcd hpt366 uhci_hcd i2c_core 
 usbcore sata_sil
 CPU:0
 EIP:0060:[c016b8ce]Not tainted VLI
 EFLAGS: 00010213   (2.6.23.12 #1)
 EIP is at drop_buffers+0x5a/0xd5
 eax: 002e   ebx: d6e300f0   ecx: 0026   edx: d6e30118
 esi: c126e140   edi: d6e300f0   ebp: d6e300f0   esp: c149bdd8
 ds: 007b   es: 007b   fs:   gs:   ss: 0068
 Process kswapd0 (pid: 163, ti=c149a000 task=dff6f030 task.ti=c149a000)
 Stack:  c1274860 c1117260 df1152fc c149be00 c126e140  0001
c149bf84 c016b987  c126e140 df1152fc c0141f72 043e 0011
 c149bf14 0120  0004 0004 0001 c12dc440
 Call Trace:
  [c016b987] try_to_free_buffers+0x3e/0x6c
  [c0141f72] shrink_page_list+0x414/0x4fc
  [c01415a7] isolate_lru_pages+0x44/0x17f
  [c0142128] shrink_inactive_list+0xce/0x265
  [c0117f98] check_preempt_curr_fair+0x52/0x56
  [c014237d] shrink_zone+0xbe/0xe2
  [c014278d] kswapd+0x251/0x3b8
  [c0127a69] autoremove_wake_function+0x0/0x35
  [c014253c] kswapd+0x0/0x3b8
  [c0127913] kthread+0x36/0x5b
  [c01278dd] kthread+0x0/0x5b
  [c010468f] kernel_thread_helper+0x7/0x10
  ===
 Code: 14 8b 02 83 e0 06 0b 42 34 74 07 31 c0 e9 8c 00 00 00 8b 52 04 39 fa 75 
 d5 89 fb 8b 4b 28 8d 53 28 8b 6b 04 39 d1 74 53 8b 42 04 89 41 04 89 
 
 08 89 52 04 83 7b 30 00 89 53 28 75 29 c7 44 24 0c
 EIP: [c016b8ce] drop_buffers+0x5a/0xd5 SS:ESP 0068:c149bdd8
 
 then later 
 
 [ cut here ]
 kernel BUG at mm/slab.c:2983!
 invalid opcode:  [#2]
 Modules linked in: ac battery dm_crypt dm_snapshot dm_mirror dm_mod joydev 
 tsdev usbhid sd_mod fan rtc evdev thermal processor button psmouse 
 
 serio_raw pcspkr via_agp agpgart i2c_viapro ehci_hcd hpt366 uhci_hcd i2c_core 
 usbcore sata_sil
 CPU:0
 EIP:0060:[c01501fc]Tainted: G  D VLI
 EFLAGS: 00010046   (2.6.23.12 #1)
 EIP is at cache_alloc_refill+0xe0/0x3ec
 eax: 0043   ebx: 0020   ecx: dffe7da0   edx: dffe7da0
 esi: d4686000   edi: dffedd20   ebp: dffe9200   esp: de49dcc8
 ds: 007b   es: 007b   fs:   gs: 0033  ss: 0068
 Process dd (pid: 1888, ti=de49c000 task=df7dd030 task.ti=de49c000)
 Stack: 0043 8050 dffe7da0  001a c0262880 d67d79e0 c013d8d3
c016b777 c1126900 dffe7da0 0282 8050 c01500e2 c1126900 
 1000 c016b75d c1126900 c016bda5 0001 c1126900 f000
 Call Trace:
  [c0262880] submit_bio+0xa5/0xac
  [c013d8d3] mempool_alloc+0x1c/0x93
  [c016b777] alloc_buffer_head+0x2a/0x2e
  [c01500e2] kmem_cache_alloc+0x2b/0x65
  [c016b75d] alloc_buffer_head+0x10/0x2e
  [c016bda5] alloc_page_buffers+0x2d/0xbb
  [c016be43] create_empty_buffers+0x10/0x6b
  [c016dce5] block_read_full_page+0x40/0x296
  [c017014f] blkdev_get_block+0x0/0x42
  [c014054f] __do_page_cache_readahead+0x16c/0x1bf
  [c0140714] ondemand_readahead+0x48/0xf1
  [c013bbf8] do_generic_mapping_read+0x10d/0x3c8
  [c013d2ca] generic_file_aio_read+0x12d/0x159
  [c013b5b7] file_read_actor+0x0/0xca
  [c01528a4] do_sync_read+0xc6/0x109
  [c0139bbe] handle_IRQ_event+0x1a/0x3f
  [c0127a69] autoremove_wake_function+0x0/0x35
  [c012d12a] clockevents_program_event+0x9c/0xa3
  [c01527de] do_sync_read+0x0/0x109
  [c01530bb] vfs_read+0xa6/0x128
  [c01533db] sys_read+0x41/0x67
  [c0103afe] sysenter_past_esp+0x5f/0x85
  ===
 Code: be 00 00 00 8b 37 39 fe 75 15 8b 77 10 8d 47 10 c7 47 30 01 00 00 00 39 
 c6 0f 84 9a 00 00 00 8b 54 24 08 8b 42 1c 39 46 10 72 2d 0f 0b eb fe 
 
 8b 44 24 08 8b 5e 14 8b 4d 00 8b 50 10 8b 04 24 0f
 EIP: [c01501fc] cache_alloc_refill+0xe0/0x3ec SS:ESP 0068:de49dcc8
 
 
 
 [3.] Keywords (i.e., modules, networking, kernel):
 
 dd null pointer dereference segfault
 
 
 [4.] Kernel version (from /proc/version):
 
 Linux version 2.6.23.12 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115