PROBLEM: BUG: null pointer deref., segfaults
[1.] One line summary of the problem: using dd on a broken hdd causes kernel NULL pointer dereference [2.] Full description of the problem/report: I have a broken hdd (unreadable sector). While dd-ing it into another same size hdd, I get kernel-level error. First time it is a NULL pointer dereference then a few minutes later its a BUG in mm/slab.c, no IO operation anymore shown by iostat, and dd gets kernel-space process (shown as [dd] by ps). System becomes unstable, processes get segfaulted, even reboot is unoperational. Dmesg output: BUG: unable to handle kernel NULL pointer dereference at virtual address 002a printing eip: c016b8ce *pde = Oops: 0002 [#1] Modules linked in: ac battery dm_crypt dm_snapshot dm_mirror dm_mod joydev tsdev usbhid sd_mod fan rtc evdev thermal processor button psmouse serio_raw pcspkr via_agp agpgart i2c_viapro ehci_hcd hpt366 uhci_hcd i2c_core usbcore sata_sil CPU:0 EIP:0060:[c016b8ce]Not tainted VLI EFLAGS: 00010213 (2.6.23.12 #1) EIP is at drop_buffers+0x5a/0xd5 eax: 002e ebx: d6e300f0 ecx: 0026 edx: d6e30118 esi: c126e140 edi: d6e300f0 ebp: d6e300f0 esp: c149bdd8 ds: 007b es: 007b fs: gs: ss: 0068 Process kswapd0 (pid: 163, ti=c149a000 task=dff6f030 task.ti=c149a000) Stack: c1274860 c1117260 df1152fc c149be00 c126e140 0001 c149bf84 c016b987 c126e140 df1152fc c0141f72 043e 0011 c149bf14 0120 0004 0004 0001 c12dc440 Call Trace: [c016b987] try_to_free_buffers+0x3e/0x6c [c0141f72] shrink_page_list+0x414/0x4fc [c01415a7] isolate_lru_pages+0x44/0x17f [c0142128] shrink_inactive_list+0xce/0x265 [c0117f98] check_preempt_curr_fair+0x52/0x56 [c014237d] shrink_zone+0xbe/0xe2 [c014278d] kswapd+0x251/0x3b8 [c0127a69] autoremove_wake_function+0x0/0x35 [c014253c] kswapd+0x0/0x3b8 [c0127913] kthread+0x36/0x5b [c01278dd] kthread+0x0/0x5b [c010468f] kernel_thread_helper+0x7/0x10 === Code: 14 8b 02 83 e0 06 0b 42 34 74 07 31 c0 e9 8c 00 00 00 8b 52 04 39 fa 75 d5 89 fb 8b 4b 28 8d 53 28 8b 6b 04 39 d1 74 53 8b 42 04 89 41 04 89 08 89 52 04 83 7b 30 00 89 53 28 75 29 c7 44 24 0c EIP: [c016b8ce] drop_buffers+0x5a/0xd5 SS:ESP 0068:c149bdd8 then later [ cut here ] kernel BUG at mm/slab.c:2983! invalid opcode: [#2] Modules linked in: ac battery dm_crypt dm_snapshot dm_mirror dm_mod joydev tsdev usbhid sd_mod fan rtc evdev thermal processor button psmouse serio_raw pcspkr via_agp agpgart i2c_viapro ehci_hcd hpt366 uhci_hcd i2c_core usbcore sata_sil CPU:0 EIP:0060:[c01501fc]Tainted: G D VLI EFLAGS: 00010046 (2.6.23.12 #1) EIP is at cache_alloc_refill+0xe0/0x3ec eax: 0043 ebx: 0020 ecx: dffe7da0 edx: dffe7da0 esi: d4686000 edi: dffedd20 ebp: dffe9200 esp: de49dcc8 ds: 007b es: 007b fs: gs: 0033 ss: 0068 Process dd (pid: 1888, ti=de49c000 task=df7dd030 task.ti=de49c000) Stack: 0043 8050 dffe7da0 001a c0262880 d67d79e0 c013d8d3 c016b777 c1126900 dffe7da0 0282 8050 c01500e2 c1126900 1000 c016b75d c1126900 c016bda5 0001 c1126900 f000 Call Trace: [c0262880] submit_bio+0xa5/0xac [c013d8d3] mempool_alloc+0x1c/0x93 [c016b777] alloc_buffer_head+0x2a/0x2e [c01500e2] kmem_cache_alloc+0x2b/0x65 [c016b75d] alloc_buffer_head+0x10/0x2e [c016bda5] alloc_page_buffers+0x2d/0xbb [c016be43] create_empty_buffers+0x10/0x6b [c016dce5] block_read_full_page+0x40/0x296 [c017014f] blkdev_get_block+0x0/0x42 [c014054f] __do_page_cache_readahead+0x16c/0x1bf [c0140714] ondemand_readahead+0x48/0xf1 [c013bbf8] do_generic_mapping_read+0x10d/0x3c8 [c013d2ca] generic_file_aio_read+0x12d/0x159 [c013b5b7] file_read_actor+0x0/0xca [c01528a4] do_sync_read+0xc6/0x109 [c0139bbe] handle_IRQ_event+0x1a/0x3f [c0127a69] autoremove_wake_function+0x0/0x35 [c012d12a] clockevents_program_event+0x9c/0xa3 [c01527de] do_sync_read+0x0/0x109 [c01530bb] vfs_read+0xa6/0x128 [c01533db] sys_read+0x41/0x67 [c0103afe] sysenter_past_esp+0x5f/0x85 === Code: be 00 00 00 8b 37 39 fe 75 15 8b 77 10 8d 47 10 c7 47 30 01 00 00 00 39 c6 0f 84 9a 00 00 00 8b 54 24 08 8b 42 1c 39 46 10 72 2d 0f 0b eb fe 8b 44 24 08 8b 5e 14 8b 4d 00 8b 50 10 8b 04 24 0f EIP: [c01501fc] cache_alloc_refill+0xe0/0x3ec SS:ESP 0068:de49dcc8 [3.] Keywords (i.e., modules, networking, kernel): dd null pointer dereference segfault [4.] Kernel version (from /proc/version): Linux version 2.6.23.12 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 Thu Dec 27 13:22:18 CET 2007 [5.] Output of Oops.. message (if applicable) with symbolic information resolved (see Documentation/oops-tracing.txt) sorry:) [6.] A small shell script or example program which triggers the problem (if possible) dd if=/dev/sdd of=/dev/sdb bs=65536 where sdd is the
Re: PROBLEM: BUG: null pointer deref., segfaults
From the backtrace, this doesn't seem to be a scsi or ide problem. It might be a block-layer bug, or a VM problem. I've cc'd the VM people to see what they think. On Mon, Dec 31, 2007 at 12:06:00AM +0100, Erno Kovacs wrote: [1.] One line summary of the problem: using dd on a broken hdd causes kernel NULL pointer dereference [2.] Full description of the problem/report: I have a broken hdd (unreadable sector). While dd-ing it into another same size hdd, I get kernel-level error. First time it is a NULL pointer dereference then a few minutes later its a BUG in mm/slab.c, no IO operation anymore shown by iostat, and dd gets kernel-space process (shown as [dd] by ps). System becomes unstable, processes get segfaulted, even reboot is unoperational. Dmesg output: BUG: unable to handle kernel NULL pointer dereference at virtual address 002a printing eip: c016b8ce *pde = Oops: 0002 [#1] Modules linked in: ac battery dm_crypt dm_snapshot dm_mirror dm_mod joydev tsdev usbhid sd_mod fan rtc evdev thermal processor button psmouse serio_raw pcspkr via_agp agpgart i2c_viapro ehci_hcd hpt366 uhci_hcd i2c_core usbcore sata_sil CPU:0 EIP:0060:[c016b8ce]Not tainted VLI EFLAGS: 00010213 (2.6.23.12 #1) EIP is at drop_buffers+0x5a/0xd5 eax: 002e ebx: d6e300f0 ecx: 0026 edx: d6e30118 esi: c126e140 edi: d6e300f0 ebp: d6e300f0 esp: c149bdd8 ds: 007b es: 007b fs: gs: ss: 0068 Process kswapd0 (pid: 163, ti=c149a000 task=dff6f030 task.ti=c149a000) Stack: c1274860 c1117260 df1152fc c149be00 c126e140 0001 c149bf84 c016b987 c126e140 df1152fc c0141f72 043e 0011 c149bf14 0120 0004 0004 0001 c12dc440 Call Trace: [c016b987] try_to_free_buffers+0x3e/0x6c [c0141f72] shrink_page_list+0x414/0x4fc [c01415a7] isolate_lru_pages+0x44/0x17f [c0142128] shrink_inactive_list+0xce/0x265 [c0117f98] check_preempt_curr_fair+0x52/0x56 [c014237d] shrink_zone+0xbe/0xe2 [c014278d] kswapd+0x251/0x3b8 [c0127a69] autoremove_wake_function+0x0/0x35 [c014253c] kswapd+0x0/0x3b8 [c0127913] kthread+0x36/0x5b [c01278dd] kthread+0x0/0x5b [c010468f] kernel_thread_helper+0x7/0x10 === Code: 14 8b 02 83 e0 06 0b 42 34 74 07 31 c0 e9 8c 00 00 00 8b 52 04 39 fa 75 d5 89 fb 8b 4b 28 8d 53 28 8b 6b 04 39 d1 74 53 8b 42 04 89 41 04 89 08 89 52 04 83 7b 30 00 89 53 28 75 29 c7 44 24 0c EIP: [c016b8ce] drop_buffers+0x5a/0xd5 SS:ESP 0068:c149bdd8 then later [ cut here ] kernel BUG at mm/slab.c:2983! invalid opcode: [#2] Modules linked in: ac battery dm_crypt dm_snapshot dm_mirror dm_mod joydev tsdev usbhid sd_mod fan rtc evdev thermal processor button psmouse serio_raw pcspkr via_agp agpgart i2c_viapro ehci_hcd hpt366 uhci_hcd i2c_core usbcore sata_sil CPU:0 EIP:0060:[c01501fc]Tainted: G D VLI EFLAGS: 00010046 (2.6.23.12 #1) EIP is at cache_alloc_refill+0xe0/0x3ec eax: 0043 ebx: 0020 ecx: dffe7da0 edx: dffe7da0 esi: d4686000 edi: dffedd20 ebp: dffe9200 esp: de49dcc8 ds: 007b es: 007b fs: gs: 0033 ss: 0068 Process dd (pid: 1888, ti=de49c000 task=df7dd030 task.ti=de49c000) Stack: 0043 8050 dffe7da0 001a c0262880 d67d79e0 c013d8d3 c016b777 c1126900 dffe7da0 0282 8050 c01500e2 c1126900 1000 c016b75d c1126900 c016bda5 0001 c1126900 f000 Call Trace: [c0262880] submit_bio+0xa5/0xac [c013d8d3] mempool_alloc+0x1c/0x93 [c016b777] alloc_buffer_head+0x2a/0x2e [c01500e2] kmem_cache_alloc+0x2b/0x65 [c016b75d] alloc_buffer_head+0x10/0x2e [c016bda5] alloc_page_buffers+0x2d/0xbb [c016be43] create_empty_buffers+0x10/0x6b [c016dce5] block_read_full_page+0x40/0x296 [c017014f] blkdev_get_block+0x0/0x42 [c014054f] __do_page_cache_readahead+0x16c/0x1bf [c0140714] ondemand_readahead+0x48/0xf1 [c013bbf8] do_generic_mapping_read+0x10d/0x3c8 [c013d2ca] generic_file_aio_read+0x12d/0x159 [c013b5b7] file_read_actor+0x0/0xca [c01528a4] do_sync_read+0xc6/0x109 [c0139bbe] handle_IRQ_event+0x1a/0x3f [c0127a69] autoremove_wake_function+0x0/0x35 [c012d12a] clockevents_program_event+0x9c/0xa3 [c01527de] do_sync_read+0x0/0x109 [c01530bb] vfs_read+0xa6/0x128 [c01533db] sys_read+0x41/0x67 [c0103afe] sysenter_past_esp+0x5f/0x85 === Code: be 00 00 00 8b 37 39 fe 75 15 8b 77 10 8d 47 10 c7 47 30 01 00 00 00 39 c6 0f 84 9a 00 00 00 8b 54 24 08 8b 42 1c 39 46 10 72 2d 0f 0b eb fe 8b 44 24 08 8b 5e 14 8b 4d 00 8b 50 10 8b 04 24 0f EIP: [c01501fc] cache_alloc_refill+0xe0/0x3ec SS:ESP 0068:de49dcc8 [3.] Keywords (i.e., modules, networking, kernel): dd null pointer dereference segfault [4.] Kernel version (from /proc/version): Linux version 2.6.23.12 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115