Bug#631187: Kernel panics when removing external hard drive

2011-08-11 Thread Alexander Kurtz
found 631187 3.0.0-1
thanks

Hi,

FYI: The problem still occurs with Linux 3.0.0-1.

Best regards

Alexander Kurtz


signature.asc
Description: This is a digitally signed message part


Bug#631187: Kernel panics when removing external hard drive

2011-07-13 Thread Alexander Kurtz
On Wed, 2011-07-13 at 22:03 +1000, Linh Nguyen wrote:
 Hello Alexander,
 
 How are you? I came across your post 
 http://lists.debian.org/debian-kernel/2011/06/msg00580.html detailing 
 similar issue as to what I am experiencing.
 
 Every time I unmount a portable HDD (normal USB sticks are fine), i get 
 a kernel panic the the power/level is deprecated; use power/control 
 instead error message.
 
 Despite my extensive googling, i've not been able to find a solution. I 
 was wondering whether or not you have solved your issue. Cheers. :)
 
 
 Sincerely,
 
 L

Sorry, I've got no solution either. Since this is kind of a low-priority
bug for me, I'm fine with manually unmounting (using umount or some GUI)
my external drive before removing it. My current plan is to wait for 3.0
and then maybe do a git bisect if it's not fixed by then. However, you
should check out the Debian bug report[1], the Ubuntu bug report[2] and
the upstream bug report[3], maybe you'll find something there.

Best regards

Alexander Kurtz

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631187
[2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/793796
[3] https://bugzilla.kernel.org/show_bug.cgi?id=38842



signature.asc
Description: This is a digitally signed message part


Bug#631187: Kernel panics when removing external hard drive

2011-07-09 Thread Alexander Kurtz
On Fri, 2011-07-08 at 04:30 +0100, Ben Hutchings wrote:
 Alexander, please test the new package version.

I just tested 2.6.39-3 from sid and 3.0.0~rc6-1~experimental.1 from
experimental. Unfortunately both reliably panic when safely removing my
external hard drive. 2.6.38-5 (still) works fine. Seems like it's time
for me to do a git bisect, or do you any other ideas?

Best regards

Alexander Kurtz



signature.asc
Description: This is a digitally signed message part


Bug#631187: Kernel panics when removing external hard drive

2011-07-09 Thread Jonathan Nieder
forwarded 631187 https://bugzilla.kernel.org/show_bug.cgi?id=38842
quit

Hi,

Alexander Kurtz wrote:

 I just tested 2.6.39-3 from sid and 3.0.0~rc6-1~experimental.1 from
 experimental. Unfortunately both reliably panic when safely removing my
 external hard drive. 2.6.38-5 (still) works fine. Seems like it's time
 for me to do a git bisect, or do you any other ideas?

I'd suggest attaching the full dmesg from 3.0.0~rc6 and any other
relevant information to https://bugzilla.kernel.org/show_bug.cgi?id=38842
first.  Maybe someone upstream will have ideas.

Thanks again.
Jonathan



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110709171451.GA3341@elie



Bug#631187: Kernel panics when removing external hard drive

2011-07-07 Thread Ben Hutchings
On Tue, 2011-07-05 at 17:51 -0500, Jonathan Nieder wrote:
 Hi,
 
 Alexander Kurtz wrote:
[...]
  [ 1491.696825] Code: 40 74 35 83 7e 44 01 74 04 a8 40 74 2b 83 e0 11 ff c8 
  0f 95 c0 83 e0 01 48 05 fc 00 00 00 ff 4c 87 04 f6 46 41 04 74 10 48 8b 02 
  [ 1491.696825]  8b 40 48 48 85 c0 74 04 41 58 ff e0 59 c3 48 8d be 80 00 00 
  [ 1491.696825] RIP  [8118b2e3] elv_completed_request+0x38/0x47
 
 Disassembly, for convenience (following the hints from
 Documentation/oops-tracing.txt):
[...]

There is a byte missing between the two lines (in fact, the very byte
which RIP points to), and you are mixing decimal and hexadecimal
offsets.

In fact RIP is pointing into the second half of this test:

if ((rq-cmd_flags  REQ_SORTED) 
e-ops-elevator_completed_req_fn)

and e-ops was NULL.

This might be fixed by:

commit 0769e21bf4b5cf48878c1ca819276e80465b39e7
Author: James Bottomley james.bottom...@hansenpartnership.com
Date:   Wed May 25 15:52:14 2011 -0500

Fix oops caused by queue refcounting failure

commit e73e079bf128d68284efedeba1fbbc18d78610f9 upstream.

which was included in stable version 2.6.39.2 and our package version
2.6.39-3.

Alexander, please test the new package version.

Ben.

-- 
Ben Hutchings
The two most common things in the universe are hydrogen and stupidity.


signature.asc
Description: This is a digitally signed message part


Bug#631187: Kernel panics when removing external hard drive

2011-07-07 Thread Jonathan Nieder
Hi Ben,

Ben Hutchings wrote:

 There is a byte missing between the two lines (in fact, the very byte
 which RIP points to), and you are mixing decimal and hexadecimal
 offsets.

 In fact RIP is pointing into the second half of this test:

   if ((rq-cmd_flags  REQ_SORTED) 
   e-ops-elevator_completed_req_fn)

 and e-ops was NULL.

Ah, that makes sense.

 This might be fixed by:

 commit 0769e21bf4b5cf48878c1ca819276e80465b39e7
 Author: James Bottomley james.bottom...@hansenpartnership.com
 Date:   Wed May 25 15:52:14 2011 -0500

 Fix oops caused by queue refcounting failure

 commit e73e079bf128d68284efedeba1fbbc18d78610f9 upstream.

As does that.  Thanks for explaining.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110708040250.GB2559@elie



Bug#631187: Kernel panics when removing external hard drive

2011-07-05 Thread Jonathan Nieder
Hi,

Alexander Kurtz wrote:
 On Wed, 2011-06-22 at 03:40 +0100, Ben Hutchings wrote:

 The panic message shows there was an earlier kernel warning; please can
 you provide that.

 Thanks to netconsole (a really great tool!) I was able to so. The
 attached kernel log starts right before I plug the drive in.
 Surprisingly the kernel didn't crash the first time, but after trying
 again, everything went as expected (see lines 17 and 35).

Sorry for the long silence.  Let's see:

 [ 1421.182657] sd 7:0:0:0: [sdc] Attached SCSI disk
 [ 1454.865926] WARNING! power/level is deprecated; use power/control instead

Seems harmless enough.

 [ 1478.728383] sd 8:0:0:0: [sdc] Attached SCSI disk
 [ 1491.693027] BUG: unable to handle kernel NULL pointer dereference at 
 0048
 [ 1491.693229] IP: [8118b2e3] elv_completed_request+0x38/0x47

The panic.

[...]
 [ 1491.696825] Code: 40 74 35 83 7e 44 01 74 04 a8 40 74 2b 83 e0 11 ff c8 0f 
 95 c0 83 e0 01 48 05 fc 00 00 00 ff 4c 87 04 f6 46 41 04 74 10 48 8b 02 
 [ 1491.696825]  8b 40 48 48 85 c0 74 04 41 58 ff e0 59 c3 48 8d be 80 00 00 
 [ 1491.696825] RIP  [8118b2e3] elv_completed_request+0x38/0x47

Disassembly, for convenience (following the hints from
Documentation/oops-tracing.txt):

| +0: rex je 0x6008b8 str+56
| +3: cmpl   $0x1,0x44(%rsi)
| +7: je 0x60088d str+13
| +9: test   $0x40,%al
| +11:je 0x6008b8 str+56
| +13:and$0x11,%eax
| +16:dec%eax
| +18:setne  %al
| +21:and$0x1,%eax
| +24:add$0xfc,%rax
| +30:decl   0x4(%rdi,%rax,4)
| +34:testb  $0x4,0x41(%rsi)
| +38:je 0x6008b8 str+56
| +40:mov(%rdx),%rax
| +43:cmp%ah,0x40(%rdx)
| +46:rex.W
| +47:test   %rax,%rax
| +50:je 0x6008b8 str+56
| +52:pop%r8
| +54:jmpq   *%rax
| +56:pop%rcx
| +57:retq   
| +58:lea0x80(%rsi),%rdi

So offset 0x38 is the jump in

if ((rq-cmd_flags  REQ_SORTED) 

As for why that involves an access to the address 0x48: well, that
is beyond my depth.  rq-cmd_flags was already accessed in the check

if (blk_account_rq(rq))

Maybe the actual cause of the fault is some different instruction and
the instruction pointer is not to be trusted (?).  I suppose if I were
in this situation, I'd sprinkle block/elevator.c::elv_completed_request
with printk calls to be able to witness exactly what happens.

Sorry for the trouble, and hope that helps.
Jonathan



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110705225129.GA8701@elie



Bug#631187: Kernel panics when removing external hard drive

2011-06-22 Thread Alexander Kurtz
On Wed, 2011-06-22 at 03:40 +0100, Ben Hutchings wrote:
 Which version of GNOME is this?

2.30.2. Apart from the newer kernel, this is a pure Squeeze system.

 The panic message shows there was an earlier kernel warning; please can
 you provide that.

Thanks to netconsole (a really great tool!) I was able to so. The
attached kernel log starts right before I plug the drive in.
Surprisingly the kernel didn't crash the first time, but after trying
again, everything went as expected (see lines 17 and 35). Please note
that I replaced the drive's serial number.

Best regards

Alexander Kurtz
[ 1420.016231] usb 1-3: new high speed USB device number 6 using ehci_hcd
[ 1420.150838] usb 1-3: New USB device found, idVendor=1058, idProduct=1010
[ 1420.150867] usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1420.150891] usb 1-3: Product: External HDD
[ 1420.150900] usb 1-3: Manufacturer: Western Digital 
[ 1420.150914] usb 1-3: SerialNumber: XX
[ 1420.152513] scsi7 : usb-storage 1-3:1.0
[ 1421.154225] scsi 7:0:0:0: Direct-Access WD   2500BEV External 1.75 
PQ: 0 ANSI: 4
[ 1421.158259] sd 7:0:0:0: [sdc] 488397168 512-byte logical blocks: (250 GB/232 
GiB)
[ 1421.159053] sd 7:0:0:0: [sdc] Write Protect is off
[ 1421.159069] sd 7:0:0:0: [sdc] Mode Sense: 23 00 00 00
[ 1421.159080] sd 7:0:0:0: [sdc] Assuming drive cache: write through
[ 1421.161796] sd 7:0:0:0: [sdc] Assuming drive cache: write through
[ 1421.179973]  sdc: sdc1
[ 1421.182628] sd 7:0:0:0: [sdc] Assuming drive cache: write through
[ 1421.182657] sd 7:0:0:0: [sdc] Attached SCSI disk
[ 1454.865926] WARNING! power/level is deprecated; use power/control instead
[ 1454.944178] usb 1-3: USB disconnect, device number 6
[ 1477.564219] usb 1-2: new high speed USB device number 7 using ehci_hcd
[ 1477.698789] usb 1-2: New USB device found, idVendor=1058, idProduct=1010
[ 1477.698817] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1477.698841] usb 1-2: Product: External HDD
[ 1477.698850] usb 1-2: Manufacturer: Western Digital 
[ 1477.698867] usb 1-2: SerialNumber: XX
[ 1477.700552] scsi8 : usb-storage 1-2:1.0
[ 1478.702244] scsi 8:0:0:0: Direct-Access WD   2500BEV External 1.75 
PQ: 0 ANSI: 4
[ 1478.705375] sd 8:0:0:0: [sdc] 488397168 512-byte logical blocks: (250 GB/232 
GiB)
[ 1478.705994] sd 8:0:0:0: [sdc] Write Protect is off
[ 1478.706023] sd 8:0:0:0: [sdc] Mode Sense: 23 00 00 00
[ 1478.706035] sd 8:0:0:0: [sdc] Assuming drive cache: write through
[ 1478.708338] sd 8:0:0:0: [sdc] Assuming drive cache: write through
[ 1478.725489]  sdc: sdc1
[ 1478.728353] sd 8:0:0:0: [sdc] Assuming drive cache: write through
[ 1478.728383] sd 8:0:0:0: [sdc] Attached SCSI disk
[ 1491.693027] BUG: unable to handle kernel NULL pointer dereference at 
0048
[ 1491.693229] IP: [8118b2e3] elv_completed_request+0x38/0x47
[ 1491.693380] PGD 1b7f16067 
[ 1491.693435] Buffer I/O error on device sdc1, logical block 61048968
[ 1491.693448] Buffer I/O error on device sdc1, logical block 61048968
[ 1491.693486] Buffer I/O error on device sdc1, logical block 61048992
[ 1491.693494] Buffer I/O error on device sdc1, logical block 61048992
[ 1491.693510] Buffer I/O error on device sdc1, logical block 61048998
[ 1491.693517] Buffer I/O error on device sdc1, logical block 61048998
[ 1491.693554] Buffer I/O error on device sdc1, logical block 61048999
[ 1491.693567] Buffer I/O error on device sdc1, logical block 0
[ 1491.693578] Buffer I/O error on device sdc1, logical block 0
[ 1491.693590] Buffer I/O error on device sdc1, logical block 256
[ 1491.694599] PUD 1b7f23067 PMD 0 
[ 1491.694689] Oops:  [#1] SMP 
[ 1491.694777] last sysfs file: 
/sys/devices/pci:00/:00:12.2/usb1/1-2/power/autosuspend
[ 1491.694945] CPU 1 
[ 1491.694991] Modules linked in: netconsole configfs parport_pc ppdev lp 
parport bridge stp bnep rfcomm bluetooth powernow_k8 mperf cpufreq_stats 
cpufreq_userspace cpufreq_powersave cpufreq_conservative binfmt_misc fuse 
snd_hda_codec_hdmi joydev snd_hda_codec_conexant radeon arc4 ecb ttm 
drm_kms_helper snd_hda_intel thinkpad_acpi rtl8192ce drm snd_hda_codec 
rtl8192c_common snd_hwdep i2c_algo_bit rtlwifi snd_pcm snd_seq snd_seq_device 
mac80211 snd_timer snd cfg80211 i2c_piix4 shpchp soundcore tpm_tis tpm psmouse 
tpm_bios snd_page_alloc wmi nvram k10temp rfkill pcspkr pci_hotplug i2c_core 
evdev serio_raw battery video ac edac_core power_supply edac_mce_amd button 
processor ext4 mbcache jbd2 crc16 sha256_generic cryptd aes_x86_64 aes_generic 
cbc dm_crypt dm_mod raid10 raid456 async_raid6_recov async_pq raid6_pq 
async_xor xor async_memcpy async_tx raid1 raid0 multipath linear md_mod sd_mod 
usb_storage crc_t10dif uas ahci libahci ohci_hcd libata ehci_hcd r8169 thermal 
scsi_mod usbcore mii thermal_sys [last unloaded: configfs]
[ 1491.696825] 
[ 1491.696825] Pid: 10, comm: ksoftirqd/1 Tainted: GW2.6.39-2-amd64 
#1 

Bug#631187: Kernel panics when removing external hard drive

2011-06-21 Thread Ben Hutchings
On Tue, 2011-06-21 at 11:08 +0200, Alexander Kurtz wrote:
 Package: linux-2.6
 Version: 2.6.39-1
 Severity: serious
 
 Hi,
 
 I've got a pretty normal Debian Squeeze AMD64 system with the current
 kernel from Wheezy. Since 2.6.39-1 I experience this bug:
 
  1. I plug in an external USB hard drive with a NTFS file system on
 it's first partition.
  2. The drive get's automatically mounted using the fuse-based NTFS
 driver (ntfs-3g).
  3. I right-click on the icon representing the drive on the GNOME
 desktop and select Safely Remove Drive.

Which version of GNOME is this?

  4. The kernel panics, see attached screenshot.
[...]

The panic message shows there was an earlier kernel warning; please can
you provide that.

Ben.

-- 
Ben Hutchings
I'm always amazed by the number of people who take up solipsism because
they heard someone else explain it. - E*Borg on alt.fan.pratchett


signature.asc
Description: This is a digitally signed message part