Re: btrfs-related kernel oops due to media error

2012-01-12 Thread Niels de Carpentier
 Hi,

 One of my disks, partitioned into a single btrfs partition, is showing
 media errors. The problem is that these errors lead to kernel panic from
 btrfs - that make the filesystem unusable until reboot - and therefore
 it is very hard for me to do a full backup of the data prior to changing
 the disk.
 My current kernel is 3.2.0-8-generic from Ubuntu/precise (based on linux
 3.2-final) but I quickly tested and get the same error with an older 3.1
 kernel (and I can probably reproduce it with a vanilla kernel if
 necessary).
 I assume that the filesystem should not panic even in case of a media
 error... Is there any procedure I can follow / patch I could apply to
 salvage my data while ignoring media errors ?

I don't know about btrfs, but writing the sector with hdparm
--write-sector will usually cause it to be remapped. You can use dd or
another tool to read the entire disk to find out if there are more bad
sectors.

Niels


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-related kernel oops due to media error

2012-01-12 Thread Vincent Vanackere



On Tue, Jan 10, 2012 at 00:01, Niels de Carpentier 
ni...@decarpentier.com mailto:ni...@decarpentier.com wrote:


 Hi,

 One of my disks, partitioned into a single btrfs partition, is
   showing
 media errors. The problem is that these errors lead to kernel
   panic from
 btrfs - that make the filesystem unusable until reboot - and
   therefore
 it is very hard for me to do a full backup of the data prior to
   changing
 the disk.
 My current kernel is 3.2.0-8-generic from Ubuntu/precise (based
   on linux
 3.2-final) but I quickly tested and get the same error with an
   older 3.1
 kernel (and I can probably reproduce it with a vanilla kernel if
 necessary).
 I assume that the filesystem should not panic even in case of a media
 error... Is there any procedure I can follow / patch I could apply to
 salvage my data while ignoring media errors ?

   I don't know about btrfs, but writing the sector with hdparm
   --write-sector will usually cause it to be remapped. You can use dd or
   another tool to read the entire disk to find out if there are more bad
   sectors.

   Niels


Thanks you for the hint !
I'll probably try this but since I've already managed to make a copy of 
all my interesting data, I think I'll keep the disk in the same state 
(with the bad sectors not remapped) for a few days, hoping the btrfs 
developers are interested in fixing this bug... Who will trust a 
filesystem that OOPs on media failure ? ;-)


Vincent

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[v3.2-4874-ge4e1118 OOPS] btrfs-related kernel oops due to media error

2012-01-10 Thread Vincent Vanackere
[Note : this is a resent of a mail I send to linux-btrfs earlier, this 
time tested with the lastest git kernel]


Hi,

One of my disks, partitioned into a single btrfs partition, is showing 
media errors. The problem is that these errors lead to kernel panic from 
btrfs - that make the filesystem unusable until reboot - and therefore 
it is very hard for me to do a full backup of the data prior to changing 
the disk.
My current kernel is a vanilla kernel at current tip (output from git 
describe is v3.2-4874-ge4e1118).
I assume that the filesystem should not panic even in case of a media 
error... Is there any procedure I can follow / patch I could apply to 
salvage my data while ignoring media errors ?


logs/OOPS at the end of this mail, please let me know if more 
information is needed,


Best regards,

Vincent

---

[ 3210.717304] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 3210.717309] ata6.00: BMDMA stat 0x24
[ 3210.717312] ata6.00: failed command: READ DMA EXT
[ 3210.717318] ata6.00: cmd 25/00:08:5f:dc:2f/00:00:70:00:00/e0 tag 0 
dma 4096 in
[ 3210.717320]  res 51/40:00:61:dc:2f/40:00:70:00:00/e0 Emask 
0x9 (media error)

[ 3210.717323] ata6.00: status: { DRDY ERR }
[ 3210.717325] ata6.00: error: { UNC }
[ 3210.732234] ata6.00: configured for UDMA/133
[ 3210.732248] sd 5:0:0:0: [sdd] Unhandled sense code
[ 3210.732250] sd 5:0:0:0: [sdd]  Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
[ 3210.732254] sd 5:0:0:0: [sdd]  Sense Key : Medium Error [current] 
[descriptor]

[ 3210.732259] Descriptor sense data with sense descriptors (in hex):
[ 3210.732261] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 3210.732270] 70 2f dc 61
[ 3210.732274] sd 5:0:0:0: [sdd]  Add. Sense: Unrecovered read error - 
auto reallocate failed
[ 3210.732278] sd 5:0:0:0: [sdd] CDB: Read(10): 28 00 70 2f dc 5f 00 00 
08 00

[ 3210.732287] end_request: I/O error, dev sdd, sector 1882184801
[ 3210.732305] ata6: EH complete
[ 3210.732322] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[ 3210.732373] IP: [a017f129] extent_range_uptodate+0x59/0xe0 
[btrfs]

[ 3210.732426] PGD 21e9b7067 PUD 21e9b6067 PMD 0
[ 3210.732455] Oops:  [#1] SMP
[ 3210.732475] CPU 3
[ 3210.732486] Modules linked in: ip6table_filter ip6_tables 
ipt_MASQUERADE bnep iptable_nat nf_nat rfcomm bluetooth 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT 
xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables 
bridge stp kvm_intel kvm parport_pc ppdev nfsd nfs lockd fscache 
binfmt_misc auth_rpcgss nfs_acl sunrpc dm_crypt snd_usb_audio 
snd_usbmidi_lib joydev snd_hda_codec_realtek snd_hda_intel snd_hda_codec 
snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq 
snd_timer snd_seq_device snd soundcore snd_page_alloc psmouse serio_raw 
cdc_acm lp parport btrfs zlib_deflate libcrc32c hid_logitech ff_memless 
usbhid hid i915 drm_kms_helper drm r8169 i2c_algo_bit video pata_jmicron

[ 3210.732870]
[ 3210.732880] Pid: 3856, comm: btrfs-endio-met Not tainted 3.2.0-custom 
#2 Gigabyte Technology Co., Ltd. G33-DS3R/G33-DS3R
[ 3210.732933] RIP: 0010:[a017f129]  [a017f129] 
extent_range_uptodate+0x59/0xe0 [btrfs]

[ 3210.732989] RSP: 0018:880006f3fde0  EFLAGS: 00010246
[ 3210.733014] RAX:  RBX: 00df57385000 RCX: 

[ 3210.733047] RDX: 0001 RSI: 0df57385 RDI: 

[ 3210.733079] RBP: 880006f3fe00 R08:  R09: 
88008bce5200
[ 3210.733111] R10: 8800299f9010 R11: 1000 R12: 
8802190f4030
[ 3210.733143] R13: 00df573853ff R14: 880006f3fe98 R15: 
880143263d88
[ 3210.733175] FS:  () GS:88022fd8() 
knlGS:

[ 3210.733212] CS:  0010 DS:  ES:  CR0: 8005003b
[ 3210.733238] CR2:  CR3: 00021f35a000 CR4: 
000406e0
[ 3210.733270] DR0:  DR1:  DR2: 

[ 3210.733302] DR3:  DR6: 0ff0 DR7: 
0400
[ 3210.74] Process btrfs-endio-met (pid: 3856, threadinfo 
880006f3e000, task 8801fa8d8000)

[ 3210.733374] Stack:
[ 3210.733385]   8800298dd838 8801f9cc9840 
88021ee05000
[ 3210.733423]  880006f3fe30 a01581f9 880143263d80 
8800298dd860
[ 3210.733461]  880143263d80 880143263d98 880006f3fee0 
a0187fef

[ 3210.733499] Call Trace:
[ 3210.733524]  [a01581f9] end_workqueue_fn+0x119/0x140 [btrfs]
[ 3210.733567]  [a0187fef] worker_loop+0x16f/0x5d0 [btrfs]
[ 3210.733608]  [a0187e80] ? btrfs_queue_worker+0x310/0x310 
[btrfs]

[ 3210.733643]  [8106fa93] kthread+0x93/0xa0
[ 3210.733668]  [8162caa4] kernel_thread_helper+0x4/0x10
[ 3210.733697]  [8106fa00] ? 

btrfs-related kernel oops due to media error

2012-01-09 Thread Vincent Vanackere

Hi,

One of my disks, partitioned into a single btrfs partition, is showing 
media errors. The problem is that these errors lead to kernel panic from 
btrfs - that make the filesystem unusable until reboot - and therefore 
it is very hard for me to do a full backup of the data prior to changing 
the disk.
My current kernel is 3.2.0-8-generic from Ubuntu/precise (based on linux 
3.2-final) but I quickly tested and get the same error with an older 3.1 
kernel (and I can probably reproduce it with a vanilla kernel if necessary).
I assume that the filesystem should not panic even in case of a media 
error... Is there any procedure I can follow / patch I could apply to 
salvage my data while ignoring media errors ?


logs/OOPS at the end of this mail, please let me know if more 
information is needed,


Best regards,

Vincent

---

   [  129.241636] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
   [  129.241640] ata6.00: BMDMA stat 0x24
   [  129.241643] ata6.00: failed command: READ DMA EXT
   [  129.241649] ata6.00: cmd 25/00:08:5f:dc:2f/00:00:70:00:00/e0 tag
   0 dma 4096 in
   [  129.241651]  res 51/40:00:61:dc:2f/40:00:70:00:00/e0
   Emask 0x9 (media error)
   [  129.241654] ata6.00: status: { DRDY ERR }
   [  129.241656] ata6.00: error: { UNC }
   [  129.256243] ata6.00: configured for UDMA/133
   [  129.256261] ata6: EH complete
   [  131.640911] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
   [  131.640915] ata6.00: BMDMA stat 0x24
   [  131.640918] ata6.00: failed command: READ DMA EXT
   [  131.640922] ata6.00: cmd 25/00:08:5f:dc:2f/00:00:70:00:00/e0 tag
   0 dma 4096 in
   [  131.640923]  res 51/40:00:61:dc:2f/40:00:70:00:00/e0
   Emask 0x9 (media error)
   [  131.640926] ata6.00: status: { DRDY ERR }
   [  131.640927] ata6.00: error: { UNC }
   [  131.656244] ata6.00: configured for UDMA/133
   [  131.656260] ata6: EH complete
   [  134.317351] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
   [  134.317355] ata6.00: BMDMA stat 0x24
   [  134.317359] ata6.00: failed command: READ DMA EXT
   [  134.317365] ata6.00: cmd 25/00:08:5f:dc:2f/00:00:70:00:00/e0 tag
   0 dma 4096 in
   [  134.317366]  res 51/40:00:61:dc:2f/40:00:70:00:00/e0
   Emask 0x9 (media error)
   [  134.317369] ata6.00: status: { DRDY ERR }
   [  134.317371] ata6.00: error: { UNC }
   [  134.332234] ata6.00: configured for UDMA/133
   [  134.332248] ata6: EH complete
   [  136.894260] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
   [  136.894264] ata6.00: BMDMA stat 0x24
   [  136.894268] ata6.00: failed command: READ DMA EXT
   [  136.894274] ata6.00: cmd 25/00:08:5f:dc:2f/00:00:70:00:00/e0 tag
   0 dma 4096 in
   [  136.894275]  res 51/40:00:61:dc:2f/40:00:70:00:00/e0
   Emask 0x9 (media error)
   [  136.894278] ata6.00: status: { DRDY ERR }
   [  136.894280] ata6.00: error: { UNC }
   [  136.924255] ata6.00: configured for UDMA/133
   [  136.924269] ata6: EH complete
   [  139.437990] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
   [  139.437994] ata6.00: BMDMA stat 0x24
   [  139.437998] ata6.00: failed command: READ DMA EXT
   [  139.438004] ata6.00: cmd 25/00:08:5f:dc:2f/00:00:70:00:00/e0 tag
   0 dma 4096 in
   [  139.438005]  res 51/40:00:61:dc:2f/40:00:70:00:00/e0
   Emask 0x9 (media error)
   [  139.438008] ata6.00: status: { DRDY ERR }
   [  139.438010] ata6.00: error: { UNC }
   [  139.468239] ata6.00: configured for UDMA/133
   [  139.468253] ata6: EH complete
   [  141.937488] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
   [  141.937493] ata6.00: BMDMA stat 0x24
   [  141.937497] ata6.00: failed command: READ DMA EXT
   [  141.937503] ata6.00: cmd 25/00:08:5f:dc:2f/00:00:70:00:00/e0 tag
   0 dma 4096 in
   [  141.937504]  res 51/40:00:61:dc:2f/40:00:70:00:00/e0
   Emask 0x9 (media error)
   [  141.937507] ata6.00: status: { DRDY ERR }
   [  141.937509] ata6.00: error: { UNC }
   [  141.952236] ata6.00: configured for UDMA/133
   [  141.952253] sd 5:0:0:0: [sdd] Unhandled sense code
   [  141.952256] sd 5:0:0:0: [sdd]  Result: hostbyte=DID_OK
   driverbyte=DRIVER_SENSE
   [  141.952260] sd 5:0:0:0: [sdd]  Sense Key : Medium Error [current]
   [descriptor]
   [  141.952264] Descriptor sense data with sense descriptors (in hex):
   [  141.952266] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
   [  141.952275] 70 2f dc 61
   [  141.952279] sd 5:0:0:0: [sdd]  Add. Sense: Unrecovered read error
   - auto reallocate failed
   [  141.952284] sd 5:0:0:0: [sdd] CDB: Read(10): 28 00 70 2f dc 5f 00
   00 08 00
   [  141.952293] end_request: I/O error, dev sdd, sector 1882184801
   [  141.952313] ata6: EH complete
   [  141.952335] BUG: unable to handle kernel NULL pointer dereference
   at   (null)
   [  141.952383] IP: [a018e439]
   extent_range_uptodate+0x59/0xe0 [btrfs]
   [  141.952440] PGD 21caae067 PUD