Re: Hardware failure or btrfs issue?

2013-07-02 Thread Hugo Mills
On Mon, Jul 01, 2013 at 11:56:30PM +0100, Peter Chant wrote:
 Sirs,
 
 my recently slowing file system is now going read only after trying
 a defrag or other operation.  I'm wondering whether this is the
 result of a hardware failure or a btrfs or some other issue.  Output
 of dmesg:

[snip]
 [  127.862825] btrfs: corrupt leaf, bad key order:
 block=2837196627968,root=1, slot=121
[snip]

   This is usually an indication that you have bad hardware -- I'd
suggest testing RAM, PSU, CPU in that order. I'm not sure what, if
anything, can be done to fix the error on the disk right now.

 Not that I've done anything other than a cursory check but it looks
 like the read only data is fine.

   Might be a good idea to use that to refresh your backups, just in
case my prediction about the fixability is correct.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- How deep will this sub go? Oh,  she'll go all the way to ---   
the bottom if we don't stop her.


signature.asc
Description: Digital signature


Re: Hardware failure or btrfs issue?

2013-07-02 Thread Peter Chant

On 07/02/2013 08:29 AM, Hugo Mills wrote:
This is usually an indication that you have bad hardware -- I'd 
suggest testing RAM, PSU, CPU in that order. I'm not sure what, if 
anything, can be done to fix the error on the disk right now. 


Thanks, appreciated.

Hmm.  I've got one stick of ram out of the machine due to testing as I 
had some freezes last week.
If it were one of the RAM, PSU and CPU then I'm unsure why this IO issue 
only surfaces on the HDD and not the SSD.  I ordered a new HDD last 
night, before reading your post.  If its not the disk I'll go raid1.  If 
it is the disk then I'll probally find out.



Not that I've done anything other than a cursory check but it looks
like the read only data is fine.

Might be a good idea to use that to refresh your backups, just in
case my prediction about the fixability is correct.


Well, first option is to drop in the new disk, freshly format it and 
copy the data across (not add it as a second disk).  If that fails last 
backup was wednesday.  I've not done much of note since then apart from 
try to fix the disk issues.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hardware failure or btrfs issue?

2013-07-02 Thread Hugo Mills
On Tue, Jul 02, 2013 at 06:36:48PM +0100, Peter Chant wrote:
 On 07/02/2013 08:29 AM, Hugo Mills wrote:
 This is usually an indication that you have bad hardware -- I'd
 suggest testing RAM, PSU, CPU in that order. I'm not sure what, if
 anything, can be done to fix the error on the disk right now.
 
 Thanks, appreciated.
 
 Hmm.  I've got one stick of ram out of the machine due to testing as
 I had some freezes last week.

   So the damage probably happened then, if that stick is bad.
Filesystems have this irritating habit of remembering things done to
them across reboots. :)

   Hugo.

 If it were one of the RAM, PSU and CPU then I'm unsure why this IO
 issue only surfaces on the HDD and not the SSD.  I ordered a new HDD
 last night, before reading your post.  If its not the disk I'll go
 raid1.  If it is the disk then I'll probally find out.
 
 Not that I've done anything other than a cursory check but it looks
 like the read only data is fine.
 Might be a good idea to use that to refresh your backups, just in
 case my prediction about the fixability is correct.
 
 Well, first option is to drop in the new disk, freshly format it and
 copy the data across (not add it as a second disk).  If that fails
 last backup was wednesday.  I've not done much of note since then
 apart from try to fix the disk issues.
 
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- The glass is neither half-full nor half-empty; it is twice as ---  
large as it needs to be. 


signature.asc
Description: Digital signature


Re: Hardware failure or btrfs issue?

2013-07-02 Thread Peter Chant

On 07/02/2013 06:48 PM, Hugo Mills wrote:
So the damage probably happened then, if that stick is bad. 
Filesystems have this irritating habit of remembering things done to 
them across reboots. :) Hugo.


The previous action  to the defrag was to delete 48 hours worth of 
hourly snapshots.  I was wondering if the numerous snapshots were what 
was making defrag so painfully slow.  Not that I know anything about 
btrfs internals, but I suspect that is major enough action to catch out 
any random corruption if there was any.  I think I'll restrict snapshots 
to once or twice a day at most unless that really should cause no issue.


Pete

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hardware failure or btrfs issue?

2013-07-01 Thread Peter Chant

Sirs,

my recently slowing file system is now going read only after trying a 
defrag or other operation.  I'm wondering whether this is the result of 
a hardware failure or a btrfs or some other issue.  Output of dmesg:


  127.750401] DR0:  DR1:  DR2: 

[  127.750494] DR3:  DR6: 0ff0 DR7: 
0400
[  127.750590] Process btrfs-cleaner (pid: 1346, threadinfo 
8800687ec000, task 88006d742a00)

[  127.750704] Stack:
[  127.750733]  880068024c38 88006a9a0438 8800687ede48 
880069928800
[  127.750850]  88006d742a00 88006d742a00 88006d742a00 

[  127.750968]  8800687edeb8 812b8c29 880069928800 


[  127.751085] Call Trace:
[  127.751122]  [812b8c29] cleaner_kthread+0xa9/0x120
[  127.751200]  [812b8b80] ? write_dev_flush.part.107+0xc0/0xc0
[  127.751289]  [81069450] kthread+0xc0/0xd0
[  127.751354]  [81069390] ? kthread_create_on_node+0x130/0x130
[  127.751444]  [816976dc] ret_from_fork+0x7c/0xb0
[  127.751516]  [81069390] ? kthread_create_on_node+0x130/0x130
[  127.751602] Code: 44 28 3f 85 c0 7f 83 31 d2 31 f6 4c 89 ff e8 f7 c5 
fe ff eb 84 0f 1f 44 00 00 48 83 c4 18 31 c0 5b 41 5c 41 5d 41 5e 41 5f 
5d c3 0f 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 66 66 66 66 90 48
[  127.752207] RIP  [812c1611] 
btrfs_clean_old_snapshots+0x131/0x140

[  127.752305]  RSP 8800687ede38
[  127.752371] ---[ end trace cc41fa39a41b468e ]---
[  127.862825] btrfs: corrupt leaf, bad key order: 
block=2837196627968,root=1, slot=121

[  127.862938] [ cut here ]
[  127.863009] WARNING: at fs/btrfs/super.c:255 
__btrfs_abort_transaction+0xdf/0x100()

[  127.863110] Hardware name: System Product Name
[  127.863171] btrfs: Transaction aborted
[  127.863222] Modules linked in: usblp pl2303 usbserial hid_generic 
usbhid hid usb_storage lp ppdev parport_pc parport snd_hda_codec_via 
sp5100_tco acpi_cpufreq mperf freq_table kvm_amd kvm evdev radeon ttm 
drm_kms_helper psmouse drm serio_raw agpgart i2c_algo_bit microcode 
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc i2c_piix4 
snd_timer snd atl1e ohci_hcd via_rhine i2c_core shpchp soundcore 
ehci_pci ehci_hcd mii wmi k10temp asus_atk0110 processor thermal_sys 
hwmon button

[  127.864073] Pid: 1347, comm: btrfs-transacti Tainted: G D  3.9.3 #1
[  127.864167] Call Trace:
[  127.864204]  [8104614f] warn_slowpath_common+0x7f/0xc0
[  127.864285]  [81046246] warn_slowpath_fmt+0x46/0x50
[  127.864370]  [812962ef] __btrfs_abort_transaction+0xdf/0x100
[  127.864460]  [812a71f2] __btrfs_free_extent+0x242/0x870
[  127.864543]  [813046bc] ? btrfs_merge_delayed_refs+0x1fc/0x3c0
[  127.870518]  [812ab59b] run_clustered_refs+0x50b/0xc40
[  127.876503]  [81303813] ? find_ref_head+0x83/0xf0
[  127.882501]  [812af6b0] btrfs_run_delayed_refs+0xe0/0x570
[  127.882503]  [812bfb9a] btrfs_commit_transaction+0xea/0xad0
[  127.882505]  [81069b90] ? finish_wait+0x80/0x80
[  127.882513]  [812b8605] transaction_kthread+0x1a5/0x220
[  127.882517]  [812b8460] ? 
btree_readpage_end_io_hook+0x2a0/0x2a0

[  127.882520]  [81069450] kthread+0xc0/0xd0
[  127.882521]  [81069390] ? kthread_create_on_node+0x130/0x130
[  127.882523]  [816976dc] ret_from_fork+0x7c/0xb0
[  127.882524]  [81069390] ? kthread_create_on_node+0x130/0x130
[  127.882525] ---[ end trace cc41fa39a41b468f ]---
[  127.882527] BTRFS error (device sdb) in __btrfs_free_extent:5394: IO 
failure

[  127.882528] btrfs: run_one_delayed_ref returned -5
[  127.882529] BTRFS error (device sdb) in btrfs_run_delayed_refs:2565: 
IO failure


Not that I've done anything other than a cursory check but it looks like 
the read only data is fine.


Pete

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html