Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-30 Thread Hugo Mills
On Fri, Aug 30, 2013 at 09:44:28AM -0500, Eric Sandeen wrote:
 On 8/29/13 3:19 PM, Chris Murphy wrote:
  
  On Aug 29, 2013, at 1:53 PM, Hugo Mills h...@carfax.org.uk wrote:
  
  On Thu, Aug 29, 2013 at 01:44:54PM -0600, Chris Murphy wrote:
 
  Certainly, if known for sure it won't be more than 30 seconds?
 
Mmm... it'll depend on the setting of the commit period, which up
  until a couple of weeks ago was always 30s, but someone posted a patch
  to give it a config knob…
  
  
  
  Proceeding will roll back the file system to a previous state, and
  may cause the loss of successfully written data since the last commit
  period (30 seconds by default). Proceed? (Y/N)
 
 Is it just loss of data, or might this also result in a filesystem with 
 inconsistent metadata, which then requires a fsck?

   No the metadata is always consistent (well, in theory, barring bugs
and out-of-band corruption).

 Above sounds like it's just reverting to a previous (consistent) state.  Is 
 that correct?

   Yes, it's dropping the log of accepted-but-uncommitted work. This
is a Bad Thing in the sense that something that's reached the log is
reported to the application as being successfully written. If the
application critically relies on that (e.g. databases), then we've
discarded durability from ACID. (Can you guess I've been marking
Databases resit exam papers this morning? :) )

   Hugo.

 -Eric
 
 p.s. fwiw when the xfs_repair zero-log option -L is used, we say:
 
 ALERT: The filesystem has valuable metadata changes in a log which is 
 being\n
 destroyed because the -L option was used.\n));

   That's a reasonable wording too.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- We teach people management skills by examining characters in ---   
Shakespeare.  You could look at Claudius's crisis
   management techniques, for example.   


signature.asc
Description: Digital signature


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-30 Thread Eric Sandeen
On 8/29/13 3:19 PM, Chris Murphy wrote:
 
 On Aug 29, 2013, at 1:53 PM, Hugo Mills h...@carfax.org.uk wrote:
 
 On Thu, Aug 29, 2013 at 01:44:54PM -0600, Chris Murphy wrote:

 Certainly, if known for sure it won't be more than 30 seconds?

   Mmm... it'll depend on the setting of the commit period, which up
 until a couple of weeks ago was always 30s, but someone posted a patch
 to give it a config knob…
 
 
 
 Proceeding will roll back the file system to a previous state, and
 may cause the loss of successfully written data since the last commit
 period (30 seconds by default). Proceed? (Y/N)

Is it just loss of data, or might this also result in a filesystem with 
inconsistent metadata, which then requires a fsck?

Above sounds like it's just reverting to a previous (consistent) state.  Is 
that correct?

-Eric

p.s. fwiw when the xfs_repair zero-log option -L is used, we say:

ALERT: The filesystem has valuable metadata changes in a log which is being\n
destroyed because the -L option was used.\n));
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-29 Thread Zach Brown
If those fail, then look in dmesg for errors relating to the log
 tree -- if that's corrupt and can't be read (or causes a crash), use
 btrfs-zero-log.

In a bit of a tangent:

btrfs-zero-log throws away data that fsync/sync could have previously
claimed was stable on disk.

Given how often this is thrown around as a solution to a broken
partition, should the tool jump up and down and make it clear that it's
about to roll the file system back?  This seems like relevant
information.

Right now, as far as I can tell, it's completely undocumented and
silent.

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-29 Thread Chris Murphy

On Aug 29, 2013, at 11:35 AM, Zach Brown z...@redhat.com wrote:

   If those fail, then look in dmesg for errors relating to the log
 tree -- if that's corrupt and can't be read (or causes a crash), use
 btrfs-zero-log.
 
 In a bit of a tangent:
 
 btrfs-zero-log throws away data that fsync/sync could have previously
 claimed was stable on disk.
 
 Given how often this is thrown around as a solution to a broken
 partition, should the tool jump up and down and make it clear that it's
 about to roll the file system back?  This seems like relevant
 information.
 
 Right now, as far as I can tell, it's completely undocumented and
 silent.

Yes, I think it helps remove some burden on the list answering questions about 
a tool that doesn't have any documentation, to have a warning.

How much longer will btrfs-zero-log be needed? If whatever it's doing isn't 
obviated by future improvements to btrfsck, and this sort of big hammer 
approach is still needed in some worse case scenarios, then it probably hurts 
no one to flag the user with essentially how you described it. I think 
documentation is a greater burden to create, and less likely to be consulted.

Proceeding will roll back the file system to a previous state, and may cause 
the loss of successfully written data. Proceed? (Y/N)

Alternative language could include a suggestion or reminder of what should be 
tried before proceeding, if applicable.



Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-29 Thread Hugo Mills
On Thu, Aug 29, 2013 at 01:37:51PM -0600, Chris Murphy wrote:
 
 On Aug 29, 2013, at 11:35 AM, Zach Brown z...@redhat.com wrote:
 
If those fail, then look in dmesg for errors relating to the log
  tree -- if that's corrupt and can't be read (or causes a crash), use
  btrfs-zero-log.
  
  In a bit of a tangent:
  
  btrfs-zero-log throws away data that fsync/sync could have previously
  claimed was stable on disk.
  
  Given how often this is thrown around as a solution to a broken
  partition, should the tool jump up and down and make it clear that it's
  about to roll the file system back?  This seems like relevant
  information.
  
  Right now, as far as I can tell, it's completely undocumented and
  silent.
 
 Yes, I think it helps remove some burden on the list answering questions 
 about a tool that doesn't have any documentation, to have a warning.
 
 How much longer will btrfs-zero-log be needed? If whatever it's doing isn't 
 obviated by future improvements to btrfsck, and this sort of big hammer 
 approach is still needed in some worse case scenarios, then it probably hurts 
 no one to flag the user with essentially how you described it. I think 
 documentation is a greater burden to create, and less likely to be consulted.
 
 Proceeding will roll back the file system to a previous state, and may cause 
 the loss of successfully written data. Proceed? (Y/N)

   ... the loss of up to the last 30 seconds of successfully written data.

   Give the user enough information to make a sensible decision.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- emacs:  Eighty Megabytes And Constantly Swapping. ---


signature.asc
Description: Digital signature


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-29 Thread Chris Murphy

On Aug 29, 2013, at 1:40 PM, Hugo Mills h...@carfax.org.uk wrote:

 On Thu, Aug 29, 2013 at 01:37:51PM -0600, Chris Murphy wrote:
 
 Proceeding will roll back the file system to a previous state, and may 
 cause the loss of successfully written data. Proceed? (Y/N)
 
   ... the loss of up to the last 30 seconds of successfully written data.
 
   Give the user enough information to make a sensible decision.

Certainly, if known for sure it won't be more than 30 seconds?



Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-29 Thread Hugo Mills
On Thu, Aug 29, 2013 at 01:44:54PM -0600, Chris Murphy wrote:
 
 On Aug 29, 2013, at 1:40 PM, Hugo Mills h...@carfax.org.uk wrote:
 
  On Thu, Aug 29, 2013 at 01:37:51PM -0600, Chris Murphy wrote:
  
  Proceeding will roll back the file system to a previous state, and may 
  cause the loss of successfully written data. Proceed? (Y/N)
  
... the loss of up to the last 30 seconds of successfully written data.
  
Give the user enough information to make a sensible decision.
 
 Certainly, if known for sure it won't be more than 30 seconds?

   Mmm... it'll depend on the setting of the commit period, which up
until a couple of weeks ago was always 30s, but someone posted a patch
to give it a config knob...

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- emacs:  Eighty Megabytes And Constantly Swapping. ---


signature.asc
Description: Digital signature


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-29 Thread Chris Murphy

On Aug 29, 2013, at 1:53 PM, Hugo Mills h...@carfax.org.uk wrote:

 On Thu, Aug 29, 2013 at 01:44:54PM -0600, Chris Murphy wrote:
 
 Certainly, if known for sure it won't be more than 30 seconds?
 
   Mmm... it'll depend on the setting of the commit period, which up
 until a couple of weeks ago was always 30s, but someone posted a patch
 to give it a config knob…



Proceeding will roll back the file system to a previous state, and may cause 
the loss of successfully written data since the last commit period (30 seconds 
by default). Proceed? (Y/N)


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-29 Thread Chris Murphy

On Aug 29, 2013, at 2:19 PM, Chris Murphy li...@colorremedies.com wrote:

 
 On Aug 29, 2013, at 1:53 PM, Hugo Mills h...@carfax.org.uk wrote:
 
 On Thu, Aug 29, 2013 at 01:44:54PM -0600, Chris Murphy wrote:
 
 Certainly, if known for sure it won't be more than 30 seconds?
 
  Mmm... it'll depend on the setting of the commit period, which up
 until a couple of weeks ago was always 30s, but someone posted a patch
 to give it a config knob…
 
 
 
 Proceeding will roll back the file system to a previous state, and may cause 
 the loss of successfully written data since the last commit period (30 
 seconds by default). Proceed? (Y/N)

And an important side question is whether may is the correct word, or if it 
should be will cause loss of…


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-26 Thread Chris Murphy

On Aug 26, 2013, at 11:41 AM, Nick Lee em...@nickle.es wrote:

 There was a discussion on IRC a few days ago that the problem with the tree 
 root's bloco was likely the result of either an issue with the disk itself, 
 or the chunk tree/logical mappings. I ran the chunk recover, looked over the 
 errors it found, and hit write. (If it failed, I was going to run something 
 photorec, loss of organization as a side effect.)
 
 I can write something more clear after my flight lands tomorrow if you want.

I'm just curious about when to use various techniques: -o recovery, btrfsck, 
chunk-recover, zero log.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-26 Thread Hugo Mills
On Mon, Aug 26, 2013 at 01:10:54PM -0600, Chris Murphy wrote:
 
 On Aug 26, 2013, at 11:41 AM, Nick Lee em...@nickle.es wrote:
 
  There was a discussion on IRC a few days ago that the problem with the tree 
  root's bloco was likely the result of either an issue with the disk itself, 
  or the chunk tree/logical mappings. I ran the chunk recover, looked over 
  the errors it found, and hit write. (If it failed, I was going to run 
  something photorec, loss of organization as a side effect.)
  
  I can write something more clear after my flight lands tomorrow if you want.

 I'm just curious about when to use various techniques: -o recovery,
 btrfsck, chunk-recover, zero log.

   Let's assume that you don't have a physical device failure (which
is a different set of tools -- mount -odegraded, btrfs dev del
missing).

   First thing to do is to take a btrfs-image -c9 -t4 of the
filesystem, and keep a copy of the output to show josef. :)

   Then start with -orecovery and -oro,recovery for pretty much
anything.

   If those fail, then look in dmesg for errors relating to the log
tree -- if that's corrupt and can't be read (or causes a crash), use
btrfs-zero-log.

   If there's problems with the chunk tree -- the only one I've seen
recently was reporting something like can't map address -- then
chunk-recover may be of use.

   After that, btrfsck is probably the next thing to try. If options
-s1, -s2, -s3 have any success, then btrfs-select-super will help by
replacing the superblock with one that works. If that's not going to
be useful, fall back to btrfsck --repair.

   Finally, btrfsck --repair --init-extent-tree may be necessary if
there's a damaged extent tree. Finally, if you've got corruption in
the checksums, there's --init-csum-tree.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Try everything once,  except incest and folk-dancing. ---  


signature.asc
Description: Digital signature


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-26 Thread Chris Murphy

On Aug 26, 2013, at 1:31 PM, Hugo Mills h...@carfax.org.uk wrote:
 
   Let's assume that you don't have a physical device failure (which
 is a different set of tools -- mount -odegraded, btrfs dev del
 missing).
 
   First thing to do is to take a btrfs-image -c9 -t4 of the
 filesystem, and keep a copy of the output to show josef. :)
 
   Then start with -orecovery and -oro,recovery for pretty much
 anything.
 
   If those fail, then look in dmesg for errors relating to the log
 tree -- if that's corrupt and can't be read (or causes a crash), use
 btrfs-zero-log.
 
   If there's problems with the chunk tree -- the only one I've seen
 recently was reporting something like can't map address -- then
 chunk-recover may be of use.
 
   After that, btrfsck is probably the next thing to try. If options
 -s1, -s2, -s3 have any success, then btrfs-select-super will help by
 replacing the superblock with one that works. If that's not going to
 be useful, fall back to btrfsck --repair.
 
   Finally, btrfsck --repair --init-extent-tree may be necessary if
 there's a damaged extent tree. Finally, if you've got corruption in
 the checksums, there's --init-csum-tree.

This is helpful. Thanks.

Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-22 Thread Nicholas Lee
Hi list! I recently butchered my filesystem, and I was wondering if anyone 
knows how to help.

Problem: My filesystem is screwed up, and I can't mount it at all right now. In 
the logs, the problem begins around 45s. 

Background: I'm running a 6x4TB RAID5 array using md. I have a few virtual 
machines using said array, and one of them is a btrfs storage server. I ran 
into some issues where FS errors would cause the host system to enter a 
read-only state, which led to the corruption of the guest's file system. Adding 
insult to injury, this occurred right as I began the initial backup to another 
system with rsync. :(

What I've done so far: I've made a few attempts to mount the partition in 
read-only recovery mode to no avail. I created another virtual disk and 
installed ArchBang to use as a recovery environment. I also tried running 
find-root, but I got no console output after two days of it running, and it 
just sat there burning CPU time. 

I would be seriously grateful to anyone that can figure out a way to mount/fix 
this partition, and I'd be willing to send some bitcoin over to anyone that can 
help me get partition mounted and/or repaired!


dmesg's output:

[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 3.9.8-1-ARCH (tobias@testing-i686) (gcc version 
4.8.1 (GCC) ) #1 SMP PREEMPT Fri Jun 28 07:43:59 CEST 2013
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x7fffdfff] usable
[0.00] BIOS-e820: [mem 0x7fffe000-0x7fff] reserved
[0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved
[0.00] BIOS-e820: [mem 0xfffc-0x] reserved
[0.00] Notice: NX (Execute Disable) protection cannot be enabled: 
non-PAE kernel!
[0.00] SMBIOS 2.4 present.
[0.00] DMI: Bochs Bochs, BIOS Bochs 01/01/2011
[0.00] Hypervisor detected: KVM
[0.00] e820: update [mem 0x-0x0fff] usable == reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x7fffe max_arch_pfn = 0x10
[0.00] MTRR default type: write-back
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B uncachable
[0.00]   C-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base 008000 mask FF8000 uncachable
[0.00]   1 disabled
[0.00]   2 disabled
[0.00]   3 disabled
[0.00]   4 disabled
[0.00]   5 disabled
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] PAT not supported by CPU.
[0.00] found SMP MP-table at [mem 0x000fdaa0-0x000fdaaf] mapped at 
[c00fdaa0]
[0.00] Scanning 1 areas for low memory corruption
[0.00] initial memory mapped: [mem 0x-0x00bf]
[0.00] Base memory trampoline at [c009b000] 9b000 size 16384
[0.00] init_memory_mapping: [mem 0x-0x000f]
[0.00]  [mem 0x-0x000f] page 4k
[0.00] init_memory_mapping: [mem 0x3700-0x373f]
[0.00]  [mem 0x3700-0x373f] page 2M
[0.00] init_memory_mapping: [mem 0x3000-0x36ff]
[0.00]  [mem 0x3000-0x36ff] page 2M
[0.00] init_memory_mapping: [mem 0x0010-0x2fff]
[0.00]  [mem 0x0010-0x003f] page 4k
[0.00]  [mem 0x0040-0x2fff] page 2M
[0.00] init_memory_mapping: [mem 0x3740-0x377fdfff]
[0.00]  [mem 0x3740-0x377fdfff] page 4k
[0.00] BRK [0x0083e000, 0x0083efff] PGTABLE
[0.00] RAMDISK: [mem 0x7eff7000-0x7ffdcfff]
[0.00] Allocated new RAMDISK: [mem 0x36818000-0x377fdb2f]
[0.00] Move RAMDISK from [mem 0x7eff7000-0x7ffdcb2f] to [mem 
0x36818000-0x377fdb2f]
[0.00] ACPI: RSDP 000fd8c0 00014 (v00 BOCHS )
[0.00] ACPI: RSDT 7fffe380 00034 (v01 BOCHS  BXPCRSDT 0001 BXPC 
0001)
[0.00] ACPI: FACP 7f80 00074 (v01 BOCHS  BXPCFACP 0001 BXPC 
0001)
[0.00] ACPI: DSDT 7fffe3c0 011A9 (v01   BXPC   BXDSDT 0001 INTL 
20100528)
[0.00] ACPI: FACS 7f40 00040
[0.00] ACPI: SSDT 76e0 00858 (v01 BOCHS  BXPCSSDT 0001 BXPC 
0001)
[0.00] ACPI: APIC 75b0 00090 (v01 BOCHS  BXPCAPIC 0001 BXPC 
0001)
[0.00] ACPI: HPET 7570 00038 (v01 BOCHS  BXPCHPET 0001 BXPC 
0001)
[0.00] ACPI: Local APIC address 0xfee0
[0.00] 1160MB HIGHMEM available.
[0.00] 887MB LOWMEM available.
[0.00]   mapped low ram: 0 - 377fe000
[0.00]   low ram: 

Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-22 Thread Mitch Harder
On Thu, Aug 22, 2013 at 1:47 AM, Nicholas Lee em...@nickle.es wrote:

 [   45.914275] [ cut here ]
 [   45.914406] kernel BUG at fs/btrfs/volumes.c:4417!
 [   45.914489] invalid opcode:  [#1] PREEMPT SMP

I can't say if this will fix your problem or not, but the 3.10.x
kernel has a patch to pass this error back instead of halting with a
BUG() at this point.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-22 Thread Chris Murphy

On Aug 22, 2013, at 4:58 PM, Chris Murphy li...@colorremedies.com wrote:
 1
 2
 3
 4
 5

6. What was the mkfs.btrfs command used? In particular are you certain the 
metadata profile is default (DUP)?

7. If you have a very recent btrfs-progs (few months at most), or better if you 
can build from btrfs-next, it may be worth running btrfsck *without* the repair 
option, to see what it has to say about the situation.

Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-22 Thread Nicholas Lee
1. It's md-raid, with an lvm on top, and this is running in a virtual machine 
with lvm also enabled. 
2. Originally, I was working from the Arch LiveCD, but I later created another 
disk to install ArchBang to.
3. I'm waiting for the check to complete.
4. SMART comes up clean

smartctl -x /dev/sdg | grep SCT
SCT capabilities:  (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
GP/S  Log at address 0xe0 has1 sectors [SCT Command/Status]
GP/S  Log at address 0xe1 has1 sectors [SCT Data Transfer]
SCT Status Version:  3
SCT Version (vendor specific):   256 (0x0100)
SCT Support Level:   1
SCT Temperature History Version: 2
SCT Error Recovery Control:

5. It returns a value of 30.

I'm running chunk-recover, but I'm going to let it write anything. I figure 
it'll take a while for it to scan, given the large size of the drive. 


On 22.08.2013, at 18:58, Chris Murphy li...@colorremedies.com wrote:

 Non-expert on btrfs errors, so hopefully someone else will still reply with 
 recovery advice. I have some foundational questions on the setup that may 
 relate, if you don't already know what precipitated this failure:
 
 
 1.
 You said it's md raid5, but I see /dev/mapper/main--storage--vg-root and dm-1 
 or dm-2, so I wonder if this is md raid with LVM on top; or if this is LVM 
 raid5 (which directly implements raid5 at LV level, without mdadm, but does 
 use md code underneath)?
 
 2.
 In one dmesg I see /dev/dm-2 referenced with errors, and in another 
 /dev/dm-1. Is it actually the same btrfs volume, and if so I wonder why it's 
 sometimes being mapped to a difference dm device?
 
 3.
 If it's an md device, when was the last time a scrub check was run?
 echo check  /sys/block/mdX/md/sync_action
 then after that completes:
 cat /sys/block/mdX/mismatch_cnt
 
 Or if LVM raid5, I think this is only recently added:
 http://www.redhat.com/archives/lvm-devel/2013-April/msg00042.html
 
 4.
 smartctl -x for each drive; are there any indications of reallocated sectors, 
 pending sectors, bad block, ECC error, CRC or UDMA error? Also included in 
 the above command should return the SCT Error Recovery Control value for each 
 drive, what's that value?
 
 5.
 What is returned for any one of the drives:
 
 cat /sys/block/sdX/device/timeout
 
 Thanks,
 
 Chris Murphy
 
 
 On Aug 22, 2013, at 1:38 PM, Nicholas Lee em...@nickle.es wrote:
 
 Full pastebin here: http://cwillu.com:8080/96.245.194.45#6
 
 [   9.213212] Btrfs loaded
 [9.245673] device fsid 2ffb2450-f74f-4cfb-a3be-bb5e3c6d32ec devid 1 
 transid 23568 /dev/dm-1
 [  102.886834] device fsid 2ffb2450-f74f-4cfb-a3be-bb5e3c6d32ec devid 1 
 transid 23568 /dev/mapper/main--storage--vg-root
 [  102.888348] btrfs: enabling auto recovery
 [  102.888354] btrfs: disabling disk space caching
 [  102.888357] btrfs: disabling disk space caching
 [  102.911068] BTRFS critical (device dm-1): unable to find logical 
 1781900460032 len 4096
 [  102.911103] BTRFS emergency (device dm-1): No mapping for 
 1781900460032-1781900464128
 
 [  102.911108] btrfs: failed to read tree root on dm-1
 [  102.911186] BTRFS critical (device dm-1): unable to find logical 
 1781900460032 len 4096
 [  102.911217] BTRFS emergency (device dm-1): No mapping for 
 1781900460032-1781900464128
 
 [  102.911222] btrfs: failed to read tree root on dm-1
 [  102.911235] BTRFS critical (device dm-1): unable to find logical 
 1198824710144 len 4096
 [  102.911240] BTRFS emergency (device dm-1): No mapping for 
 1198824710144-1198824714240
 
 [  102.911243] btrfs: failed to read tree root on dm-1
 [  102.911255] BTRFS critical (device dm-1): unable to find logical 
 1198518919168 len 4096
 [  102.911286] BTRFS emergency (device dm-1): No mapping for 
 1198518919168-1198518923264
 
 [  102.911290] btrfs: failed to read tree root on dm-1
 [  102.911302] BTRFS critical (device dm-1): unable to find logical 
 582755782656 len 4096
 [  102.911308] BTRFS emergency (device dm-1): No mapping for 
 582755782656-582755786752
 
 [  102.911311] btrfs: failed to read tree root on dm-1
 [  102.986797] btrfs: open_ctree failed
 
 
 On 22.08.2013, at 15:23, Nicholas Lee em...@nickle.es wrote:
 
 After updating the kernel and using btrfs-progs-git from the AUR, I'm now 
 getting this output. Does this yield any new insight?
 
 [  473.305408] btrfs: failed to read tree root on dm-2
 [  473.30] BTRFS critical (device dm-2): unable to find logical 
 1781900460032 len 4096
 [  473.305591] BTRFS emergency (device dm-2): No mapping for 
 1781900460032-1781900464128
 
 
 On 22.08.2013, at 10:09, Mitch Harder mitch.har...@sabayonlinux.org wrote:
 
 On Thu, Aug 22, 2013 at 1:47 AM, Nicholas Lee em...@nickle.es wrote:
 
 [   45.914275] [ cut here ]
 [   

Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-22 Thread Chris Murphy

On Aug 22, 2013, at 6:59 PM, Nicholas Lee em...@nickle.es wrote:
 
 smartctl -x /dev/sdg | grep SCT

The grep filtered the current read/write values, try this:

smartctl -l scterc /dev/sdg


If it's higher than 300(30.0 seconds) then you should change it:

smartctl -l scterc,70,70 /dev/sdX

for all drives. And then I'd filter through dmesg to see if you have any PHY or 
read errors or ATA reset messages.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-22 Thread Duncan
Duncan posted on Thu, 22 Aug 2013 23:53:28 + as excerpted:

 btrfs wiki[1]

[1] https://btrfs.wiki.kernel.org/


-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html