Re: Filesystem Corruption

2018-12-03 Thread remi
On Mon, Dec 3, 2018, at 4:31 AM, Stefan Malte Schumacher wrote:

> I have noticed an unusual amount of crc-errors in downloaded rars,
> beginning about a week ago. But lets start with the preliminaries. I
> am using Debian Stretch.
> Kernel: Linux mars 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4
> (2018-08-21) x86_64 GNU/Linux
> 
> [5390748.884929] Buffer I/O error on dev dm-0, logical block
> 976701312, async page read


Excuse me for butting when there are *many* more qualified people on this list.

But assuming the rar crc errors are related to your unexplained buffer I/O 
errors, (and not some weird coincidence of simply bad downloads.), I would 
start, immediately, by testing the Memory.  Ram corruption can wreak havok with 
btrfs, (any filesystem but I think BTRFS has special challenges in this 
regard.)  and this looks like memory error to me.



Re: Filesystem Corruption

2018-12-03 Thread Qu Wenruo


On 2018/12/3 下午5:31, Stefan Malte Schumacher wrote:
> Hello,
> 
> I have noticed an unusual amount of crc-errors in downloaded rars,
> beginning about a week ago. But lets start with the preliminaries. I
> am using Debian Stretch.
> Kernel: Linux mars 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4
> (2018-08-21) x86_64 GNU/Linux
> BTRFS-Tools btrfs-progs  4.7.3-1
> Smartctl shows no errors for any of the drives in the filesystem.
> 
> Btrfs /dev/stats shows zero errors, but dmesg gives me a lot of
> filesystem related error messages.
> 
> [5390748.884929] Buffer I/O error on dev dm-0, logical block
> 976701312, async page read
> This errors is shown a lot of time in the log.

No "btrfs:" prefix, looks more like an error message from block level,
no wonder btrfs shows no error at all.

What is the underlying device mapper?

And further more, is there any kernel message with "btrfs"
(case-insensitive) in it?

Thanks,
Qu
> 
> This seems to affect just newly written files. This is the output of
> btrfs scrub status:
> scrub status for 1609e4e1-4037-4d31-bf12-f84a691db5d8
> scrub started at Tue Nov 27 06:02:04 2018 and finished after 07:34:16
> total bytes scrubbed: 17.29TiB with 0 errors
> 
> What is the probable cause of these errors? How can I fix this?
> 
> Thanks in advance for your advice
> Stefan
> 



signature.asc
Description: OpenPGP digital signature


Re: Filesystem corruption?

2018-10-22 Thread Qu Wenruo


On 2018/10/23 上午4:02, Gervais, Francois wrote:
> Hi,
> 
> I think I lost power on my btrfs disk and it looks like it is now in an 
> unfunctional state.

What does the word "unfunctional" mean?

Unable to mount? Or what else?

> 
> Any idea how I could debug that issue?
> 
> Here is what I have:
> 
> kernel 4.4.0-119-generic

The kernel is somewhat old now.

> btrfs-progs v4.4

The progs is definitely too old.

It's highly recommended to use the latest btrfs-progs for its better
"btrfs check" code.

> 
> 
> 
> sudo btrfs check /dev/sdd
> Checking filesystem on /dev/sdd
> UUID: 9a14b7a1-672c-44da-b49a-1f6566db3e44
> checking extents
> checking free space cache
> checking fs roots
> checking csums
> checking root refs

So no error reported from all these essential trees.
Unless there is some bug in btrfs-progs 4.4, your fs should be mostly OK.

> checking quota groups
> Ignoring qgroup relation key 310
[snip]
> Ignoring qgroup relation key 71776119061217590

Just a lot of qgroup relation key problems.
Not a big problem, especially considering you're using older kernel
without proper qgroup fixes.

Just in case, please run "btrfs check" with latest btrfs-progs (v4.17.1)
to see if it reports extra error.

Despite that, if the fs can be mounted RW, mount it then execute "btrfs
quota disable " should disable quota and solves the problem.

Thanks,
Qu

> found 29301522460 bytes used err is 0
> total csum bytes: 27525424
> total tree bytes: 541573120
> total fs tree bytes: 494223360
> total extent tree bytes: 16908288
> btree space waste bytes: 85047903
> file data blocks allocated: 273892241408
>  referenced 44667650048
> extent buffer leak: start 29360128 len 16384
> extent buffer leak: start 740524032 len 16384
> extent buffer leak: start 446840832 len 16384
> extent buffer leak: start 142819328 len 16384
> extent buffer leak: start 143179776 len 16384
> extent buffer leak: start 184107008 len 16384
> extent buffer leak: start 190513152 len 16384
> extent buffer leak: start 190939136 len 16384
> extent buffer leak: start 239943680 len 16384
> extent buffer leak: start 29392896 len 16384
> extent buffer leak: start 295223296 len 16384
> extent buffer leak: start 30556160 len 16384
> extent buffer leak: start 29376512 len 16384
> extent buffer leak: start 29409280 len 16384
> extent buffer leak: start 29491200 len 16384
> extent buffer leak: start 29556736 len 16384
> extent buffer leak: start 29720576 len 16384
> extent buffer leak: start 29884416 len 16384
> extent buffer leak: start 30097408 len 16384
> extent buffer leak: start 30179328 len 16384
> extent buffer leak: start 30228480 len 16384
> extent buffer leak: start 30277632 len 16384
> extent buffer leak: start 30343168 len 16384
> extent buffer leak: start 30392320 len 16384
> extent buffer leak: start 30457856 len 16384
> extent buffer leak: start 30507008 len 16384
> extent buffer leak: start 30572544 len 16384
> extent buffer leak: start 30621696 len 16384
> extent buffer leak: start 30670848 len 16384
> extent buffer leak: start 3072 len 16384
> extent buffer leak: start 30769152 len 16384
> extent buffer leak: start 30801920 len 16384
> extent buffer leak: start 30867456 len 16384
> extent buffer leak: start 30916608 len 16384
> extent buffer leak: start 102498304 len 16384
> extent buffer leak: start 204488704 len 16384
> extent buffer leak: start 237912064 len 16384
> extent buffer leak: start 328499200 len 16384
> extent buffer leak: start 684539904 len 16384
> extent buffer leak: start 849362944 len 16384
> 



signature.asc
Description: OpenPGP digital signature


Re: filesystem corruption

2014-11-04 Thread Duncan
Zygo Blaxell posted on Mon, 03 Nov 2014 23:31:45 -0500 as excerpted:

 On Mon, Nov 03, 2014 at 10:11:18AM -0700, Chris Murphy wrote:
 
 On Nov 2, 2014, at 8:43 PM, Zygo Blaxell zblax...@furryterror.org
 wrote:
  btrfs seems to assume the data is correct on both disks (the
  generation numbers and checksums are OK) but gets confused by equally
  plausible but different metadata on each disk.  It doesn't take long
  before the filesystem becomes data soup or crashes the kernel.
 
 This is a pretty significant problem to still be present, honestly. I
 can understand the catchup mechanism is probably not built yet,
 but clearly the two devices don't have the same generation. The lower
 generation device should probably be booted/ignored or declared missing
 in the meantime to prevent trashing the file system.
 
 The problem with generation numbers is when both devices get divergent
 generation numbers but we can't tell them apart

[snip very reasonable scenario]

 Now we have two disks with equal generation numbers. 
 Generations 6..9 on sda are not the same as generations 6..9 on sdb, so
 if we mix the two disks' metadata we get bad confusion.
 
 It needs to be more than a sequential number.  If one of the disks
 disappears we need to record this fact on the surviving disks, and also
 cope with _both_ disks claiming to be the surviving one.

Zygo's absolutely correct.  There is an existing catchup mechanism, but 
the tracking is /purely/ sequential generation number based, and if the 
two generation sequences diverge, Welcome to the (data) Twilight Zone!

I noted this in my own early pre-deployment raid1 mode testing as well, 
except that I didn't at that point know about sequence numbers and never 
got as far as letting the filesystem make data soup of itself.

What I did was this:

1) Create a two-device raid1 data and metadata filesystem, mount it and 
stick some data on it.

2) Unmount, pull a device, mount degraded the remaining device.

3) Change a file.

4) Unmount, switch devices, mount degraded the other device.

5) Change the same file in an different/incompatible way.

6) Unmount, plug both devices in again, mount (not degraded).

7) Wait for the sync I was used to from mdraid, which of course didn't 
occur.

8) Check the file to see which version showed up.  I don't recall which 
version it was, but it wasn't the common pre-change version.

9) Unmount, pull each device one at a time, mounting the other one 
degraded and checking the file again.

10) The file on each device remained different, without a warning or 
indication of any problem at all when I mounted undegraded in 6/7.

Had I initiated a scrub, presumably it would have seen the difference and 
if one was a newer generation, it would have taken it, overwriting the 
other.  I don't know what it would have done if both were the same 
generation, tho the file being small (just a few line text file, big 
enough to test the effect of differing edits), I guess it would take one 
version or the other.  If the file was large enough to be multiple 
extents, however, I've no idea whether it'd take one or the other, or 
possibly combine the two, picking extents where they differed more or 
less randomly.

By that time the lack of warning and absolute resolution to one version 
or the other even after mounting undegraded and accessing the file with 
incompatible versions on each of the two devices was bothering me 
sufficiently that I didn't test any further.

Being just me I have to worry about (unlike a multi-admin corporate 
scenario where you can never be /sure/ what the other admins will do 
regardless of agreed procedure), I simply set myself a set of rules very 
similar to what Zygo proposed:

1) If for whatever reason I ever split a btrfs raid1 with the intent or 
even the possibility of bringing the pieces back together again, if at 
all possible, never mount the split pieces writable -- mount read-only.

2) If a writable mount is required, keep the writable mounts to one 
device of the split.  As long as the other device is never mounted 
writable, it will have an older generation when they're reunited and a 
scrub should take care of things, reliably resolving to the updated 
written device, rewriting the older generation on the other device.

What I'd do here is physically put the removed side of the raid1 in 
storage, far enough from the remaining side that I couldn't possibly get 
them mixed up.  I'd clearly label it as well, creating a defense in 
depth of at least two, the labeling and the physical separation and 
storage of the read-only device.

3) If for whatever reason the originally read-only side must be mounted 
writable, very clearly mark the originally mounted-writable device 
POISONED/TOXIC!!  *NEVER* *EVER* let such a POISONED device anywhere near 
its original raid1 mate, until it is wiped, such that there's no 
possibility of btrfs getting confused and contaminated with the poisoned 
data.

Given how unimpressed I was 

Re: filesystem corruption

2014-11-04 Thread Chris Murphy

On Nov 3, 2014, at 9:31 PM, Zygo Blaxell zblax...@furryterror.org wrote:

 On Mon, Nov 03, 2014 at 10:11:18AM -0700, Chris Murphy wrote:
 
 On Nov 2, 2014, at 8:43 PM, Zygo Blaxell zblax...@furryterror.org wrote:
 btrfs seems to assume the data is correct on both disks (the generation
 numbers and checksums are OK) but gets confused by equally plausible but
 different metadata on each disk.  It doesn't take long before the
 filesystem becomes data soup or crashes the kernel.
 
 This is a pretty significant problem to still be present, honestly. I
 can understand the catchup mechanism is probably not built yet,
 but clearly the two devices don't have the same generation. The lower
 generation device should probably be booted/ignored or declared missing
 in the meantime to prevent trashing the file system.
 
 The problem with generation numbers is when both devices get divergent
 generation numbers but we can't tell them apart, e.g.
 
   1.  sda generation = 5, sdb generation = 5
 
   2.  sdb temporarily disconnects, so we are degraded on just sda
 
   3.  sda gets more generations 6..9
 
   4.  sda temporarily disconnects, so we have no disks at all.
 
   5.  the machine reboots, gets sdb back but not sda
 
 If we allow degraded here, then:
 
   6.  sdb gets more generations 6..9
 
   7.  sdb disconnects, no disks so no filesystem
 
   8.  the machine reboots again, this time with sda and sdb present
 
 Now we have two disks with equal generation numbers.  Generations 6..9
 on sda are not the same as generations 6..9 on sdb, so if we mix the
 two disks' metadata we get bad confusion.
 
 It needs to be more than a sequential number.  If one of the disks
 disappears we need to record this fact on the surviving disks, and also
 cope with _both_ disks claiming to be the surviving one.

I agree this is also a problem. But the most common case is where we know that 
sda generation is newer (larger value) and most recently modified, and sdb has 
not since been modified but needs to be caught up. As far as I know the only 
way to do that on Btrfs right now is a full balance, it doesn't catch up just 
be being reconnected with a normal mount.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filesystem corruption

2014-11-04 Thread Duncan
Chris Murphy posted on Tue, 04 Nov 2014 11:28:39 -0700 as excerpted:

 It needs to be more than a sequential number.  If one of the disks
 disappears we need to record this fact on the surviving disks, and also
 cope with _both_ disks claiming to be the surviving one.
 
 I agree this is also a problem. But the most common case is where we
 know that sda generation is newer (larger value) and most recently
 modified, and sdb has not since been modified but needs to be caught up.
 As far as I know the only way to do that on Btrfs right now is a full
 balance, it doesn't catch up just be being reconnected with a normal
 mount.

I thought it was a scrub that would take care of that, not a balance?

(Maybe do both to be sure?)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filesystem corruption

2014-11-04 Thread Robert White

On 11/04/2014 10:28 AM, Chris Murphy wrote:

On Nov 3, 2014, at 9:31 PM, Zygo Blaxell zblax...@furryterror.org wrote:

Now we have two disks with equal generation numbers.  Generations 6..9
on sda are not the same as generations 6..9 on sdb, so if we mix the
two disks' metadata we get bad confusion.

It needs to be more than a sequential number.  If one of the disks
disappears we need to record this fact on the surviving disks, and also
cope with _both_ disks claiming to be the surviving one.


I agree this is also a problem. But the most common case is where we know that 
sda generation is newer (larger value) and most recently modified, and sdb has 
not since been modified but needs to be caught up. As far as I know the only 
way to do that on Btrfs right now is a full balance, it doesn't catch up just 
be being reconnected with a normal mount.



I would think that any time any system or fraction thereof is mounted 
with both a degraded and rw, status a degraded flag should be set 
somewhere/somehow in the superblock etc.


The only way to clear this flag would be to reach a reconciled state. 
That state could be reached in one of several ways. Removing the missing 
mirror element would be a fast reconcile, doing a balance or scrub would 
be a slow reconcile for a filessytem where all the media are returned to 
service (e.g. the missing volume of a RAID 1 etc is returned.)


Generation numbers are pretty good, but I'd put on a rider that any 
generation number or equivelant incremented while the system is degraded 
should have a unique quanta (say a GUID) generated and stored along with 
the generation number. The mere existence of this quanta would act as 
the degraded flag.


Any check/compare/access related to the generation number would know to 
notice that the GUID is in place and do the necessary resolution. If 
successful the GUID would be discarded.


As to how this could be implemented, I'm not fully conversant on the 
internal layout.


One possibility would be to add a block reference, or, indeed replace 
the current storage for generation numbers completely with block 
reference to a block containing the generation number and the potential 
GUID. The main value of having an out-of-structure reference is that its 
content is less space constrained, and it could be shared by multiple 
usages. In the case, for instance, where the block is added (as opposed 
to replacing the generation number) only one such block would be needed 
per degraded,rw mount, and it could be attached to as many filesystem 
structures as needed.



Just as metadata under DUP is divergent after a degraded mount, a 
generation block wold be divergent, and likely in a different location 
than its peers on a subsequent restored geometry.


A gerenation block could have other nicities like the date/time and the 
devices present (or absent); such information could conceivably be used 
to intellegently disambiguate references. For instance if one degraded 
mount had sda and sdb, and second had sdb and sdc, then itd be known 
that sdb was dominant for having been present every time.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filesystem corruption

2014-11-04 Thread Zygo Blaxell
On Tue, Nov 04, 2014 at 11:28:39AM -0700, Chris Murphy wrote:
 On Nov 3, 2014, at 9:31 PM, Zygo Blaxell zblax...@furryterror.org wrote:
  It needs to be more than a sequential number.  If one of the disks
  disappears we need to record this fact on the surviving disks, and also
  cope with _both_ disks claiming to be the surviving one.
 
 I agree this is also a problem. But the most common case is where we
 know that sda generation is newer (larger value) and most recently
 modified, and sdb has not since been modified but needs to be caught
 up. As far as I know the only way to do that on Btrfs right now is
 a full balance, it doesn't catch up just be being reconnected with a
 normal mount.

The data on the disks might be inconistent, so resynchronization must
read from only the good copy.  A balance could just spread corruption
around if it reads from two out-of-sync mirrors.  (Maybe it already does
the right thing if sdb was not modified...?).

The full resync operation is more like btrfs device replace, except that
it's replacing a disk in-place (i.e. without removing it first), and it
would not read from the non-good disk.

 
 Chris Murphy--
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: Digital signature


Re: filesystem corruption

2014-11-03 Thread Chris Murphy

On Nov 2, 2014, at 8:43 PM, Zygo Blaxell zblax...@furryterror.org wrote:

 On Sun, Nov 02, 2014 at 02:57:22PM -0700, Chris Murphy wrote:
 
 For example if I have a two device Btrfs raid1 for both data and
 metadata, and one device is removed and I mount -o degraded,rw one
 of them and make some small changes, unmount, then reconnect the
 missing device and mount NOT degraded - what happens?  I haven't tried
 this. 
 
 I have.  It's a filesystem-destroying disaster.  Never do it, never let
 it happen accidentally.  Make sure that if a disk gets temporarily
 disconnected, you either never mount it degraded, or never let it come
 back (i.e. take the disk to another machine and wipefs it).  Don't ever,
 ever put 'degraded' in /etc/fstab mount options.  Nope.  No.

Well I guess I now see why opensuse's plan for Btrfs by default proscribes 
multiple device Btrfs volumes. The described scenario is really common with 
users, I see it often on linux-raid@. And md doesn't have this problem. The 
worst case scenario is if devices don't have bitmaps, and then a whole device 
rebuild has to happen rather than just a quick catchup.



 
 btrfs seems to assume the data is correct on both disks (the generation
 numbers and checksums are OK) but gets confused by equally plausible but
 different metadata on each disk.  It doesn't take long before the
 filesystem becomes data soup or crashes the kernel.

This is a pretty significant problem to still be present, honestly. I can 
understand the catchup mechanism is probably not built yet, but clearly the 
two devices don't have the same generation. The lower generation device should 
probably be booted/ignored or declared missing in the meantime to prevent 
trashing the file system.


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filesystem corruption

2014-11-03 Thread Zygo Blaxell
On Mon, Nov 03, 2014 at 10:11:18AM -0700, Chris Murphy wrote:
 
 On Nov 2, 2014, at 8:43 PM, Zygo Blaxell zblax...@furryterror.org wrote:
  btrfs seems to assume the data is correct on both disks (the generation
  numbers and checksums are OK) but gets confused by equally plausible but
  different metadata on each disk.  It doesn't take long before the
  filesystem becomes data soup or crashes the kernel.
 
 This is a pretty significant problem to still be present, honestly. I
 can understand the catchup mechanism is probably not built yet,
 but clearly the two devices don't have the same generation. The lower
 generation device should probably be booted/ignored or declared missing
 in the meantime to prevent trashing the file system.

The problem with generation numbers is when both devices get divergent
generation numbers but we can't tell them apart, e.g.

1.  sda generation = 5, sdb generation = 5

2.  sdb temporarily disconnects, so we are degraded on just sda

3.  sda gets more generations 6..9

4.  sda temporarily disconnects, so we have no disks at all.

5.  the machine reboots, gets sdb back but not sda

If we allow degraded here, then:

6.  sdb gets more generations 6..9

7.  sdb disconnects, no disks so no filesystem

8.  the machine reboots again, this time with sda and sdb present

Now we have two disks with equal generation numbers.  Generations 6..9
on sda are not the same as generations 6..9 on sdb, so if we mix the
two disks' metadata we get bad confusion.

It needs to be more than a sequential number.  If one of the disks
disappears we need to record this fact on the surviving disks, and also
cope with _both_ disks claiming to be the surviving one.



 
 Chris Murphy
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: Digital signature


Re: filesystem corruption

2014-11-02 Thread Chris Murphy

On Nov 1, 2014, at 10:49 PM, Robert White rwh...@pobox.com wrote:

 On 10/31/2014 10:34 AM, Tobias Holst wrote:
 I am now using another system with kernel 3.17.2 and btrfs-tools 3.17
 and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add
 the second one as there are only two slots in that server.
 
 This is what I got:
 
  tobby@ubuntu: sudo btrfs check /dev/sdb1
 warning, device 2 is missing
 warning devid 2 not found already
 root item for root 1746, current bytenr 80450240512, current gen
 163697, current level 2, new bytenr 40074067968, new gen 163707, new
 level 2
 Found 1 roots with an outdated root item.
 Please run a filesystem check with the option --repair to fix them.
 
  tobby@ubuntu: sudo btrfs check --repair /dev/sdb1
 enabling repair mode
 warning, device 2 is missing
 warning devid 2 not found already
 Unable to find block group for 0
 extent-tree.c:289: find_search_start: Assertion `1` failed.
 
 The read-only snapshots taken under 3.17.1 are your core problem.
 
 Now btrfsck is refusing to operate on the degraded RAID because degraded RAID 
 is degraded so it's read-only. (this is an educated guess).

Degradedness and writability are orthogonal. If there's some problem with the 
fs that prevents it from being mountable rw, then that'd apply for both normal 
and degraded operation. If the fs is OK, it should permit writable degraded 
mounts.

 Since btrfsck is _not_ a mount type of operation its got no degraded mode 
 that would let you deal with half a RAID as far as I know.

That's a problem. I can see why a repair might need an additional flag (maybe 
force) to repair a volume that has the minimum number of devices for degraded 
mounting, but not all are present. Maybe we wouldn't want it to be easy to 
accidentally run a repair that changes the file system when a device happens to 
be missing inadvertently that could be found and connected later.

I think related to this is a btrfs equivalent of a bitmap. The metadata already 
has this information in it, but possibly right now btrfs lacks the equivalent 
behavior of mdadm readd when a previously missing device is reconnected. If it 
has a bitmap then it doesn't have to be completely rebuilt, the bitmap contains 
information telling md how to catch up the readded device, i.e. only that 
which is different needs to be written upon a readd.

For example if I have a two device Btrfs raid1 for both data and metadata, and 
one device is removed and I mount -o degraded,rw one of them and make some 
small changes, unmount, then reconnect the missing device and mount NOT 
degraded - what happens? I haven't tried this. And I also don't know if a full 
balance (hours) is needed to catch up the formerly missing device. With md 
this is very fast - seconds/minutes depending on how much has been changed.


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filesystem corruption

2014-11-02 Thread Tobias Holst
Thank you for your reply.

I'll answer in-line.


2014-11-02 5:49 GMT+01:00 Robert White rwh...@pobox.com:
 On 10/31/2014 10:34 AM, Tobias Holst wrote:

 I am now using another system with kernel 3.17.2 and btrfs-tools 3.17
 and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add
 the second one as there are only two slots in that server.

 This is what I got:

   tobby@ubuntu: sudo btrfs check /dev/sdb1
 warning, device 2 is missing
 warning devid 2 not found already
 root item for root 1746, current bytenr 80450240512, current gen
 163697, current level 2, new bytenr 40074067968, new gen 163707, new
 level 2
 Found 1 roots with an outdated root item.
 Please run a filesystem check with the option --repair to fix them.

   tobby@ubuntu: sudo btrfs check --repair /dev/sdb1
 enabling repair mode
 warning, device 2 is missing
 warning devid 2 not found already
 Unable to find block group for 0
 extent-tree.c:289: find_search_start: Assertion `1` failed.


 The read-only snapshots taken under 3.17.1 are your core problem.

OK


 Now btrfsck is refusing to operate on the degraded RAID because degraded
 RAID is degraded so it's read-only. (this is an educated guess). Since
 btrfsck is _not_ a mount type of operation its got no degraded mode that
 would let you deal with half a RAID as far as I know.

OK, good to know.


 In your case...

 It is _known_ that you need to be _not_ running 3.17.0 or 3.17.1 if you are
 going to make read-only snapshots safely.
 It is _known_ that you need to be running 3.17.2 to get a number of fixes
 that impact your circumstance.
 It is _known_ that you need to be running btrfs-progs 3.17 to repair the
 read-only snapshot that are borked up, and that you must _not_ have
 previously tried to repair the problme with an older btrfsck.

No, I didn't try to repair it with older kernels/btrfs-tools.


 Were I you, I would...

 Put the two disks back in the same computer before something bad happens.

 Upgrade that computer to 3.17.2 and 3.17 respectively.

As I mentioned before I only have two slots and my system on this
btrfs-raid1 is not working anymore. Not just when accessing
ro-snapshots - it crashes everytime at the login prompt. So now I
installed Ubuntu 14.04 to an USB stick (so I can readd both btrfs
HDDs) and upgraded the kernel to 3.17.2 and btrfs-tools to 3.17.


 Take a backup (because I am paranoid like that, though current threat seems
 negligible).

I already have a backup. :)


 btrfsck your raid with --repair.

OK. And this is what I get now:

tobby@ubuntu: sudo btrfs check /dev/sda1
root item for root 1746, current bytenr 80450240512, current gen
163697, current level 2, new bytenr 40074067968, new gen 163707, new
level 2
Found 1 roots with an outdated root item.
Please run a filesystem check with the option --repair to fix them.

tobby@ubuntu: sudo btrfs check /dev/sda1 --repair
enabling repair mode
fixing root item for root 1746, current bytenr 80450240512, current
gen 163697, current level 2, new bytenr 40074067968, new gen 163707,
new level 2
Fixed 1 roots.
Checking filesystem on /dev/sda1
UUID: 3ad065be-2525-4547-87d3-0e195497f9cf
checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 18446744073709551607 inode 258 errors 1000, some csum missing
found 36031450184 bytes used err is 1
total csum bytes: 59665716
total tree bytes: 3523330048
total fs tree bytes: 3234054144
total extent tree bytes: 202358784
btree space waste bytes: 755547262
file data blocks allocated: 122274091008
 referenced 211741990912
Btrfs v3.17


 Alternately, if you previously tried to btrfsck the raid with a version
 prior to 3.17 tools after the read-only snapshot(s) problem, you will need
 to resort to mkfs.btrfs to solve the problem. But Hey! you have two disks,
 so break the RAID, then mkfs one of them, then copy the data, then re-make
 the RAID such that the new FS rules.

 Enjoy your system no longer taking racy read-only snapshots... 8-)



And this worked! :) Server is back online without restoring any
files from the backup. Looks good to me!

But I can't do a balance anymore?

root@t-mon:~# btrfs balance start /dev/sda1
ERROR: can't access '/dev/sda1'

Regards
Tobias
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filesystem corruption

2014-11-02 Thread Zygo Blaxell
On Sun, Nov 02, 2014 at 02:57:22PM -0700, Chris Murphy wrote:
 On Nov 1, 2014, at 10:49 PM, Robert White rwh...@pobox.com wrote:
 
  On 10/31/2014 10:34 AM, Tobias Holst wrote:
  I am now using another system with kernel 3.17.2 and btrfs-tools 3.17
  and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add
  the second one as there are only two slots in that server.
  
  This is what I got:
  
   tobby@ubuntu: sudo btrfs check /dev/sdb1
  warning, device 2 is missing
  warning devid 2 not found already
  root item for root 1746, current bytenr 80450240512, current gen
  163697, current level 2, new bytenr 40074067968, new gen 163707, new
  level 2
  Found 1 roots with an outdated root item.
  Please run a filesystem check with the option --repair to fix them.
  
   tobby@ubuntu: sudo btrfs check --repair /dev/sdb1
  enabling repair mode
  warning, device 2 is missing
  warning devid 2 not found already
  Unable to find block group for 0
  extent-tree.c:289: find_search_start: Assertion `1` failed.
  
  The read-only snapshots taken under 3.17.1 are your core problem.
  
  Now btrfsck is refusing to operate on the degraded RAID because
  degraded RAID is degraded so it's read-only. (this is an educated
  guess).
 
 Degradedness and writability are orthogonal. If there's some problem
 with the fs that prevents it from being mountable rw, then that'd
 apply for both normal and degraded operation. If the fs is OK, it
 should permit writable degraded mounts.
 
  Since btrfsck is _not_ a mount type of operation its got no degraded
  mode that would let you deal with half a RAID as far as I know.
 
 That's a problem. I can see why a repair might need an additional flag
 (maybe force) to repair a volume that has the minimum number of devices
 for degraded mounting, but not all are present. Maybe we wouldn't want
 it to be easy to accidentally run a repair that changes the file system
 when a device happens to be missing inadvertently that could be found
 and connected later.
 
 I think related to this is a btrfs equivalent of a bitmap. The metadata
 already has this information in it, but possibly right now btrfs
 lacks the equivalent behavior of mdadm readd when a previously missing
 device is reconnected. If it has a bitmap then it doesn't have to be
 completely rebuilt, the bitmap contains information telling md how to
 catch up the readded device, i.e. only that which is different needs
 to be written upon a readd.
 
 For example if I have a two device Btrfs raid1 for both data and
 metadata, and one device is removed and I mount -o degraded,rw one
 of them and make some small changes, unmount, then reconnect the
 missing device and mount NOT degraded - what happens?  I haven't tried
 this. 

I have.  It's a filesystem-destroying disaster.  Never do it, never let
it happen accidentally.  Make sure that if a disk gets temporarily
disconnected, you either never mount it degraded, or never let it come
back (i.e. take the disk to another machine and wipefs it).  Don't ever,
ever put 'degraded' in /etc/fstab mount options.  Nope.  No.

btrfs seems to assume the data is correct on both disks (the generation
numbers and checksums are OK) but gets confused by equally plausible but
different metadata on each disk.  It doesn't take long before the
filesystem becomes data soup or crashes the kernel.

There is more than one way to get to this point.  Take LVM snapshots of
the devices in a btrfs RAID1 array, and 'btrfs device scan' will see two
different versions of each btrfs device in a btrfs filesystem (one for
the origin LV and one for the snapshot).  btrfs then assembles LVs of
different vintages randomly (e.g. one from the mount command line, one
from an earlier LVM snapshot of the second disk) with disastrous results
similar to the above.  IMHO if btrfs sees multiple devices with the same
UUIDs, it should reject all of them and require an explicit device list;
however, mdadm has a way to deal with this that would also work.

mdadm puts event counters and timestamps in the device superblocks to
prevent any such accidental disjoint assembly and modification of members
of an array.  If disks go temporarily offline with separate modifications
then mdadm refuses to accept disks with different counter+timestamp data
(so you'll get all the disks but one rejected, or only one disk with all
others rejected).  The rejected disk(s) has to go through full device
recovery before rejoining the array--someone has to use mdadm to add
the rejected disk as if it was a new, blank one.

Currently btrfs won't mount a degraded array by default, which prevents
unrecoverable inconsistency.  That's a safe behavior for now, but sooner
or later btrfs will need to be able to safely boot unattended on a
degraded RAID1 root filesystem.

 And I also don't know if a full balance (hours) is needed to
 catch up the formerly missing device. With md this is very fast -
 seconds/minutes depending on how much has been changed.

I schedule 

Re: filesystem corruption

2014-11-02 Thread Robert White

On 11/02/2014 06:55 PM, Tobias Holst wrote:

But I can't do a balance anymore?

root@t-mon:~# btrfs balance start /dev/sda1
ERROR: can't access '/dev/sda1'


Balance takes place on a mounted filesystem not a native block device.

So...

mount -t btrfs /dev/sda1 /some/path/somewhere
btrfs balance start /some/path/somewhere


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filesystem corruption

2014-11-01 Thread Robert White

On 10/31/2014 10:34 AM, Tobias Holst wrote:

I am now using another system with kernel 3.17.2 and btrfs-tools 3.17
and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add
the second one as there are only two slots in that server.

This is what I got:

  tobby@ubuntu: sudo btrfs check /dev/sdb1
warning, device 2 is missing
warning devid 2 not found already
root item for root 1746, current bytenr 80450240512, current gen
163697, current level 2, new bytenr 40074067968, new gen 163707, new
level 2
Found 1 roots with an outdated root item.
Please run a filesystem check with the option --repair to fix them.

  tobby@ubuntu: sudo btrfs check --repair /dev/sdb1
enabling repair mode
warning, device 2 is missing
warning devid 2 not found already
Unable to find block group for 0
extent-tree.c:289: find_search_start: Assertion `1` failed.


The read-only snapshots taken under 3.17.1 are your core problem.

Now btrfsck is refusing to operate on the degraded RAID because degraded 
RAID is degraded so it's read-only. (this is an educated guess). Since 
btrfsck is _not_ a mount type of operation its got no degraded mode 
that would let you deal with half a RAID as far as I know.


In your case...

It is _known_ that you need to be _not_ running 3.17.0 or 3.17.1 if you 
are going to make read-only snapshots safely.
It is _known_ that you need to be running 3.17.2 to get a number of 
fixes that impact your circumstance.
It is _known_ that you need to be running btrfs-progs 3.17 to repair the 
read-only snapshot that are borked up, and that you must _not_ have 
previously tried to repair the problme with an older btrfsck.


Were I you, I would...

Put the two disks back in the same computer before something bad happens.

Upgrade that computer to 3.17.2 and 3.17 respectively.

Take a backup (because I am paranoid like that, though current threat 
seems negligible).


btrfsck your raid with --repair.

Alternately, if you previously tried to btrfsck the raid with a version 
prior to 3.17 tools after the read-only snapshot(s) problem, you will 
need to resort to mkfs.btrfs to solve the problem. But Hey! you have two 
disks, so break the RAID, then mkfs one of them, then copy the data, 
then re-make the RAID such that the new FS rules.


Enjoy your system no longer taking racy read-only snapshots... 8-)


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filesystem corruption

2014-10-31 Thread Tobias Holst
I am now using another system with kernel 3.17.2 and btrfs-tools 3.17
and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add
the second one as there are only two slots in that server.

This is what I got:

 tobby@ubuntu: sudo btrfs check /dev/sdb1
warning, device 2 is missing
warning devid 2 not found already
root item for root 1746, current bytenr 80450240512, current gen
163697, current level 2, new bytenr 40074067968, new gen 163707, new
level 2
Found 1 roots with an outdated root item.
Please run a filesystem check with the option --repair to fix them.

 tobby@ubuntu: sudo btrfs check --repair /dev/sdb1
enabling repair mode
warning, device 2 is missing
warning devid 2 not found already
Unable to find block group for 0
extent-tree.c:289: find_search_start: Assertion `1` failed.
btrfs[0x42bd62]
btrfs[0x42ffe5]
btrfs[0x430211]
btrfs[0x4246ec]
btrfs[0x424d11]
btrfs[0x426af3]
btrfs[0x41b18c]
btrfs[0x40b46a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7ffca1119ec5]
btrfs[0x40b497]

This can be repeated as often as I want ;) Nothing changed.

Regards
Tobias


2014-10-31 3:41 GMT+01:00 Rich Freeman r-bt...@thefreemanclan.net:
 On Thu, Oct 30, 2014 at 9:02 PM, Tobias Holst to...@tobby.eu wrote:
 Addition:
 I found some posts here about a general file system corruption in 3.17
 and 3.17.1 - is this the cause?
 Additionally I am using ro-snapshots - maybe this is the cause, too?

 Anyway: Can I fix that or do I have to reinstall? Haven't touched the
 filesystem, just did a scrub (found 0 errors).


 Yup - ro-snapshots is a big problem in 3.17.  You can probably recover now by:
 1.  Update your kernel to 3.17.2 - that takes care of all the big
 known 3.16/17 issues in general.
 2.  Run btrfs check using btrfs-tools 3.17.  That can clean up the
 broken snapshots in your filesystem.

 That is fairly likely to get your filesystem working normally again.
 It worked for me.  I was getting some balance issues when trying to
 add another device and I'm not sure if 3.17.2 totally fixed that - I
 ended up cancelling the balance and it will be a while before I have
 to balance this particular filesystem again, so I'll just hold off and
 hope things stabilize.

 --
 Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filesystem corruption

2014-10-30 Thread Tobias Holst
Addition:
I found some posts here about a general file system corruption in 3.17
and 3.17.1 - is this the cause?
Additionally I am using ro-snapshots - maybe this is the cause, too?

Anyway: Can I fix that or do I have to reinstall? Haven't touched the
filesystem, just did a scrub (found 0 errors).

Regards
Tobias


2014-10-31 1:29 GMT+01:00 Tobias Holst to...@tobby.eu:
 Hi

 I was using a btrfs RAID1 with two disks under Ubuntu 14.04, kernel
 3.13 and btrfs-tools 3.14.1 for weeks without issues.

 Now I updated to kernel 3.17.1 and btrfs-tools 3.17. After a reboot
 everything looked fine and I started some tests. While running
 duperemover (just scanning, not doing anything) and a balance at the
 same time the load suddenly went up to 30 and the system was not
 responding anymore. Everyhting working with the filesystem stopped
 responding. So I did a hard reset.

 I was able to reboot, but on the login prompt nothing happened but a
 kernel bug. Same back in kernel 3.13.

 Now I started a live system (Ubuntu 14.10, kernel 3.16.x, btrfs-tools
 3.14.1), and mounted the btrfs filesystem. I can browse through the
 files but sometimes, especially when accessing my snapshots or trying
 to create a new snapshot, the kernel bug appears and the filesystem
 hangs.

 It shows this:
 Oct 31 00:09:14 ubuntu kernel: [  187.661731] [ cut here
 ]
 Oct 31 00:09:14 ubuntu kernel: [  187.661770] WARNING: CPU: 1 PID:
 4417 at /build/buildd/linux-3.16.0/fs/btrfs/relocation.c:924
 build_backref_tree+0xcab/0x1240 [btrfs]()
 Oct 31 00:09:14 ubuntu kernel: [  187.661772] Modules linked in:
 nls_iso8859_1 dm_crypt gpio_ich coretemp lpc_ich kvm_intel kvm
 dm_multipath scsi_dh serio_raw xgifb(C) bnep rfcomm bluetooth
 6lowpan_iphc i3000_edac edac_core parport_pc mac_hid ppdev shpchp lp
 parport squashfs overlayfs nls_utf8 isofs btrfs xor raid6_pq dm_mirror
 dm_region_hash dm_log hid_generic usbhid hid uas usb_storage ahci
 e1000e libahci ptp pps_core
 Oct 31 00:09:14 ubuntu kernel: [  187.661800] CPU: 1 PID: 4417 Comm:
 btrfs-balance Tainted: G C3.16.0-23-generic #31-Ubuntu
 Oct 31 00:09:14 ubuntu kernel: [  187.661802] Hardware name:
 Supermicro PDSML/PDSML+, BIOS 6.00 03/06/2009
 Oct 31 00:09:14 ubuntu kernel: [  187.661804]  0009
 8800a0ae7a00 8177fcbc 
 Oct 31 00:09:14 ubuntu kernel: [  187.661807]  8800a0ae7a38
 8106fd8d 8800a1440750 8800a1440b48
 Oct 31 00:09:14 ubuntu kernel: [  187.661809]  88020a8ce000
 0001 88020b6b0d00 8800a0ae7a48
 Oct 31 00:09:14 ubuntu kernel: [  187.661812] Call Trace:
 Oct 31 00:09:14 ubuntu kernel: [  187.661820]  [8177fcbc]
 dump_stack+0x45/0x56
 Oct 31 00:09:14 ubuntu kernel: [  187.661825]  [8106fd8d]
 warn_slowpath_common+0x7d/0xa0
 Oct 31 00:09:14 ubuntu kernel: [  187.661827]  [8106fe6a]
 warn_slowpath_null+0x1a/0x20
 Oct 31 00:09:14 ubuntu kernel: [  187.661842]  [c01b734b]
 build_backref_tree+0xcab/0x1240 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661857]  [c01b7ae1]
 relocate_tree_blocks+0x201/0x600 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661872]  [c01b88d8] ?
 add_data_references+0x268/0x2a0 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661887]  [c01b96fd]
 relocate_block_group+0x25d/0x6b0 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661902]  [c01b9d36]
 btrfs_relocate_block_group+0x1e6/0x2f0 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661916]  [c0190988]
 btrfs_relocate_chunk.isra.27+0x58/0x720 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661926]  [c0140dc1] ?
 btrfs_set_path_blocking+0x41/0x80 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661935]  [c0145dfd] ?
 btrfs_search_slot+0x48d/0xa40 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661950]  [c018b49b] ?
 release_extent_buffer+0x2b/0xd0 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661964]  [c018b95f] ?
 free_extent_buffer+0x4f/0xa0 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661979]  [c01936c3]
 __btrfs_balance+0x4d3/0x8d0 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661993]  [c0193d48]
 btrfs_balance+0x288/0x600 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.662008]  [c019411d]
 balance_kthread+0x5d/0x80 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.662022]  [c01940c0] ?
 btrfs_balance+0x600/0x600 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.662026]  [81094aeb]
 kthread+0xdb/0x100
 Oct 31 00:09:14 ubuntu kernel: [  187.662029]  [81094a10] ?
 kthread_create_on_node+0x1c0/0x1c0
 Oct 31 00:09:14 ubuntu kernel: [  187.662032]  [81787c3c]
 ret_from_fork+0x7c/0xb0
 Oct 31 00:09:14 ubuntu kernel: [  187.662035]  [81094a10] ?
 kthread_create_on_node+0x1c0/0x1c0
 Oct 31 00:09:14 ubuntu kernel: [  187.662037] ---[ end trace
 fb7849e4a6f20424 ]---

 end this:
 Oct 31 00:09:14 ubuntu kernel: [  187.682629] [ cut here
 

Re: filesystem corruption

2014-10-30 Thread Rich Freeman
On Thu, Oct 30, 2014 at 9:02 PM, Tobias Holst to...@tobby.eu wrote:
 Addition:
 I found some posts here about a general file system corruption in 3.17
 and 3.17.1 - is this the cause?
 Additionally I am using ro-snapshots - maybe this is the cause, too?

 Anyway: Can I fix that or do I have to reinstall? Haven't touched the
 filesystem, just did a scrub (found 0 errors).


Yup - ro-snapshots is a big problem in 3.17.  You can probably recover now by:
1.  Update your kernel to 3.17.2 - that takes care of all the big
known 3.16/17 issues in general.
2.  Run btrfs check using btrfs-tools 3.17.  That can clean up the
broken snapshots in your filesystem.

That is fairly likely to get your filesystem working normally again.
It worked for me.  I was getting some balance issues when trying to
add another device and I'm not sure if 3.17.2 totally fixed that - I
ended up cancelling the balance and it will be a while before I have
to balance this particular filesystem again, so I'll just hold off and
hope things stabilize.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html