Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-05 Thread Kai Krakow
Am Mon, 4 Apr 2016 17:09:14 -0600
schrieb Chris Murphy :

> > Why couldn't/shouldn't I remove snapshots before detaching the seed
> > device? I want to keep them on the seed but they are useless to me
> > on the sprout.  
> 
> You can remove snapshots before or after detaching the seed device, it
> doesn't matter, but such snapshot removal only affects the sprout. You
> wrote:
> 
> "remove all left-over snapshots from the seed"
> 
> The seed is read only, you can't modify the contents of the seed
> device.

Sorry, not a native speaker... What I actually meant was to remove the
snapshot that originated from the seed, and which I don't need in the
sprout.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-04 Thread Duncan
Kai Krakow posted on Mon, 04 Apr 2016 21:26:28 +0200 as excerpted:

> I'll go test the soon-to-die SSD as soon as it replaced. I think it's
> still far from failing with bitrot. It was overprovisioned by 30% most
> of the time, with the spare space trimmed.

Same here, FWIW.  In fact, I had expected to get ~128 GB SSDs and ended 
up getting 256 GB, such that I was only using about 130 GiB, so depending 
on relative to what the overprovisioning percentage is calculated 
against, I was and am near 50% or 100% overprovisioned.

So in my case I think the SSD was simply defective, such that the 
overprovisioning and trim simply didn't help.  Tho the other two 
identical brand and model devices I bought from the same store at the 
same time, so very likely the same manufacturing lot, were and are just 
fine (tho one is showing a trivial non-zero raw value for 5, reallocated 
sector count, and 182, erase fail count total, but both remain at 100% 
"cooked" value, but absolutely no issues on the other one, actually the 
one that wasn't replaced of the original pair, at all).

But based on that experience, while overprovisioning may help in terms of 
normal wearout, it doesn't necessarily help at all if the device is 
actually going bad.

> It certainly should have a
> lot of sectors for wear levelling. In addition, smartctl shows no sector
> errors at all - except for one: raw_read_error_rate. I'm not sure what
> all those sensors tell me, but that one I'm also seeing on hard disks
> which show absolutely no data damage.
> 
> In fact, I see those counters for my hard disks. But dd to /dev/null of
> the complete raw hard disk shows no sector errors. It seems good. But
> well, counting 1+1 together: I currently see data damage. But I guess
> that's unrelated.
> 
> Is there some documentation somewhere what each of those sensors
> technically mean and how to read the raw values and thresh values?

Nothing user/admin level that I'm aware of.  I'm sure there's some smart 
docs somewhere that describe them as part of the standard, but they could 
easily be effectively unavailable for those unwilling to pay a big-
corporate-sized consortium membership fee (as was the case with one of 
the CompactDisc specs, Orange Book IIRC, at one point).

I know there's some discussion by allusion in the smartctl manpage and 
docs, but many attributes appear to be manufacturer specific and/or to 
have been reverse-engineered by the smartctl devs, meaning even /they/ 
don't really have access to proper documentation for at least some 
attributes.

Which is sad, but in a majority proprietary or at best don't-care 
market...

> I'm also seeing multi_zone_error_rate on my spinning rust.

> According to smartctl health check and smartctl extended selftest,
> there's no problems at all - and the smart error log is empty. There has
> never been an ATA error in dmesg... No relocated sectors... From my
> naive view the drives still look good.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-04 Thread Chris Murphy
On Mon, Apr 4, 2016 at 2:50 PM, Kai Krakow  wrote:

>> Anyway the 2nd 4 is not possible. The seed is ro by definition so you
>> can't remove snapshots from the seed. If you remove them from the
>> mounted rw sprout volume, they're removed from the sprout, not the
>> seed. If you want them on the sprout, but not on the seed, you need to
>> delete snapshots only after the seed is a.) removed from the sprout
>> and b.) made no longer a seed with btrfstune -S 0 and c.) mounted rw.
>
> If I understand right, the seed device won't change? So whatever action
> I apply to the sprout pool, I can later remove the seed from the pool
> and it will still be kind of untouched. Except, I'll have to return it
> no non-seed mode (step b).

Correct. In a sense, making a volume a seed is like making it a
volume-wide read-only snapshot. Any changes are applied via COW only
to added device(s).

>
> Why couldn't/shouldn't I remove snapshots before detaching the seed
> device? I want to keep them on the seed but they are useless to me on
> the sprout.

You can remove snapshots before or after detaching the seed device, it
doesn't matter, but such snapshot removal only affects the sprout. You
wrote:

"remove all left-over snapshots from the seed"

The seed is read only, you can't modify the contents of the seed device.

What you should do is just delete the snapshots you don't want
migrated over to the sprout right away before you even do the balance
-dconvert -mconvert. That way you aren't wasting time moving things
over that you don't want. To be clear:

btrfstune -S 0
mount /dev/seed /mnt/
btrfs dev add /dev/new1
btrfs dev add /dev/new2
mount -o remount,rw /mnt/
btrfs sub del blah/ blah2/ blah3/ blah4/
btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/
btrfs dev del /dev/seed /mnt/

If you're doing any backsup once remounting rw, note those backups
will only be on the sprout. Backups will not be on the seed because
it's read-only.


>
> What happens to the UUIDs when I separate seed and sprout?

Nothing. They remain intact and unique, per volume.




>
> I'd now reboot into the system to see if it's working.

Note you'll need to change grub.cfg, possibly fstab, and possibly the
initramfs, all three of which may be referencing the old volume.


> By then, it's
> time for some cleanup (remove the previously deferred "trashes" and
> retention snapshots), then separate the seed from the sprout. During
> that time, I could already use my system again while it's migrating for
> me in the background.
>
> I'd then return the seed back to non-seed, so it can take the role of
> my backup storage again. I'd do a rebalance now.

OK? I don't know why you need to balance the seed at all, let alone
afterward, but it seems like it might be a more efficient replication
if you balanced before making it a seed?


>
> During the whole process, the backup storage will still stay safe for
> me. If something goes wrong, I could easily start over.
>
> Did I miss something? Is it too much of an experimental kind of stuff?

I'm not sure where all the bugs are. It's good to find bugs though and
get them squashed. I have an idea of making live media use Btrfs
instead of using a loop mounted file to back a rw lvm snapshot device
(persistent overlay), which I think is really fragile and a lot more
complicated in the initramfs. It's also good to take advantage of
checksumming after having written an ISO to flash media, where users
often don't verify or something can mount the USB stick rw and
immediately modify the stick in such a way that media verification
will fail anyway. So, a number of plusses, I'd like to see the seed
device be robust.


>
> BTW: The way it is arranged now, the backup storage is bootable by
> setting the scratch area subvolume as the rootfs on kernel cmdline,
> USB drivers are included in the kernel, it's tested and works. I guess,
> this isn't possible while the backup storage acts as a seed device? But
> I have an initrd with latest btrfs-progs on my boot device (which is an
> UEFI ESP, so not related to btrfs at all), I should be able to use that
> to revert changes preventing me from booting.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-04 Thread Kai Krakow
Am Mon, 4 Apr 2016 22:50:18 +0200
schrieb Kai Krakow :

> Am Mon, 4 Apr 2016 13:57:50 -0600
> schrieb Chris Murphy :
> 
> > On Mon, Apr 4, 2016 at 1:36 PM, Kai Krakow 
> > wrote:
> >   
> > >
> >  [...]
>  [...]  
> > >
> > > In the following sense: I should disable the automounter and
> > > backup job for the seed device while I let my data migrate back
> > > to main storage in the background...
> > 
> > The sprout can be written to just fine by the backup, just
> > understand that the seed and sprout volume UUID are different. Your
> > automounter is probably looking for the seed's UUID, and that seed
> > can only be mounted ro. The sprout UUID however can be mounted rw.
> > 
> > I would probably skip the automounter. Do the seed setup, mount it,
> > add all devices you're planning to add, then -o
> > remount,rw,compress... , and then activate the backup. But maybe
> > your backup also is looking for UUID? If so, that needs to be
> > updated first. Once the balance -dconvert=raid1 and -mconvert=raid1
> > is finished, then you can remove the seed device. And now might be
> > a good time to give the raid1 a new label, I think it inherits the
> > label of the seed but I'm not certain of this.
> > 
> >   
> > > My intention is to use fully my system while btrfs migrates the
> > > data from seed to main storage. Then, afterwards I'd like to
> > > continue using the seed device for backups.
> > >
> > > I'd probably do the following:
> > >
> > > 1. create btrfs pool, attach seed
> > 
> > I don't understand that step in terms of commands. Sprouts are made
> > with btrfs dev add, not with mkfs. There is no pool creation. You
> > make a seed. You mount it. Add devices to it. Then remount it.  
> 
> Hmm, yes. I didn't think this through into detail yet. It actually
> works that way. I more commonly referenced to the general approach.
> 
> But I think this answers my question... ;-)
> 
> > > 2. recreate my original subvolume structure by snapshotting the
> > > backup scratch area multiple times into each subvolume
> > > 3. rearrange the files in each subvolume to match their intended
> > > use by using rm and mv
> > > 4. reboot into full system
> > > 4. remove all left-over snapshots from the seed
> > > 5. remove (detach) the seed device
> > 
> > You have two 4's.  
> 
> Oh... Sorry... I think one week of 80 work hours, and another of 60
> was a bit too much... ;-)
> 
> > Anyway the 2nd 4 is not possible. The seed is ro by definition so
> > you can't remove snapshots from the seed. If you remove them from
> > the mounted rw sprout volume, they're removed from the sprout, not
> > the seed. If you want them on the sprout, but not on the seed, you
> > need to delete snapshots only after the seed is a.) removed from
> > the sprout and b.) made no longer a seed with btrfstune -S 0 and
> > c.) mounted rw.  
> 
> If I understand right, the seed device won't change? So whatever
> action I apply to the sprout pool, I can later remove the seed from
> the pool and it will still be kind of untouched. Except, I'll have to
> return it no non-seed mode (step b).
> 
> Why couldn't/shouldn't I remove snapshots before detaching the seed
> device? I want to keep them on the seed but they are useless to me on
> the sprout.
> 
> What happens to the UUIDs when I separate seed and sprout?
> 
> This is my layout:
> 
> /dev/sde1 contains my backup storage: btrfs with multiple weeks worth
> of retention in form of ro snapshots, and one scratch area in which
> the backup is performed. Snapshots are created from the scratch area.
> The scratch area is one single subvolume updated by rsync.
> 
> I want to turn this into a seed for my newly created btrfs pool. This
> one has subvolumes for /home, /home/my_user, /distribution_name/rootfs
> and a few more (like var/log etc).
> 
> Since the backup is not split by those subvolumes but contains just
> the single runtime view of my system rootfs, I'm planning to clone
> this single subvolume back into each of my previously used subvolumes
> which in turn of course now contain all the same complete filesystem
> tree. Thus, in the next step, I'm planning to mv/rm the contents to
> get back to the original subvolume structure - mv should be a fast
> operation here, rm probably not so but I don't bother. I could defer
> that until later by moving those rm-candidates into some trash folder
> per subvolume.
> 
> Now, I still have the ro-snapshots worth of multiple weeks of
> retention. I only need those in my backup storage, not in the storage
> proposed to become my bootable system. So I'd simply remove them. I
> could also defer that until later easily.
> 
> This should get my system back into working state pretty fast and
> easily if I didn't miss a point.
> 
> I'd now reboot into the system to see if it's working. By then, it's
> time for some cleanup (remove the previously deferred "trashes" and
> retention 

Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-04 Thread Kai Krakow
Am Mon, 4 Apr 2016 13:57:50 -0600
schrieb Chris Murphy :

> On Mon, Apr 4, 2016 at 1:36 PM, Kai Krakow 
> wrote:
> 
> >  
>  [...]  
> >>
> >> ?  
> >
> > In the following sense: I should disable the automounter and backup
> > job for the seed device while I let my data migrate back to main
> > storage in the background...  
> 
> The sprout can be written to just fine by the backup, just understand
> that the seed and sprout volume UUID are different. Your automounter
> is probably looking for the seed's UUID, and that seed can only be
> mounted ro. The sprout UUID however can be mounted rw.
> 
> I would probably skip the automounter. Do the seed setup, mount it,
> add all devices you're planning to add, then -o remount,rw,compress...
> , and then activate the backup. But maybe your backup also is looking
> for UUID? If so, that needs to be updated first. Once the balance
> -dconvert=raid1 and -mconvert=raid1 is finished, then you can remove
> the seed device. And now might be a good time to give the raid1 a new
> label, I think it inherits the label of the seed but I'm not certain
> of this.
> 
> 
> > My intention is to use fully my system while btrfs migrates the data
> > from seed to main storage. Then, afterwards I'd like to continue
> > using the seed device for backups.
> >
> > I'd probably do the following:
> >
> > 1. create btrfs pool, attach seed  
> 
> I don't understand that step in terms of commands. Sprouts are made
> with btrfs dev add, not with mkfs. There is no pool creation. You make
> a seed. You mount it. Add devices to it. Then remount it.

Hmm, yes. I didn't think this through into detail yet. It actually
works that way. I more commonly referenced to the general approach.

But I think this answers my question... ;-)

> > 2. recreate my original subvolume structure by snapshotting the
> > backup scratch area multiple times into each subvolume
> > 3. rearrange the files in each subvolume to match their intended
> > use by using rm and mv
> > 4. reboot into full system
> > 4. remove all left-over snapshots from the seed
> > 5. remove (detach) the seed device  
> 
> You have two 4's.

Oh... Sorry... I think one week of 80 work hours, and another of 60 was
a bit too much... ;-)

> Anyway the 2nd 4 is not possible. The seed is ro by definition so you
> can't remove snapshots from the seed. If you remove them from the
> mounted rw sprout volume, they're removed from the sprout, not the
> seed. If you want them on the sprout, but not on the seed, you need to
> delete snapshots only after the seed is a.) removed from the sprout
> and b.) made no longer a seed with btrfstune -S 0 and c.) mounted rw.

If I understand right, the seed device won't change? So whatever action
I apply to the sprout pool, I can later remove the seed from the pool
and it will still be kind of untouched. Except, I'll have to return it
no non-seed mode (step b).

Why couldn't/shouldn't I remove snapshots before detaching the seed
device? I want to keep them on the seed but they are useless to me on
the sprout.

What happens to the UUIDs when I separate seed and sprout?

This is my layout:

/dev/sde1 contains my backup storage: btrfs with multiple weeks worth
of retention in form of ro snapshots, and one scratch area in which the
backup is performed. Snapshots are created from the scratch area. The
scratch area is one single subvolume updated by rsync.

I want to turn this into a seed for my newly created btrfs pool. This
one has subvolumes for /home, /home/my_user, /distribution_name/rootfs
and a few more (like var/log etc).

Since the backup is not split by those subvolumes but contains just the
single runtime view of my system rootfs, I'm planning to clone this
single subvolume back into each of my previously used subvolumes which
in turn of course now contain all the same complete filesystem tree.
Thus, in the next step, I'm planning to mv/rm the contents to get back
to the original subvolume structure - mv should be a fast operation
here, rm probably not so but I don't bother. I could defer that until
later by moving those rm-candidates into some trash folder per
subvolume.

Now, I still have the ro-snapshots worth of multiple weeks of
retention. I only need those in my backup storage, not in the storage
proposed to become my bootable system. So I'd simply remove them. I
could also defer that until later easily.

This should get my system back into working state pretty fast and
easily if I didn't miss a point.

I'd now reboot into the system to see if it's working. By then, it's
time for some cleanup (remove the previously deferred "trashes" and
retention snapshots), then separate the seed from the sprout. During
that time, I could already use my system again while it's migrating for
me in the background.

I'd then return the seed back to non-seed, so it can take the role of
my backup storage again. I'd do a rebalance now.

During the whole process, the backup storage 

Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-04 Thread Chris Murphy
On Mon, Apr 4, 2016 at 1:36 PM, Kai Krakow  wrote:

>
>> > I guess the
>> > seed source cannot be mounted or modified...
>>
>> ?
>
> In the following sense: I should disable the automounter and backup job
> for the seed device while I let my data migrate back to main storage in
> the background...

The sprout can be written to just fine by the backup, just understand
that the seed and sprout volume UUID are different. Your automounter
is probably looking for the seed's UUID, and that seed can only be
mounted ro. The sprout UUID however can be mounted rw.

I would probably skip the automounter. Do the seed setup, mount it,
add all devices you're planning to add, then -o remount,rw,compress...
, and then activate the backup. But maybe your backup also is looking
for UUID? If so, that needs to be updated first. Once the balance
-dconvert=raid1 and -mconvert=raid1 is finished, then you can remove
the seed device. And now might be a good time to give the raid1 a new
label, I think it inherits the label of the seed but I'm not certain
of this.


> My intention is to use fully my system while btrfs migrates the data
> from seed to main storage. Then, afterwards I'd like to continue using
> the seed device for backups.
>
> I'd probably do the following:
>
> 1. create btrfs pool, attach seed

I don't understand that step in terms of commands. Sprouts are made
with btrfs dev add, not with mkfs. There is no pool creation. You make
a seed. You mount it. Add devices to it. Then remount it.


> 2. recreate my original subvolume structure by snapshotting the backup
>scratch area multiple times into each subvolume
> 3. rearrange the files in each subvolume to match their intended use by
>using rm and mv
> 4. reboot into full system
> 4. remove all left-over snapshots from the seed
> 5. remove (detach) the seed device

You have two 4's.

Anyway the 2nd 4 is not possible. The seed is ro by definition so you
can't remove snapshots from the seed. If you remove them from the
mounted rw sprout volume, they're removed from the sprout, not the
seed. If you want them on the sprout, but not on the seed, you need to
delete snapshots only after the seed is a.) removed from the sprout
and b.) made no longer a seed with btrfstune -S 0 and c.) mounted rw.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-04 Thread Kai Krakow
Am Sun, 3 Apr 2016 18:51:07 -0600
schrieb Chris Murphy :

> > BTW: Is it possible to use my backup drive (it's btrfs single-data
> > dup-metadata, single device) as a seed device for my newly created
> > btrfs pool (raid0-data, raid1-metadata, three devices)?  
> 
> Yes.
> 
> I just tried doing the conversion to raid1 before and after seed
> removal, but with the small amount of data (4GiB) I can't tell a
> difference. It seems like -dconvert=raid with seed still connected
> makes two rw copies (i.e. there's a ro copy which is the original, and
> then two rw copies on 2 of the 3 devices I added all at the same time
> to the seed), and the 'btrfs dev remove' command to remove the seed
> happened immediately, suggested the prior balances had already
> migrated copies off the seed. This may or may not be optimal for your
> case.
> 
> Two gotchas.
> 
> I ran into this bug:
> btrfs fi usage crash when volume contains seed device
> https://bugzilla.kernel.org/show_bug.cgi?id=115851
> 
> And there is a phantom single chunk on one of the new rw devices that
> was added. Data,single: Size:1.00GiB, Used:0.00B
>/dev/dm-8   1.00GiB
> 
> It's still there after the -dconvert=raid1 and separate -mconvert=raid
> and after seed device removal. A balance start without filters removes
> it, chances are had I used -dconvert=raid1,soft it would have vanished
> also but I didn't retest for that.

Good to know, thanks.

> > I guess the
> > seed source cannot be mounted or modified...  
> 
> ?

In the following sense: I should disable the automounter and backup job
for the seed device while I let my data migrate back to main storage in
the background...

My intention is to use fully my system while btrfs migrates the data
from seed to main storage. Then, afterwards I'd like to continue using
the seed device for backups.

I'd probably do the following:

1. create btrfs pool, attach seed
2. recreate my original subvolume structure by snapshotting the backup
   scratch area multiple times into each subvolume
3. rearrange the files in each subvolume to match their intended use by
   using rm and mv
4. reboot into full system
4. remove all left-over snapshots from the seed
5. remove (detach) the seed device
6. rebalance
7. switch bcache to write-back mode (or attach bcache only now)


-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-04 Thread Kai Krakow
Am Mon, 4 Apr 2016 04:34:54 + (UTC)
schrieb Duncan <1i5t5.dun...@cox.net>:

> Meanwhile, putting bcache into write-around mode, so it makes no
> further changes to the ssd and only uses it for reads, is probably
> wise, and should help limit further damage.  Tho if in that mode
> bcache still does writeback of existing dirty and cached data to the
> backing store, some further damage could occur from that.  But I
> don't know enough about bcache to know what its behavior and level of
> available configuration in that regard actually are.  As long as it's
> not trying to write anything from the ssd to the backing store, I
> think further damage should be very limited.

bcache has 0 for dirty data most of the time for me - even in write
back mode. It does write back during idle time and at reduced rate,
usually that finishes within a few minutes.

Switching the cache to write-around initiates instant write-back of all
dirty data, so within seconds it goes down to zero and the cache
becomes detachable.

I'll go test the soon-to-die SSD as soon as it replaced. I think it's
still far from failing with bitrot. It was overprovisioned by 30% most
of the time, with the spare space trimmed. It certainly should have a
lot of sectors for wear levelling. In addition, smartctl shows no
sector errors at all - except for one: raw_read_error_rate. I'm not
sure what all those sensors tell me, but that one I'm also seeing on
hard disks which show absolutely no data damage.

In fact, I see those counters for my hard disks. But dd to /dev/null of
the complete raw hard disk shows no sector errors. It seems good. But
well, counting 1+1 together: I currently see data damage. But I guess
that's unrelated.

Is there some documentation somewhere what each of those sensors
technically mean and how to read the raw values and thresh values?

I'm also seeing multi_zone_error_rate on my spinning rust.

According to smartctl health check and smartctl extended selftest,
there's no problems at all - and the smart error log is empty. There
has never been an ATA error in dmesg... No relocated sectors... From my
naive view the drives still look good.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-03 Thread Duncan
Kai Krakow posted on Mon, 04 Apr 2016 00:19:25 +0200 as excerpted:

> The corruptions seem to be different by the following observation:
> 
> While the VDI file was corrupted over and over again with a csum error,
> I could simply remove it and restore from backup. The last thing I did
> was ddescue it from the damaged version to my backup device, than rsync
> the file back to the originating device (which created a new file
> side-by-side, so in a new area of disk space, then replace-by-renamed
> the old one). I didn't run VirtualBox since back then but the file
> didn't become corrupted either since then.
> 
> But now, according to btrfsck, a csum error instead came up in another
> big file from Steam. This time, when I rm the file, the kernel
> backtraces and sends btrfs to RO mode. The file cannot be removed. I'm
> going to leave it that way currently, the file won't be used currently.
> And I can simply ignore it for backup and restore, it's not an important
> one. Better have an "incorrectable" csum error there than having one
> jumping unpredictably across my files.

While my dying ssd experience was with btrfs raid1 direct on a pair of 
ssds, extrapolating from what I learned about the ssd behavior to your 
case with bcache caching to the ssd, then writing back to the spinning 
rust backing store, presumably in btrfs single-device mode with single 
data and either single or dup metadata (there's enough other cases 
interwoven on this thread its no longer clear to me which posted btrfs fi 
show, etc, apply to this case, so I'm guessing, as I believe presenting 
it as more than a single device at the btrfs level would require multiple 
bcache devices, tho of course you could do that by partitioning the 
ssd)...

Would lead me to predict very much the behavior you're seeing, if the 
caching ssd was dying.

As bcache is running below btrfs, btrfs won't know anything about it, and 
therefore, will behave, effectively, as if it's not there -- an error on 
the ssd will look like an error on the btrfs, period.  (As I'm assuming a 
single btrfs device, which device of the btrfs doesn't come into 
question, tho which copy of dup metadata might... but that's an entirely 
different can of worms since I'm not sure whether the bcache would end up 
deduping the dup metadata or not, and the ssd might do the same, and...)

And with bcache doing write-behind from the ssd to the backing store, 
underneath the level at which btrfs could detect and track csum 
corruption, if it's corrupt on the ssd, that corruption then transfers to 
the backing store as btrfs won't know that transfer is happening at all 
and thus won't be in the loop to detect the csum error at that stage.


Meanwhile, what I saw on the pair of ssds, one going bad, in btrfs raid1 
mode, was that a btrfs scrub *WOULD* successfully detect the csum errors 
on the bad ssd, and rewrite it from the remaining good copy.

Keep in mind that this is without snapshots, so that rewrite, while COW, 
would then release the old copy back into the free space pool.  In so 
doing, it would trigger the ssd firmware to copy the rest of the erase-
block and erase it, and that in turn would trigger the firmware to detect 
the bad sector and replace it with one from its spare-sectors list.  As a 
result, it would tick up the raw value of attribute #5, 
Reallocated_Sector_Ct, as well as 182, Erase_Fail_Count_Total, in smartctl 
-A (tho the two attributes didn't increase in numeric lock-step, both 
were increasing over time, primarily when I ran scrubs).


But it was mostly (almost entirely) when I ran the scrubs and 
consequently rewrote the corrupted sectors from the copy on the good 
device, that it would trigger those erase-fails and sector reallocations.

Anyway, the failing ssd's issues gradually got worse, until I was having 
to scrub and trigger both filesystem recopy and bad ssd sector rewrites 
any time I wrote anything major to the filesystem as well as at cold-boot 
(leaving the system off for several hours apparently accelerated the 
sector rot within stable data, while the powered-on state kept the flash 
cells charged high enough they didn't rot so fast and it was mostly or 
entirely new/changed data I had to worry about).  Eventually I simply 
decided I was tired of the now more or less constant hassle and I wasn't 
learning much new any more from the decaying device's behavior, and I 
replaced it.


Translating that to your case, if your caching ssd is dying and some 
sectors are now corrupted, unless there's a second btrfs copy of that 
block to copy over the bad version with, it's unlikely to trigger those 
sector reallocations.

Tho actually rewriting them (or at the device firmware level, COWing them 
and erasing the old erase-blocks), as bcache will be doing if it dumps 
the current cache content and fills those blocks with something else, 
should trigger the same thing, tho unless bcache can force-dump and 
recache or something, I don't believe there's 

Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-03 Thread Chris Murphy
On Sun, Apr 3, 2016 at 4:19 PM, Kai Krakow  wrote:

> BTW: Is it possible to use my backup drive (it's btrfs single-data
> dup-metadata, single device) as a seed device for my newly created
> btrfs pool (raid0-data, raid1-metadata, three devices)?

Yes.

I just tried doing the conversion to raid1 before and after seed
removal, but with the small amount of data (4GiB) I can't tell a
difference. It seems like -dconvert=raid with seed still connected
makes two rw copies (i.e. there's a ro copy which is the original, and
then two rw copies on 2 of the 3 devices I added all at the same time
to the seed), and the 'btrfs dev remove' command to remove the seed
happened immediately, suggested the prior balances had already
migrated copies off the seed. This may or may not be optimal for your
case.

Two gotchas.

I ran into this bug:
btrfs fi usage crash when volume contains seed device
https://bugzilla.kernel.org/show_bug.cgi?id=115851

And there is a phantom single chunk on one of the new rw devices that was added.
Data,single: Size:1.00GiB, Used:0.00B
   /dev/dm-8   1.00GiB

It's still there after the -dconvert=raid1 and separate -mconvert=raid
and after seed device removal. A balance start without filters removes
it, chances are had I used -dconvert=raid1,soft it would have vanished
also but I didn't retest for that.


> I guess the
> seed source cannot be mounted or modified...

?



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-03 Thread Kai Krakow
Am Sun, 3 Apr 2016 05:06:19 + (UTC)
schrieb Duncan <1i5t5.dun...@cox.net>:

> Kai Krakow posted on Sun, 03 Apr 2016 06:02:02 +0200 as excerpted:
> 
> > No, other files are affected, too. And it looks like those files are
> > easily affected even when removed and recreated from whatever backup
> > source.  
> 
> I've seen you say that several times now, I think.  But none of those 
> times has it apparently occurred to you to double-check whether it's
> the /same/ corruptions every time, or at least, if you checked it,
> I've not seen it actually /reported/.  (Note that I didn't say you
> didn't report it, only that I've not seen it.  A difference there is!
> =:^)

Believe me, I would double check... But this FS is (and the affected
files are) just too big to create test cases, and backups, and copies,
and you know what...

So only chance I see is to offer help improving "btrfsck --repair"
before I wipe and restore from backup. Except the unlikely case
"--repair" will improve to a point it gets my FS back in order. ;-)

I'll have to wait for my new bcache SSD to arrive. I it's current state
(lifetime at 97%) I don't want to push my whole file data through it.

Then I'll backup the current state (the damaged files are skipped
anyways because they haven't been "modified" according to mtime), so
I'll get a clean backup except for the VDI file and some big Steam
files (which actually can easily be downloaded again through the
client).

And yes, you are true in that I didn't check if it is the same
corruption every time. But that's also a bit difficult to do because
I'd need either enough spare disk space to keep copies of the files to
compare against, or need to setup some block-identifying checksumming
like a hash tree.

> If I'm getting repeated corruptions of something, that's the first
> thing I'd check, is there some repeating pattern to those
> corruptions, same place in the file, same "wanted" value (expected),
> same "got" value, (not expected if it's reporting corruption), etc.

Way to go, usually...

> Then I'd try different variations like renaming the file, putting it
> in a different directory with all of the same other files, putting it
> in a different directory with all different files, putting it in a
> different directory by itself, putting it in the same directory but
> in a different subvolume... you get the point.

Here's the point: Shuffling files around should be done to different
filesystems. I neither have any spare files to do that, nor I currently
can afford time to shuffle around such big files - it takes multiple
hours to copy these. Already looking forward to restoring the backup...
*sigh*

BTW: Is it possible to use my backup drive (it's btrfs single-data
dup-metadata, single device) as a seed device for my newly created
btrfs pool (raid0-data, raid1-metadata, three devices)? I guess the
seed source cannot be mounted or modified...

> Then I'd try different mount options, with and without compression,
> with different kinds of compression, with compress-force and with
> simple compress, with and without autodefrag...

As a first step I've switched bcache to write-around mode. It should
prevent (or at least reduce) more corruption if bcache is at fault. And
it's the safer choice anyway for a soon-to-die SSD.

> I could try it with nocow enabled for the file (note that the file
> has to be created with nocow before it gets content, for nocow to
> take effect), tho of course that'll turn off btrfs checksumming, but
> I could still for instance md5sum the original source and the nocowed
> test version and see if it tests clean that way.

I already thought about putting the VDI back to nocow... I had this
before. But in this sense, csum errors would go unnoticed. So I don't
think that is adequate.

But in consequence I could actually md5sum the files as you wrote
because there won't be read errors due to csum mismatch. And I could
detect corruption that way.

> I could try it with nocow on the file but with a bunch of snapshots 
> interwoven with writing changes to the file (obviously this will kill 
> comparison against the original, but I could arrange to write the
> same changes to the test file on btrfs, and to a control copy of the
> file on non-btrfs, and then md5sum or whatever compare them).

That would probably work but I do not quite trust it due to the
corruptions already on disk which seemingly damage specific files or
areas on the disk.

> Then, if I had the devices available to do so, I'd try it in a
> different btrfs of the same layout (same redundancy mode and number
> of devices), both single and dup mode on a single device, etc.

In that sense: If I had the disks available I already would've taken a
block-by-block copy and then restored from backup.

> And again if available, I'd try swapping the filesystem to different 
> machines...

Maybe another time... ;-)

Actually, I only have that one system here. I could do that with the
other system I have problems with 

Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-03 Thread Chris Murphy
On Sat, Apr 2, 2016 at 10:02 PM, Kai Krakow  wrote:
> Am Sat, 2 Apr 2016 18:14:17 -0600

> Also I think, having options nossd+autodefrag+lzo shouldn't be an
> exotic or unsupported option. Having this on top of bcache should just
> work.

I'm not suggesting it shouldn't work. But in fact something isn't
working. Bugs happen. Regressions happen. This is a process of
elimination project to find out either why, or under what
condition(s), it doesn't work.


> Does it make sense while I still have the corruptions in the FS? I'd
> like to wait for Qu whether I should recreate the FS or whether I
> should take some image, or send info to improve btrfsck...

It's up to you. I think it's fair to say the file system should not be
corrupting files so long as it's willing to write to the volume. So
that's a problem in and of itself; it should sooner go read only.

It's completely reasonable to take a btrfs-image, back everything up,
and then try a 'btrfs check --repair' and see if it can fix things up.
If not, that makes the btrfs-image more valuable.



> I think the latter two are easily the least probable sort of bugs. But
> I'll give it a try. For the time being, I could switch bcache to
> write-around mode - so it could at least not corrupt btrfs during
> writes.

I don't know enough about bcache to speculate what can happen if there
are already fs corruptions. Is it possible bcache makes things worse?
No idea.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-02 Thread Duncan
Kai Krakow posted on Sun, 03 Apr 2016 06:02:02 +0200 as excerpted:

> No, other files are affected, too. And it looks like those files are
> easily affected even when removed and recreated from whatever backup
> source.

I've seen you say that several times now, I think.  But none of those 
times has it apparently occurred to you to double-check whether it's the 
/same/ corruptions every time, or at least, if you checked it, I've not 
seen it actually /reported/.  (Note that I didn't say you didn't report 
it, only that I've not seen it.  A difference there is! =:^)

If I'm getting repeated corruptions of something, that's the first thing 
I'd check, is there some repeating pattern to those corruptions, same 
place in the file, same "wanted" value (expected), same "got" value, (not 
expected if it's reporting corruption), etc.

Then I'd try different variations like renaming the file, putting it in a 
different directory with all of the same other files, putting it in a 
different directory with all different files, putting it in a different 
directory by itself, putting it in the same directory but in a different 
subvolume... you get the point.

Then I'd try different mount options, with and without compression, with 
different kinds of compression, with compress-force and with simple 
compress, with and without autodefrag...

I could try it with nocow enabled for the file (note that the file has to 
be created with nocow before it gets content, for nocow to take effect), 
tho of course that'll turn off btrfs checksumming, but I could still for 
instance md5sum the original source and the nocowed test version and see 
if it tests clean that way.

I could try it with nocow on the file but with a bunch of snapshots 
interwoven with writing changes to the file (obviously this will kill 
comparison against the original, but I could arrange to write the same 
changes to the test file on btrfs, and to a control copy of the file on 
non-btrfs, and then md5sum or whatever compare them).

Then, if I had the devices available to do so, I'd try it in a different 
btrfs of the same layout (same redundancy mode and number of devices), 
both single and dup mode on a single device, etc.

And again if available, I'd try swapping the filesystem to different 
machines...

OK, so trying /all/ the above might be a bit overboard but I think you 
get the point.  Try to find some pattern or common element in the whole 
thing, and report back the results at least for the "simple" experiments 
like whether the corruption appears to be the same (same got at the same 
spot) or different, and whether putting the file in a different subdir or 
using a different name for it matters at all.  =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-02 Thread Kai Krakow
Am Sat, 2 Apr 2016 18:14:17 -0600
schrieb Chris Murphy :

> On Sat, Apr 2, 2016 at 2:16 PM, Kai Krakow 
> wrote:
> 
> > I'll go checking the RAM for problems - tho that would be the first
> > time in twenty years that a RAM module hadn't errors from the
> > beginning. Well, you'll never know. But I expect no error since
> > usually this would mean all sorts of different and random problems
> > which I don't have. Problems are very specific, which is atypical
> > for RAM errors.  
> 
> Well so far it's just the VDI that's experiencing csum mismatch
> errors, right? So that's not bad RAM, which would affect other files
> too. And same for a failing SSD.

No, other files are affected, too. And it looks like those files are
easily affected even when removed and recreated from whatever backup
source.

> I think you've got a bug somewhere and it's just hard to say where it
> is based on the available information. I've already lost track if
> others have all of the exact same setup you do: bcache + nossd +
> autodefrag + lzo + VirtualBox writing to VDI on this Btrfs volume.
> There are others who have some of those options, but I don't know if
> there's anyone who has all of those going on.

I didn't run VirtualBox since the incident. So I'd rule out VirtualBox.
Currently, there seems to be no csum error for the VDI file, instead
now another file gets corruptions, even after recreated. I think it is
result of another corruption and thus a side effect.

Also I think, having options nossd+autodefrag+lzo shouldn't be an
exotic or unsupported option. Having this on top of bcache should just
work.

Let's not rule out bcache had a problem although I usually expect
bcache to freak out with internal btree corruption then.

> Maybe Qu has some suggestions, but if it were me I'd do this. Build
> mainline 4.5.0, it's a known quantity by Btrfs devs.

4.5.0-gentoo is currently only a few patches so I could easily build
vanilla.

> Build the kernel
> with BTRFS_FS_CHECK_INTEGRITY enabled in kernel config. And when you
> mount the file system, don't use mount option check_int, just use your
> regular mount options and try to reproduce the VDI corruption. If you
> can reproduce it, then start over, this time with check_int mount
> option included along with the others you're using and try to
> reproduce. It's possible there will be fairly verbose kernel messages,
> so use boot parameter log_buf_len=1M and then that way you can use
> dmesg rather than depending on journalctl -k which sometimes drops
> messages if there are too many.

Does it make sense while I still have the corruptions in the FS? I'd
like to wait for Qu whether I should recreate the FS or whether I
should take some image, or send info to improve btrfsck...

I'm pretty sure I do not have reproducible corruptions which are not
caused by another corruption - so check_int would probably be of less
use currently.

> If you reproduce the corruption while check_int is enabled, kernel
> messages should have clues and then you can put that in a file and
> attach to the list or open a bug. FWIW, I'm pretty sure your MUA is
> wrapping poorly, when I look at this URL for your post with smartctl
> output, it wraps in a way that's essentially impossible to sort out at
> a glance. Whether it's your MUA or my web browser pretty much doesn't
> matter, it's not legible so what I do is just attach as file to a bug
> report or if small enough onto the list itself.
> http://www.spinics.net/lists/linux-btrfs/msg53790.html

Claws mail is just too smart for me... It showed up correctly in the
editor before hitting the send button. I wish I could go back to knode
(that did it's job right). But it's currently an unsupported orphan
project of KDE. :-(

> Finally, I would retest yet again with check_int_data as a mount
> option and try to reproduce. This is reported to be dirt slow, but it
> might capture something that check_int doesn't. But I admit this is
> throwing spaghetti on the wall, and is something of a goose chase just
> because I don't know what else to recommend other than iterating all
> of your mount options from none, adding just one at a time, and trying
> to reproduce. That somehow sounds more tedious. But chances are you'd
> find out what mount option is causing it; OR maybe you'd find out the
> corruption always happens, even with defaults, even without bcache, in
> which case that'd seem to implicate either a gentoo patch, or a
> virtual box bug of some sort.

I think the latter two are easily the least probable sort of bugs. But
I'll give it a try. For the time being, I could switch bcache to
write-around mode - so it could at least not corrupt btrfs during
writes.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-02 Thread Chris Murphy
On Sat, Apr 2, 2016 at 2:16 PM, Kai Krakow  wrote:

> I'll go checking the RAM for problems - tho that would be the first
> time in twenty years that a RAM module hadn't errors from the
> beginning. Well, you'll never know. But I expect no error since usually
> this would mean all sorts of different and random problems which I
> don't have. Problems are very specific, which is atypical for RAM
> errors.

Well so far it's just the VDI that's experiencing csum mismatch
errors, right? So that's not bad RAM, which would affect other files
too. And same for a failing SSD.

I think you've got a bug somewhere and it's just hard to say where it
is based on the available information. I've already lost track if
others have all of the exact same setup you do: bcache + nossd +
autodefrag + lzo + VirtualBox writing to VDI on this Btrfs volume.
There are others who have some of those options, but I don't know if
there's anyone who has all of those going on.

Maybe Qu has some suggestions, but if it were me I'd do this. Build
mainline 4.5.0, it's a known quantity by Btrfs devs. Build the kernel
with BTRFS_FS_CHECK_INTEGRITY enabled in kernel config. And when you
mount the file system, don't use mount option check_int, just use your
regular mount options and try to reproduce the VDI corruption. If you
can reproduce it, then start over, this time with check_int mount
option included along with the others you're using and try to
reproduce. It's possible there will be fairly verbose kernel messages,
so use boot parameter log_buf_len=1M and then that way you can use
dmesg rather than depending on journalctl -k which sometimes drops
messages if there are too many.

If you reproduce the corruption while check_int is enabled, kernel
messages should have clues and then you can put that in a file and
attach to the list or open a bug. FWIW, I'm pretty sure your MUA is
wrapping poorly, when I look at this URL for your post with smartctl
output, it wraps in a way that's essentially impossible to sort out at
a glance. Whether it's your MUA or my web browser pretty much doesn't
matter, it's not legible so what I do is just attach as file to a bug
report or if small enough onto the list itself.
http://www.spinics.net/lists/linux-btrfs/msg53790.html

Finally, I would retest yet again with check_int_data as a mount
option and try to reproduce. This is reported to be dirt slow, but it
might capture something that check_int doesn't. But I admit this is
throwing spaghetti on the wall, and is something of a goose chase just
because I don't know what else to recommend other than iterating all
of your mount options from none, adding just one at a time, and trying
to reproduce. That somehow sounds more tedious. But chances are you'd
find out what mount option is causing it; OR maybe you'd find out the
corruption always happens, even with defaults, even without bcache, in
which case that'd seem to implicate either a gentoo patch, or a
virtual box bug of some sort.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-02 Thread Kai Krakow
Am Sat, 2 Apr 2016 19:17:55 +0200
schrieb Henk Slager :

> On Sat, Apr 2, 2016 at 11:00 AM, Kai Krakow 
> wrote:
> > Am Fri, 1 Apr 2016 01:27:21 +0200
> > schrieb Henk Slager :
> >  
> >> It is not clear to me what 'Gentoo patch-set r1' is and does. So
> >> just boot a vanilla v4.5 kernel from kernel.org and see if you get
> >> csum errors in dmesg.  
> >
> > It is the gentoo patchset, I don't think anything there relates to
> > btrfs:
> > https://dev.gentoo.org/~mpagano/genpatches/trunk/4.5/
> >  
> >> Also, where does 'duplicate object' come from? dmesg ? then please
> >> post its surroundings, straight from dmesg.  
> >
> > It was in dmesg. I already posted it in the other thread and Qu took
> > note of it. Apparently, I didn't manage to capture anything else
> > than:
> >
> > btrfs_run_delayed_refs:2927: errno=-17 Object already exists
> >
> > It hit me unexpected. This was the first time btrfs went RO for me.
> > It was with kernel 4.4.5 I think.
> >
> > I suspect this is the outcome of unnoticed corruptions that sneaked
> > in earlier over some period of time. The system had no problems
> > until this incident, and only then I discovered the huge pile of
> > corruptions when I ran btrfsck.
> >
> > I'm also pretty convinced now that VirtualBox itself is not the
> > problem but only victim of these corruptions, that's why it
> > primarily shows up in the VDI file.
> >
> > However, I now found csum errors in unrelated files (see other post
> > in this thread), even for files not touched in a long time.  
> 
> Ok, this is some good further status and background. That there are
> more csum errors elsewhere is quite worrying I would say. You said HW
> is tested, are you sure there no rare undetected failures, like due to
> overclocking or just aging or whatever. It might just be that spurious
> HW errors just now start to happen and are unrelated to kernel upgrade
> from 4.4.x to 4.5.
> I had once a RAM module going bad; Windows7 ran fine (at least no
> crashes), but when I booted with Linux/btrfs, all kinds of strange
> btrfs errors started to appear including csum errors.

I'll go checking the RAM for problems - tho that would be the first
time in twenty years that a RAM module hadn't errors from the
beginning. Well, you'll never know. But I expect no error since usually
this would mean all sorts of different and random problems which I
don't have. Problems are very specific, which is atypical for RAM
errors.

The hardware is not overclocked, every part was tested when installed.

> The other thing you could think about is the SSD cache partition. I
> don't remember if blocks from RAM to SSD get an extra CRC attached
> (independent of BTRFS). But if data gets corrupted while in the SSD,
> you could get very nasty errors, how nasty depends a bit on the
> various bcache settings. It is not unthinkable that dirty changed data
> gets written to the harddisks. But at least btrfs (scub) can detect
> that (the situation you are in now).

Well, the SSD could in fact soon become a problem. It's at 97% of its
lifetime according to SMART. I'm probably somewhere near 85TB (that's
the lifetime spec of the SSD) of written data within one year thanks to
some unfortunate disk replacement (btrfs replace) action with btrfs
through bcache, and weekly scrubs (which does not just read, but
writes).

ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f   100
100   000Pre-fail  Always   -   1 5 Reallocate_NAND_Blk_Cnt
0x0033   100   100   000Pre-fail  Always   -   0 9
Power_On_Hours  0x0032   100   100   000Old_age
Always   -   8705 12 Power_Cycle_Count   0x0032   100
100   000Old_age   Always   -   286 171
Program_Fail_Count  0x0032   100   100   000Old_age
Always   -   0 172 Erase_Fail_Count0x0032   100   100
000Old_age   Always   -   0 173 Ave_Block-Erase_Count
0x0032   003   003   000Old_age   Always   -   2913 174
Unexpect_Power_Loss_Ct  0x0032   100   100   000Old_age
Always   -   112 180 Unused_Reserve_NAND_Blk 0x0033   000
000   000Pre-fail  Always   -   1036 183
SATA_Interfac_Downshift 0x0032   100   100   000Old_age
Always   -   0 184 Error_Correction_Count  0x0032   100   100
000Old_age   Always   -   0 187 Reported_Uncorrect
0x0032   100   100   000Old_age   Always   -   0 194
Temperature_Celsius 0x0022   067   057   000Old_age
Always   -   33 (Min/Max 20/43) 196 Reallocated_Event_Count
0x0032   100   100   000Old_age   Always   -   0 197
Current_Pending_Sector  0x0032   100   100   000Old_age
Always   -   0 198 Offline_Uncorrectable   0x0030   100   100
000Old_age   Offline  -   0 199 UDMA_CRC_Error_Count
0x0032   100   100   000Old_age   Always   -   0 202

Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-02 Thread Henk Slager
On Sat, Apr 2, 2016 at 11:00 AM, Kai Krakow  wrote:
> Am Fri, 1 Apr 2016 01:27:21 +0200
> schrieb Henk Slager :
>
>> It is not clear to me what 'Gentoo patch-set r1' is and does. So just
>> boot a vanilla v4.5 kernel from kernel.org and see if you get csum
>> errors in dmesg.
>
> It is the gentoo patchset, I don't think anything there relates to
> btrfs:
> https://dev.gentoo.org/~mpagano/genpatches/trunk/4.5/
>
>> Also, where does 'duplicate object' come from? dmesg ? then please
>> post its surroundings, straight from dmesg.
>
> It was in dmesg. I already posted it in the other thread and Qu took
> note of it. Apparently, I didn't manage to capture anything else than:
>
> btrfs_run_delayed_refs:2927: errno=-17 Object already exists
>
> It hit me unexpected. This was the first time btrfs went RO for me. It
> was with kernel 4.4.5 I think.
>
> I suspect this is the outcome of unnoticed corruptions that sneaked in
> earlier over some period of time. The system had no problems until this
> incident, and only then I discovered the huge pile of corruptions when I
> ran btrfsck.
>
> I'm also pretty convinced now that VirtualBox itself is not the problem
> but only victim of these corruptions, that's why it primarily shows up
> in the VDI file.
>
> However, I now found csum errors in unrelated files (see other post in
> this thread), even for files not touched in a long time.

Ok, this is some good further status and background. That there are
more csum errors elsewhere is quite worrying I would say. You said HW
is tested, are you sure there no rare undetected failures, like due to
overclocking or just aging or whatever. It might just be that spurious
HW errors just now start to happen and are unrelated to kernel upgrade
from 4.4.x to 4.5.
I had once a RAM module going bad; Windows7 ran fine (at least no
crashes), but when I booted with Linux/btrfs, all kinds of strange
btrfs errors started to appear including csum errors.

The other thing you could think about is the SSD cache partition. I
don't remember if blocks from RAM to SSD get an extra CRC attached
(independent of BTRFS). But if data gets corrupted while in the SSD,
you could get very nasty errors, how nasty depends a bit on the
various bcache settings. It is not unthinkable that dirty changed data
gets written to the harddisks. But at least btrfs (scub) can detect
that (the situation you are in now).

Maybe to further isolate just btrfs, you could temporary rule out
bcache by making sure the cache is clean and then increase the
startsectors of second partitions on the harddisks by 16 (8KiB) and
then reboot. Of course after any write to the partitions, you'll have
to recreate all bcache.

But maybe it is just due to bugs in older kernels that the fs has been
silently corrupted and now kernel 4.5 cannot handle it anymore and any
use of the fs increases corruption.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-02 Thread Kai Krakow
Am Fri, 1 Apr 2016 01:27:21 +0200
schrieb Henk Slager :

> It is not clear to me what 'Gentoo patch-set r1' is and does. So just
> boot a vanilla v4.5 kernel from kernel.org and see if you get csum
> errors in dmesg.

It is the gentoo patchset, I don't think anything there relates to
btrfs:
https://dev.gentoo.org/~mpagano/genpatches/trunk/4.5/

> Also, where does 'duplicate object' come from? dmesg ? then please
> post its surroundings, straight from dmesg.

It was in dmesg. I already posted it in the other thread and Qu took
note of it. Apparently, I didn't manage to capture anything else than:

btrfs_run_delayed_refs:2927: errno=-17 Object already exists

It hit me unexpected. This was the first time btrfs went RO for me. It
was with kernel 4.4.5 I think.

I suspect this is the outcome of unnoticed corruptions that sneaked in
earlier over some period of time. The system had no problems until this
incident, and only then I discovered the huge pile of corruptions when I
ran btrfsck.

I'm also pretty convinced now that VirtualBox itself is not the problem
but only victim of these corruptions, that's why it primarily shows up
in the VDI file.

However, I now found csum errors in unrelated files (see other post in
this thread), even for files not touched in a long time.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-04-02 Thread Kai Krakow
Am Fri, 1 Apr 2016 09:10:44 +0800
schrieb Qu Wenruo :

> The real problem is, the extent has mismatched reference.
> Normally it can fixed by --init-extent-tree option, but it normally 
> means bigger problem, especially it has already caused kernel 
> delayed-ref problem.
> 
> No to mention the error "extent item 11271947091968 has multiple
> extent items", which makes the problem more serious.
> 
> 
> I assume some older kernel have already screwed up the extent tree,
> as although delayed-ref is bug-prove, it has improved in recent years.
> 
> But it seems fs tree is less damaged, I assume the extent tree 
> corruption could be fixed by "--init-extent-tree".
> 
> For the only fs tree error (missing csum), if "btrfsck 
> --init-extent-tree --repair" works without any problem, the most
> simple fix would be, just removing the file.
> Or you can use a lot of CPU time and disk IO to rebuild the whole
> csum, by using "--init-csum-tree" option.

Okay, so I'm going to inode-resolve the file with csum errors.
Actually, it's a file from Steam which has been there for ages and
never showed csum errors before which make me wonder if csum errors may
sneak in on long existing files through other corruptions.

I now removed this file and had to reboot because btrfs went RO. Here's
the backtrace:

https://gist.github.com/kakra/a7be40c23e08fc6e237f9108371afadf

[137619.835374] [ cut here ]
[137619.835385] WARNING: CPU: 1 PID: 4840 at fs/btrfs/extent-tree.c:1625 
lookup_inline_extent_backref+0x156/0x620()
[137619.835394] Modules linked in: nvidia_drm(PO) uas usb_storage vboxnetadp(O) 
vboxnetflt(O) vboxdrv(O) nvidia_modeset(PO) nvidia(PO)
[137619.835405] CPU: 1 PID: 4840 Comm: rm Tainted: P   O
4.5.0-gentoo-r1 #1
[137619.835407] Hardware name: To Be Filled By O.E.M. To Be Filled By 
O.E.M./Z68 Pro3, BIOS L2.16A 02/22/2013
[137619.835409]   8159eae9  
81ea1d08
[137619.835412]  810c6e37 8803d56a4d20 88040c7daa00 
0a4075114000
[137619.835415]  00201000  81489836 
001d
[137619.835418] Call Trace:
[137619.835423]  [] ? dump_stack+0x46/0x5d
[137619.835429]  [] ? warn_slowpath_common+0x77/0xb0
[137619.835432]  [] ? lookup_inline_extent_backref+0x156/0x620
[137619.835435]  [] ? btrfs_get_token_32+0xee/0x110
[137619.835440]  [] ? __set_page_dirty_nobuffers+0xf8/0x150
[137619.835443]  [] ? insert_inline_extent_backref+0x54/0xe0
[137619.835450]  [] ? __slab_free+0x98/0x220
[137619.835453]  [] ? kmem_cache_alloc+0x14d/0x160
[137619.835456]  [] ? 
__btrfs_inc_extent_ref.isra.64+0x99/0x270
[137619.835459]  [] ? __btrfs_run_delayed_refs+0x673/0x1020
[137619.835463]  [] ? 
btrfs_release_extent_buffer_page+0x71/0x120
[137619.835466]  [] ? release_extent_buffer+0x3f/0x90
[137619.835469]  [] ? btrfs_run_delayed_refs+0x8f/0x2b0
[137619.835473]  [] ? btrfs_truncate_inode_items+0x8b8/0xdc0
[137619.835477]  [] ? btrfs_evict_inode+0x3fe/0x550
[137619.835481]  [] ? evict+0xb7/0x180
[137619.835484]  [] ? do_unlinkat+0x12c/0x2d0
[137619.835488]  [] ? entry_SYSCALL_64_fastpath+0x12/0x6a
[137619.835491] ---[ end trace 6e8061336c42ff93 ]---
[137619.835494] [ cut here ]
[137619.835497] WARNING: CPU: 1 PID: 4840 at fs/btrfs/extent-tree.c:2946 
btrfs_run_delayed_refs+0x279/0x2b0()
[137619.835499] BTRFS: Transaction aborted (error -5)
[137619.835500] Modules linked in: nvidia_drm(PO) uas usb_storage vboxnetadp(O) 
vboxnetflt(O) vboxdrv(O) nvidia_modeset(PO) nvidia(PO)
[137619.835506] CPU: 1 PID: 4840 Comm: rm Tainted: PW  O
4.5.0-gentoo-r1 #1
[137619.835508] Hardware name: To Be Filled By O.E.M. To Be Filled By 
O.E.M./Z68 Pro3, BIOS L2.16A 02/22/2013
[137619.835509]   8159eae9 880255d1bc98 
81ea1d08
[137619.835512]  810c6e37 88040c7daa00 880255d1bce8 
01c6
[137619.835514]  8803211b4510 000b 810c6eb7 
81e8a0a0
[137619.835517] Call Trace:
[137619.835519]  [] ? dump_stack+0x46/0x5d
[137619.835522]  [] ? warn_slowpath_common+0x77/0xb0
[137619.835525]  [] ? warn_slowpath_fmt+0x47/0x50
[137619.835528]  [] ? btrfs_run_delayed_refs+0x279/0x2b0
[137619.835531]  [] ? btrfs_truncate_inode_items+0x8b8/0xdc0
[137619.835535]  [] ? btrfs_evict_inode+0x3fe/0x550
[137619.835538]  [] ? evict+0xb7/0x180
[137619.835541]  [] ? do_unlinkat+0x12c/0x2d0
[137619.835543]  [] ? entry_SYSCALL_64_fastpath+0x12/0x6a
[137619.835545] ---[ end trace 6e8061336c42ff94 ]---
[137619.835547] BTRFS: error (device bcache2) in btrfs_run_delayed_refs:2946: 
errno=-5 IO failure
[137619.835550] BTRFS info (device bcache2): forced readonly
[137619.886069] pending csums is 410705920

So it looks like fixing one error introduces other errors. Should I try
init-extent-tree after taking a backup?

BTW: "btrfsck --repair" does not work: I complains about unsupported
cases due to compression of 

Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-03-31 Thread Qu Wenruo



Henk Slager wrote on 2016/04/01 01:27 +0200:

On Thu, Mar 31, 2016 at 10:44 PM, Kai Krakow  wrote:

Hello!

I already reported this in another thread but it was a bit confusing by
intermixing multiple volumes. So let's start a new thread:

Since one of the last kernel upgrades, I'm experiencing one VDI file
(containing a NTFS image with Windows 7) getting damaged when running
the machine in VirtualBox. I got knowledge about this after
experiencing an error "duplicate object" and btrfs went RO. I fixed it
by deleting the VDI and restoring from backup - but no I get csum
errors as soon as some VM IO goes into the VDI file.

The FS is still usable. One effect is, that after reading all files
with rsync (to copy to my backup), each call of "du" or "df" hangs, also
similar calls to "btrfs {sub|fi} ..." show the same effect. I guess one
outcome of this is, that the FS does not properly unmount during
shutdown.

Kernel is 4.5.0 by now (the FS is much much older, dates back to 3.x
series, and never had problems), including Gentoo patch-set r1.


One possibility could be that the vbox kernel modules somehow corrupt
btrfs kernel area since kernel 4.5.

In order to make this reproducible (or an attempt to reproduce) for
others, you could unload VirtualBox stuff and restore the VDI file
from backup (or whatever big file) and then make pseudo-random, but
reproducible writes to the file.

It is not clear to me what 'Gentoo patch-set r1' is and does. So just
boot a vanilla v4.5 kernel from kernel.org and see if you get csum
errors in dmesg.

Also, where does 'duplicate object' come from? dmesg ? then please
post its surroundings, straight from dmesg.


The device layout is:

$ lsblk -o NAME,MODEL,FSTYPE,LABEL,MOUNTPOINT
NAMEMODELFSTYPE LABEL  MOUNTPOINT
sda Crucial_CT128MX1
├─sda1   vfat   ESP/boot
├─sda2
└─sda3   bcache
   ├─bcache0  btrfs  system
   ├─bcache1  btrfs  system
   └─bcache2  btrfs  system /usr/src
sdb SAMSUNG HD103SJ
├─sdb1   swap   swap0  [SWAP]
└─sdb2   bcache
   └─bcache2  btrfs  system /usr/src
sdc SAMSUNG HD103SJ
├─sdc1   swap   swap1  [SWAP]
└─sdc2   bcache
   └─bcache1  btrfs  system
sdd SAMSUNG HD103UJ
├─sdd1   swap   swap2  [SWAP]
└─sdd2   bcache
   └─bcache0  btrfs  system

Mount options are:

$ mount|fgrep btrfs
/dev/bcache2 on / type btrfs 
(rw,noatime,compress=lzo,nossd,discard,space_cache,autodefrag,subvolid=256,subvol=/gentoo/rootfs)

The FS uses mraid=1 and draid=0.

Output of btrfsck is:
(also available here:
https://gist.github.com/kakra/bfcce4af242f6548f4d6b45c8afb46ae)

$ btrfsck /dev/disk/by-label/system
checking extents
ref mismatch on [10443660537856 524288] extent item 1, found 2

This   10443660537856  number is bigger than the  1832931324360 number
found for total bytes. AFAIK, this is already wrong.


Nope. That's btrfs logical space address, which can be beyond real disk 
bytenr.


The easiest method to reproduce such case, is write something in a 256M 
btrfs, and balance the fs several times.


Then all chunks can be at bytenr beyond 256M.

The real problem is, the extent has mismatched reference.
Normally it can fixed by --init-extent-tree option, but it normally 
means bigger problem, especially it has already caused kernel 
delayed-ref problem.


No to mention the error "extent item 11271947091968 has multiple extent 
items", which makes the problem more serious.



I assume some older kernel have already screwed up the extent tree, as 
although delayed-ref is bug-prove, it has improved in recent years.


But it seems fs tree is less damaged, I assume the extent tree 
corruption could be fixed by "--init-extent-tree".


For the only fs tree error (missing csum), if "btrfsck 
--init-extent-tree --repair" works without any problem, the most simple 
fix would be, just removing the file.
Or you can use a lot of CPU time and disk IO to rebuild the whole csum, 
by using "--init-csum-tree" option.


Thanks,
Qu



[...]


checking fs roots
root 4336 inode 4284125 errors 1000, some csum missing

What is in this inode?


Checking filesystem on /dev/disk/by-label/system
UUID: d2bb232a-2e8f-4951-8bcc-97e237f1b536
found 1832931324360 bytes used err is 1
total csum bytes: 1730105656
total tree bytes: 6494474240
total fs tree bytes: 3789783040
total extent tree bytes: 608219136
btree space waste bytes: 1221460063
file data blocks allocated: 2406059724800
  referenced 2040857763840

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line 

Re: btrfsck: backpointer mismatch (and multiple other errors)

2016-03-31 Thread Henk Slager
On Thu, Mar 31, 2016 at 10:44 PM, Kai Krakow  wrote:
> Hello!
>
> I already reported this in another thread but it was a bit confusing by
> intermixing multiple volumes. So let's start a new thread:
>
> Since one of the last kernel upgrades, I'm experiencing one VDI file
> (containing a NTFS image with Windows 7) getting damaged when running
> the machine in VirtualBox. I got knowledge about this after
> experiencing an error "duplicate object" and btrfs went RO. I fixed it
> by deleting the VDI and restoring from backup - but no I get csum
> errors as soon as some VM IO goes into the VDI file.
>
> The FS is still usable. One effect is, that after reading all files
> with rsync (to copy to my backup), each call of "du" or "df" hangs, also
> similar calls to "btrfs {sub|fi} ..." show the same effect. I guess one
> outcome of this is, that the FS does not properly unmount during
> shutdown.
>
> Kernel is 4.5.0 by now (the FS is much much older, dates back to 3.x
> series, and never had problems), including Gentoo patch-set r1.

One possibility could be that the vbox kernel modules somehow corrupt
btrfs kernel area since kernel 4.5.

In order to make this reproducible (or an attempt to reproduce) for
others, you could unload VirtualBox stuff and restore the VDI file
from backup (or whatever big file) and then make pseudo-random, but
reproducible writes to the file.

It is not clear to me what 'Gentoo patch-set r1' is and does. So just
boot a vanilla v4.5 kernel from kernel.org and see if you get csum
errors in dmesg.

Also, where does 'duplicate object' come from? dmesg ? then please
post its surroundings, straight from dmesg.

> The device layout is:
>
> $ lsblk -o NAME,MODEL,FSTYPE,LABEL,MOUNTPOINT
> NAMEMODELFSTYPE LABEL  MOUNTPOINT
> sda Crucial_CT128MX1
> ├─sda1   vfat   ESP/boot
> ├─sda2
> └─sda3   bcache
>   ├─bcache0  btrfs  system
>   ├─bcache1  btrfs  system
>   └─bcache2  btrfs  system /usr/src
> sdb SAMSUNG HD103SJ
> ├─sdb1   swap   swap0  [SWAP]
> └─sdb2   bcache
>   └─bcache2  btrfs  system /usr/src
> sdc SAMSUNG HD103SJ
> ├─sdc1   swap   swap1  [SWAP]
> └─sdc2   bcache
>   └─bcache1  btrfs  system
> sdd SAMSUNG HD103UJ
> ├─sdd1   swap   swap2  [SWAP]
> └─sdd2   bcache
>   └─bcache0  btrfs  system
>
> Mount options are:
>
> $ mount|fgrep btrfs
> /dev/bcache2 on / type btrfs 
> (rw,noatime,compress=lzo,nossd,discard,space_cache,autodefrag,subvolid=256,subvol=/gentoo/rootfs)
>
> The FS uses mraid=1 and draid=0.
>
> Output of btrfsck is:
> (also available here:
> https://gist.github.com/kakra/bfcce4af242f6548f4d6b45c8afb46ae)
>
> $ btrfsck /dev/disk/by-label/system
> checking extents
> ref mismatch on [10443660537856 524288] extent item 1, found 2
This   10443660537856  number is bigger than the  1832931324360 number
found for total bytes. AFAIK, this is already wrong.

[...]

> checking fs roots
> root 4336 inode 4284125 errors 1000, some csum missing
What is in this inode?

> Checking filesystem on /dev/disk/by-label/system
> UUID: d2bb232a-2e8f-4951-8bcc-97e237f1b536
> found 1832931324360 bytes used err is 1
> total csum bytes: 1730105656
> total tree bytes: 6494474240
> total fs tree bytes: 3789783040
> total extent tree bytes: 608219136
> btree space waste bytes: 1221460063
> file data blocks allocated: 2406059724800
>  referenced 2040857763840
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfsck: backpointer mismatch (and multiple other errors)

2016-03-31 Thread Kai Krakow
Hello!

I already reported this in another thread but it was a bit confusing by
intermixing multiple volumes. So let's start a new thread:

Since one of the last kernel upgrades, I'm experiencing one VDI file
(containing a NTFS image with Windows 7) getting damaged when running
the machine in VirtualBox. I got knowledge about this after
experiencing an error "duplicate object" and btrfs went RO. I fixed it
by deleting the VDI and restoring from backup - but no I get csum
errors as soon as some VM IO goes into the VDI file.

The FS is still usable. One effect is, that after reading all files
with rsync (to copy to my backup), each call of "du" or "df" hangs, also
similar calls to "btrfs {sub|fi} ..." show the same effect. I guess one
outcome of this is, that the FS does not properly unmount during
shutdown.

Kernel is 4.5.0 by now (the FS is much much older, dates back to 3.x
series, and never had problems), including Gentoo patch-set r1.

The device layout is:

$ lsblk -o NAME,MODEL,FSTYPE,LABEL,MOUNTPOINT
NAMEMODELFSTYPE LABEL  MOUNTPOINT
sda Crucial_CT128MX1
├─sda1   vfat   ESP/boot
├─sda2
└─sda3   bcache
  ├─bcache0  btrfs  system
  ├─bcache1  btrfs  system
  └─bcache2  btrfs  system /usr/src
sdb SAMSUNG HD103SJ
├─sdb1   swap   swap0  [SWAP]
└─sdb2   bcache
  └─bcache2  btrfs  system /usr/src
sdc SAMSUNG HD103SJ
├─sdc1   swap   swap1  [SWAP]
└─sdc2   bcache
  └─bcache1  btrfs  system
sdd SAMSUNG HD103UJ
├─sdd1   swap   swap2  [SWAP]
└─sdd2   bcache
  └─bcache0  btrfs  system

Mount options are:

$ mount|fgrep btrfs
/dev/bcache2 on / type btrfs 
(rw,noatime,compress=lzo,nossd,discard,space_cache,autodefrag,subvolid=256,subvol=/gentoo/rootfs)

The FS uses mraid=1 and draid=0.

Output of btrfsck is:
(also available here:
https://gist.github.com/kakra/bfcce4af242f6548f4d6b45c8afb46ae)

$ btrfsck /dev/disk/by-label/system
checking extents
ref mismatch on [10443660537856 524288] extent item 1, found 2
Backref 10443660537856 root 256 owner 23536425 offset 1310720 num_refs 0 not 
found in extent tree
Incorrect local backref count on 10443660537856 root 256 owner 23536425 offset 
1310720 found 1 wanted 0 back 0x4ceee750
Backref disk bytenr does not match extent record, bytenr=10443660537856, ref 
bytenr=10443660914688
Backref bytes do not match extent backref, bytenr=10443660537856, ref 
bytes=524288, backref bytes=69632
backpointer mismatch on [10443660537856 524288]
extent item 11271946579968 has multiple extent items
ref mismatch on [11271946579968 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271946579968, ref 
bytenr=11271946629120
backpointer mismatch on [11271946579968 110592]
extent item 11271946690560 has multiple extent items
ref mismatch on [11271946690560 114688] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271946690560, ref 
bytenr=11271946739712
Backref bytes do not match extent backref, bytenr=11271946690560, ref 
bytes=114688, backref bytes=110592
backpointer mismatch on [11271946690560 114688]
extent item 11271946805248 has multiple extent items
ref mismatch on [11271946805248 114688] extent item 1, found 3
Backref disk bytenr does not match extent record, bytenr=11271946805248, ref 
bytenr=11271946850304
Backref bytes do not match extent backref, bytenr=11271946805248, ref 
bytes=114688, backref bytes=53248
Backref disk bytenr does not match extent record, bytenr=11271946805248, ref 
bytenr=11271946903552
Backref bytes do not match extent backref, bytenr=11271946805248, ref 
bytes=114688, backref bytes=49152
backpointer mismatch on [11271946805248 114688]
extent item 11271946919936 has multiple extent items
ref mismatch on [11271946919936 61440] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271946919936, ref 
bytenr=11271946952704
Backref bytes do not match extent backref, bytenr=11271946919936, ref 
bytes=61440, backref bytes=110592
backpointer mismatch on [11271946919936 61440]
extent item 11271946981376 has multiple extent items
ref mismatch on [11271946981376 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271946981376, ref 
bytenr=11271947063296
backpointer mismatch on [11271946981376 110592]
extent item 11271947091968 has multiple extent items
ref mismatch on [11271947091968 110592] extent item 1, found 2
Backref disk bytenr does not match extent record, bytenr=11271947091968, ref 
bytenr=11271947173888
Backref bytes do not match extent backref, bytenr=11271947091968, ref 
bytes=110592, backref bytes=114688
backpointer mismatch on [11271947091968 110592]
extent item 11271947202560 has multiple