Re: cannot mount read-write because of unsupported optional features (2)

2016-11-01 Thread Tobias Holst
Ah thanks! Looks like I missed this bug on the list...

But as I understand it, it should be mountable with btrfs-progs
>=4.7.3 or when mounting it with "-o clear_cache,nospace_cache". But
both doesn't work. Is it only mountable with kernel >=4.9 anymore?
That would be interesting since I never ran a kernel newer than 4.8...

Regards,
Tobias


2016-11-01 6:24 GMT+01:00 Qu Wenruo <quwen...@cn.fujitsu.com>:
>
>
> At 11/01/2016 12:46 PM, Tobias Holst wrote:
>>
>> Hi
>>
>> I can't mount my boot partition anymore. When I try it by entering
>> "mount /dev/sdi1 /mnt/boot/" I get:
>>>
>>> mount: wrong fs type, bad option, bad superblock on /dev/sdi1,
>>>   missing codepage or helper program, or other error
>>>
>>>   In some cases useful info is found in syslog - try
>>>   dmesg | tail or so.
>>
>>
>> "dmesg | tail" gives me:
>>>
>>> BTRFS info (device sdi1): using free space tree
>>> BTRFS info (device sdi1): has skinny extents
>>> BTRFS error (device sdi1): cannot mount read-write because of unsupported
>>> optional features (2)
>>> BTRFS: open_ctree failed
>>
>>
>> I am using an Ubuntu 16.10 Live CD with Kernel 4.8 and btrfs-progs v4.8.2.
>>
>> "btrfs inspect-internal dump-super /dev/sdi1" gives me the following:
>>>
>>> superblock: bytenr=65536, device=/dev/sdi1
>>> -
>>> csum_type   0 (crc32c)
>>> csum_size   4
>>> csum0x0f346b08 [match]
>>> bytenr  65536
>>> flags   0x1
>>>( WRITTEN )
>>> magic   _BHRfS_M [match]
>>> fsid67ac5740-1ced-4d59-8999-03bb3195ec49
>>> label   t-hyper-boot
>>> generation  64
>>> root20971520
>>> sys_array_size  129
>>> chunk_root_generation   43
>>> root_level  1
>>> chunk_root  12587008
>>> chunk_root_level0
>>> log_root0
>>> log_root_transid0
>>> log_root_level  0
>>> total_bytes 2147483648
>>> bytes_used  102813696
>>> sectorsize  4096
>>> nodesize4096
>>> leafsize4096
>>> stripesize  4096
>>> root_dir6
>>> num_devices 1
>>> compat_flags0x0
>>> compat_ro_flags 0x3
>
> compat_ro_flags 0x3 is FREE_SPACE_TREE and FREE_SPACE_TREE_VALID.
>
> FREE_SPACE_TREE_VALID is introduced later to fix a endian bug in free space
> tree.
>
> And that seems to be cause.
>
> Thanks,
> Qu
>
>>> incompat_flags  0x34d
>>>( MIXED_BACKREF |
>>>  MIXED_GROUPS |
>>>  COMPRESS_LZO |
>>>  EXTENDED_IREF |
>>>  SKINNY_METADATA |
>>>  NO_HOLES )
>>> cache_generation53
>>> uuid_tree_generation64
>>> dev_item.uuid   273d5800-add1-45bb-8a11-ecd6d8c1503e
>>> dev_item.fsid   67ac5740-1ced-4d59-8999-03bb3195ec49 [match]
>>> dev_item.type   0
>>> dev_item.total_bytes2147483648
>>> dev_item.bytes_used 667680768
>>> dev_item.io_align   4096
>>> dev_item.io_width   4096
>>> dev_item.sector_size4096
>>> dev_item.devid  1
>>> dev_item.dev_group  0
>>> dev_item.seek_speed 0
>>> dev_item.bandwidth  0
>>> dev_item.generation 0
>>
>>
>> Does anyone have an idea why I can't mount it anymore?
>>
>> Regards,
>> Tobias
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


cannot mount read-write because of unsupported optional features (2)

2016-10-31 Thread Tobias Holst
Hi

I can't mount my boot partition anymore. When I try it by entering
"mount /dev/sdi1 /mnt/boot/" I get:
> mount: wrong fs type, bad option, bad superblock on /dev/sdi1,
>   missing codepage or helper program, or other error
>
>   In some cases useful info is found in syslog - try
>   dmesg | tail or so.

"dmesg | tail" gives me:
> BTRFS info (device sdi1): using free space tree
> BTRFS info (device sdi1): has skinny extents
> BTRFS error (device sdi1): cannot mount read-write because of unsupported 
> optional features (2)
> BTRFS: open_ctree failed

I am using an Ubuntu 16.10 Live CD with Kernel 4.8 and btrfs-progs v4.8.2.

"btrfs inspect-internal dump-super /dev/sdi1" gives me the following:
> superblock: bytenr=65536, device=/dev/sdi1
> -
> csum_type   0 (crc32c)
> csum_size   4
> csum0x0f346b08 [match]
> bytenr  65536
> flags   0x1
>( WRITTEN )
> magic   _BHRfS_M [match]
> fsid67ac5740-1ced-4d59-8999-03bb3195ec49
> label   t-hyper-boot
> generation  64
> root20971520
> sys_array_size  129
> chunk_root_generation   43
> root_level  1
> chunk_root  12587008
> chunk_root_level0
> log_root0
> log_root_transid0
> log_root_level  0
> total_bytes 2147483648
> bytes_used  102813696
> sectorsize  4096
> nodesize4096
> leafsize4096
> stripesize  4096
> root_dir6
> num_devices 1
> compat_flags0x0
> compat_ro_flags 0x3
> incompat_flags  0x34d
>( MIXED_BACKREF |
>  MIXED_GROUPS |
>  COMPRESS_LZO |
>  EXTENDED_IREF |
>  SKINNY_METADATA |
>  NO_HOLES )
> cache_generation53
> uuid_tree_generation64
> dev_item.uuid   273d5800-add1-45bb-8a11-ecd6d8c1503e
> dev_item.fsid   67ac5740-1ced-4d59-8999-03bb3195ec49 [match]
> dev_item.type   0
> dev_item.total_bytes2147483648
> dev_item.bytes_used 667680768
> dev_item.io_align   4096
> dev_item.io_width   4096
> dev_item.sector_size4096
> dev_item.devid  1
> dev_item.dev_group  0
> dev_item.seek_speed 0
> dev_item.bandwidth  0
> dev_item.generation 0

Does anyone have an idea why I can't mount it anymore?

Regards,
Tobias
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


"parent transid verify failed"

2016-06-11 Thread Tobias Holst
Hi

I am getting some "parent transid verify failed"-errors. Is there any
way to find out what's affected? Are these errors in metadata, data or
both - and if they are errors in the data: How can I find out which
files are affected?

Regards,
Tobias
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/9] Btrfs: free space B-tree

2015-11-03 Thread Tobias Holst
Ah, thanks for the information!
Happy testing :)

2015-11-03 19:34 GMT+01:00 Chris Mason <c...@fb.com>:
> On Tue, Nov 03, 2015 at 07:13:37PM +0100, Tobias Holst wrote:
>> Hi
>>
>> Anything new on this topic?
>>
>> I think it would be a great thing and should be merged as soon as it
>> is stable. :)
>
> I've been testing it, but my plan is 4.5.
>
> -chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/9] Btrfs: free space B-tree

2015-11-03 Thread Tobias Holst
Hi

Anything new on this topic?

I think it would be a great thing and should be merged as soon as it
is stable. :)

Regards,
Tobias


2015-10-02 13:47 GMT+02:00 Austin S Hemmelgarn :
> On 2015-09-29 23:50, Omar Sandoval wrote:
>>
>> Hi,
>>
>> Here's one more reroll of the free space B-tree patches, a more scalable
>> alternative to the free space cache. Minimal changes this time around, I
>> mainly wanted to resend this after Holger and I cleared up his bug
>> report here: http://www.spinics.net/lists/linux-btrfs/msg47165.html. It
>> initially looked like it was a bug in a patch that Josef sent, then in
>> this series, but finally Holger and I figured out that it was something
>> else in the queue of patches he carries around, we just don't know what
>> yet (I'm in the middle of looking into it). While trying to reproduce
>> that bug, I ran xfstests about a trillion times and a bunch of stress
>> tests, so this is fairly well tested now. Additionally, the last time
>> around, Holger and Austin both bravely offered their Tested-bys on the
>> series. I wasn't sure which patch(es) to tack them onto so here they
>> are:
>>
>> Tested-by: Holger Hoffstätte 
>> Tested-by: Austin S. Hemmelgarn 
>
> I've re-run the same testing I did for the last iteration, and also tested
> that the btrfs_end_transaction thing mentioned below works right now
> (Ironically that's one of the few things I didn't think of testing last time
> :)), so the Tested-by from me is current now.
>
>>
>> Thanks, everyone!
>>
>> Omar
>>
>> Changes from v3->v4:
>>
>> - Added a missing btrfs_end_transaction() to
>> btrfs_create_free_space_tree() and
>>btrfs_clear_free_space_tree() in the error cases after we abort the
>>transaction (see
>> http://www.spinics.net/lists/linux-btrfs/msg47545.html)
>> - Rebased the kernel patches on v4.3-rc3
>> - Rebased the progs patches on v4.2.1
>>
>> v3: http://www.spinics.net/lists/linux-btrfs/msg47095.html
>>
>> Changes from v2->v3:
>>
>> - Fixed a warning in the free space tree sanity tests caught by Zhao Lei.
>> - Moved the addition of a block group to the free space tree to occur
>> either on
>>the first attempt to modify the free space for the block group or in
>>btrfs_create_pending_block_groups(), whichever happens first. This
>> avoids a
>>deadlock (lock recursion) when modifying the free space tree requires
>>allocating a new block group. In order to do this, it was simpler to
>> change
>>the on-disk semantics: the superblock stripes will now appear to be
>> free space
>>according to the free space tree, but load_free_space_tree() will still
>>exclude them when building the in-memory free space cache.
>> - Changed the free_space_tree option to space_cache=v2 and made
>> clear_cache
>>clear the free space tree. If the free space tree has been created,
>>the mount will fail unless space_cache=v2 or nospace_cache,clear_cache
>>is given because we cannot allow the free space tree to get out of
>>date.
>> - Did a once-over of the code and caught a couple of error handling typos.
>>
>> v2: http://www.spinics.net/lists/linux-btrfs/msg46796.html
>>
>> Changes from v1->v2:
>>
>> - Cleaned up a bunch of unnecessary instances of "if (ret) goto out; ret =
>> 0"
>> - Added aborts in the free space tree code closer to the site the error is
>>encountered: where we add or remove block groups, add or remove free
>> space,
>>and also when we convert formats
>> - Moved loading of the free space tree into caching_thread() and added a
>> new
>>patch 3 in preparation for it
>> - Commented a bunch of stuff in the extent buffer bitmap operations and
>>refactored some of the complicated logic
>>
>> v1: http://www.spinics.net/lists/linux-btrfs/msg46713.html
>>
>> Omar Sandoval (9):
>>Btrfs: add extent buffer bitmap operations
>>Btrfs: add extent buffer bitmap sanity tests
>>Btrfs: add helpers for read-only compat bits
>>Btrfs: refactor caching_thread()
>>Btrfs: introduce the free space B-tree on-disk format
>>Btrfs: implement the free space B-tree
>>Btrfs: add free space tree sanity tests
>>Btrfs: wire up the free space tree to the extent tree
>>Btrfs: add free space tree mount option
>>
>>   fs/btrfs/Makefile  |5 +-
>>   fs/btrfs/ctree.h   |  157 +++-
>>   fs/btrfs/disk-io.c |   38 +
>>   fs/btrfs/extent-tree.c |   98 +-
>>   fs/btrfs/extent_io.c   |  183 +++-
>>   fs/btrfs/extent_io.h   |   10 +-
>>   fs/btrfs/free-space-tree.c | 1584
>> 
>>   fs/btrfs/free-space-tree.h |   72 ++
>>   fs/btrfs/super.c   |   56 +-
>>   fs/btrfs/tests/btrfs-tests.c   |   52 ++
>>   fs/btrfs/tests/btrfs-tests.h   |   10 +
>>   fs/btrfs/tests/extent-io-tests.c   

Re: "free_raid_bio" crash on RAID6

2015-11-02 Thread Tobias Holst
Hi

No, I never figured this out... After a while of waiting for answers I
just started over and took the data from my backup.

> Did you try removing the bad drive and did the system keep crashing anyway?

As you can see in my first mail the drive was already removed when
this error started to happen ("some devices missing"). ;)

Regards,
Tobias


2015-10-18 16:14 GMT+02:00 Philip Seeger <p0h0i0l0...@gmail.com>:
> Hi Tobias
>
> On 07/20/2015 06:20 PM, Tobias Holst wrote:
>>
>> My btrfs-RAID6 seems to be broken again :(
>>
>> When reading from it I get several of these:
>> [  176.349943] BTRFS info (device dm-4): csum failed ino 1287707
>> extent 21274957705216 csum 2830458701 wanted 426660650 mirror 2
>>
>> then followed by a "free_raid_bio"-crash:
>>
>> [  176.349961] [ cut here ]
>> [  176.349981] WARNING: CPU: 6 PID: 110 at
>> /home/kernel/COD/linux/fs/btrfs/raid56.c:831
>> __free_raid_bio+0xfc/0x130 [btrfs]()
>> ...
>
>
> It's been 3 months now, have you ever figured this out? Do you know if the
> bug has been identified and fixed or have you filed a bugzilla report?
>
>> One drive is broken, so at the moment it is mounted with "-O
>> defaults,ro,degraded,recovery,compress=lzo,space_cache,subvol=raid".
>
>
> Did you try removing the bad drive and did the system keep crashing anyway?
>
>
>
> Philip
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


free_raid_bio crash on RAID6

2015-07-20 Thread Tobias Holst
Hi

My btrfs-RAID6 seems to be broken again :(

When reading from it I get several of these:
[  176.349943] BTRFS info (device dm-4): csum failed ino 1287707
extent 21274957705216 csum 2830458701 wanted 426660650 mirror 2

then followed by a free_raid_bio-crash:

[  176.349961] [ cut here ]
[  176.349981] WARNING: CPU: 6 PID: 110 at
/home/kernel/COD/linux/fs/btrfs/raid56.c:831
__free_raid_bio+0xfc/0x130 [btrfs]()
[  176.349982] Modules linked in: iosf_mbi kvm_intel kvm ppdev
crct10dif_pclmul crc32_pclmul dm_crypt ghash_clmulni_intel aesni_intel
aes_x86_64 lrw gf128mul glue_helper ablk_helper serio_raw 8250_fintek
i2c_piix4 pvpanic cryptd mac_hid virtio_rng parport_pc lp parport
btrfs xor raid6_pq cirrus syscopyarea sysfillrect sysimgblt ttm
drm_kms_helper mpt2sas drm raid_class psmouse floppy
scsi_transport_sas pata_acpi
[  176.349998] CPU: 6 PID: 110 Comm: kworker/u16:2 Not tainted
4.1.2-040102-generic #201507101335
[  176.34] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[  176.350007] Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
[  176.350008]  c026fc18 8800baa4f978 817d076c

[  176.350010]   8800baa4f9b8 81079b0a
0246
[  176.350011]  88034e7baa68 88008619b800 fffb

[  176.350013] Call Trace:
[  176.350023]  [817d076c] dump_stack+0x45/0x57
[  176.350026]  [81079b0a] warn_slowpath_common+0x8a/0xc0
[  176.350029]  [81079bfa] warn_slowpath_null+0x1a/0x20
[  176.350036]  [c025e91c] __free_raid_bio+0xfc/0x130 [btrfs]
[  176.350041]  [c025f351] rbio_orig_end_io+0x51/0xa0 [btrfs]
[  176.350047]  [c02610e3] __raid56_parity_recover+0x1d3/0x210 [btrfs]
[  176.350052]  [c0261cb0] raid56_parity_recover+0x110/0x180 [btrfs]
[  176.350058]  [c0216cdb] btrfs_map_bio+0xdb/0x4e0 [btrfs]
[  176.350065]  [c0236024]
btrfs_submit_compressed_read+0x354/0x4e0 [btrfs]
[  176.350070]  [c01ee681] btrfs_submit_bio_hook+0x1d1/0x1e0 [btrfs]
[  176.350076]  [81376dbe] ? bio_add_page+0x5e/0x70
[  176.350083]  [c020c176] ?
btrfs_create_repair_bio+0xe6/0x110 [btrfs]
[  176.350089]  [c020c6ab] end_bio_extent_readpage+0x50b/0x560 [btrfs]
[  176.350094]  [c020c1a0] ?
btrfs_create_repair_bio+0x110/0x110 [btrfs]
[  176.350096]  [8137934b] bio_endio+0x5b/0xa0
[  176.350103]  [811d9e19] ? kmem_cache_free+0x1d9/0x1f0
[  176.350104]  [813793a2] bio_endio_nodec+0x12/0x20
[  176.350109]  [c01e10df] end_workqueue_fn+0x3f/0x50 [btrfs]
[  176.350115]  [c021b522] normal_work_helper+0xc2/0x2b0 [btrfs]
[  176.350121]  [c021b7e2] btrfs_endio_helper+0x12/0x20 [btrfs]
[  176.350124]  [8109324f] process_one_work+0x14f/0x420
[  176.350127]  [81093a08] worker_thread+0x118/0x530
[  176.350128]  [810938f0] ? rescuer_thread+0x3d0/0x3d0
[  176.350129]  [81098f89] kthread+0xc9/0xe0
[  176.350130]  [81098ec0] ? kthread_create_on_node+0x180/0x180
[  176.350134]  [817d86a2] ret_from_fork+0x42/0x70
[  176.350135]  [81098ec0] ? kthread_create_on_node+0x180/0x180
[  176.350136] ---[ end trace 81289955f20d48ee ]---

Did I found a kernel bug? What can/should I do?

Don't worry about my data, I have tape-backups of the important data,
I just want to help fixing RAID-related btrfs bugs.

Hardware: KVM with all drives attached to a passed through SAS-controller
System: Ubuntu 14.04.2
Kernel: 4.1.2
btrfs-tools: 4.0
It's a btrfs-RAID-6 on top of 6 LUKS-encrypted volumes, created with
-O extref,raid56,skinny-metadata,no-holes. At normal it's mounted
with defaults,compress=lzo,space_cache,autodefrag,subvol=raid.
One drive is broken, so at the moment it is mounted with -O
defaults,ro,degraded,recovery,compress=lzo,space_cache,subvol=raid.

It's pretty much full, so btrfs fi show shows:
Label: 't-raid'  uuid: 3938baeb-cb02-4909-8e75-6ec2f47d1d19
Total devices 6 FS bytes used 14.44TiB
devid2 size 3.64TiB used 3.64TiB path /dev/mapper/sdb_crypt
devid3 size 3.64TiB used 3.64TiB path /dev/mapper/sdc_crypt
devid4 size 3.64TiB used 3.64TiB path /dev/mapper/sdd_crypt
devid5 size 3.64TiB used 3.64TiB path /dev/mapper/sde_crypt
devid6 size 3.64TiB used 3.64TiB path /dev/mapper/sdf_crypt
*** Some devices missing

and btrfs fi df /raid shows:
Data, RAID6: total=14.52TiB, used=14.42TiB
System, RAID6: total=64.00MiB, used=1.00MiB
Metadata, RAID6: total=24.00GiB, used=21.78GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


Regards,
Tobias
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Uncorrectable errors on RAID6

2015-06-15 Thread Tobias Holst
Hi Qu, hi all,

 RO snapshot, I remember there is a RO snapshot bug, but seems fixed in 4.x?
Yes, that bug has already been fixed.

 For recovery, first just try cp -r mnt/* to grab what's still completely OK.
 Maybe recovery mount option can do some help in the process?
That's what I did now. I mounted with recovery and copied all of my
important data. But several folders/files couldn't be read, the whole
system stopped responding. Nothing in the logs, nothing on the screen
- but everything is frozen. So I have to take these files out of my
backup.
Also several files produced checksum verify failed, csum failed
and no csum found errrors in the syslog.

 Then you may try btrfs restore, which is the safest method, won't
 write any byte into the offline disks.
Yes but I would need at least the same storage space as for the
original data - and I don't have as much free space somewhere else (or
not quickly available).

 Lastly, you can try the btrfsck --repair, *WITH BINARY BACKUP OF YOUR DISKS*
I don't have a bitwise copy of my disks, but all important data is
secure now. So I tried it, see below.

 BTW, if you decided to use btrfs --repair, please upload the full
 output, since we can use it to improve the b-tree recovery codes.
OK, see below.

 (Yeah, welcome to be a laboratory mice of real world b-tree recovery codes)
Haha, right. Since I have been testing the experimental RAID6-features
of btrfs for a while I know what it means to be a laboratory mice ;)

So back to btrfsck. I started it and after a while this happened in
the syslog. Again and again: https://paste.ee/p/BIs56
According to the internet this is a known but very rare problem with
my LSI 9211-8i controller. It happens when the
PCIe-generation-autodetection detects the card as a PCIe-3.0-card
instead of 2.0 and heavy I/O is happening. Because I never ever had
this bug before it must be coincidence... But not the root cause of
this broken filesystem.
As a result there were many blk_update_request: I/O error, FAILED
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE, Add. Sense: Power
on, reset, or bus device reset occurred and Buffer I/O error/lost
async page write in the syslog.

The result of btrfsck --repair until this point: https://paste.ee/p/nzzAo
Then btrfsck died: https://paste.ee/p/0Brku

Now I rebooted and forced the card to PCIe-generation 2.0, so this bug
shouldn't happen again, and started btrfsck --repair again.
This time it ran without controller-problems and you can find the full
output here: 
https://ssl-account.com/oc.tobby.eu/public.php?service=filest=8b93f56a69ea04886e9bc2c8534b32f6
(huge, about 13MB)

Result: One (out of four) folder in my root-directory is completly
gone (about 8 TB). Two folders seem to be ok (about 1.4 TB). And the
last folder is ok in terms of folder- and subfolder-structure, but
nearly all subfolders are empty (only 230GB of 3.1TB are still there).
So roughly 90% of the data is gone now.

I will now destroy the filesystem, create a new btrfs-RAID-6 and fetch
the data out of my backups. I hope, my logs help a little bit to find
the cause. I didn't have the time to try to reproduce this broken
filesystem - did you try it with loop devices?

Regards,
Tobias


2015-05-29 4:27 GMT+02:00 Qu Wenruo quwen...@cn.fujitsu.com:


  Original Message  
 Subject: Re: Uncorrectable errors on RAID6
 From: Tobias Holst to...@tobby.eu
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2015年05月29日 10:00

 Thanks, Qu, sad news... :-(
 No, I also didn't defrag with older kernels. Maybe I did it a while
 ago with 3.19.x, but there was a scrub afterwards and it showed no
 error, so this shouldn't be the problem. The things described above
 were all done with 4.0.3/4.0.4.

 Balances and scrubs all stop at ~1.5 TiB of ~13.3TiB. Balance with an
 error in the log, scrub just doesn't do anything according to dstat
 without any error and still shows running.

 The errors/problems started during the first balance but maybe this
 only showed them and is not the cause.

 Here detailed debug infos to (maybe?) recreate the problem. This is
 exactly what happened here over some time. As I can only tell when it
 definitively has been clean (scrub at the beginning of May) an when it
 definitively was broken (now, end of May), there may be some more
 steps neccessary to reproduce, because several things happened in the
 meantime:
 - filesystem was created with mkfs.btrfs -f -m raid6 -d raid6 -L
 t-raid -O extref,raid56,skinny-metadata,no-holes with 6
 LUKS-encrypted HDDs on kernel 3.19

 LUKS...
 Even LUKS is much stabler than btrfs, and may not be related to the
 bug, your setup is quite complex anyway.

 - mounted with options
 defaults,compress-force=zlib,space_cache,autodefrag


 Normally i'd not recommend compress-force as btrfs can auto detect compress
 ratio.
 But such complex setting up with such mount option as LUKS base should
 be quite a good playground to produce some of bug.

 - copies all data onto it
 - all data

checksum verify failed vs. csum failed

2015-06-11 Thread Tobias Holst
Hi

Just a question to understand my logs. Doesn't matter where these
errors come from, I just want to understand them. What is the
difference of these two message types?

 BTRFS: dm-4 checksum verify failed on 6318462353408 wanted 25D94CD6 found 
 8BA427D4 level 1
vs.
 BTRFS warning (device dm-4): csum failed ino 27594 off 1266679808 csum 
 1065556573 expected csum 0

Maybe the first one was a correctable error and the second one not?

Regards,
Tobias
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Uncorrectable errors on RAID6

2015-05-28 Thread Tobias Holst
Thanks, Qu, sad news... :-(
No, I also didn't defrag with older kernels. Maybe I did it a while
ago with 3.19.x, but there was a scrub afterwards and it showed no
error, so this shouldn't be the problem. The things described above
were all done with 4.0.3/4.0.4.

Balances and scrubs all stop at ~1.5 TiB of ~13.3TiB. Balance with an
error in the log, scrub just doesn't do anything according to dstat
without any error and still shows running.

The errors/problems started during the first balance but maybe this
only showed them and is not the cause.

Here detailed debug infos to (maybe?) recreate the problem. This is
exactly what happened here over some time. As I can only tell when it
definitively has been clean (scrub at the beginning of May) an when it
definitively was broken (now, end of May), there may be some more
steps neccessary to reproduce, because several things happened in the
meantime:
- filesystem was created with mkfs.btrfs -f -m raid6 -d raid6 -L
t-raid -O extref,raid56,skinny-metadata,no-holes with 6
LUKS-encrypted HDDs on kernel 3.19
- mounted with options defaults,compress-force=zlib,space_cache,autodefrag
- copies all data onto it
- all data on the devices is now compressed with zlib
- until now the filesystem is ok, scrub shows no errors
- now mount it with defaults,compress-force=lzo,space_cache instead
- use kernel 4.0.3/4.0.4
- create a r/o-snapshot
- defrag some data with -clzo
- have some (not much) I/O during the process
- this should approx. double the size of the defragged data because
your snapshot contains your data compressed with zlib and your volume
contains your data compressed with lzo
- delete the snapshot
- wait some time until the cleaning is complete, still some other I/O
during this
- this doesn't free as much data as the snapshot contained (?)
- is this ok? Maybe here the problem already existed/started
- defrag the rest of all data on the devices with -clzo, still some
other I/O during this
- now start a balance of the whole array
- errors will spam the log and it's broken.

I hope, it is possible to reproduce the errors and find out exactly
when this happens. I'll do the same steps again, too, but maybe there
is someone else who could try it as well? With some small loop-devices
just for testing this shouldn't take too long even if it sounds like
that ;-)

Back to my actual data: Are there any tips on how to recover? Mount
with recover, copy over and see the log, which files seem to be
broken? Or some (dangerous) tricks on how to repair this broken file
system?
I do have a full backup, but it's very slow and may take weeks
(months?), if I have to recover everything.

Regards,
Tobias



2015-05-29 2:36 GMT+02:00 Qu Wenruo quwen...@cn.fujitsu.com:


  Original Message  
 Subject: Re: Uncorrectable errors on RAID6
 From: Tobias Holst to...@tobby.eu
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2015年05月28日 21:13

 Ah it's already done. You can find the error-log over here:
 https://paste.ee/p/sxCKF

 In short there are several of these:
 bytenr mismatch, want=6318462353408, have=56676169344768
 checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890
 checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890
 checksum verify failed on 8955306033152 found 5B5F717A wanted C44CA54E
 checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A
 checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A

 and these:
 ref mismatch on [13431504896 16384] extent item 1, found 0
 Backref 13431504896 root 7 not referenced back 0x1202acc0
 Incorrect global backref count on 13431504896 found 1 wanted 0
 backpointer mismatch on [13431504896 16384]
 owner ref check failed [13431504896 16384]

 and these:
 ref mismatch on [1951739412480 524288] extent item 0, found 1
 Backref 1951739412480 root 5 owner 27852 offset 644349952 num_refs 0
 not found in extent tree
 Incorrect local backref count on 1951739412480 root 5 owner 27852
 offset 644349952 found 1 wanted 0 back 0x1a92aa20
 backpointer mismatch on [1951739412480 524288]

 Any ideas? :)

 The metadata is really corrupted...

 I'd recommend to salvage your data as soon as possible.

 For the reason, as you didn't run replace, it should at least not the
 bug spotted by Zhao Lei.

 BTW, did you run defrag on older kernels?
 IIRC, old kernel has bug with snapshot aware defrag, so it's later
 disabled in newer kernel.
 Not sure if it's related.

 Balance may be related but I'm not familiar with balance with RAID5/6.
 So hard to say.

 Sorry for unable to provide much help.

 But if you have enough time to find a stable method to reproduce the bug,
 best try it on loop device, it would definitely help us to debug.

 Thanks,
 Qu


 Regards
 Tobias


 2015-05-28 14:57 GMT+02:00 Tobias Holst to...@tobby.eu:

 Hi Qu,

 no, I didn't run a replace. But I ran a defrag with -clzo on all
 files while there has been slightly I/O on the devices. Don't know if
 this could cause

Re: Uncorrectable errors on RAID6

2015-05-28 Thread Tobias Holst
Ah it's already done. You can find the error-log over here:
https://paste.ee/p/sxCKF

In short there are several of these:
bytenr mismatch, want=6318462353408, have=56676169344768
checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890
checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890
checksum verify failed on 8955306033152 found 5B5F717A wanted C44CA54E
checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A
checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A

and these:
ref mismatch on [13431504896 16384] extent item 1, found 0
Backref 13431504896 root 7 not referenced back 0x1202acc0
Incorrect global backref count on 13431504896 found 1 wanted 0
backpointer mismatch on [13431504896 16384]
owner ref check failed [13431504896 16384]

and these:
ref mismatch on [1951739412480 524288] extent item 0, found 1
Backref 1951739412480 root 5 owner 27852 offset 644349952 num_refs 0
not found in extent tree
Incorrect local backref count on 1951739412480 root 5 owner 27852
offset 644349952 found 1 wanted 0 back 0x1a92aa20
backpointer mismatch on [1951739412480 524288]

Any ideas? :)

Regards
Tobias


2015-05-28 14:57 GMT+02:00 Tobias Holst to...@tobby.eu:
 Hi Qu,

 no, I didn't run a replace. But I ran a defrag with -clzo on all
 files while there has been slightly I/O on the devices. Don't know if
 this could cause corruptions, too?

 Later on I deleted a r/o-snapshot which should free a big amount of
 storage space. It didn't free as much as it should so after a few days
 I started a balance to free the space. During the balance the first
 checksum errors happened and the whole balance process crashed:

 [19174.342882] BTRFS: dm-5 checksum verify failed on 6318462353408
 wanted 25D94CD6 found 8BA427D4 level 1
 [19174.365473] BTRFS: dm-5 checksum verify failed on 6318462353408
 wanted 25D94CD6 found 8BA427D4 level 1
 [19174.365651] BTRFS: dm-5 checksum verify failed on 6318462353408
 wanted 25D94CD6 found 8BA427D4 level 1
 [19174.366168] BTRFS: dm-5 checksum verify failed on 6318462353408
 wanted 25D94CD6 found 8BA427D4 level 1
 [19174.366250] BTRFS: dm-5 checksum verify failed on 6318462353408
 wanted 25D94CD6 found 8BA427D4 level 1
 [19174.366392] BTRFS: dm-5 checksum verify failed on 6318462353408
 wanted 25D94CD6 found 8BA427D4 level 1
 [19174.367313] [ cut here ]
 [19174.367340] kernel BUG at /home/kernel/COD/linux/fs/btrfs/relocation.c:242!
 [19174.367384] invalid opcode:  [#1] SMP
 [19174.367418] Modules linked in: iosf_mbi kvm_intel kvm
 crct10dif_pclmul ppdev dm_crypt crc32_pclmul ghash_clmulni_intel
 aesni_intel aes_x86_64 lrw gf128mul glue_helper parport_pc ablk_helper
 cryptd mac_hid 8250_fintek virtio_rng serio_raw i2c_piix4 pvpanic lp
 parport btrfs xor raid6_pq cirrus syscopyarea sysfillrect sysimgblt
 ttm mpt2sas drm_kms_helper raid_class scsi_transport_sas drm floppy
 psmouse pata_acpi
 [19174.367656] CPU: 1 PID: 4960 Comm: btrfs Not tainted
 4.0.4-040004-generic #201505171336
 [19174.367703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
 BIOS Bochs 01/01/2011
 [19174.367752] task: 8804274e8000 ti: 880367b5 task.ti:
 880367b5
 [19174.367797] RIP: 0010:[c05ec4ba]  [c05ec4ba]
 backref_cache_cleanup+0xea/0x100 [btrfs]
 [19174.367867] RSP: 0018:880367b53bd8  EFLAGS: 00010202
 [19174.367905] RAX: 88008250d8f8 RBX: 88008250d820 RCX: 
 00018021
 [19174.367948] RDX: 88008250d8d8 RSI: 88008250d8e8 RDI: 
 4000
 [19174.367992] RBP: 880367b53bf8 R08: 880418b77780 R09: 
 00018021
 [19174.368037] R10: c05ec1d9 R11: 00018bf8 R12: 
 0001
 [19174.368081] R13: 88008250d8e8 R14: fffb R15: 
 880367b53c28
 [19174.368125] FS:  7f7fd6831c80() GS:88043fc4()
 knlGS:
 [19174.368172] CS:  0010 DS:  ES:  CR0: 80050033
 [19174.368210] CR2: 7f65f7564770 CR3: 0003ac92f000 CR4: 
 001407e0
 [19174.368257] Stack:
 [19174.368279]  fffb 88008250d800 88042b3d46e0
 88006845f990
 [19174.368327]  880367b53c78 c05f25eb 880367b53c78
 0002
 [19174.368376]  00ff880429e4c670 a910d8fb7e00 
 
 [19174.368424] Call Trace:
 [19174.368459]  [c05f25eb] relocate_block_group+0x2cb/0x510 [btrfs]
 [19174.368509]  [c05f29e0]
 btrfs_relocate_block_group+0x1b0/0x2d0 [btrfs]
 [19174.368562]  [c05c6eab]
 btrfs_relocate_chunk.isra.75+0x4b/0xd0 [btrfs]
 [19174.368615]  [c05c82e8] __btrfs_balance+0x348/0x460 [btrfs]
 [19174.368663]  [c05c87b5] btrfs_balance+0x3b5/0x5d0 [btrfs]
 [19174.368710]  [c05d5cac] btrfs_ioctl_balance+0x1cc/0x530 [btrfs]
 [19174.368756]  [811b52e0] ? handle_mm_fault+0xb0/0x160
 [19174.368802]  [c05d7c7e] btrfs_ioctl+0x69e/0xb20 [btrfs]
 [19174.368845]  [8120f5b5

Re: Uncorrectable errors on RAID6

2015-05-28 Thread Tobias Holst
]  [c05ec43c] ? backref_cache_cleanup+0x6c/0x100 [btrfs]
[19174.369827]  [c05f25eb] relocate_block_group+0x2cb/0x510 [btrfs]
[19174.369827]  [c05f29e0]
btrfs_relocate_block_group+0x1b0/0x2d0 [btrfs]
[19174.369827]  [c05c6eab]
btrfs_relocate_chunk.isra.75+0x4b/0xd0 [btrfs]
[19174.369827]  [c05c82e8] __btrfs_balance+0x348/0x460 [btrfs]
[19174.369827]  [c05c87b5] btrfs_balance+0x3b5/0x5d0 [btrfs]
[19174.369827]  [c05d5cac] btrfs_ioctl_balance+0x1cc/0x530 [btrfs]
[19174.369827]  [811b52e0] ? handle_mm_fault+0xb0/0x160
[19174.369827]  [c05d7c7e] btrfs_ioctl+0x69e/0xb20 [btrfs]
[19174.369827]  [8120f5b5] do_vfs_ioctl+0x75/0x320
[19174.369827]  [8120f8f1] SyS_ioctl+0x91/0xb0
[19174.369827]  [817f098d] system_call_fastpath+0x16/0x1b
[19174.369827] Code: 4e 8b 2c 23 eb cd 66 0f 1f 44 00 00 48 83 c4 28
5b 41 5c 41 5d 41 5e 41 5f 5d c3 90 be 00 10 00 00 4c 89 ef e8 a3 ee
ff ff eb c7 0f 0b 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
44 00
[19174.369827] RIP  [8106875f] cpa_flush_array+0x10f/0x120
[19174.369827]  RSP 880367b52cf8
[19174.369827] ---[ end trace 60adc437bd944044 ]---

After a reboot and a remount it always tried to resume the balance and
and then crashed again, so I had to be quick to do a btrfs balance
cancel. Then I started the scrub and got these uncorrectable errors I
mentioned in the first mail.

I just unmounted it and started a btrfsck. Will post the output when it's done.
It's already showing me several of these:

checksum verify failed on 18523667709952 found C240FB11 wanted 1ED6A587
checksum verify failed on 18523667709952 found C240FB11 wanted 1ED6A587
checksum verify failed on 18523667709952 found 5EAB6BFE wanted BA48D648
checksum verify failed on 18523667709952 found 8E19F60E wanted E3A34D18
checksum verify failed on 18523667709952 found C240FB11 wanted 1ED6A587
bytenr mismatch, want=18523667709952, have=10838194617263884761


Thanks,
Tobias



2015-05-28 4:49 GMT+02:00 Qu Wenruo quwen...@cn.fujitsu.com:


  Original Message  
 Subject: Uncorrectable errors on RAID6
 From: Tobias Holst to...@tobby.eu
 To: linux-btrfs@vger.kernel.org linux-btrfs@vger.kernel.org
 Date: 2015年05月28日 10:18

 Hi

 I am doing a scrub on my 6-drive btrfs RAID6. Last time it found zero
 errors, but now I am getting this in my log:

 [ 6610.888020] BTRFS: checksum error at logical 478232346624 on dev
 /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
 [ 6610.888025] BTRFS: checksum error at logical 478232346624 on dev
 /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
 [ 6610.888029] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 1,
 gen 0
 [ 6611.271334] BTRFS: unable to fixup (regular) error at logical
 478232346624 on dev /dev/dm-2
 [ 6611.831370] BTRFS: checksum error at logical 478232346624 on dev
 /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
 [ 6611.831373] BTRFS: checksum error at logical 478232346624 on dev
 /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
 [ 6611.831375] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 2,
 gen 0
 [ 6612.396402] BTRFS: unable to fixup (regular) error at logical
 478232346624 on dev /dev/dm-2
 [ 6904.027456] BTRFS: checksum error at logical 478232346624 on dev
 /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
 [ 6904.027460] BTRFS: checksum error at logical 478232346624 on dev
 /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
 [ 6904.027463] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 3,
 gen 0

 Looks like it is always the same sector.

 btrfs balance status shows me:
 scrub status for a34ce68b-bb9f-49f0-91fe-21a924ef11ae
  scrub started at Thu May 28 02:25:31 2015, running for 6759
 seconds
  total bytes scrubbed: 448.87GiB with 14 errors
  error details: read=8 csum=6
  corrected errors: 3, uncorrectable errors: 11, unverified errors:
 0

 What does it mean and why are these erros uncorrectable even on a RAID6?
 Can I find out, which files are affected?

 If it's OK for you to put the fs offline,
 btrfsck is the best method to check what happens, although it may take a
 long time.

 There is a known bug that replace can cause checksum error, found by Zhao
 Lei.
 So did you run replace while there is still some other disk I/O happens?

 Thanks,
 Qu


 system: Ubuntu 14.04.2
 kernel version 4.0.4
 btrfs-tools version: 4.0

 Regards
 Tobias
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Uncorrectable errors on RAID6

2015-05-27 Thread Tobias Holst
Hi

I am doing a scrub on my 6-drive btrfs RAID6. Last time it found zero
errors, but now I am getting this in my log:

[ 6610.888020] BTRFS: checksum error at logical 478232346624 on dev
/dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
[ 6610.888025] BTRFS: checksum error at logical 478232346624 on dev
/dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
[ 6610.888029] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[ 6611.271334] BTRFS: unable to fixup (regular) error at logical
478232346624 on dev /dev/dm-2
[ 6611.831370] BTRFS: checksum error at logical 478232346624 on dev
/dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
[ 6611.831373] BTRFS: checksum error at logical 478232346624 on dev
/dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
[ 6611.831375] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
[ 6612.396402] BTRFS: unable to fixup (regular) error at logical
478232346624 on dev /dev/dm-2
[ 6904.027456] BTRFS: checksum error at logical 478232346624 on dev
/dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
[ 6904.027460] BTRFS: checksum error at logical 478232346624 on dev
/dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
[ 6904.027463] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0

Looks like it is always the same sector.

btrfs balance status shows me:
scrub status for a34ce68b-bb9f-49f0-91fe-21a924ef11ae
scrub started at Thu May 28 02:25:31 2015, running for 6759 seconds
total bytes scrubbed: 448.87GiB with 14 errors
error details: read=8 csum=6
corrected errors: 3, uncorrectable errors: 11, unverified errors: 0

What does it mean and why are these erros uncorrectable even on a RAID6?
Can I find out, which files are affected?

system: Ubuntu 14.04.2
kernel version 4.0.4
btrfs-tools version: 4.0

Regards
Tobias
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Repair broken btrfs raid6?

2015-02-15 Thread Tobias Holst
OK, I see. Maybe there is even more damaged...

Now I finished my second backup of the important data and just
killed this damaged raid. I created a new one and now I am restoring
my data. Let's hope it will last longer this time :)

Regards,
Tobias


2015-02-15 4:30 GMT+01:00 Liu Bo bo.li@oracle.com:
 On Fri, Feb 13, 2015 at 10:54:22PM +0100, Tobias Holst wrote:
 It's me again. I just found out why my system crashed during the back up.

 I don't know what it means, but maybe it helps you?

 The warning means somehow checksum becomes inconsistent with file extents, 
 but no clear clues about the cause :-(

 Thanks,

 -liubo


 WARNING: CPU: 7 PID: 22878 at
 /home/kernel/COD/linux/fs/btrfs/extent_io.c:5203
 read_extent_buffer+0xe3/0x120 [btrfs]()
 Modules linked in: raid0(E) ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E)
 ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E) xor(E)
 raid6_pq(E) iosf_mbi(E) dm_crypt(E) kvm_intel(E) kvm(E)
 crct10dif_pclmul(E) ppdev(E) crc32_pclmul(E) ghash_clmulni_intel(E)
 aesni_intel(E) aes_x86_64(E) virtio_rng(E) lrw(E) gf128mul(E)
 glue_helper(E) ablk_helper(E) cryptd(E) serio_raw(E) 8250_fintek(E)
 parport_pc(E) pvpanic(E) i2c_piix4(E) mac_hid(E) lp(E) parport(E)
 cirrus(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) mpt2sas(E) ttm(E)
 drm_kms_helper(E) raid_class(E) floppy(E) psmouse(E)
 scsi_transport_sas(E) drm(E)
  [c05089f3] read_extent_buffer+0xe3/0x120 [btrfs]
  [c04d7dde] __btrfs_lookup_bio_sums.isra.8+0x2ce/0x540 [btrfs]
  [c04d82a6] btrfs_lookup_bio_sums+0x36/0x40 [btrfs]
  [c05301e6] btrfs_submit_compressed_read+0x316/0x4e0 [btrfs]
  [c04ea031] btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
  [c05010ca] submit_one_bio+0x6a/0xa0 [btrfs]
  [c0504958] submit_extent_page.isra.34+0xe8/0x210 [btrfs]
  [c0506087] __do_readpage+0x3f7/0x640 [btrfs]
  [c05057a0] ? clean_io_failure+0x1b0/0x1b0 [btrfs]
  [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
  [c0506606] __extent_readpages.constprop.45+0x266/0x290 [btrfs]
  [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
  [c050734e] extent_readpages+0x15e/0x1a0 [btrfs]
  [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
  [c04e771f] btrfs_readpages+0x1f/0x30 [btrfs]
  [c04dc969] ? btrfs_congested_fn+0x49/0xb0 [btrfs]
  hfs(E) minix(E) ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E)
 xor(E) raid6_pq(E) iosf_mbi(E) dm_crypt(E) kvm_intel(E) kvm(E)
 crct10dif_pclmul(E) ppdev(E) crc32_pclmul(E)
  [c05089ca] ? read_extent_buffer+0xba/0x120 [btrfs]
  [c04d7dde] __btrfs_lookup_bio_sums.isra.8+0x2ce/0x540 [btrfs]
  [c04d82a6] btrfs_lookup_bio_sums+0x36/0x40 [btrfs]
  [c05301e6] btrfs_submit_compressed_read+0x316/0x4e0 [btrfs]
  [c04ea031] btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
  [c05010ca] submit_one_bio+0x6a/0xa0 [btrfs]
  [c0504958] submit_extent_page.isra.34+0xe8/0x210 [btrfs]
  [c0506087] __do_readpage+0x3f7/0x640 [btrfs]
  [c05057a0] ? clean_io_failure+0x1b0/0x1b0 [btrfs]
  [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
  [c0506606] __extent_readpages.constprop.45+0x266/0x290 [btrfs]
  [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
  [c050734e] extent_readpages+0x15e/0x1a0 [btrfs]
  [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
  [c04e771f] btrfs_readpages+0x1f/0x30 [btrfs]
  [c04dc969] ? btrfs_congested_fn+0x49/0xb0 [btrfs]
 Modules linked in: raid0(E) ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E)
 ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E)
  [c05089ca] ? read_extent_buffer+0xba/0x120 [btrfs]
  [c04d7dde] __btrfs_lookup_bio_sums.isra.8+0x2ce/0x540 [btrfs]
  [c04d82a6] btrfs_lookup_bio_sums+0x36/0x40 [btrfs]
  [c05301e6] btrfs_submit_compressed_read+0x316/0x4e0 [btrfs]
  [c04ea031] btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
  [c05010ca] submit_one_bio+0x6a/0xa0 [btrfs]
  [c0504958] submit_extent_page.isra.34+0xe8/0x210 [btrfs]
  [c0506087] __do_readpage+0x3f7/0x640 [btrfs]
  [c05057a0] ? clean_io_failure+0x1b0/0x1b0 [btrfs]
  [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
  [c0506606] __extent_readpages.constprop.45+0x266/0x290 [btrfs]
  [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
  [c050734e] extent_readpages+0x15e/0x1a0 [btrfs]
  [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
  [c04e771f] btrfs_readpages+0x1f/0x30 [btrfs]
  [c04dc969] ? btrfs_congested_fn+0x49/0xb0 [btrfs]

 Regards,
 Tobias


 2015-02-13 19:26 GMT+01:00 Tobias Holst to...@tobby.eu:
  2015-02-13 9:06 GMT+01:00 Liu Bo bo.li@oracle.com:
  On Fri, Feb 13, 2015 at 12:22:16AM +0100, Tobias Holst wrote:
  Hi
 
  I don't remember the exact mkfs.btrfs options anymore but
   ls /sys/fs/btrfs/[UUID]/features/
  shows the following

Re: Repair broken btrfs raid6?

2015-02-13 Thread Tobias Holst
2015-02-13 9:06 GMT+01:00 Liu Bo bo.li@oracle.com:
 On Fri, Feb 13, 2015 at 12:22:16AM +0100, Tobias Holst wrote:
 Hi

 I don't remember the exact mkfs.btrfs options anymore but
  ls /sys/fs/btrfs/[UUID]/features/
 shows the following output:
  big_metadata  compress_lzo  extended_iref  mixed_backref  raid56

 Well... mkfs.btrfs can specify a '-m' for metadata profile and a '-d'
 for data profile, the default profile for metadata is RAID1,
 so we're not sure if your metadata is RAID1 or RAID6, if raid1 and both
 copies are corrupted, then please use your backup.

Ah, I used RAID6 for both, so btrfs fi df /[mountpoint] looks like this:
Data, RAID6: total=13.11TiB, used=13.10TiB
System, RAID6: total=64.00MiB, used=928.00KiB
Metadata, RAID6: total=25.00GiB, used=23.29GiB
GlobalReserve, single: total=512.00MiB, used=0.00B



 I also tested my device with a short
  hdparm -tT /dev/dm5
 and got
  /dev/mapper/sdc_crypt:
   Timing cached reads:   30712 MB in  2.00 seconds = 15376.11 MB/sec
   Timing buffered disk reads: 444 MB in  3.01 seconds = 147.51 MB/sec

 Looks ok to me. Should I test more?

 Okay, looks good.


 I bought a few new hard drives so currently I am copying all my data
 to a second (faster) backup, so I can maybe overwrite the current file
 system, if it's not repairable.

 Another question, have you tried mount -o recovery, did it work?

Yes and no. At the moment I mounted it with
defaults,recovery,ro,compress-force=lzo,nospace_cache,clear_cache. I
am still getting some errors in the syslog, but less than before. Also
it doesn't get unreadable after a while like before. But it seems to
be a little bit slow sometimes and two times the whole system freezed
until I did a hard reset.


 Thanks,

 -liubo

 Regards,
 Tobias


 2015-02-12 10:16 GMT+01:00 Liu Bo bo.li@oracle.com:
  On Wed, Feb 11, 2015 at 03:46:33PM +0100, Tobias Holst wrote:
  Hmm, it looks like it is getting worse... Here are some parts of my
  syslog, including two crashed btrfs-threads:
 
  So I am still getting many of these:
   BTRFS (device dm-5): parent transid verify failed on 25033166798848 
   wanted 108976 found 108958
   BTRFS warning (device dm-5): page private not zero on page 
   25033166798848
   BTRFS warning (device dm-5): page private not zero on page 
   25033166802944
   BTRFS warning (device dm-5): page private not zero on page 
   25033166807040
   BTRFS warning (device dm-5): page private not zero on page 
   25033166811136
 
  First we probably make sure that your device is well setup, since these
  messages usually occur after a drive is removed(the device is somehow 
  droping
  writes), the below -EIO also implies btrfs cannot read/write data from or 
  to that drive.
 
  And in theory, RAID6 can tolerate two drive failures, so what's your 
  mkfs.btrfs option?
 
  Thanks,
 
  -liubo
 
   BTRFS info (device dm-5): force lzo compression
   BTRFS info (device dm-5): disk space caching is enabled
   BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 
   found B18E3934 level 0
   BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 
   found B18E3934 level 0
   BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 
   found B18E3934 level 0
   BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 
   found B18E3934 level 0
   BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 
   found B18E3934 level 0
   BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 
   found B18E3934 level 0
 
  Then there is this crash of super/btrfs_abort_transaction:
   [ cut here ]
   WARNING: CPU: 0 PID: 30526 at 
   /home/kernel/COD/linux/fs/btrfs/super.c:260 
   __btrfs_abort_transaction+0x5f/0x140 [btrfs]()
   BTRFS: Transaction aborted (error -5)
   Modules linked in: ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E) ntfs(E) 
   msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) 
   iosf_mbi(E) dm_crypt(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E) 
   crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) aesni_intel(E) 
   aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) 
   cryptd(E) 8250_fintek(E) serio_raw(E) virtio_rng(E) parport_pc(E) 
   mac_hid(E) pvpanic(E) i2c_piix4(E) lp(E) parport(E) cirrus(E) 
   syscopyarea(E) sysfillrect(E) sysimgblt(E) ttm(E) mpt2sas(E) 
   drm_kms_helper(E) raid_class(E) floppy(E) psmouse(E) drm(E) 
   scsi_transport_sas(E)
   CPU: 0 PID: 30526 Comm: kworker/u16:6 Tainted: GW   E  
   3.19.0-031900-generic #201502091451
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
   01/01/2011
   Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
   0104 880002743c18 817c4c00 0007
   880002743c68 880002743c58 81076e87 880002743c58
   88020a8694d0 8801fb715800 fffb 0ae8
   Call Trace:
   [817c4c00] dump_stack+0x45/0x57

Re: Repair broken btrfs raid6?

2015-02-13 Thread Tobias Holst
It's me again. I just found out why my system crashed during the back up.

I don't know what it means, but maybe it helps you?

WARNING: CPU: 7 PID: 22878 at
/home/kernel/COD/linux/fs/btrfs/extent_io.c:5203
read_extent_buffer+0xe3/0x120 [btrfs]()
Modules linked in: raid0(E) ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E)
ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E) xor(E)
raid6_pq(E) iosf_mbi(E) dm_crypt(E) kvm_intel(E) kvm(E)
crct10dif_pclmul(E) ppdev(E) crc32_pclmul(E) ghash_clmulni_intel(E)
aesni_intel(E) aes_x86_64(E) virtio_rng(E) lrw(E) gf128mul(E)
glue_helper(E) ablk_helper(E) cryptd(E) serio_raw(E) 8250_fintek(E)
parport_pc(E) pvpanic(E) i2c_piix4(E) mac_hid(E) lp(E) parport(E)
cirrus(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) mpt2sas(E) ttm(E)
drm_kms_helper(E) raid_class(E) floppy(E) psmouse(E)
scsi_transport_sas(E) drm(E)
 [c05089f3] read_extent_buffer+0xe3/0x120 [btrfs]
 [c04d7dde] __btrfs_lookup_bio_sums.isra.8+0x2ce/0x540 [btrfs]
 [c04d82a6] btrfs_lookup_bio_sums+0x36/0x40 [btrfs]
 [c05301e6] btrfs_submit_compressed_read+0x316/0x4e0 [btrfs]
 [c04ea031] btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
 [c05010ca] submit_one_bio+0x6a/0xa0 [btrfs]
 [c0504958] submit_extent_page.isra.34+0xe8/0x210 [btrfs]
 [c0506087] __do_readpage+0x3f7/0x640 [btrfs]
 [c05057a0] ? clean_io_failure+0x1b0/0x1b0 [btrfs]
 [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
 [c0506606] __extent_readpages.constprop.45+0x266/0x290 [btrfs]
 [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
 [c050734e] extent_readpages+0x15e/0x1a0 [btrfs]
 [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
 [c04e771f] btrfs_readpages+0x1f/0x30 [btrfs]
 [c04dc969] ? btrfs_congested_fn+0x49/0xb0 [btrfs]
 hfs(E) minix(E) ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E)
xor(E) raid6_pq(E) iosf_mbi(E) dm_crypt(E) kvm_intel(E) kvm(E)
crct10dif_pclmul(E) ppdev(E) crc32_pclmul(E)
 [c05089ca] ? read_extent_buffer+0xba/0x120 [btrfs]
 [c04d7dde] __btrfs_lookup_bio_sums.isra.8+0x2ce/0x540 [btrfs]
 [c04d82a6] btrfs_lookup_bio_sums+0x36/0x40 [btrfs]
 [c05301e6] btrfs_submit_compressed_read+0x316/0x4e0 [btrfs]
 [c04ea031] btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
 [c05010ca] submit_one_bio+0x6a/0xa0 [btrfs]
 [c0504958] submit_extent_page.isra.34+0xe8/0x210 [btrfs]
 [c0506087] __do_readpage+0x3f7/0x640 [btrfs]
 [c05057a0] ? clean_io_failure+0x1b0/0x1b0 [btrfs]
 [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
 [c0506606] __extent_readpages.constprop.45+0x266/0x290 [btrfs]
 [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
 [c050734e] extent_readpages+0x15e/0x1a0 [btrfs]
 [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
 [c04e771f] btrfs_readpages+0x1f/0x30 [btrfs]
 [c04dc969] ? btrfs_congested_fn+0x49/0xb0 [btrfs]
Modules linked in: raid0(E) ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E)
ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E)
 [c05089ca] ? read_extent_buffer+0xba/0x120 [btrfs]
 [c04d7dde] __btrfs_lookup_bio_sums.isra.8+0x2ce/0x540 [btrfs]
 [c04d82a6] btrfs_lookup_bio_sums+0x36/0x40 [btrfs]
 [c05301e6] btrfs_submit_compressed_read+0x316/0x4e0 [btrfs]
 [c04ea031] btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
 [c05010ca] submit_one_bio+0x6a/0xa0 [btrfs]
 [c0504958] submit_extent_page.isra.34+0xe8/0x210 [btrfs]
 [c0506087] __do_readpage+0x3f7/0x640 [btrfs]
 [c05057a0] ? clean_io_failure+0x1b0/0x1b0 [btrfs]
 [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
 [c0506606] __extent_readpages.constprop.45+0x266/0x290 [btrfs]
 [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
 [c050734e] extent_readpages+0x15e/0x1a0 [btrfs]
 [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
 [c04e771f] btrfs_readpages+0x1f/0x30 [btrfs]
 [c04dc969] ? btrfs_congested_fn+0x49/0xb0 [btrfs]

Regards,
Tobias


2015-02-13 19:26 GMT+01:00 Tobias Holst to...@tobby.eu:
 2015-02-13 9:06 GMT+01:00 Liu Bo bo.li@oracle.com:
 On Fri, Feb 13, 2015 at 12:22:16AM +0100, Tobias Holst wrote:
 Hi

 I don't remember the exact mkfs.btrfs options anymore but
  ls /sys/fs/btrfs/[UUID]/features/
 shows the following output:
  big_metadata  compress_lzo  extended_iref  mixed_backref  raid56

 Well... mkfs.btrfs can specify a '-m' for metadata profile and a '-d'
 for data profile, the default profile for metadata is RAID1,
 so we're not sure if your metadata is RAID1 or RAID6, if raid1 and both
 copies are corrupted, then please use your backup.

 Ah, I used RAID6 for both, so btrfs fi df /[mountpoint] looks like this:
 Data, RAID6: total=13.11TiB, used=13.10TiB
 System, RAID6: total=64.00MiB, used=928.00KiB
 Metadata, RAID6: total=25.00GiB, used=23.29GiB
 GlobalReserve, single: total=512.00MiB

Re: Repair broken btrfs raid6?

2015-02-12 Thread Tobias Holst
Hi

I don't remember the exact mkfs.btrfs options anymore but
 ls /sys/fs/btrfs/[UUID]/features/
shows the following output:
 big_metadata  compress_lzo  extended_iref  mixed_backref  raid56

I also tested my device with a short
 hdparm -tT /dev/dm5
and got
 /dev/mapper/sdc_crypt:
  Timing cached reads:   30712 MB in  2.00 seconds = 15376.11 MB/sec
  Timing buffered disk reads: 444 MB in  3.01 seconds = 147.51 MB/sec

Looks ok to me. Should I test more?

I bought a few new hard drives so currently I am copying all my data
to a second (faster) backup, so I can maybe overwrite the current file
system, if it's not repairable.

Regards,
Tobias


2015-02-12 10:16 GMT+01:00 Liu Bo bo.li@oracle.com:
 On Wed, Feb 11, 2015 at 03:46:33PM +0100, Tobias Holst wrote:
 Hmm, it looks like it is getting worse... Here are some parts of my
 syslog, including two crashed btrfs-threads:

 So I am still getting many of these:
  BTRFS (device dm-5): parent transid verify failed on 25033166798848 wanted 
  108976 found 108958
  BTRFS warning (device dm-5): page private not zero on page 25033166798848
  BTRFS warning (device dm-5): page private not zero on page 25033166802944
  BTRFS warning (device dm-5): page private not zero on page 25033166807040
  BTRFS warning (device dm-5): page private not zero on page 25033166811136

 First we probably make sure that your device is well setup, since these
 messages usually occur after a drive is removed(the device is somehow droping
 writes), the below -EIO also implies btrfs cannot read/write data from or to 
 that drive.

 And in theory, RAID6 can tolerate two drive failures, so what's your 
 mkfs.btrfs option?

 Thanks,

 -liubo

  BTRFS info (device dm-5): force lzo compression
  BTRFS info (device dm-5): disk space caching is enabled
  BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found 
  B18E3934 level 0
  BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found 
  B18E3934 level 0
  BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found 
  B18E3934 level 0
  BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found 
  B18E3934 level 0
  BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found 
  B18E3934 level 0
  BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found 
  B18E3934 level 0

 Then there is this crash of super/btrfs_abort_transaction:
  [ cut here ]
  WARNING: CPU: 0 PID: 30526 at /home/kernel/COD/linux/fs/btrfs/super.c:260 
  __btrfs_abort_transaction+0x5f/0x140 [btrfs]()
  BTRFS: Transaction aborted (error -5)
  Modules linked in: ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E) ntfs(E) 
  msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) 
  iosf_mbi(E) dm_crypt(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E) 
  crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) aesni_intel(E) 
  aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) 
  8250_fintek(E) serio_raw(E) virtio_rng(E) parport_pc(E) mac_hid(E) 
  pvpanic(E) i2c_piix4(E) lp(E) parport(E) cirrus(E) syscopyarea(E) 
  sysfillrect(E) sysimgblt(E) ttm(E) mpt2sas(E) drm_kms_helper(E) 
  raid_class(E) floppy(E) psmouse(E) drm(E) scsi_transport_sas(E)
  CPU: 0 PID: 30526 Comm: kworker/u16:6 Tainted: GW   E  
  3.19.0-031900-generic #201502091451
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
  01/01/2011
  Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
  0104 880002743c18 817c4c00 0007
  880002743c68 880002743c58 81076e87 880002743c58
  88020a8694d0 8801fb715800 fffb 0ae8
  Call Trace:
  [817c4c00] dump_stack+0x45/0x57
  [81076e87] warn_slowpath_common+0x97/0xe0
  [81076f86] warn_slowpath_fmt+0x46/0x50
  [c06375cf] __btrfs_abort_transaction+0x5f/0x140 [btrfs]
  [c0655105] btrfs_run_delayed_refs.part.82+0x175/0x290 [btrfs]
  [c0655237] btrfs_run_delayed_refs+0x17/0x20 [btrfs]
  [c0655507] delayed_ref_async_start+0x37/0x90 [btrfs]
  [c069720e] normal_work_helper+0x7e/0x1b0 [btrfs]
  [c0697572] btrfs_extent_refs_helper+0x12/0x20 [btrfs]
  [8108f76d] process_one_work+0x14d/0x460
  [8109014b] worker_thread+0x11b/0x3f0
  [81090030] ? create_worker+0x1e0/0x1e0
  [81095d59] kthread+0xc9/0xe0
  [81095c90] ? flush_kthread_worker+0x90/0x90
  [817d1e7c] ret_from_fork+0x7c/0xb0
  [81095c90] ? flush_kthread_worker+0x90/0x90
  ---[ end trace dd65465954546462 ]---
  BTRFS: error (device dm-5) in btrfs_run_delayed_refs:2792: errno=-5 IO 
  failure
  BTRFS info (device dm-5): forced readonly

 and this crash of delayed-ref/btrfs_select_ref_head:
  [ cut here ]
  WARNING: CPU: 7 PID: 3159 at 
  /home/kernel/COD/linux/fs/btrfs/delayed-ref.c:438 
  btrfs_select_ref_head+0x120/0x130 [btrfs

Re: Repair broken btrfs raid6?

2015-02-11 Thread Tobias Holst
 [btrfs]
 [c0652cd1] __btrfs_run_delayed_refs+0x1e1/0x5f0 [btrfs]
 [c0654ffa] btrfs_run_delayed_refs.part.82+0x6a/0x290 [btrfs]
 [c0664e5c] ? join_transaction.isra.31+0x13c/0x380 [btrfs]
 [c0655237] btrfs_run_delayed_refs+0x17/0x20 [btrfs]
 [c0665e50] btrfs_commit_transaction+0xb0/0xa70 [btrfs]
 [c0663d95] transaction_kthread+0x1d5/0x250 [btrfs]
 [c0663bc0] ? open_ctree+0x1f40/0x1f40 [btrfs]
 [81095d59] kthread+0xc9/0xe0
 [81095c90] ? flush_kthread_worker+0x90/0x90
 [817d1e7c] ret_from_fork+0x7c/0xb0
 [81095c90] ? flush_kthread_worker+0x90/0x90
 ---[ end trace dd65465954546463 ]---
 BTRFS warning (device dm-5): Skipping commit of aborted transaction.
 BTRFS: error (device dm-5) in cleanup_transaction:1670: errno=-5 IO failure


Any thoughts? Would it help to unplug the dm5-device which seems to
be causing this errors and then balance the array?

Regards,
Tobias

2015-02-09 23:45 GMT+01:00 Tobias Holst to...@tobby.eu:
 Hi

 I'm having some trouble with my six-drives btrfs raid6 (each drive
 encrypted with LUKS). At first: Yes, I do have backups, but it may
 take at least days, maybe weeks or even some month to restore
 everything from the (offside) backups. So it is not essential to
 recover the data, but would be great ;-)

 OS: Ubuntu 14.04
 Kernel: 3.19.0
 btrfs-progs: 3.19-rc2

 When booting my server I am getting this in the syslog:
 [8.026362] BTRFS: device label tobby-btrfs devid 3 transid 108721 
 /dev/dm-0
 [8.118896] BTRFS: device label tobby-btrfs devid 6 transid 108721 
 /dev/dm-1
 [8.202477] BTRFS: device label tobby-btrfs devid 1 transid 108721 
 /dev/dm-2
 [8.520988] BTRFS: device label tobby-btrfs devid 4 transid 108721 
 /dev/dm-3
 [8.70] BTRFS info (device dm-3): force lzo compression
 [8.74] BTRFS info (device dm-3): disk space caching is enabled
 [8.556310] BTRFS: failed to read the system array on dm-3
 [8.592135] BTRFS: open_ctree failed
 [9.039187] BTRFS: device label tobby-btrfs devid 2 transid 108721 
 /dev/dm-4
 [9.107779] BTRFS: device label tobby-btrfs devid 5 transid 108721 
 /dev/dm-5
 Looks like there is something wrong on drive 3, giving me open_ctree
 failed. I have to press S to skip mounting of the btrfs volume. It
 boots and with sudo mount --all I can successfully mount the btrfs
 volume. Sometimes it takes one or two minutes but it will mount.

 After a while I am sometimes/randomly getting this in the syslog:
 [ 1161.283246] BTRFS: dm-5 checksum verify failed on 39099619901440 wanted 
 BB5B0AD5 found 6B6F5040 level 0
 Looks like something else is broken on dm-5... But shouldn't this be
 repaired with the new raid56-repair-features of kernel 3.19?

 After some more time I am getting this:
 [637017.631044] BTRFS (device dm-4): parent transid verify failed on 
 39099305132032 wanted 108722 found 108719
 Then it is not possible to access the mounted volume anymore. I have
 to umount -l to unmount it and then I can remount it. Until it
 happens again (after some time)...

 I also tried a balance and a scrub but they crash. Syslog is full of
 messages like the following examples:
 [ 3355.523157] csum_tree_block: 53 callbacks suppressed
 [ 3355.523160] BTRFS: dm-5 checksum verify failed on 39099306917888 wanted 
 F90D8231 found 5981C697 level 0
 [ 4006.935632]  BTRFS (device dm-5): parent transid verify failed on 
 30525418536960 wanted 108975 found 108767
 and btrfs scrub status /[device] gives me the following output:
 scrub status for [UUID]
scrub started at Mon Feb  9 18:16:38 2015 and was aborted after 2008 
 seconds
total bytes scrubbed: 113.04GiB with 0 errors

 So a short summary:
 - btrfs raid6 on 3.19.0 with btrfs-progs 3.19-rc2
 - does not mount at boot up, open_ctree failed (disk 3)
 - mounts successfully after bootup
 - randomly checksum verify failed (disk 5)
 - balance and scrub crash after some time
 - after a while the volume gets unreadable, saying parent transid
 verify failed (disk 4 or 5)

 And it looks like there still is no way to btrfsck a raid6.

 Any ideas how to repair this filesystem?

 Regards,
 Tobias
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Repair broken btrfs raid6?

2015-02-10 Thread Tobias Holst
2015-02-10 8:17 GMT+01:00 Kai Krakow hurikha...@gmail.com:
 Tobias Holst to...@tobby.eu schrieb:

 and btrfs scrub status /[device] gives me the following output:
 scrub status for [UUID]
scrub started at Mon Feb  9 18:16:38 2015 and was aborted after 2008
seconds total bytes scrubbed: 113.04GiB with 0 errors

 Does not look very correct to me:

 Why should a scrub in a six-drivers btrfs array which is probably multi-
 terabytes big (as you state a restore from backup would take days) take only
 ~2000 seconds? And scrub only ~120 GB worth of data. Either your 6 devices
 are really small (then why RAID-6), or your data is very sparse (then way
 does it take so long), or scrub prematurely aborts and never checks the
 complete devices (I guess this is it).

Yes, sorry, I didn't post an output of btrfs filesystem show - but here it is:

Label: 'tobby-btrfs'  uuid: b689ab76-7ff5-434c-a2c6-03efb45faa46
Total devices 6 FS bytes used 13.13TiB
devid1 size 3.64TiB used 3.28TiB path /dev/mapper/sde_crypt
devid2 size 3.64TiB used 3.28TiB path /dev/mapper/sdd_crypt
devid3 size 3.64TiB used 3.28TiB path /dev/mapper/sdf_crypt
devid4 size 3.64TiB used 3.28TiB path /dev/mapper/sda_crypt
devid5 size 3.64TiB used 3.28TiB path /dev/mapper/sdb_crypt
devid6 size 3.64TiB used 3.28TiB path /dev/mapper/sdc_crypt
btrfs-progs v3.19-rc2

So there are ~13TiB of data on this raid6 - but like it says it was
aborted after 2008 seconds (about half an hour) and ~120GB of data.
Then a parent transid verify failed happened, the volume got
unreadable and the scrub was aborted. Until a remount of the btrfs -
and until it happens again...


 And that's what it actually says: aborted after 2008 seconds. I'd expect
 finished after  seconds if I remember my scrub runs correctly (which I
 currently don't do regularly because it takes long and IO performance sucks
 during running it).

 --
 Replies to list only preferred.

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs features

2015-02-09 Thread Tobias Holst
Hi

I am just looking at the features enabled on my btrfs volume.
 ls /sys/fs/btrfs/[UUID]/features/
shows the following output:
 big_metadata  compress_lzo  extended_iref  mixed_backref  raid56

So big_metadata means I am not using skinny-metadata,
compress_lzo means I am using compression. raid56 means I am using
the experimental RAID-features of btrfs.



But the other two flags are a little bit unclear... I think extended
_iref is the extref feature of mkfs.btrfs - right?

I am not sure about the mixed_backref feature. What does it mean? Is
this the mixed-bg-feature of mkfs.btrfs?

Also I try to change these features. I am missing the skinny extends,
this can be enabled by btrfstune -x [one device of my raid],
correct?

And how can I enable the missing no-holes-feature on my volume?

Regards,
Tobias
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Repair broken btrfs raid6?

2015-02-09 Thread Tobias Holst
Hi

I'm having some trouble with my six-drives btrfs raid6 (each drive
encrypted with LUKS). At first: Yes, I do have backups, but it may
take at least days, maybe weeks or even some month to restore
everything from the (offside) backups. So it is not essential to
recover the data, but would be great ;-)

OS: Ubuntu 14.04
Kernel: 3.19.0
btrfs-progs: 3.19-rc2

When booting my server I am getting this in the syslog:
 [8.026362] BTRFS: device label tobby-btrfs devid 3 transid 108721 
 /dev/dm-0
 [8.118896] BTRFS: device label tobby-btrfs devid 6 transid 108721 
 /dev/dm-1
 [8.202477] BTRFS: device label tobby-btrfs devid 1 transid 108721 
 /dev/dm-2
 [8.520988] BTRFS: device label tobby-btrfs devid 4 transid 108721 
 /dev/dm-3
 [8.70] BTRFS info (device dm-3): force lzo compression
 [8.74] BTRFS info (device dm-3): disk space caching is enabled
 [8.556310] BTRFS: failed to read the system array on dm-3
 [8.592135] BTRFS: open_ctree failed
 [9.039187] BTRFS: device label tobby-btrfs devid 2 transid 108721 
 /dev/dm-4
 [9.107779] BTRFS: device label tobby-btrfs devid 5 transid 108721 
 /dev/dm-5
Looks like there is something wrong on drive 3, giving me open_ctree
failed. I have to press S to skip mounting of the btrfs volume. It
boots and with sudo mount --all I can successfully mount the btrfs
volume. Sometimes it takes one or two minutes but it will mount.

After a while I am sometimes/randomly getting this in the syslog:
 [ 1161.283246] BTRFS: dm-5 checksum verify failed on 39099619901440 wanted 
 BB5B0AD5 found 6B6F5040 level 0
Looks like something else is broken on dm-5... But shouldn't this be
repaired with the new raid56-repair-features of kernel 3.19?

After some more time I am getting this:
 [637017.631044] BTRFS (device dm-4): parent transid verify failed on 
 39099305132032 wanted 108722 found 108719
Then it is not possible to access the mounted volume anymore. I have
to umount -l to unmount it and then I can remount it. Until it
happens again (after some time)...

I also tried a balance and a scrub but they crash. Syslog is full of
messages like the following examples:
 [ 3355.523157] csum_tree_block: 53 callbacks suppressed
 [ 3355.523160] BTRFS: dm-5 checksum verify failed on 39099306917888 wanted 
 F90D8231 found 5981C697 level 0
 [ 4006.935632]  BTRFS (device dm-5): parent transid verify failed on 
 30525418536960 wanted 108975 found 108767
and btrfs scrub status /[device] gives me the following output:
 scrub status for [UUID]
scrub started at Mon Feb  9 18:16:38 2015 and was aborted after 2008 
 seconds
total bytes scrubbed: 113.04GiB with 0 errors

So a short summary:
- btrfs raid6 on 3.19.0 with btrfs-progs 3.19-rc2
- does not mount at boot up, open_ctree failed (disk 3)
- mounts successfully after bootup
- randomly checksum verify failed (disk 5)
- balance and scrub crash after some time
- after a while the volume gets unreadable, saying parent transid
verify failed (disk 4 or 5)

And it looks like there still is no way to btrfsck a raid6.

Any ideas how to repair this filesystem?

Regards,
Tobias
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to repair a damaged filesystem with btrfs raid5

2015-02-02 Thread Tobias Holst
Hi.

There is a known bug when you re-plug in a missing hdd of a btrfs raid
without wiping the device before. In worst case this results in a
totally corrupted filesystem as it did sometimes during my tests of
the raid6 implementation. With raid1 it may just go back in time to
the point when you unplugged the device. Which is also bad but still
no complete data loss - but in raid6 sometimes it was worse.

Sounds like you did that (plug in the missing device without wiping)?

Next thing is, that scrub and filesystem-check of raid5/6 is not
implemented/completed (yet) as Duncan said. It will be (mostly)
included in 3.19, but maybe with bugs.

You may try to do a balance instead of a scrub as this should read and
check your data and then write it back. This worked for me most of the
time during my personal raid6 stability and stress tests. But maybe
your filesystem has already been corrupted...
Give it a try :)

Regards
Tobias


2015-01-27 10:12 GMT+01:00 Alexander Fieroch
alexander.fier...@mpi-dortmund.mpg.de:
 Hello,

 I'm testing btrfs RAID5 on three encrypted hdds (dm-crypt) and I'm
 simulating a harddisk failure by unplugging one device while writing some
 files.
 Now the filesystem is damaged. By now is there any chance to repair the
 filesystem?

 My operating system is ubuntu server (vivid) with kernel 3.18 and btrfs
 3.18.1 (external PPA).
 I've unplugged device sdb with UUID 65f62f63-6526-4d5e-82d4-adf6d7508092 and
 crypt device name /dev/mapper/crypt-1. This one should be repaired.
 Attached is the dmesg log file with corresponding errors.

 btrfs check do not seem to work.

 # btrfs check --repair /dev/mapper/crypt-1 enabling repair mode
 Checking filesystem on /dev/mapper/crypt-1
 UUID: 504c2850-3977-4340-8849-18dd3ac2e5e4
 checking extents
 Check tree block failed, want=165396480, have=5385177728513973313
 Check tree block failed, want=165396480, have=5385177728513973313
 Check tree block failed, want=165396480, have=65536
 Check tree block failed, want=165396480, have=5385177728513973313
 Check tree block failed, want=165396480, have=5385177728513973313
 read block failed check_tree_block
 Check tree block failed, want=165740544, have=6895225932619678086
 Check tree block failed, want=165740544, have=6895225932619678086
 Check tree block failed, want=165740544, have=65536
 Check tree block failed, want=165740544, have=6895225932619678086
 Check tree block failed, want=165740544, have=6895225932619678086
 read block failed check_tree_block
 Check tree block failed, want=165756928, have=13399486021073017810
 Check tree block failed, want=165756928, have=13399486021073017810
 Check tree block failed, want=165756928, have=65536
 Check tree block failed, want=165756928, have=13399486021073017810
 Check tree block failed, want=165756928, have=13399486021073017810
 read block failed check_tree_block
 Check tree block failed, want=165773312, have=12571697019259051064
 Check tree block failed, want=165773312, have=12571697019259051064
 Check tree block failed, want=165773312, have=65536
 Check tree block failed, want=165773312, have=12571697019259051064
 Check tree block failed, want=165773312, have=12571697019259051064
 read block failed check_tree_block
 Check tree block failed, want=165789696, have=4069002570438424782
 Check tree block failed, want=165789696, have=4069002570438424782
 Check tree block failed, want=165789696, have=65536
 Check tree block failed, want=165789696, have=4069002570438424782
 Check tree block failed, want=165789696, have=4069002570438424782
 read block failed check_tree_block
 Check tree block failed, want=165838848, have=9612508092910615774
 Check tree block failed, want=165838848, have=9612508092910615774
 Check tree block failed, want=165838848, have=65536
 Check tree block failed, want=165838848, have=9612508092910615774
 Check tree block failed, want=165838848, have=9612508092910615774
 read block failed check_tree_block
 ref mismatch on [99516416 16384] extent item 1, found 0
 failed to repair damaged filesystem, aborting



 Trying a btrfs scrub is finishing with uncorrectable errors:
 # btrfs scrub start -d /dev/mapper/crypt-1 scrub started on
 /dev/mapper/crypt-1, fsid 504c2850-3977-4340-8849-18dd3ac2e5e4 (pid=2014)
 # btrfs scrub status -d /mnt/data/
 scrub status for 504c2850-3977-4340-8849-18dd3ac2e5e4
 scrub device /dev/mapper/crypt-1 (id 1) history
 scrub started at Mon Jan 26 14:36:57 2015 and finished after 617
 seconds
 total bytes scrubbed: 29.78GiB with 10906 errors
 error details: csum=10906
 corrected errors: 0, uncorrectable errors: 10906, unverified errors:
 0
 scrub device /dev/mapper/crypt-2 (id 2) no stats available
 scrub device /dev/mapper/crypt-3 (id 3) no stats available


 Any chance to fix the errors or do I have to wait for the next btrfs
 version?
 Thank you very much,
 Alexander


 # uname -a
 Linux antares 3.18.0-9-generic #10-Ubuntu SMP Mon Jan 12 21:41:54 UTC 2015
 x86_64 x86_64 x86_64 

Re: filesystem corruption

2014-11-02 Thread Tobias Holst
Thank you for your reply.

I'll answer in-line.


2014-11-02 5:49 GMT+01:00 Robert White rwh...@pobox.com:
 On 10/31/2014 10:34 AM, Tobias Holst wrote:

 I am now using another system with kernel 3.17.2 and btrfs-tools 3.17
 and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add
 the second one as there are only two slots in that server.

 This is what I got:

   tobby@ubuntu: sudo btrfs check /dev/sdb1
 warning, device 2 is missing
 warning devid 2 not found already
 root item for root 1746, current bytenr 80450240512, current gen
 163697, current level 2, new bytenr 40074067968, new gen 163707, new
 level 2
 Found 1 roots with an outdated root item.
 Please run a filesystem check with the option --repair to fix them.

   tobby@ubuntu: sudo btrfs check --repair /dev/sdb1
 enabling repair mode
 warning, device 2 is missing
 warning devid 2 not found already
 Unable to find block group for 0
 extent-tree.c:289: find_search_start: Assertion `1` failed.


 The read-only snapshots taken under 3.17.1 are your core problem.

OK


 Now btrfsck is refusing to operate on the degraded RAID because degraded
 RAID is degraded so it's read-only. (this is an educated guess). Since
 btrfsck is _not_ a mount type of operation its got no degraded mode that
 would let you deal with half a RAID as far as I know.

OK, good to know.


 In your case...

 It is _known_ that you need to be _not_ running 3.17.0 or 3.17.1 if you are
 going to make read-only snapshots safely.
 It is _known_ that you need to be running 3.17.2 to get a number of fixes
 that impact your circumstance.
 It is _known_ that you need to be running btrfs-progs 3.17 to repair the
 read-only snapshot that are borked up, and that you must _not_ have
 previously tried to repair the problme with an older btrfsck.

No, I didn't try to repair it with older kernels/btrfs-tools.


 Were I you, I would...

 Put the two disks back in the same computer before something bad happens.

 Upgrade that computer to 3.17.2 and 3.17 respectively.

As I mentioned before I only have two slots and my system on this
btrfs-raid1 is not working anymore. Not just when accessing
ro-snapshots - it crashes everytime at the login prompt. So now I
installed Ubuntu 14.04 to an USB stick (so I can readd both btrfs
HDDs) and upgraded the kernel to 3.17.2 and btrfs-tools to 3.17.


 Take a backup (because I am paranoid like that, though current threat seems
 negligible).

I already have a backup. :)


 btrfsck your raid with --repair.

OK. And this is what I get now:

tobby@ubuntu: sudo btrfs check /dev/sda1
root item for root 1746, current bytenr 80450240512, current gen
163697, current level 2, new bytenr 40074067968, new gen 163707, new
level 2
Found 1 roots with an outdated root item.
Please run a filesystem check with the option --repair to fix them.

tobby@ubuntu: sudo btrfs check /dev/sda1 --repair
enabling repair mode
fixing root item for root 1746, current bytenr 80450240512, current
gen 163697, current level 2, new bytenr 40074067968, new gen 163707,
new level 2
Fixed 1 roots.
Checking filesystem on /dev/sda1
UUID: 3ad065be-2525-4547-87d3-0e195497f9cf
checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 18446744073709551607 inode 258 errors 1000, some csum missing
found 36031450184 bytes used err is 1
total csum bytes: 59665716
total tree bytes: 3523330048
total fs tree bytes: 3234054144
total extent tree bytes: 202358784
btree space waste bytes: 755547262
file data blocks allocated: 122274091008
 referenced 211741990912
Btrfs v3.17


 Alternately, if you previously tried to btrfsck the raid with a version
 prior to 3.17 tools after the read-only snapshot(s) problem, you will need
 to resort to mkfs.btrfs to solve the problem. But Hey! you have two disks,
 so break the RAID, then mkfs one of them, then copy the data, then re-make
 the RAID such that the new FS rules.

 Enjoy your system no longer taking racy read-only snapshots... 8-)



And this worked! :) Server is back online without restoring any
files from the backup. Looks good to me!

But I can't do a balance anymore?

root@t-mon:~# btrfs balance start /dev/sda1
ERROR: can't access '/dev/sda1'

Regards
Tobias
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filesystem corruption

2014-10-31 Thread Tobias Holst
I am now using another system with kernel 3.17.2 and btrfs-tools 3.17
and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add
the second one as there are only two slots in that server.

This is what I got:

 tobby@ubuntu: sudo btrfs check /dev/sdb1
warning, device 2 is missing
warning devid 2 not found already
root item for root 1746, current bytenr 80450240512, current gen
163697, current level 2, new bytenr 40074067968, new gen 163707, new
level 2
Found 1 roots with an outdated root item.
Please run a filesystem check with the option --repair to fix them.

 tobby@ubuntu: sudo btrfs check --repair /dev/sdb1
enabling repair mode
warning, device 2 is missing
warning devid 2 not found already
Unable to find block group for 0
extent-tree.c:289: find_search_start: Assertion `1` failed.
btrfs[0x42bd62]
btrfs[0x42ffe5]
btrfs[0x430211]
btrfs[0x4246ec]
btrfs[0x424d11]
btrfs[0x426af3]
btrfs[0x41b18c]
btrfs[0x40b46a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7ffca1119ec5]
btrfs[0x40b497]

This can be repeated as often as I want ;) Nothing changed.

Regards
Tobias


2014-10-31 3:41 GMT+01:00 Rich Freeman r-bt...@thefreemanclan.net:
 On Thu, Oct 30, 2014 at 9:02 PM, Tobias Holst to...@tobby.eu wrote:
 Addition:
 I found some posts here about a general file system corruption in 3.17
 and 3.17.1 - is this the cause?
 Additionally I am using ro-snapshots - maybe this is the cause, too?

 Anyway: Can I fix that or do I have to reinstall? Haven't touched the
 filesystem, just did a scrub (found 0 errors).


 Yup - ro-snapshots is a big problem in 3.17.  You can probably recover now by:
 1.  Update your kernel to 3.17.2 - that takes care of all the big
 known 3.16/17 issues in general.
 2.  Run btrfs check using btrfs-tools 3.17.  That can clean up the
 broken snapshots in your filesystem.

 That is fairly likely to get your filesystem working normally again.
 It worked for me.  I was getting some balance issues when trying to
 add another device and I'm not sure if 3.17.2 totally fixed that - I
 ended up cancelling the balance and it will be a while before I have
 to balance this particular filesystem again, so I'll just hold off and
 hope things stabilize.

 --
 Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


filesystem corruption

2014-10-30 Thread Tobias Holst
Hi

I was using a btrfs RAID1 with two disks under Ubuntu 14.04, kernel
3.13 and btrfs-tools 3.14.1 for weeks without issues.

Now I updated to kernel 3.17.1 and btrfs-tools 3.17. After a reboot
everything looked fine and I started some tests. While running
duperemover (just scanning, not doing anything) and a balance at the
same time the load suddenly went up to 30 and the system was not
responding anymore. Everyhting working with the filesystem stopped
responding. So I did a hard reset.

I was able to reboot, but on the login prompt nothing happened but a
kernel bug. Same back in kernel 3.13.

Now I started a live system (Ubuntu 14.10, kernel 3.16.x, btrfs-tools
3.14.1), and mounted the btrfs filesystem. I can browse through the
files but sometimes, especially when accessing my snapshots or trying
to create a new snapshot, the kernel bug appears and the filesystem
hangs.

It shows this:
Oct 31 00:09:14 ubuntu kernel: [  187.661731] [ cut here
]
Oct 31 00:09:14 ubuntu kernel: [  187.661770] WARNING: CPU: 1 PID:
4417 at /build/buildd/linux-3.16.0/fs/btrfs/relocation.c:924
build_backref_tree+0xcab/0x1240 [btrfs]()
Oct 31 00:09:14 ubuntu kernel: [  187.661772] Modules linked in:
nls_iso8859_1 dm_crypt gpio_ich coretemp lpc_ich kvm_intel kvm
dm_multipath scsi_dh serio_raw xgifb(C) bnep rfcomm bluetooth
6lowpan_iphc i3000_edac edac_core parport_pc mac_hid ppdev shpchp lp
parport squashfs overlayfs nls_utf8 isofs btrfs xor raid6_pq dm_mirror
dm_region_hash dm_log hid_generic usbhid hid uas usb_storage ahci
e1000e libahci ptp pps_core
Oct 31 00:09:14 ubuntu kernel: [  187.661800] CPU: 1 PID: 4417 Comm:
btrfs-balance Tainted: G C3.16.0-23-generic #31-Ubuntu
Oct 31 00:09:14 ubuntu kernel: [  187.661802] Hardware name:
Supermicro PDSML/PDSML+, BIOS 6.00 03/06/2009
Oct 31 00:09:14 ubuntu kernel: [  187.661804]  0009
8800a0ae7a00 8177fcbc 
Oct 31 00:09:14 ubuntu kernel: [  187.661807]  8800a0ae7a38
8106fd8d 8800a1440750 8800a1440b48
Oct 31 00:09:14 ubuntu kernel: [  187.661809]  88020a8ce000
0001 88020b6b0d00 8800a0ae7a48
Oct 31 00:09:14 ubuntu kernel: [  187.661812] Call Trace:
Oct 31 00:09:14 ubuntu kernel: [  187.661820]  [8177fcbc]
dump_stack+0x45/0x56
Oct 31 00:09:14 ubuntu kernel: [  187.661825]  [8106fd8d]
warn_slowpath_common+0x7d/0xa0
Oct 31 00:09:14 ubuntu kernel: [  187.661827]  [8106fe6a]
warn_slowpath_null+0x1a/0x20
Oct 31 00:09:14 ubuntu kernel: [  187.661842]  [c01b734b]
build_backref_tree+0xcab/0x1240 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661857]  [c01b7ae1]
relocate_tree_blocks+0x201/0x600 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661872]  [c01b88d8] ?
add_data_references+0x268/0x2a0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661887]  [c01b96fd]
relocate_block_group+0x25d/0x6b0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661902]  [c01b9d36]
btrfs_relocate_block_group+0x1e6/0x2f0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661916]  [c0190988]
btrfs_relocate_chunk.isra.27+0x58/0x720 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661926]  [c0140dc1] ?
btrfs_set_path_blocking+0x41/0x80 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661935]  [c0145dfd] ?
btrfs_search_slot+0x48d/0xa40 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661950]  [c018b49b] ?
release_extent_buffer+0x2b/0xd0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661964]  [c018b95f] ?
free_extent_buffer+0x4f/0xa0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661979]  [c01936c3]
__btrfs_balance+0x4d3/0x8d0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661993]  [c0193d48]
btrfs_balance+0x288/0x600 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.662008]  [c019411d]
balance_kthread+0x5d/0x80 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.662022]  [c01940c0] ?
btrfs_balance+0x600/0x600 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.662026]  [81094aeb]
kthread+0xdb/0x100
Oct 31 00:09:14 ubuntu kernel: [  187.662029]  [81094a10] ?
kthread_create_on_node+0x1c0/0x1c0
Oct 31 00:09:14 ubuntu kernel: [  187.662032]  [81787c3c]
ret_from_fork+0x7c/0xb0
Oct 31 00:09:14 ubuntu kernel: [  187.662035]  [81094a10] ?
kthread_create_on_node+0x1c0/0x1c0
Oct 31 00:09:14 ubuntu kernel: [  187.662037] ---[ end trace
fb7849e4a6f20424 ]---

end this:
Oct 31 00:09:14 ubuntu kernel: [  187.682629] [ cut here
]
Oct 31 00:09:14 ubuntu kernel: [  187.682635] kernel BUG at
/build/buildd/linux-3.16.0/fs/btrfs/extent-tree.c:868!
Oct 31 00:09:14 ubuntu kernel: [  187.682638] invalid opcode:  [#1] SMP
Oct 31 00:09:14 ubuntu kernel: [  187.682642] Modules linked in:
nls_iso8859_1 dm_crypt gpio_ich coretemp lpc_ich kvm_intel kvm
dm_multipath scsi_dh serio_raw xgifb(C) bnep rfcomm bluetooth
6lowpan_iphc i3000_edac edac_core parport_pc mac_hid ppdev shpchp 

Re: filesystem corruption

2014-10-30 Thread Tobias Holst
Addition:
I found some posts here about a general file system corruption in 3.17
and 3.17.1 - is this the cause?
Additionally I am using ro-snapshots - maybe this is the cause, too?

Anyway: Can I fix that or do I have to reinstall? Haven't touched the
filesystem, just did a scrub (found 0 errors).

Regards
Tobias


2014-10-31 1:29 GMT+01:00 Tobias Holst to...@tobby.eu:
 Hi

 I was using a btrfs RAID1 with two disks under Ubuntu 14.04, kernel
 3.13 and btrfs-tools 3.14.1 for weeks without issues.

 Now I updated to kernel 3.17.1 and btrfs-tools 3.17. After a reboot
 everything looked fine and I started some tests. While running
 duperemover (just scanning, not doing anything) and a balance at the
 same time the load suddenly went up to 30 and the system was not
 responding anymore. Everyhting working with the filesystem stopped
 responding. So I did a hard reset.

 I was able to reboot, but on the login prompt nothing happened but a
 kernel bug. Same back in kernel 3.13.

 Now I started a live system (Ubuntu 14.10, kernel 3.16.x, btrfs-tools
 3.14.1), and mounted the btrfs filesystem. I can browse through the
 files but sometimes, especially when accessing my snapshots or trying
 to create a new snapshot, the kernel bug appears and the filesystem
 hangs.

 It shows this:
 Oct 31 00:09:14 ubuntu kernel: [  187.661731] [ cut here
 ]
 Oct 31 00:09:14 ubuntu kernel: [  187.661770] WARNING: CPU: 1 PID:
 4417 at /build/buildd/linux-3.16.0/fs/btrfs/relocation.c:924
 build_backref_tree+0xcab/0x1240 [btrfs]()
 Oct 31 00:09:14 ubuntu kernel: [  187.661772] Modules linked in:
 nls_iso8859_1 dm_crypt gpio_ich coretemp lpc_ich kvm_intel kvm
 dm_multipath scsi_dh serio_raw xgifb(C) bnep rfcomm bluetooth
 6lowpan_iphc i3000_edac edac_core parport_pc mac_hid ppdev shpchp lp
 parport squashfs overlayfs nls_utf8 isofs btrfs xor raid6_pq dm_mirror
 dm_region_hash dm_log hid_generic usbhid hid uas usb_storage ahci
 e1000e libahci ptp pps_core
 Oct 31 00:09:14 ubuntu kernel: [  187.661800] CPU: 1 PID: 4417 Comm:
 btrfs-balance Tainted: G C3.16.0-23-generic #31-Ubuntu
 Oct 31 00:09:14 ubuntu kernel: [  187.661802] Hardware name:
 Supermicro PDSML/PDSML+, BIOS 6.00 03/06/2009
 Oct 31 00:09:14 ubuntu kernel: [  187.661804]  0009
 8800a0ae7a00 8177fcbc 
 Oct 31 00:09:14 ubuntu kernel: [  187.661807]  8800a0ae7a38
 8106fd8d 8800a1440750 8800a1440b48
 Oct 31 00:09:14 ubuntu kernel: [  187.661809]  88020a8ce000
 0001 88020b6b0d00 8800a0ae7a48
 Oct 31 00:09:14 ubuntu kernel: [  187.661812] Call Trace:
 Oct 31 00:09:14 ubuntu kernel: [  187.661820]  [8177fcbc]
 dump_stack+0x45/0x56
 Oct 31 00:09:14 ubuntu kernel: [  187.661825]  [8106fd8d]
 warn_slowpath_common+0x7d/0xa0
 Oct 31 00:09:14 ubuntu kernel: [  187.661827]  [8106fe6a]
 warn_slowpath_null+0x1a/0x20
 Oct 31 00:09:14 ubuntu kernel: [  187.661842]  [c01b734b]
 build_backref_tree+0xcab/0x1240 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661857]  [c01b7ae1]
 relocate_tree_blocks+0x201/0x600 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661872]  [c01b88d8] ?
 add_data_references+0x268/0x2a0 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661887]  [c01b96fd]
 relocate_block_group+0x25d/0x6b0 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661902]  [c01b9d36]
 btrfs_relocate_block_group+0x1e6/0x2f0 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661916]  [c0190988]
 btrfs_relocate_chunk.isra.27+0x58/0x720 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661926]  [c0140dc1] ?
 btrfs_set_path_blocking+0x41/0x80 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661935]  [c0145dfd] ?
 btrfs_search_slot+0x48d/0xa40 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661950]  [c018b49b] ?
 release_extent_buffer+0x2b/0xd0 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661964]  [c018b95f] ?
 free_extent_buffer+0x4f/0xa0 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661979]  [c01936c3]
 __btrfs_balance+0x4d3/0x8d0 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.661993]  [c0193d48]
 btrfs_balance+0x288/0x600 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.662008]  [c019411d]
 balance_kthread+0x5d/0x80 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.662022]  [c01940c0] ?
 btrfs_balance+0x600/0x600 [btrfs]
 Oct 31 00:09:14 ubuntu kernel: [  187.662026]  [81094aeb]
 kthread+0xdb/0x100
 Oct 31 00:09:14 ubuntu kernel: [  187.662029]  [81094a10] ?
 kthread_create_on_node+0x1c0/0x1c0
 Oct 31 00:09:14 ubuntu kernel: [  187.662032]  [81787c3c]
 ret_from_fork+0x7c/0xb0
 Oct 31 00:09:14 ubuntu kernel: [  187.662035]  [81094a10] ?
 kthread_create_on_node+0x1c0/0x1c0
 Oct 31 00:09:14 ubuntu kernel: [  187.662037] ---[ end trace
 fb7849e4a6f20424 ]---

 end this:
 Oct 31 00:09:14 ubuntu kernel: [  187.682629] [ cut here

Re: general thoughts and questions + general and RAID5/6 stability?

2014-09-23 Thread Tobias Holst
If it is unknown, which of these options have been used at btrfs
creation time - is it possible to check the state of these options
afterwards on a mounted or unmounted filesystem?


2014-09-23 15:38 GMT+02:00 Austin S Hemmelgarn ahferro...@gmail.com:

 Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives the 
 following list of features:
 mixed-bg- mixed data and metadata block groups
 extref  - increased hard-link limit per file to 65536
 raid56  - raid56 extended format
 skinny-metadata - reduced size metadata extent refs
 no-holes- no explicit hole extents for files
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Blocked tasks on 3.15.1

2014-08-07 Thread Tobias Holst
Hi

Is there anything new on this topic? I am using Ubuntu 14.04.1 and
experiencing the same problem.
- 6 HDDs
- LUKS on every HDD
- btrfs RAID6 over this 6 crypt-devices
No LVM, no nodatacow files.
Mount-options: defaults,compress-force=lzo,space_cache
With the original 3.13-kernel (3.13.0-32-generic) it is working fine.

Then I tried the following kernels from here:
http://kernel.ubuntu.com/~kernel-ppa/mainline/
linux-image-3.14.15-031415-generic_3.14.15-031415.201407311853_amd64.deb
- not even booting, kernel panic at boot.
linux-image-3.15.6-031506-generic_3.15.6-031506.201407172034_amd64.deb,
linux-image-3.15.7-031507-generic_3.15.7-031507.201407281235_amd64.deb,
and linux-image-3.16.0-031600-generic_3.16.0-031600.201408031935_amd64.deb
causing the hangs like described in this thread. When doing big IO
(unpacking a .rar-archive with multiple GB) the filesystem stops
working. Load stays very high but nothing actually happens on the
drives accoding to dstat. htop shows a D (uninterruptible sleep
(usually IO)) at many kworker-threads.
Unmounting of the btrfs-filesystem only works with -l (lazy) option.
Reboot or shutdown doesn't work because of the blocking threads. So
only a power cut works. After the reboot the last written data before
the hang is lost.

I am now back on 3.13.

Regards


2014-07-25 4:27 GMT+02:00 Cody P Schafer d...@codyps.com:

 On Tue, Jul 22, 2014 at 9:53 AM, Chris Mason c...@fb.com wrote:
 
 
  On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
 
  Running 3.15.6 with this patch applied on top:
   - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ 
  /home/nyx/`
  - no extra error messages printed (`dmesg | grep racing`) compared to
  without the patch
 
  I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with
  3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
 
  To recap some details (so I can have it all in one place):
   - /home/ is btrfs with compress=lzo
 
  BTRFS RAID 1 with lzo.
 
   - I have _not_ created any nodatacow files.
 
  Me neither.
 
   - Full stack is: sata - dmcrypt - lvm - btrfs (I noticed others
  mentioning the use of dmcrypt)
 
  Same, except no dmcrypt.
 
 
  Thanks for the help in tracking this down everyone.  We'll get there!
  Are you all running multi-disk systems (from a btrfs POV, more than one
  device?)  I don't care how many physical drives this maps to, just does
  btrfs think there's more than one drive.

 No, both of my btrfs filesystems are single disk.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to handle a RAID5 arrawy with a failing drive? - raid5 mostly works, just no rebuilds

2014-03-20 Thread Tobias Holst
I think after the balance it was a fine, non-degraded RAID again... As
far as I remember.

Tobby


2014-03-20 1:46 GMT+01:00 Marc MERLIN m...@merlins.org:

 On Thu, Mar 20, 2014 at 01:44:20AM +0100, Tobias Holst wrote:
  I tried the RAID6 implementation of btrfs and I looks like I had the
  same problem. Rebuild with balance worked but when a drive was
  removed when mounted and then readded, the chaos began. I tried it a
  few times. So when a drive fails (and this is just because of
  connection lost or similar non severe problems), then it is necessary
  to wipe the disc first before readding it, so btrfs will add it as a
  new disk and not try to readd the old one.

 Good to know you got this too.

 Just to confirm: did you get it to rebuild, or once a drive is lost/gets
 behind, you're in degraded mode forever for those blocks?

 Or were you able to balance?

 Marc
 --
 A mouse is a device used to point at the xterm you want to type in - A.S.R.
 Microsoft is to operating systems 
    what McDonalds is to gourmet 
 cooking
 Home page: http://marc.merlins.org/
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Massive BTRFS performance degradation

2014-03-09 Thread Tobias Holst
2014-03-09 18:36 GMT+01:00 Austin S Hemmelgarn ahferro...@gmail.com:
 On 03/09/2014 04:17 AM, Swâmi Petaramesh wrote:
 Le dimanche 9 mars 2014 08:48:20 KC a écrit :
 I am experiencing massive performance degradation on my BTRFS
 root partition on SSD.

 BTW, is BTRFS still a SSD-killer ? It had this reputation a while
 ago, and I'm not sure if this still is the case, but I don't dare
 (yet) converting to BTRFS one of my laptops that has a SSD...

 Actually, because of the COW nature of BTRFS, it should be better for
 SSD's than stuff like ext4 (which DOES kill SSD's when journaling is
 enabled because it ends up doing thousands of read-modify-write cycles
 to the same 128k of the disk under just generic usage).  Just make
 sure that you use the 'ssd' and 'discard' mount options.

Every modern SSD does Wear Leveling. Doing a read-modify-write cycle
on the same block doesn't mean it writes to the same memory cell. The
SSD-controller distributes the write-cycles over all (empty) cells. So
in best-case every cell in the SSD is used equally, no matter of doing
random writes or writing the same block over and over. This works
better with lots of empty space on the SSD, that's why you should
never use more than 90% of the space on a SSD. Garbage collection and
TRIM also help the SSD-controller to find empty cells.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html