Re: 4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)

2017-07-31 Thread Marc MERLIN
On Tue, Aug 01, 2017 at 12:07:14AM +0300, Ivan Sizov wrote: > 2017-07-09 10:57 GMT+03:00 Martin Steigerwald <mar...@lichtvoll.de>: > > Hello Marc. > > > > Marc MERLIN - 08.07.17, 21:34: > >> Sigh, > >> > >> This is now the 3rd filesystem

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-07-16 Thread Marc MERLIN
On Sun, Jul 16, 2017 at 04:01:53PM +0200, Giuseppe Della Bianca wrote: > > On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote: > > > Dear Chris and other developers, > ]zac[ > > Others on this thread with the same error: did anyone recover from this > >

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-07-15 Thread Marc MERLIN
On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote: > Dear Chris and other developers, > > Can you look at this bug which has been happening since 2012 on apparently > all kernels between at least > 3.4 and 4.11. > I didn't look in detail at each thread (took long e

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-07-14 Thread Marc MERLIN
the device read only. On Mon, Jul 10, 2017 at 11:21:55PM -0700, Marc MERLIN wrote: > Looks like btrfs has decided to give me hell. > I'm still recovering my system. > The biggest filesystem seems to work, but I just had it go read only: > > [ cut here ] > W

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists

2017-07-14 Thread Marc MERLIN
On Thu, Jul 13, 2017 at 12:17:16PM -0600, Chris Murphy wrote: > Well I'd say it's a bug, but that's not a revelation. Is there a > snapshot being deleted in the approximate time frame for this? I see a Yep :) I run btrfs-snaps and it happens right aroudn that time. It creates a snapshot and

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists

2017-07-12 Thread Marc MERLIN
On Tue, Jul 11, 2017 at 09:48:12AM -0700, Marc MERLIN wrote: > On Tue, Jul 11, 2017 at 10:00:40AM -0600, Chris Murphy wrote: > > > ---[ end trace feb4b95c83ac065f ]--- > > > BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 > > > Object already e

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists

2017-07-11 Thread Marc MERLIN
On Tue, Jul 11, 2017 at 04:43:06PM -0600, Chris Murphy wrote: > Assuming it works, settle on 4.9 until 4.14 shakes out a bit. Given > your setup and the penalty for even small problems, it's probably > better to go low risk and that means longterm kernels. Maybe one of > the three systems can use

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists

2017-07-11 Thread Marc MERLIN
On Tue, Jul 11, 2017 at 10:00:40AM -0600, Chris Murphy wrote: > > ---[ end trace feb4b95c83ac065f ]--- > > BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object > > already exists > > BTRFS info (device dm-2): forced readonly > > You've already had this same traceback, not

BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists

2017-07-11 Thread Marc MERLIN
Looks like btrfs has decided to give me hell. I'm still recovering my system. The biggest filesystem seems to work, but I just had it go read only: [ cut here ] WARNING: CPU: 5 PID: 3734 at fs/btrfs/extent-tree.c:2960 btrfs_run_delayed_refs+0xb6/0x1dc BTRFS: Transaction

Re: Can I drop/reset files with corrupted data if they are in a read only snapshot?

2017-07-10 Thread Marc MERLIN
Thanks for the Cc/ping, I appreciate it On Sun, Jul 09, 2017 at 11:38:51AM +, Duncan wrote: > At your own risk you can try using btrfs property to set the ro snapshot > to rw. Then you can delete the corrupted files and reset the snapshot > back to ro. > > Of course you'll need to do the

Re: 4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)

2017-07-09 Thread Marc MERLIN
On Sat, Jul 08, 2017 at 09:34:17PM -0700, Marc MERLIN wrote: > Sigh, > > This is now the 3rd filesystem I have (on 3 different machines) that is > getting corruption of some kind (on 4.11.6). > This is starting to look suspicious :-/ > > Can I fix this filesystem in some

Can I drop/reset files with corrupted data if they are in a read only snapshot?

2017-07-08 Thread Marc MERLIN
Sorry for the mails, I still have one more problem I'm trying to work through. My filesystem that probably got real corruption due to an unstable block layer underneath (my 2 other machines with other problems did not have an unstable block layer and just started having problem recently, which is

We really need a better/working btrfs check --repair

2017-07-08 Thread Marc MERLIN
+Chris On Sat, Jul 08, 2017 at 09:34:17PM -0700, Marc MERLIN wrote: > gargamel:/var/local/scr/host# btrfs check --repair /dev/mapper/crypt_bcache2 > enabling repair mode > Checking filesystem on /dev/mapper/crypt_bcache2 > UUID: c4e6f9ca-e9a2-43d7-befa-763fc2cd5a57 > checkin

4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)

2017-07-08 Thread Marc MERLIN
Sigh, This is now the 3rd filesystem I have (on 3 different machines) that is getting corruption of some kind (on 4.11.6). This is starting to look suspicious :-/ Can I fix this filesystem in some other way? gargamel:/var/local/scr/host# btrfs check --repair /dev/mapper/crypt_bcache2 enabling

Re: Leveldb in google-chrome incompatible with btrfs?

2017-07-07 Thread Marc MERLIN
(removing pwnall at chromium.org to cut spam) On Thu, Jul 06, 2017 at 10:46:08PM -0700, Omar Sandoval wrote: > ┌[osandov@vader ~/.config] > └$ ls -al google-chrome-busted/** > ls: cannot access 'google-chrome-busted/Local State': No such file or > directory > google-chrome-busted/Default: > ls:

Re: ctree.c:197: update_ref_for_cow: BUG_ON `ret` triggered, value -5

2017-07-07 Thread Marc MERLIN
On Fri, Jul 07, 2017 at 05:33:20PM +0800, Lu Fengqi wrote: > I apologise for my late reply. As a colleague left, I have to take over his > work recently. no worries. > >Mmmh, never mind, it seems that the software raid suffered yet another > >double disk failure due to some undermined flakiness

Re: ctree.c:197: update_ref_for_cow: BUG_ON `ret` triggered, value -5

2017-07-06 Thread Marc MERLIN
On Thu, Jul 06, 2017 at 10:37:18PM -0700, Marc MERLIN wrote: > I'm still trying to fix my filesystem. > It seems to work well enough since the damage is apparently localized, but > I'd really want check --repair to actually bring it back to a working > state, but now i

ctree.c:197: update_ref_for_cow: BUG_ON `ret` triggered, value -5

2017-07-06 Thread Marc MERLIN
I'm still trying to fix my filesystem. It seems to work well enough since the damage is apparently localized, but I'd really want check --repair to actually bring it back to a working state, but now it's crashing This is btrfs tools from git from a few days ago Failed to find [4068943577088,

Re: Leveldb in google-chrome incompatible with btrfs?

2017-07-06 Thread Marc MERLIN
On Thu, Jul 06, 2017 at 04:44:51PM -0700, Omar Sandoval wrote: > In the bug report, you commented that CURRENT contained MANIFEST-010814, > is that indeed the case or was it actually something newer? If it was > the newer one, then it's still tricky how we'd end up that way but not > as

Re: Leveldb in google-chrome incompatible with btrfs?

2017-07-06 Thread Marc MERLIN
On Thu, Jul 06, 2017 at 04:01:41PM -0700, Omar Sandoval wrote: > What doesn't add up about your bug report is that your CURRENT points to > a MANIFEST-010814 way behind all of the other files in that directory, > which are numbered 022745+. If there were a bug here, I'd expect the > stale MANIFEST

Re: Leveldb in google-chrome incompatible with btrfs?

2017-07-06 Thread Marc MERLIN
On Thu, Jul 06, 2017 at 02:13:20PM -0700, Omar Sandoval wrote: > On Thu, Jul 06, 2017 at 08:00:46AM -0700, Marc MERLIN wrote: > > I don't know who else uses google-chrome here, but for me, for as long as > > I've used btrfs (3+ years now), I've had no end of troubles recovering f

Leveldb in google-chrome incompatible with btrfs?

2017-07-06 Thread Marc MERLIN
I don't know who else uses google-chrome here, but for me, for as long as I've used btrfs (3+ years now), I've had no end of troubles recovering from a linux crash, and google-chrome has had problems recovering my tabs and usually cmoplains about plenty of problems, some are corruption looking.

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-29 Thread Marc MERLIN
On Thu, Jun 29, 2017 at 09:36:15PM +0800, Lu Fengqi wrote: > On Wed, Jun 28, 2017 at 07:43:48AM -0700, Marc MERLIN wrote: > >[cc trimmed] > > > >On Wed, Jun 28, 2017 at 03:10:27PM +0800, Lu Fengqi wrote: > >> Because the output is abnormal, except for the relevant

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-28 Thread Marc MERLIN
ular extent can co-exist with inlined extent, the extent_end of inlined extent also need to record. Reported-by: Marc MERLIN <m...@merlins.org> Signed-off-by: Lu Fengqi <lufq.f...@cn.fujitsu.com> --- Changlog: v2: Just fix reported-by v3: Output verbose information when file extent in

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-27 Thread Marc MERLIN
On Mon, Jun 26, 2017 at 06:46:16PM +0800, Lu Fengqi wrote: > Thanks for the updated information. I'm sorry that the false alert make > you feel nervous. If you can help me find out whether those are real errors that I need to fix (and can't yet since there is no --repair), or whether they are

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-23 Thread Marc MERLIN
On Fri, Jun 23, 2017 at 09:17:50AM -0700, Marc MERLIN wrote: > Thanks for looking at this. > I have applied your patch and I'm still re-running check in lowmem. It takes > about 24H so I'll > post the full results when it's done. Ok, here is the output of the check with btrfs-p

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-23 Thread Marc MERLIN
On Fri, Jun 23, 2017 at 04:54:01PM +0800, Lu Fengqi wrote: > On 2017年06月23日 12:06, Marc MERLIN wrote: > > > Well, there is only the output from extent tree. > > > > > > I was also expecting output from subvolue (11930) tree. > > > > > > It co

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-22 Thread Marc MERLIN
On Thu, Jun 22, 2017 at 12:08:44PM +0800, Qu Wenruo wrote: > > On Thu, Jun 22, 2017 at 10:22:57AM +0800, Qu Wenruo wrote: > > > > gargamel:~# btrfs check -p --mode lowmem /dev/mapper/dshelf2 > > > > Checking filesystem on /dev/mapper/dshelf2 > > > > UUID: 85441c59-ad11-4b25-b1fe-974f9e4acede > >

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-21 Thread Marc MERLIN
Ok, first it finished (almost 24H) (...) ERROR: root 3862 EXTENT_DATA[18170706 135168] interrupt ERROR: root 3862 EXTENT_DATA[18170706 1048576] interrupt ERROR: root 3864 EXTENT_DATA[109336 4096] interrupt ERROR: errors found in fs roots found 5544779108352 bytes used, error(s) found total csum

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-21 Thread Marc MERLIN
On Wed, Jun 21, 2017 at 05:22:15PM -0600, Chris Murphy wrote: > I don't know what it means. Maybe Qu has some idea. He might want a > btrfs-image of this file system to see if it's a bug. There are still > some bugs found with lowmem mode, so these could be bogus messages. > But the file system

How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-21 Thread Marc MERLIN
On Tue, Jun 20, 2017 at 08:43:52PM -0700, Marc MERLIN wrote: > On Tue, Jun 20, 2017 at 09:31:42PM -0600, Chris Murphy wrote: > > On Tue, Jun 20, 2017 at 5:12 PM, Marc MERLIN <m...@merlins.org> wrote: > > > > > I'm now going to remount this with nospace_cache to

Re: 4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean

2017-06-20 Thread Marc MERLIN
On Tue, Jun 20, 2017 at 09:26:27PM -0600, Chris Murphy wrote: > Right now Btrfs isn't scalable if you have to repair it because large > volumes run into this problem; one of the reasons for the lowmem mode. > > It's a separate bug that it OOMs even with swap, I don't know why it > won't use that,

Re: 4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean

2017-06-20 Thread Marc MERLIN
On Tue, Jun 20, 2017 at 09:31:42PM -0600, Chris Murphy wrote: > On Tue, Jun 20, 2017 at 5:12 PM, Marc MERLIN <m...@merlins.org> wrote: > > > I'm now going to remount this with nospace_cache to see if your guess about > > space_cache was correct. > > Other suggest

Re: 4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean

2017-06-20 Thread Marc MERLIN
On Tue, Jun 20, 2017 at 04:12:03PM -0700, Marc MERLIN wrote: > Given that check --repair ran clean when I ran it yesterday after this first > happened, > and I then ran mount -o clear_cache , the cache got rebuilt, and I got the > problem again, > this is not looking g

Re: 4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean

2017-06-20 Thread Marc MERLIN
On Tue, Jun 20, 2017 at 08:44:29AM -0700, Marc MERLIN wrote: > On Tue, Jun 20, 2017 at 03:36:01PM +, Hugo Mills wrote: > > > Thanks for having a look. Is it a bug, or is it a problem with my storage > > > subsystem? > > > >Well, I'd say it's probably a pr

Re: 4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean

2017-06-20 Thread Marc MERLIN
On Tue, Jun 20, 2017 at 03:36:01PM +, Hugo Mills wrote: > > Thanks for having a look. Is it a bug, or is it a problem with my storage > > subsystem? > >Well, I'd say it's probably a problem with some inconsistent data > on the disk. How that data got there is another matter -- it may be >

Re: 4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean

2017-06-20 Thread Marc MERLIN
On Tue, Jun 20, 2017 at 03:23:54PM +, Hugo Mills wrote: > On Tue, Jun 20, 2017 at 07:39:16AM -0700, Marc MERLIN wrote: > > My filesystem got remounted read only, and yet after a lengthy > > btrfs check --repair, it ran clean. > > > > Any idea what went wrong? >

4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean

2017-06-20 Thread Marc MERLIN
My filesystem got remounted read only, and yet after a lengthy btrfs check --repair, it ran clean. Any idea what went wrong? [846332.992285] WARNING: CPU: 4 PID: 4095 at fs/btrfs/free-space-cache.c:1476 tree_insert_offset+0x78/0xb1 [846333.744721] BTRFS critical (device dm-1): unable to add free

Re: BTRFS converted from EXT4 becomes read-only after reboot

2017-05-23 Thread Marc MERLIN
On Tue, May 23, 2017 at 02:53:21PM -0700, Marc MERLIN wrote: > On Tue, May 23, 2017 at 03:51:43PM -0600, Chris Murphy wrote: > > On Tue, May 23, 2017 at 3:49 PM, Marc MERLIN <m...@merlins.org> wrote: > > > On Tue, May 23, 2017 at 03:38:01PM -0600, Chris Murphy wrote: >

Re: BTRFS converted from EXT4 becomes read-only after reboot

2017-05-23 Thread Marc MERLIN
On Tue, May 23, 2017 at 03:51:43PM -0600, Chris Murphy wrote: > On Tue, May 23, 2017 at 3:49 PM, Marc MERLIN <m...@merlins.org> wrote: > > On Tue, May 23, 2017 at 03:38:01PM -0600, Chris Murphy wrote: > >> > I've tried an ext4 to btrfs conversion 3 times in the last 3 ye

Re: BTRFS converted from EXT4 becomes read-only after reboot

2017-05-23 Thread Marc MERLIN
On Tue, May 23, 2017 at 03:38:01PM -0600, Chris Murphy wrote: > > I've tried an ext4 to btrfs conversion 3 times in the last 3 years, it > > never worked properly any of those times, sadly. > > Since the 4.6 total rewrite? There are also recent bug fixes related > to convert in the changelog, it

Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-23 Thread Marc MERLIN
On Mon, May 22, 2017 at 09:19:34AM +, Duncan wrote: > btrfs check is userspace, not kernelspace. The btrfs-transacti threads That was my understanding, yes, but since I got it to starve my system, including in kernel OOM issues I pasted in my last message and just referenced in

Re: BTRFS converted from EXT4 becomes read-only after reboot

2017-05-23 Thread Marc MERLIN
On Thu, May 04, 2017 at 03:55:28AM +, Duncan wrote: > > But that alone may not fix it, I think you need a newer kernel... > > Well, while the 4.4 LTS kernel series /is/ getting a bit long in the > tooth by now, it's still the second newest LTS series available, 4.9 > being the newest. > >

Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-23 Thread Marc MERLIN
On Tue, May 02, 2017 at 05:01:02AM +, Duncan wrote: > Marc MERLIN posted on Mon, 01 May 2017 20:23:46 -0700 as excerpted: > > > Also, how is --mode=lowmem being useful? > > FWIW, I just watched your talk that's linked from the wiki, and wondered > what you were doing t

Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-23 Thread Marc MERLIN
On Tue, May 23, 2017 at 07:21:33AM -0400, Austin S. Hemmelgarn wrote: > > Yeah although I have no idea how much swap is needed for it to > > succeed. I'm not sure what the relationship is to fs metadata chunk > > size to btrfs check RAM requirement is; but if it wants all of the > > metadata in

WARNING: CPU: 5 PID: 19734 at fs/btrfs/send.c:6290 btrfs_ioctl_send+0xad/0xde2

2017-05-22 Thread Marc MERLIN
This is probably not a bug I should report and simply an issue with the filesystem I'm trying to get data out of, but reporting it just in case it's useful somehow. /* * This is done when we lookup the root, it should already be complete * by the time we get here. */

Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-22 Thread Marc MERLIN
On Mon, May 22, 2017 at 05:26:25PM -0600, Chris Murphy wrote: > On Mon, May 22, 2017 at 10:31 AM, Marc MERLIN <m...@merlins.org> wrote: > > > > > I already have 24GB of RAM in that machine, adding more for the real > > fsck repair to run, is going to be diffi

Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-22 Thread Marc MERLIN
On Sun, May 21, 2017 at 06:35:53PM -0700, Marc MERLIN wrote: > On Sun, May 21, 2017 at 04:45:57PM -0700, Marc MERLIN wrote: > > On Sun, May 21, 2017 at 02:47:33PM -0700, Marc MERLIN wrote: > > > gargamel:~# btrfs check --repair /dev/mapper/dshelf1 > > > enabling

Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-21 Thread Marc MERLIN
On Sun, May 21, 2017 at 04:45:57PM -0700, Marc MERLIN wrote: > On Sun, May 21, 2017 at 02:47:33PM -0700, Marc MERLIN wrote: > > gargamel:~# btrfs check --repair /dev/mapper/dshelf1 > > enabling repair mode > > Checking filesystem on /dev/mapper/dshelf1 > > UU

Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-21 Thread Marc MERLIN
On Sun, May 21, 2017 at 02:47:33PM -0700, Marc MERLIN wrote: > gargamel:~# btrfs check --repair /dev/mapper/dshelf1 > enabling repair mode > Checking filesystem on /dev/mapper/dshelf1 > UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d > checking extents > > This causes a bu

4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-21 Thread Marc MERLIN
gargamel:~# btrfs check --repair /dev/mapper/dshelf1 enabling repair mode Checking filesystem on /dev/mapper/dshelf1 UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d checking extents This causes a bunch of these: btrfs-transacti: page allocation stalls for 23508ms, order:0,

Re: 4.11.0: kernel BUG at fs/btrfs/ctree.h:1779!

2017-05-19 Thread Marc MERLIN
On Sat, May 20, 2017 at 12:57:09AM +, Hugo Mills wrote: >I think from the POV of removing these BUG_ONs, it doesn't matter > which FS causes them. "All" you need to know is where the error > happened. From there, you can (in theory) work out what was wrong and > handle it more elagantly

Re: 4.11.0: kernel BUG at fs/btrfs/ctree.h:1779!

2017-05-19 Thread Marc MERLIN
On Sat, May 20, 2017 at 12:37:47AM +, Hugo Mills wrote: > > Can I make another plea for just removing all those BUG/BUG_ON? > > They really have no place in production code, there is no excuse for a > > filesystem to bring down the entire and in the process not even tell you > > which of your

Re: 4.11.0: kernel BUG at fs/btrfs/ctree.h:1779!

2017-05-19 Thread Marc MERLIN
On Fri, May 19, 2017 at 12:03:58PM -0700, Liu Bo wrote: > Hi Marc, > > On Thu, May 18, 2017 at 09:16:38PM -0700, Marc MERLIN wrote: > > Looks like all the unhelpful BUG() aren't gone yet :-/ > > This one is really not helpful, I don't even know which one of my > > f

4.11.0: kernel BUG at fs/btrfs/ctree.h:1779!

2017-05-18 Thread Marc MERLIN
Looks like all the unhelpful BUG() aren't gone yet :-/ This one is really not helpful, I don't even know which one of my filesystems caused the crash :( Why is this not remounting the filesystem read only? Really, from a user and admin perspective, this is really not helpful. Could someone who

Re: balancing every night broke balancing so now I can't balance anymore?

2017-05-14 Thread Marc MERLIN
On Sun, May 14, 2017 at 09:21:11PM +, Hugo Mills wrote: > > 2) balance -musage=0 > > 3) balance -musage=20 > >In most cases, this is going to make ENOSPC problems worse, not > better. The reason for doign this kind of balance is to recover unused > space and allow it to be reallocated.

Re: balancing every night broke balancing so now I can't balance anymore?

2017-05-14 Thread Marc MERLIN
On Sun, May 14, 2017 at 09:13:35PM +0200, Hans van Kranenburg wrote: > On 05/13/2017 10:54 PM, Marc MERLIN wrote: > > Kernel 4.11, btrfs-progs v4.7.3 > > > > I run scrub and balance every night, been doing this for 1.5 years on this > > filesystem. > > What are

Re: 4.11: da_remove called for id=16 which is not allocated.

2017-05-14 Thread Marc MERLIN
My apologies, this was for the bcache list, sorry about this. On Sun, May 14, 2017 at 08:25:22AM -0700, Marc MERLIN wrote: > > gargamel:/sys/block/bcache16/bcache# echo 1 > stop > > bcache: bcache_device_free() bcache16 stopped > [ cut here ] > WARNI

4.11: da_remove called for id=16 which is not allocated.

2017-05-14 Thread Marc MERLIN
gargamel:/sys/block/bcache16/bcache# echo 1 > stop bcache: bcache_device_free() bcache16 stopped [ cut here ] WARNING: CPU: 7 PID: 11051 at lib/idr.c:383 ida_remove+0xe8/0x10b ida_remove called for id=16 which is not allocated. Modules linked in: uas usb_storage veth

balancing every night broke balancing so now I can't balance anymore?

2017-05-13 Thread Marc MERLIN
Kernel 4.11, btrfs-progs v4.7.3 I run scrub and balance every night, been doing this for 1.5 years on this filesystem. But it has just started failing: saruman:~# btrfs balance start -musage=0 /mnt/btrfs_pool1 Done, had to relocate 0 out of 235 chunks saruman:~# btrfs balance start -dusage=0

Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-05 Thread Marc MERLIN
Thanks again for your answer. Obviously even if my filesystem is toast, it's useful to learn from what happened. On Fri, May 05, 2017 at 01:03:02PM +0800, Qu Wenruo wrote: > > > So unfortunately, your fs/subvolume trees are also corrupted. > > > And almost no chance to do a graceful recovery. > >

Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-04 Thread Marc MERLIN
On Fri, May 05, 2017 at 09:19:29AM +0800, Qu Wenruo wrote: > Sorry for not noticing the link. no problem, it was only one line amongst many :) Thanks much for having had a look. > [Conclusion] > After checking the full result, some of fs/subvolume trees are corrupted. > > [Details] > Some

Re: btrfs check --repair: failed to repair damaged filesystem, aborting

2017-05-03 Thread Marc MERLIN
On Wed, May 03, 2017 at 11:32:26AM +0500, Roman Mamedov wrote: > > Actually, another thought: > > Is there or should there be a way to repair around the bit that cannot > > be repaired? > > Separately, or not, can I locate which bits are causing the repair to > > fail and maybe get a pointer to

Re: btrfs check --repair: failed to repair damaged filesystem, aborting

2017-05-03 Thread Marc MERLIN
On Tue, May 02, 2017 at 11:00:08PM -0700, Marc MERLIN wrote: > David, > > I think you maintain btrfs-progs, but I'm not sure if you're in charge > of check --repair. > Could you comment on the bottom of the mail, namely: > > failed to repair damaged filesystem, aborting &g

Re: btrfs check --repair: failed to repair damaged filesystem, aborting

2017-05-03 Thread Marc MERLIN
May 02, 2017 at 11:47:22AM -0700, Marc MERLIN wrote: > (cc trimmed) > > The one in debian/unstable crashed: > gargamel:~# btrfs --version > btrfs-progs v4.7.3 > gargamel:~# btrfs check --repair /dev/mapper/dshelf2 > bytenr mismatch, want=2899180224512, have=3981076597540

btrfs check --repair: failed to repair damaged filesystem, aborting

2017-05-02 Thread Marc MERLIN
(cc trimmed) The one in debian/unstable crashed: gargamel:~# btrfs --version btrfs-progs v4.7.3 gargamel:~# btrfs check --repair /dev/mapper/dshelf2 bytenr mismatch, want=2899180224512, have=3981076597540270796 extent-tree.c:2721: alloc_reserved_tree_block: Assertion `ret` failed. btrfs[0x43e418]

Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-01 Thread Marc MERLIN
On Mon, May 01, 2017 at 10:56:06PM -0600, Chris Murphy wrote: > > Right, of course, I was being way over optimistic here. I kind of forgot > > that metadata wasn't COW, my bad. > > Well it is COW. But there's more to the file system than fs trees, and > just because an fs tree gets snapshot

Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-01 Thread Marc MERLIN
Hi Chris, Thanks for the reply, much appreciated. On Mon, May 01, 2017 at 07:50:22PM -0600, Chris Murphy wrote: > What about btfs check (no repair), without and then also with --mode=lowmem? > > In theory I like the idea of a 24 hour rollback; but in normal usage > Btrfs will eventually free up

Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-01 Thread Marc MERLIN
was able to cancel the btrfs balance, but it goes read only at the drop of a hat, even when I'm trying to delete recent snapshots and all data that was potentially written in the last 24H On Mon, May 01, 2017 at 10:06:41AM -0700, Marc MERLIN wrote: > I have a filesystem that sadly got corrup

4.11 relocate crash, null pointer

2017-05-01 Thread Marc MERLIN
I have a filesystem that sadly got corrupted by a SAS card I just installed yesterday. I don't think in a case like this, there is there a way to roll back all writes across all subvolumes in the last 24H, correct? Is the best thing to go in each subvolume, delete the recent snapshots and

Re: 4.8.8, bcache deadlock and hard lockup

2016-11-30 Thread Marc MERLIN
On Wed, Nov 30, 2016 at 03:57:28PM -0800, Eric Wheeler wrote: > > I'll start another separate thread with the btrfs folks on how much > > pressure is put on the system, but on your side it would be good to help > > ensure that bcache doesn't crash the system altogether if too many > > requests are

Re: btrfs flooding the I/O subsystem and hanging the machine, with bcache cache turned off

2016-11-30 Thread Marc MERLIN
+folks from linux-mm thread for your suggestion On Wed, Nov 30, 2016 at 01:00:45PM -0500, Austin S. Hemmelgarn wrote: > > swraid5 < bcache < dmcrypt < btrfs > > > > Copying with btrfs send/receive causes massive hangs on the system. > > Please see this explanation from Linus on why the

Re: 4.8.8, bcache deadlock and hard lockup

2016-11-30 Thread Marc MERLIN
On Wed, Nov 30, 2016 at 08:46:46AM -0800, Marc MERLIN wrote: > +btrfs mailing list, see below why > > On Tue, Nov 29, 2016 at 12:59:44PM -0800, Eric Wheeler wrote: > > On Mon, 27 Nov 2016, Coly Li wrote: > > > > > > Yes, too many work queues... I guess th

Re: btrfs flooding the I/O subsystem and hanging the machine, with bcache cache turned off

2016-11-30 Thread Marc MERLIN
On Wed, Nov 30, 2016 at 08:46:46AM -0800, Marc MERLIN wrote: > +btrfs mailing list, see below why > > Ok, Linus helped me find a workaround for this problem: > https://lkml.org/lkml/2016/11/29/667 > namely: >echo 2 > /proc/sys/vm/dirty_ratio >echo 1 > /proc/sy

Re: 4.8.8, bcache deadlock and hard lockup

2016-11-30 Thread Marc MERLIN
+btrfs mailing list, see below why On Tue, Nov 29, 2016 at 12:59:44PM -0800, Eric Wheeler wrote: > On Mon, 27 Nov 2016, Coly Li wrote: > > > > Yes, too many work queues... I guess the locking might be caused by some > > very obscure reference of closure code. I cannot have any clue if I > >

Re: when btrfs scrub reports errors and btrfs check --repair does not

2016-11-13 Thread Marc MERLIN
On Sun, Nov 13, 2016 at 08:13:29PM +0500, Roman Mamedov wrote: > On Sun, 13 Nov 2016 07:06:30 -0800 > Marc MERLIN <m...@merlins.org> wrote: > > > So first: > > a) find -inum returns some inodes that don't match > > b) but argh, multiple files (very differen

Re: when btrfs scrub reports errors and btrfs check --repair does not

2016-11-13 Thread Marc MERLIN
On Fri, Nov 11, 2016 at 07:17:08PM -0800, Marc MERLIN wrote: > On Fri, Nov 11, 2016 at 11:55:21AM +0800, Qu Wenruo wrote: > > It seems to be orphan inodes. > > Btrfs doesn't remove all the contents of an inode at rm time. > > It just unlink the inode and put it into a state

Re: when btrfs scrub reports errors and btrfs check --repair does not

2016-11-11 Thread Marc MERLIN
On Fri, Nov 11, 2016 at 11:55:21AM +0800, Qu Wenruo wrote: > It seems to be orphan inodes. > Btrfs doesn't remove all the contents of an inode at rm time. > It just unlink the inode and put it into a state called orphan inodes.(Can't > be referred from any directory). BTRFS warning (device dm-6):

Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-10 Thread Marc MERLIN
On Tue, Nov 08, 2016 at 06:05:19PM -0800, Marc MERLIN wrote: > On Wed, Nov 09, 2016 at 09:50:08AM +0800, Qu Wenruo wrote: > > Yeah, quite possible! > > > > The truth is, current btrfs check only checks: > > 1) Metadata > >while --check-data-csum option will ch

Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-08 Thread Marc MERLIN
On Wed, Nov 09, 2016 at 09:50:08AM +0800, Qu Wenruo wrote: > Yeah, quite possible! > > The truth is, current btrfs check only checks: > 1) Metadata >while --check-data-csum option will check data, but still >follow the restriction 3). > 2) Crossing reference of metadata (contents of

Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-08 Thread Marc MERLIN
On Tue, Nov 08, 2016 at 09:17:43AM +0800, Qu Wenruo wrote: > > > At 11/08/2016 09:06 AM, Marc MERLIN wrote: > >On Tue, Nov 08, 2016 at 08:43:34AM +0800, Qu Wenruo wrote: > >>That's strange, balance is done completely in kernel space. > >> > >>Unles

Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-07 Thread Marc MERLIN
On Tue, Nov 08, 2016 at 08:43:34AM +0800, Qu Wenruo wrote: > That's strange, balance is done completely in kernel space. > > Unless we're calling vfs_* function we won't go through the extra check. > > What's the error reported? See below. Note however that is may be because btrfs received

Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-07 Thread Marc MERLIN
On Tue, Nov 08, 2016 at 08:35:54AM +0800, Qu Wenruo wrote: > >Understood. One big thing (for me) I forgot to confirm: > >1) btrfs receive > > Unfortunately, receive is completely done in userspace. > Only send works inside kernel. right, I've confirmed that btrfs receive fails. It looks like

Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-07 Thread Marc MERLIN
On Mon, Nov 07, 2016 at 02:16:37PM +0800, Qu Wenruo wrote: > > > At 11/07/2016 01:36 PM, Marc MERLIN wrote: > > (sorry for the bad subject line from the mdadm list on the previous mail) > > > > On Mon, Nov 07, 2016 at 12:18:10PM +0800, Qu Wenruo wrote: &

Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-06 Thread Marc MERLIN
(sorry for the bad subject line from the mdadm list on the previous mail) On Mon, Nov 07, 2016 at 12:18:10PM +0800, Qu Wenruo wrote: > I'm totally wrong here. > > DirectIO needs the 'buf' parameter of read()/pread() to be 512 bytes > aligned. > > While we are using a lot of stack memory() and

Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?

2016-11-06 Thread Marc MERLIN
On Mon, Nov 07, 2016 at 09:11:54AM +0800, Qu Wenruo wrote: > > Well, turns out you were right. My array is 14TB and dd was only able to > > copy 8.8TB out of it. > > > > I wonder if it's a bug with bcache and source devices that are too big? > > At least we know it's not a problem of

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-11-04 Thread Marc MERLIN
On Fri, Nov 04, 2016 at 02:00:43PM +0500, Roman Mamedov wrote: > On Fri, 4 Nov 2016 01:01:13 -0700 > Marc MERLIN <m...@merlins.org> wrote: > > > Basically I have this: > > sde8:64 0 3.7T 0 > > └─sde1

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-11-04 Thread Marc MERLIN
On Mon, Oct 31, 2016 at 09:21:40PM -0700, Marc MERLIN wrote: > On Tue, Nov 01, 2016 at 12:13:38PM +0800, Qu Wenruo wrote: > > Would you try to locate the range where we starts to fail to read? > > > > I still think the root problem is we failed to read the dev

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-31 Thread Marc MERLIN
On Tue, Nov 01, 2016 at 12:13:38PM +0800, Qu Wenruo wrote: > Would you try to locate the range where we starts to fail to read? > > I still think the root problem is we failed to read the device in user > space. Understood. I'll run this then: myth:~# dd if=/dev/mapper/crypt_bcache0

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-31 Thread Marc MERLIN
So, I'm willing to wait 2 more days before I wipe this filesystem and start over if I can't get check --repair to work on it. If you need longer, please let me konw you have an upcoming patch for me to try and I'll wait. Thanks, Marc On Mon, Oct 31, 2016 at 08:04:22AM -0700, Marc MERLIN wrote

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-31 Thread Marc MERLIN
On Mon, Oct 31, 2016 at 08:44:12AM +, Hugo Mills wrote: > > Any idea on special dm setup which can make us fail to read out some > > data range? > >I've seen both btrfs check and btrfs dump-super give wrong answers > (particularly, some addresses end up larger than the device, for some >

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-31 Thread Marc MERLIN
On Mon, Oct 31, 2016 at 02:32:53PM +0800, Qu Wenruo wrote: > > > At 10/31/2016 02:25 PM, Marc MERLIN wrote: > >On Mon, Oct 31, 2016 at 02:04:10PM +0800, Qu Wenruo wrote: > >>>Sorry for asking, am I doing this wrong? > >>>myth:~# dd if=/dev/mapper/cryp

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-31 Thread Marc MERLIN
On Mon, Oct 31, 2016 at 02:04:10PM +0800, Qu Wenruo wrote: > >Sorry for asking, am I doing this wrong? > >myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32 > >skip=26367830208 > >dd: reading `/dev/mapper/crypt_bcache0': Invalid argument > >0+0 records in > >0+0 records out > >0

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-30 Thread Marc MERLIN
On Mon, Oct 31, 2016 at 01:27:56PM +0800, Qu Wenruo wrote: > Would you please dump the following bytes? > That's the chunk root tree block on your disk. > > offset: 13500329066496 length: 16384 > offset: 13500330213376 length: 16384 Sorry for asking, am I doing this wrong? myth:~# dd

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-30 Thread Marc MERLIN
On Sun, Oct 30, 2016 at 07:06:16PM -0700, Marc MERLIN wrote: > On Mon, Oct 31, 2016 at 09:02:50AM +0800, Qu Wenruo wrote: > > Your chunk root is corrupted, and since chunk tree provides the > > underlying disk layout, even for single device, so if we failed to read > > i

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-30 Thread Marc MERLIN
On Mon, Oct 31, 2016 at 09:02:50AM +0800, Qu Wenruo wrote: > Your chunk root is corrupted, and since chunk tree provides the > underlying disk layout, even for single device, so if we failed to read > it, then it will never be able to be mounted. That's the thing though, I can mount the

btrfs check --repair: ERROR: cannot read chunk root

2016-10-30 Thread Marc MERLIN
I have a filesystem on top of md raid5 that got a few problems due to the underlying block layer (bad data cable). The filesystem mounts fine, but had a few issues Scrub runs (I didn't let it finish, it takes a _long_ time) But check --repair won't even run at all: myth:~# btrfs --version

Re: Is stability a joke?

2016-09-11 Thread Marc MERLIN
On Sun, Sep 11, 2016 at 02:39:14PM +0200, Waxhead wrote: > That is exactly the same reason I don't edit the wiki myself. I could of > course get it started and hopefully someone will correct what I write, but I > feel that if I start this off I don't have deep enough knowledge to do a > proper

Re: btrfs and containers

2016-03-09 Thread Marc MERLIN
On Wed, Mar 09, 2016 at 02:21:26PM -0700, Chris Murphy wrote: > > I have a very stripped down docker image that actually mounts portion of > > of my root filesystem read only. > > While it's running out of a btrfs filesystem, you can't run btrfs > > commands against it: > > 05233e5c91f0:/# btrfs

Re: btrfs and containers

2016-03-09 Thread Marc MERLIN
On Mon, Mar 07, 2016 at 11:55:47PM +0100, Tobias Hunger wrote: > Hi, > > I have been running systemd-nspawn containers on top of a btrfs > filesystem for a while now. > > This works great: Snapshots are a huge help to manage containers! > > But today I ran btrfs subvol list . *inside* a

<    1   2   3   4   5   6   7   8   >