from:"Marc MERLIN"

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-08-29 Thread Marc MERLIN

On Tue, Aug 29, 2017 at 02:30:19PM +, Josef Bacik wrote: > Sorry Marc, I’ll wire up a bcc script to try and catch when this > happens. In order for it to work it’ll need to read the extent tree in > before you mount the fs, is that something you’ll be able to swing or is > this your root fs?

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-08-28 Thread Marc MERLIN

On Sat, Jul 15, 2017 at 04:12:45PM -0700, Marc MERLIN wrote: > On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote: > > Dear Chris and other developers, > > > > Can you look at this bug which has been happening since 2012 on apparently > > all kernels betwee

Re: 4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)

2017-07-31 Thread Marc MERLIN

On Mon, Jul 31, 2017 at 03:00:53PM -0700, Justin Maggard wrote: > Marc, do you have quotas enabled? IIRC, you're a send/receive user. > The combination of quotas and btrfs receive can corrupt your > filesystem, as shown by the xfstest I sent to the list a little while > ago. Thanks for checking.

Re: 4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)

2017-07-31 Thread Marc MERLIN

On Tue, Aug 01, 2017 at 12:07:14AM +0300, Ivan Sizov wrote: > 2017-07-09 10:57 GMT+03:00 Martin Steigerwald : > > Hello Marc. > > > > Marc MERLIN - 08.07.17, 21:34: > >> Sigh, > >> > >> This is now the 3rd filesystem I have (on 3 different machines)

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-07-16 Thread Marc MERLIN

On Sun, Jul 16, 2017 at 04:01:53PM +0200, Giuseppe Della Bianca wrote: > > On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote: > > > Dear Chris and other developers, > ]zac[ > > Others on this thread with the same error: did anyone recover from this > >

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-07-15 Thread Marc MERLIN

On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote: > Dear Chris and other developers, > > Can you look at this bug which has been happening since 2012 on apparently > all kernels between at least > 3.4 and 4.11. > I didn't look in detail at each thread (took l

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-07-14 Thread Marc MERLIN

d the device read only. On Mon, Jul 10, 2017 at 11:21:55PM -0700, Marc MERLIN wrote: > Looks like btrfs has decided to give me hell. > I'm still recovering my system. > The biggest filesystem seems to work, but I just had it go read only: > > [ cut here ]-

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists

2017-07-14 Thread Marc MERLIN

On Thu, Jul 13, 2017 at 12:17:16PM -0600, Chris Murphy wrote: > Well I'd say it's a bug, but that's not a revelation. Is there a > snapshot being deleted in the approximate time frame for this? I see a Yep :) I run btrfs-snaps and it happens right aroudn that time. It creates a snapshot and delete

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists

2017-07-12 Thread Marc MERLIN

On Tue, Jul 11, 2017 at 09:48:12AM -0700, Marc MERLIN wrote: > On Tue, Jul 11, 2017 at 10:00:40AM -0600, Chris Murphy wrote: > > > ---[ end trace feb4b95c83ac065f ]--- > > > BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 > > > Object already e

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists

2017-07-11 Thread Marc MERLIN

On Tue, Jul 11, 2017 at 04:43:06PM -0600, Chris Murphy wrote: > Assuming it works, settle on 4.9 until 4.14 shakes out a bit. Given > your setup and the penalty for even small problems, it's probably > better to go low risk and that means longterm kernels. Maybe one of > the three systems can use a

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists

2017-07-11 Thread Marc MERLIN

On Tue, Jul 11, 2017 at 10:00:40AM -0600, Chris Murphy wrote: > > ---[ end trace feb4b95c83ac065f ]--- > > BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object > > already exists > > BTRFS info (device dm-2): forced readonly > > You've already had this same traceback, not s

BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists

2017-07-10 Thread Marc MERLIN

Looks like btrfs has decided to give me hell. I'm still recovering my system. The biggest filesystem seems to work, but I just had it go read only: [ cut here ] WARNING: CPU: 5 PID: 3734 at fs/btrfs/extent-tree.c:2960 btrfs_run_delayed_refs+0xb6/0x1dc BTRFS: Transaction ab

Re: Can I drop/reset files with corrupted data if they are in a read only snapshot?

2017-07-10 Thread Marc MERLIN

Thanks for the Cc/ping, I appreciate it On Sun, Jul 09, 2017 at 11:38:51AM +, Duncan wrote: > At your own risk you can try using btrfs property to set the ro snapshot > to rw. Then you can delete the corrupted files and reset the snapshot > back to ro. > > Of course you'll need to do the s

Re: 4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)

2017-07-08 Thread Marc MERLIN

On Sat, Jul 08, 2017 at 09:34:17PM -0700, Marc MERLIN wrote: > Sigh, > > This is now the 3rd filesystem I have (on 3 different machines) that is > getting corruption of some kind (on 4.11.6). > This is starting to look suspicious :-/ > > Can I fix this filesystem in some

Can I drop/reset files with corrupted data if they are in a read only snapshot?

2017-07-08 Thread Marc MERLIN

Sorry for the mails, I still have one more problem I'm trying to work through. My filesystem that probably got real corruption due to an unstable block layer underneath (my 2 other machines with other problems did not have an unstable block layer and just started having problem recently, which is

We really need a better/working btrfs check --repair

2017-07-08 Thread Marc MERLIN

+Chris On Sat, Jul 08, 2017 at 09:34:17PM -0700, Marc MERLIN wrote: > gargamel:/var/local/scr/host# btrfs check --repair /dev/mapper/crypt_bcache2 > enabling repair mode > Checking filesystem on /dev/mapper/crypt_bcache2 > UUID: c4e6f9ca-e9a2-43d7-befa-763fc2cd5a57 > checkin

4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)

2017-07-08 Thread Marc MERLIN

Sigh, This is now the 3rd filesystem I have (on 3 different machines) that is getting corruption of some kind (on 4.11.6). This is starting to look suspicious :-/ Can I fix this filesystem in some other way? gargamel:/var/local/scr/host# btrfs check --repair /dev/mapper/crypt_bcache2 enabling re

Re: Leveldb in google-chrome incompatible with btrfs?

2017-07-07 Thread Marc MERLIN

(removing pwnall at chromium.org to cut spam) On Thu, Jul 06, 2017 at 10:46:08PM -0700, Omar Sandoval wrote: > ┌[osandov@vader ~/.config] > └$ ls -al google-chrome-busted/** > ls: cannot access 'google-chrome-busted/Local State': No such file or > directory > google-chrome-busted/Default: > ls: c

Re: ctree.c:197: update_ref_for_cow: BUG_ON `ret` triggered, value -5

2017-07-07 Thread Marc MERLIN

On Fri, Jul 07, 2017 at 05:33:20PM +0800, Lu Fengqi wrote: > I apologise for my late reply. As a colleague left, I have to take over his > work recently. no worries. > >Mmmh, never mind, it seems that the software raid suffered yet another > >double disk failure due to some undermined flakiness

Re: ctree.c:197: update_ref_for_cow: BUG_ON `ret` triggered, value -5

2017-07-06 Thread Marc MERLIN

On Thu, Jul 06, 2017 at 10:37:18PM -0700, Marc MERLIN wrote: > I'm still trying to fix my filesystem. > It seems to work well enough since the damage is apparently localized, but > I'd really want check --repair to actually bring it back to a working > state, but now it&#

ctree.c:197: update_ref_for_cow: BUG_ON `ret` triggered, value -5

2017-07-06 Thread Marc MERLIN

I'm still trying to fix my filesystem. It seems to work well enough since the damage is apparently localized, but I'd really want check --repair to actually bring it back to a working state, but now it's crashing This is btrfs tools from git from a few days ago Failed to find [4068943577088, 168,

Re: Leveldb in google-chrome incompatible with btrfs?

2017-07-06 Thread Marc MERLIN

On Thu, Jul 06, 2017 at 04:44:51PM -0700, Omar Sandoval wrote: > In the bug report, you commented that CURRENT contained MANIFEST-010814, > is that indeed the case or was it actually something newer? If it was > the newer one, then it's still tricky how we'd end up that way but not > as outlandish.

Re: Leveldb in google-chrome incompatible with btrfs?

2017-07-06 Thread Marc MERLIN

On Thu, Jul 06, 2017 at 04:01:41PM -0700, Omar Sandoval wrote: > What doesn't add up about your bug report is that your CURRENT points to > a MANIFEST-010814 way behind all of the other files in that directory, > which are numbered 022745+. If there were a bug here, I'd expect the > stale MANIFEST

Re: Leveldb in google-chrome incompatible with btrfs?

2017-07-06 Thread Marc MERLIN

On Thu, Jul 06, 2017 at 02:13:20PM -0700, Omar Sandoval wrote: > On Thu, Jul 06, 2017 at 08:00:46AM -0700, Marc MERLIN wrote: > > I don't know who else uses google-chrome here, but for me, for as long as > > I've used btrfs (3+ years now), I've had no end of troubl

Leveldb in google-chrome incompatible with btrfs?

2017-07-06 Thread Marc MERLIN

I don't know who else uses google-chrome here, but for me, for as long as I've used btrfs (3+ years now), I've had no end of troubles recovering from a linux crash, and google-chrome has had problems recovering my tabs and usually cmoplains about plenty of problems, some are corruption looking. Th

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-29 Thread Marc MERLIN

On Thu, Jun 29, 2017 at 09:36:15PM +0800, Lu Fengqi wrote: > On Wed, Jun 28, 2017 at 07:43:48AM -0700, Marc MERLIN wrote: > >[cc trimmed] > > > >On Wed, Jun 28, 2017 at 03:10:27PM +0800, Lu Fengqi wrote: > >> Because the output is abnormal, except for the relevant

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-28 Thread Marc MERLIN

rd the extent_end of regular extent to check if there is a gap between the regular extents. Normally there is only one inlined extent, so the extent_end of inlined extent is useless. However, if regular extent can co-exist with inlined extent, the extent_end of inlined extent also need to record. Repor

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-27 Thread Marc MERLIN

On Mon, Jun 26, 2017 at 06:46:16PM +0800, Lu Fengqi wrote: > Thanks for the updated information. I'm sorry that the false alert make > you feel nervous. If you can help me find out whether those are real errors that I need to fix (and can't yet since there is no --repair), or whether they are not

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-23 Thread Marc MERLIN

On Fri, Jun 23, 2017 at 09:17:50AM -0700, Marc MERLIN wrote: > Thanks for looking at this. > I have applied your patch and I'm still re-running check in lowmem. It takes > about 24H so I'll > post the full results when it's done. Ok, here is the output of the che

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-23 Thread Marc MERLIN

On Fri, Jun 23, 2017 at 04:54:01PM +0800, Lu Fengqi wrote: > On 2017年06月23日 12:06, Marc MERLIN wrote: > > > Well, there is only the output from extent tree. > > > > > > I was also expecting output from subvolue (11930) tree. > > > > > > It co

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-22 Thread Marc MERLIN

On Thu, Jun 22, 2017 at 12:08:44PM +0800, Qu Wenruo wrote: > > On Thu, Jun 22, 2017 at 10:22:57AM +0800, Qu Wenruo wrote: > > > > gargamel:~# btrfs check -p --mode lowmem /dev/mapper/dshelf2 > > > > Checking filesystem on /dev/mapper/dshelf2 > > > > UUID: 85441c59-ad11-4b25-b1fe-974f9e4acede > > >

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-21 Thread Marc MERLIN

Ok, first it finished (almost 24H) (...) ERROR: root 3862 EXTENT_DATA[18170706 135168] interrupt ERROR: root 3862 EXTENT_DATA[18170706 1048576] interrupt ERROR: root 3864 EXTENT_DATA[109336 4096] interrupt ERROR: errors found in fs roots found 5544779108352 bytes used, error(s) found total csum by

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-21 Thread Marc MERLIN

On Wed, Jun 21, 2017 at 05:22:15PM -0600, Chris Murphy wrote: > I don't know what it means. Maybe Qu has some idea. He might want a > btrfs-image of this file system to see if it's a bug. There are still > some bugs found with lowmem mode, so these could be bogus messages. > But the file system cl

How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-21 Thread Marc MERLIN

On Tue, Jun 20, 2017 at 08:43:52PM -0700, Marc MERLIN wrote: > On Tue, Jun 20, 2017 at 09:31:42PM -0600, Chris Murphy wrote: > > On Tue, Jun 20, 2017 at 5:12 PM, Marc MERLIN wrote: > > > > > I'm now going to remount this with nospace_cache to see if your guess >

Re: 4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean

2017-06-20 Thread Marc MERLIN

On Tue, Jun 20, 2017 at 09:26:27PM -0600, Chris Murphy wrote: > Right now Btrfs isn't scalable if you have to repair it because large > volumes run into this problem; one of the reasons for the lowmem mode. > > It's a separate bug that it OOMs even with swap, I don't know why it > won't use that,

Re: 4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean

2017-06-20 Thread Marc MERLIN

On Tue, Jun 20, 2017 at 09:31:42PM -0600, Chris Murphy wrote: > On Tue, Jun 20, 2017 at 5:12 PM, Marc MERLIN wrote: > > > I'm now going to remount this with nospace_cache to see if your guess about > > space_cache was correct. > > Other suggestions also welcome :)

Re: 4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean

2017-06-20 Thread Marc MERLIN

On Tue, Jun 20, 2017 at 04:12:03PM -0700, Marc MERLIN wrote: > Given that check --repair ran clean when I ran it yesterday after this first > happened, > and I then ran mount -o clear_cache , the cache got rebuilt, and I got the > problem again, > this is not looking goo

Re: 4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean

2017-06-20 Thread Marc MERLIN

On Tue, Jun 20, 2017 at 08:44:29AM -0700, Marc MERLIN wrote: > On Tue, Jun 20, 2017 at 03:36:01PM +, Hugo Mills wrote: > > > Thanks for having a look. Is it a bug, or is it a problem with my storage > > > subsystem? > > > >Well, I'd say it's pro

Re: 4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean

2017-06-20 Thread Marc MERLIN

On Tue, Jun 20, 2017 at 03:36:01PM +, Hugo Mills wrote: > > Thanks for having a look. Is it a bug, or is it a problem with my storage > > subsystem? > >Well, I'd say it's probably a problem with some inconsistent data > on the disk. How that data got there is another matter -- it may be >

Re: 4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean

2017-06-20 Thread Marc MERLIN

On Tue, Jun 20, 2017 at 03:23:54PM +, Hugo Mills wrote: > On Tue, Jun 20, 2017 at 07:39:16AM -0700, Marc MERLIN wrote: > > My filesystem got remounted read only, and yet after a lengthy > > btrfs check --repair, it ran clean. > > > > Any idea what went wrong? >

4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean

2017-06-20 Thread Marc MERLIN

My filesystem got remounted read only, and yet after a lengthy btrfs check --repair, it ran clean. Any idea what went wrong? [846332.992285] WARNING: CPU: 4 PID: 4095 at fs/btrfs/free-space-cache.c:1476 tree_insert_offset+0x78/0xb1 [846333.744721] BTRFS critical (device dm-1): unable to add free

Re: BTRFS converted from EXT4 becomes read-only after reboot

2017-05-23 Thread Marc MERLIN

On Tue, May 23, 2017 at 02:53:21PM -0700, Marc MERLIN wrote: > On Tue, May 23, 2017 at 03:51:43PM -0600, Chris Murphy wrote: > > On Tue, May 23, 2017 at 3:49 PM, Marc MERLIN wrote: > > > On Tue, May 23, 2017 at 03:38:01PM -0600, Chris Murphy wrote: > > >> > I

Re: BTRFS converted from EXT4 becomes read-only after reboot

2017-05-23 Thread Marc MERLIN

On Tue, May 23, 2017 at 03:51:43PM -0600, Chris Murphy wrote: > On Tue, May 23, 2017 at 3:49 PM, Marc MERLIN wrote: > > On Tue, May 23, 2017 at 03:38:01PM -0600, Chris Murphy wrote: > >> > I've tried an ext4 to btrfs conversion 3 times in the last 3 years, it > >

Re: BTRFS converted from EXT4 becomes read-only after reboot

2017-05-23 Thread Marc MERLIN

On Tue, May 23, 2017 at 03:38:01PM -0600, Chris Murphy wrote: > > I've tried an ext4 to btrfs conversion 3 times in the last 3 years, it > > never worked properly any of those times, sadly. > > Since the 4.6 total rewrite? There are also recent bug fixes related > to convert in the changelog, it s

Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-23 Thread Marc MERLIN

On Mon, May 22, 2017 at 09:19:34AM +, Duncan wrote: > btrfs check is userspace, not kernelspace. The btrfs-transacti threads That was my understanding, yes, but since I got it to starve my system, including in kernel OOM issues I pasted in my last message and just referenced in https://bugzi

Re: BTRFS converted from EXT4 becomes read-only after reboot

2017-05-23 Thread Marc MERLIN

On Thu, May 04, 2017 at 03:55:28AM +, Duncan wrote: > > But that alone may not fix it, I think you need a newer kernel... > > Well, while the 4.4 LTS kernel series /is/ getting a bit long in the > tooth by now, it's still the second newest LTS series available, 4.9 > being the newest. > > A

Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-23 Thread Marc MERLIN

On Tue, May 02, 2017 at 05:01:02AM +, Duncan wrote: > Marc MERLIN posted on Mon, 01 May 2017 20:23:46 -0700 as excerpted: > > > Also, how is --mode=lowmem being useful? > > FWIW, I just watched your talk that's linked from the wiki, and wondered > what you were do

Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-23 Thread Marc MERLIN

On Tue, May 23, 2017 at 07:21:33AM -0400, Austin S. Hemmelgarn wrote: > > Yeah although I have no idea how much swap is needed for it to > > succeed. I'm not sure what the relationship is to fs metadata chunk > > size to btrfs check RAM requirement is; but if it wants all of the > > metadata in RAM

WARNING: CPU: 5 PID: 19734 at fs/btrfs/send.c:6290 btrfs_ioctl_send+0xad/0xde2

2017-05-22 Thread Marc MERLIN

This is probably not a bug I should report and simply an issue with the filesystem I'm trying to get data out of, but reporting it just in case it's useful somehow. /* * This is done when we lookup the root, it should already be complete * by the time we get here. */ WARN_ON(send_root->orphan_c

Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-22 Thread Marc MERLIN

On Mon, May 22, 2017 at 05:26:25PM -0600, Chris Murphy wrote: > On Mon, May 22, 2017 at 10:31 AM, Marc MERLIN wrote: > > > > > I already have 24GB of RAM in that machine, adding more for the real > > fsck repair to run, is going to be difficult and ndb would take days I

Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-22 Thread Marc MERLIN

On Sun, May 21, 2017 at 06:35:53PM -0700, Marc MERLIN wrote: > On Sun, May 21, 2017 at 04:45:57PM -0700, Marc MERLIN wrote: > > On Sun, May 21, 2017 at 02:47:33PM -0700, Marc MERLIN wrote: > > > gargamel:~# btrfs check --repair /dev/mapper/dshelf1 > > > enabling

Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-21 Thread Marc MERLIN

On Sun, May 21, 2017 at 04:45:57PM -0700, Marc MERLIN wrote: > On Sun, May 21, 2017 at 02:47:33PM -0700, Marc MERLIN wrote: > > gargamel:~# btrfs check --repair /dev/mapper/dshelf1 > > enabling repair mode > > Checking filesystem on /dev/mapper/dshelf1 > > UU

Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-21 Thread Marc MERLIN

On Sun, May 21, 2017 at 02:47:33PM -0700, Marc MERLIN wrote: > gargamel:~# btrfs check --repair /dev/mapper/dshelf1 > enabling repair mode > Checking filesystem on /dev/mapper/dshelf1 > UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d > checking extents > > This causes a bu

4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-21 Thread Marc MERLIN

gargamel:~# btrfs check --repair /dev/mapper/dshelf1 enabling repair mode Checking filesystem on /dev/mapper/dshelf1 UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d checking extents This causes a bunch of these: btrfs-transacti: page allocation stalls for 23508ms, order:0, mode:0x1400840(GFP_NOFS|__GF

Re: 4.11.0: kernel BUG at fs/btrfs/ctree.h:1779!

2017-05-19 Thread Marc MERLIN

On Sat, May 20, 2017 at 12:57:09AM +, Hugo Mills wrote: >I think from the POV of removing these BUG_ONs, it doesn't matter > which FS causes them. "All" you need to know is where the error > happened. From there, you can (in theory) work out what was wrong and > handle it more elagantly tha

Re: 4.11.0: kernel BUG at fs/btrfs/ctree.h:1779!

2017-05-19 Thread Marc MERLIN

On Sat, May 20, 2017 at 12:37:47AM +, Hugo Mills wrote: > > Can I make another plea for just removing all those BUG/BUG_ON? > > They really have no place in production code, there is no excuse for a > > filesystem to bring down the entire and in the process not even tell you > > which of your f

Re: 4.11.0: kernel BUG at fs/btrfs/ctree.h:1779!

2017-05-19 Thread Marc MERLIN

On Fri, May 19, 2017 at 12:03:58PM -0700, Liu Bo wrote: > Hi Marc, > > On Thu, May 18, 2017 at 09:16:38PM -0700, Marc MERLIN wrote: > > Looks like all the unhelpful BUG() aren't gone yet :-/ > > This one is really not helpful, I don't even know which one of my &

4.11.0: kernel BUG at fs/btrfs/ctree.h:1779!

2017-05-18 Thread Marc MERLIN

Looks like all the unhelpful BUG() aren't gone yet :-/ This one is really not helpful, I don't even know which one of my filesystems caused the crash :( Why is this not remounting the filesystem read only? Really, from a user and admin perspective, this is really not helpful. Could someone who k

Re: balancing every night broke balancing so now I can't balance anymore?

2017-05-14 Thread Marc MERLIN

On Sun, May 14, 2017 at 09:21:11PM +, Hugo Mills wrote: > > 2) balance -musage=0 > > 3) balance -musage=20 > >In most cases, this is going to make ENOSPC problems worse, not > better. The reason for doign this kind of balance is to recover unused > space and allow it to be reallocated. The

Re: balancing every night broke balancing so now I can't balance anymore?

2017-05-14 Thread Marc MERLIN

On Sun, May 14, 2017 at 09:13:35PM +0200, Hans van Kranenburg wrote: > On 05/13/2017 10:54 PM, Marc MERLIN wrote: > > Kernel 4.11, btrfs-progs v4.7.3 > > > > I run scrub and balance every night, been doing this for 1.5 years on this > > filesystem. > > What are

Re: 4.11: da_remove called for id=16 which is not allocated.

2017-05-14 Thread Marc MERLIN

My apologies, this was for the bcache list, sorry about this. On Sun, May 14, 2017 at 08:25:22AM -0700, Marc MERLIN wrote: > > gargamel:/sys/block/bcache16/bcache# echo 1 > stop > > bcache: bcache_device_free() bcache16 stopped > [ cut here ] > WARNI

4.11: da_remove called for id=16 which is not allocated.

2017-05-14 Thread Marc MERLIN

gargamel:/sys/block/bcache16/bcache# echo 1 > stop bcache: bcache_device_free() bcache16 stopped [ cut here ] WARNING: CPU: 7 PID: 11051 at lib/idr.c:383 ida_remove+0xe8/0x10b ida_remove called for id=16 which is not allocated. Modules linked in: uas usb_storage veth ip6ta

balancing every night broke balancing so now I can't balance anymore?

2017-05-13 Thread Marc MERLIN

Kernel 4.11, btrfs-progs v4.7.3 I run scrub and balance every night, been doing this for 1.5 years on this filesystem. But it has just started failing: saruman:~# btrfs balance start -musage=0 /mnt/btrfs_pool1 Done, had to relocate 0 out of 235 chunks saruman:~# btrfs balance start -dusage=0 /mn

Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-05 Thread Marc MERLIN

Thanks again for your answer. Obviously even if my filesystem is toast, it's useful to learn from what happened. On Fri, May 05, 2017 at 01:03:02PM +0800, Qu Wenruo wrote: > > > So unfortunately, your fs/subvolume trees are also corrupted. > > > And almost no chance to do a graceful recovery. > >

Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-04 Thread Marc MERLIN

On Fri, May 05, 2017 at 09:19:29AM +0800, Qu Wenruo wrote: > Sorry for not noticing the link. no problem, it was only one line amongst many :) Thanks much for having had a look. > [Conclusion] > After checking the full result, some of fs/subvolume trees are corrupted. > > [Details] > Some examp

Re: btrfs check --repair: failed to repair damaged filesystem, aborting

2017-05-03 Thread Marc MERLIN

On Wed, May 03, 2017 at 11:32:26AM +0500, Roman Mamedov wrote: > > Actually, another thought: > > Is there or should there be a way to repair around the bit that cannot > > be repaired? > > Separately, or not, can I locate which bits are causing the repair to > > fail and maybe get a pointer to the

Re: btrfs check --repair: failed to repair damaged filesystem, aborting

2017-05-02 Thread Marc MERLIN

On Tue, May 02, 2017 at 11:00:08PM -0700, Marc MERLIN wrote: > David, > > I think you maintain btrfs-progs, but I'm not sure if you're in charge > of check --repair. > Could you comment on the bottom of the mail, namely: > > failed to repair damaged filesystem, a

Re: btrfs check --repair: failed to repair damaged filesystem, aborting

2017-05-02 Thread Marc MERLIN

Rest: On Tue, May 02, 2017 at 11:47:22AM -0700, Marc MERLIN wrote: > (cc trimmed) > > The one in debian/unstable crashed: > gargamel:~# btrfs --version > btrfs-progs v4.7.3 > gargamel:~# btrfs check --repair /dev/mapper/dshelf2 > bytenr mismatch, want=2899180224512, hav

btrfs check --repair: failed to repair damaged filesystem, aborting

2017-05-02 Thread Marc MERLIN

(cc trimmed) The one in debian/unstable crashed: gargamel:~# btrfs --version btrfs-progs v4.7.3 gargamel:~# btrfs check --repair /dev/mapper/dshelf2 bytenr mismatch, want=2899180224512, have=3981076597540270796 extent-tree.c:2721: alloc_reserved_tree_block: Assertion `ret` failed. btrfs[0x43e418]

Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-01 Thread Marc MERLIN

On Mon, May 01, 2017 at 10:56:06PM -0600, Chris Murphy wrote: > > Right, of course, I was being way over optimistic here. I kind of forgot > > that metadata wasn't COW, my bad. > > Well it is COW. But there's more to the file system than fs trees, and > just because an fs tree gets snapshot doesn'

Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-01 Thread Marc MERLIN

Hi Chris, Thanks for the reply, much appreciated. On Mon, May 01, 2017 at 07:50:22PM -0600, Chris Murphy wrote: > What about btfs check (no repair), without and then also with --mode=lowmem? > > In theory I like the idea of a 24 hour rollback; but in normal usage > Btrfs will eventually free up

Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-01 Thread Marc MERLIN

now that I was able to cancel the btrfs balance, but it goes read only at the drop of a hat, even when I'm trying to delete recent snapshots and all data that was potentially written in the last 24H On Mon, May 01, 2017 at 10:06:41AM -0700, Marc MERLIN wrote: > I have a filesystem that s

4.11 relocate crash, null pointer

2017-05-01 Thread Marc MERLIN

I have a filesystem that sadly got corrupted by a SAS card I just installed yesterday. I don't think in a case like this, there is there a way to roll back all writes across all subvolumes in the last 24H, correct? Is the best thing to go in each subvolume, delete the recent snapshots and rename

Re: 4.8.8, bcache deadlock and hard lockup

2016-11-30 Thread Marc MERLIN

On Wed, Nov 30, 2016 at 03:57:28PM -0800, Eric Wheeler wrote: > > I'll start another separate thread with the btrfs folks on how much > > pressure is put on the system, but on your side it would be good to help > > ensure that bcache doesn't crash the system altogether if too many > > requests are

Re: btrfs flooding the I/O subsystem and hanging the machine, with bcache cache turned off

2016-11-30 Thread Marc MERLIN

+folks from linux-mm thread for your suggestion On Wed, Nov 30, 2016 at 01:00:45PM -0500, Austin S. Hemmelgarn wrote: > > swraid5 < bcache < dmcrypt < btrfs > > > > Copying with btrfs send/receive causes massive hangs on the system. > > Please see this explanation from Linus on why the workaround

Re: 4.8.8, bcache deadlock and hard lockup

2016-11-30 Thread Marc MERLIN

On Wed, Nov 30, 2016 at 08:46:46AM -0800, Marc MERLIN wrote: > +btrfs mailing list, see below why > > On Tue, Nov 29, 2016 at 12:59:44PM -0800, Eric Wheeler wrote: > > On Mon, 27 Nov 2016, Coly Li wrote: > > > > > > Yes, too many work queues... I guess th

Re: btrfs flooding the I/O subsystem and hanging the machine, with bcache cache turned off

2016-11-30 Thread Marc MERLIN

On Wed, Nov 30, 2016 at 08:46:46AM -0800, Marc MERLIN wrote: > +btrfs mailing list, see below why > > Ok, Linus helped me find a workaround for this problem: > https://lkml.org/lkml/2016/11/29/667 > namely: >echo 2 > /proc/sys/vm/dirty_ratio >echo 1 > /proc/sy

Re: 4.8.8, bcache deadlock and hard lockup

2016-11-30 Thread Marc MERLIN

+btrfs mailing list, see below why On Tue, Nov 29, 2016 at 12:59:44PM -0800, Eric Wheeler wrote: > On Mon, 27 Nov 2016, Coly Li wrote: > > > > Yes, too many work queues... I guess the locking might be caused by some > > very obscure reference of closure code. I cannot have any clue if I > > canno

Re: when btrfs scrub reports errors and btrfs check --repair does not

2016-11-13 Thread Marc MERLIN

On Sun, Nov 13, 2016 at 08:13:29PM +0500, Roman Mamedov wrote: > On Sun, 13 Nov 2016 07:06:30 -0800 > Marc MERLIN wrote: > > > So first: > > a) find -inum returns some inodes that don't match > > b) but argh, multiple files (very different) have the same inode n

Re: when btrfs scrub reports errors and btrfs check --repair does not

2016-11-13 Thread Marc MERLIN

On Fri, Nov 11, 2016 at 07:17:08PM -0800, Marc MERLIN wrote: > On Fri, Nov 11, 2016 at 11:55:21AM +0800, Qu Wenruo wrote: > > It seems to be orphan inodes. > > Btrfs doesn't remove all the contents of an inode at rm time. > > It just unlink the inode and put it into a

Re: when btrfs scrub reports errors and btrfs check --repair does not

2016-11-11 Thread Marc MERLIN

On Fri, Nov 11, 2016 at 11:55:21AM +0800, Qu Wenruo wrote: > It seems to be orphan inodes. > Btrfs doesn't remove all the contents of an inode at rm time. > It just unlink the inode and put it into a state called orphan inodes.(Can't > be referred from any directory). BTRFS warning (device dm-6):

Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-10 Thread Marc MERLIN

On Tue, Nov 08, 2016 at 06:05:19PM -0800, Marc MERLIN wrote: > On Wed, Nov 09, 2016 at 09:50:08AM +0800, Qu Wenruo wrote: > > Yeah, quite possible! > > > > The truth is, current btrfs check only checks: > > 1) Metadata > >while --check-data-csum option will ch

Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-08 Thread Marc MERLIN

On Wed, Nov 09, 2016 at 09:50:08AM +0800, Qu Wenruo wrote: > Yeah, quite possible! > > The truth is, current btrfs check only checks: > 1) Metadata >while --check-data-csum option will check data, but still >follow the restriction 3). > 2) Crossing reference of metadata (contents of metada

Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-08 Thread Marc MERLIN

On Tue, Nov 08, 2016 at 09:17:43AM +0800, Qu Wenruo wrote: > > > At 11/08/2016 09:06 AM, Marc MERLIN wrote: > >On Tue, Nov 08, 2016 at 08:43:34AM +0800, Qu Wenruo wrote: > >>That's strange, balance is done completely in kernel space. > >> > >>

Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-07 Thread Marc MERLIN

On Tue, Nov 08, 2016 at 08:43:34AM +0800, Qu Wenruo wrote: > That's strange, balance is done completely in kernel space. > > Unless we're calling vfs_* function we won't go through the extra check. > > What's the error reported? See below. Note however that is may be because btrfs received messe

Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-07 Thread Marc MERLIN

On Tue, Nov 08, 2016 at 08:35:54AM +0800, Qu Wenruo wrote: > >Understood. One big thing (for me) I forgot to confirm: > >1) btrfs receive > > Unfortunately, receive is completely done in userspace. > Only send works inside kernel. right, I've confirmed that btrfs receive fails. It looks like btr

Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-07 Thread Marc MERLIN

On Mon, Nov 07, 2016 at 02:16:37PM +0800, Qu Wenruo wrote: > > > At 11/07/2016 01:36 PM, Marc MERLIN wrote: > > (sorry for the bad subject line from the mdadm list on the previous mail) > > > > On Mon, Nov 07, 2016 at 12:18:10PM +0800, Qu Wenruo wrote: &

Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-06 Thread Marc MERLIN

(sorry for the bad subject line from the mdadm list on the previous mail) On Mon, Nov 07, 2016 at 12:18:10PM +0800, Qu Wenruo wrote: > I'm totally wrong here. > > DirectIO needs the 'buf' parameter of read()/pread() to be 512 bytes > aligned. > > While we are using a lot of stack memory() and n

Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?

2016-11-06 Thread Marc MERLIN

On Mon, Nov 07, 2016 at 09:11:54AM +0800, Qu Wenruo wrote: > > Well, turns out you were right. My array is 14TB and dd was only able to > > copy 8.8TB out of it. > > > > I wonder if it's a bug with bcache and source devices that are too big? > > At least we know it's not a problem of btrfs-progs.

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-11-04 Thread Marc MERLIN

On Fri, Nov 04, 2016 at 02:00:43PM +0500, Roman Mamedov wrote: > On Fri, 4 Nov 2016 01:01:13 -0700 > Marc MERLIN wrote: > > > Basically I have this: > > sde8:64 0 3.7T 0 > > └─sde1 8:65 0 3.7T 0 > >

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-11-04 Thread Marc MERLIN

On Mon, Oct 31, 2016 at 09:21:40PM -0700, Marc MERLIN wrote: > On Tue, Nov 01, 2016 at 12:13:38PM +0800, Qu Wenruo wrote: > > Would you try to locate the range where we starts to fail to read? > > > > I still think the root problem is we failed to read the dev

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-31 Thread Marc MERLIN

On Tue, Nov 01, 2016 at 12:13:38PM +0800, Qu Wenruo wrote: > Would you try to locate the range where we starts to fail to read? > > I still think the root problem is we failed to read the device in user > space. Understood. I'll run this then: myth:~# dd if=/dev/mapper/crypt_bcache0 of=/dev/nul

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-31 Thread Marc MERLIN

So, I'm willing to wait 2 more days before I wipe this filesystem and start over if I can't get check --repair to work on it. If you need longer, please let me konw you have an upcoming patch for me to try and I'll wait. Thanks, Marc On Mon, Oct 31, 2016 at 08:04:22AM -0700, Ma

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-31 Thread Marc MERLIN

On Mon, Oct 31, 2016 at 08:44:12AM +, Hugo Mills wrote: > > Any idea on special dm setup which can make us fail to read out some > > data range? > >I've seen both btrfs check and btrfs dump-super give wrong answers > (particularly, some addresses end up larger than the device, for some > r

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-30 Thread Marc MERLIN

On Mon, Oct 31, 2016 at 02:32:53PM +0800, Qu Wenruo wrote: > > > At 10/31/2016 02:25 PM, Marc MERLIN wrote: > >On Mon, Oct 31, 2016 at 02:04:10PM +0800, Qu Wenruo wrote: > >>>Sorry for asking, am I doing this wrong? > >>>myth:~# dd if=/dev/mapper/cryp

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-30 Thread Marc MERLIN

On Mon, Oct 31, 2016 at 02:04:10PM +0800, Qu Wenruo wrote: > >Sorry for asking, am I doing this wrong? > >myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32 > >skip=26367830208 > >dd: reading `/dev/mapper/crypt_bcache0': Invalid argument > >0+0 records in > >0+0 records out > >0

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-30 Thread Marc MERLIN

On Mon, Oct 31, 2016 at 01:27:56PM +0800, Qu Wenruo wrote: > Would you please dump the following bytes? > That's the chunk root tree block on your disk. > > offset: 13500329066496 length: 16384 > offset: 13500330213376 length: 16384 Sorry for asking, am I doing this wrong? myth:~# dd if=/dev/map

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-30 Thread Marc MERLIN

On Sun, Oct 30, 2016 at 07:06:16PM -0700, Marc MERLIN wrote: > On Mon, Oct 31, 2016 at 09:02:50AM +0800, Qu Wenruo wrote: > > Your chunk root is corrupted, and since chunk tree provides the > > underlying disk layout, even for single device, so if we failed to read > > it,

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-30 Thread Marc MERLIN

On Mon, Oct 31, 2016 at 09:02:50AM +0800, Qu Wenruo wrote: > Your chunk root is corrupted, and since chunk tree provides the > underlying disk layout, even for single device, so if we failed to read > it, then it will never be able to be mounted. That's the thing though, I can mount the filesys

btrfs check --repair: ERROR: cannot read chunk root

2016-10-30 Thread Marc MERLIN

I have a filesystem on top of md raid5 that got a few problems due to the underlying block layer (bad data cable). The filesystem mounts fine, but had a few issues Scrub runs (I didn't let it finish, it takes a _long_ time) But check --repair won't even run at all: myth:~# btrfs --version btrfs-pr

< 1 2 3 4 5 6 7 8 >

101 - 200 of 761 matches

Mail list logo