Re: Scrub: no spae left on device
Marc MERLIN posted on Tue, 08 Dec 2015 08:06:15 -0800 as excerpted: > On Tue, Dec 08, 2015 at 04:46:32PM +0100, Lionel Bouton wrote: >> Le 08/12/2015 16:37, Holger Hoffstätte a écrit : >> > On 12/08/15 16:06, Marc MERLIN wrote: >> >> >> >> Why would scrub need space and why would it cancel if there isn't >> >> enough of it? (kernel 4.3) >> >> >> >> btrfs scrub start -Bd /dev/mapper/pool1 >> >> ERROR: scrubbing /dev/mapper/pool1 failed for device id 1 >> >> (No space left on device) >> >> scrub device /dev/mapper/pool1 (id 1) canceled >> > Scrub rewrites metadata (apparently even in -r aka readonly mode), >> > and that can lead to temporary metadata expansion (stuff gets COWed >> > around); it's a bit surprising but makes sense if you think about it. Are you sure about that? My / is mounted ro by default, and if I try to scrub it in normal mode, it'll error out due to read-only. But I can run a read-only scrub just fine, and if I find errors, I simply mount it writable and redo the scrub without the -r. (My / is only 8 GiB, under half used including metadata on a fast SSD, so scrubs complete in under 30 seconds, and doing a read- only scrub followed by a mount-writable and a second fixing scrub if necessary, is trivial.) >> Sorry I'm not sure why metadata is rewritten if no error is detected. But scrub will of course do copy-on-write if there's an error, and it's possible that on initialization it checks for space to do a few cows if necessary, before it actually checks for the -r read-only flag. I try to leave at least enough unallocated space to do a balance, which of course except for -dusage=0 (or -musage=0) writes a new chunk to rewrite existing chunks into, so I'd be unlikely to ever get that close to out of space to trigger the possible initialization-time space-warning, and thus wouldn't know whether it has one or whether it comes before the -r check, or not. > And this is what I got: > legolas:~# btrfs balance start -musage=10 -v /mnt/btrfs_pool1/ > Dumping filters: flags 0x6, state 0x0, force is off > METADATA (flags 0x2): balancing, usage=10 > SYSTEM (flags 0x2): balancing, usage=10 > ERROR: error during balancing '/mnt/btrfs_pool1/' - No space left on > device There may be more info in syslog - try dmesg | tail > > Ok, that sucks. > > legolas:~# btrfs balance start -musage=0 -v /mnt/btrfs_pool1/ > Dumping filters: flags 0x6, state 0x0, force is off > METADATA (flags 0x2): balancing, usage=0 > SYSTEM (flags 0x2): balancing, usage=0 > Done, had to relocate 0 out of 618 chunks > > This worked. Mmmh, I thought this wouldn't be necessary anymore in 4.3 > kernels? Well, it said it had to relocate zero blocks, so it _appears_ that it didn't do anything, which would be expected on reasonably current kernels as they already clean up zero-usage chunks, automatically. *BUT*... > legolas:~# btrfs balance start -musage=10 -v /mnt/btrfs_pool1 > Dumping filters: flags 0x6, state 0x0, force is off > METADATA (flags 0x2): balancing, usage=10 > SYSTEM (flags 0x2): balancing, usage=10 > Done, had to relocate 1 out of 618 chunks ... if it did nothing in the -musage=0 case above, why did the -musage=10 case fail before, but succeed after? That's a very good question I don't have an answer to. Good question for the devs and others that actually read code. Meanwhile, note that if it relocates only a single chunk (of non-zero usage), under normal circumstances, it'll take exactly the same amount of space as before, because it'd allocate a new chunk of exactly the same size as the one it was rewriting. However, once remaining unallocated space gets tight enough, it starts allocating smaller than normal chunks, which may be what happened this time. Presumably that chunk was originally allocated when the filesystem still has much more unallocated free space, so it was a standard size chunk. When it was rewritten, unallocated space was much tighter, so a smaller chunk would likely be written, which would then be rather fuller than it was previously, as it would have the same amount of metadata in it, but be a smaller chunk. And, perhaps partially answering my own question above, the balance with -musage=0 somehow triggered a space reevaluation, thus allowing the -musage=10 balance to run afterward when it wouldn't before, even tho the -musage=0 didn't actually relocate (to /dev/null as they'd be empty, IOW, delete) any empty chunks. But... it still shouldn't happen, as if -musage=0 didn't relocate anything, it shouldn't trigger a space reevaluage that -musage=10 wouldn't trigger on its own, so while this might partially answer what happened, it does nothing to explain /why/ it happened. I'd call it a bug in the balance code, as the result of the -musage=10 should be exactly the same before and after, because the -musage=0 didn't actually relocate/delete anything. > And now I'm back in business... > > Still, this is a bit disappointing and at the v
Re: Scrub: no spae left on device
On Tue, Dec 08, 2015 at 05:24:16PM +0100, Holger Hoffstätte wrote: > On 12/08/15 17:06, Marc MERLIN wrote: > > Label: 'btrfs_pool1' uuid: 5ee24229-2431-448a-868e-2c325d10bfa7 > > Total devices 1 FS bytes used 524.26GiB > > devid1 size 615.01GiB used 614.94GiB path /dev/mapper/pool1 > > This is what I was alluding to. You could have started a -dusage balance > *before* the scrub so that one or several data chunks get freed. > Balancing metadata when you're out of space accomplishes nothing and only > will very likely fail, just as you saw. You have ~90GB usable space, but > that space is spread over chunks with low utilisation. Yes, my partition got a bit full, I freed up space, and unfortunately we still don't have a background rebalance to fix this, so I did run a manual one. But my filesystem was usable, I was writing to it just fine. I was just very surprised that scrub needed to rewrite blocks on a single disk device. You could make the case that scrub and balance=0 should be run together. In the meantime, I upgraded my script: http://marc.merlins.org/perso/btrfs/2014-03.html#Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair http://marc.merlins.org/linux/scripts/btrfs-scrub I figured there is no good reason not to run a balance 20 on metadata and data every night. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrub: no spae left on device
On 12/08/15 17:06, Marc MERLIN wrote: > Label: 'btrfs_pool1' uuid: 5ee24229-2431-448a-868e-2c325d10bfa7 > Total devices 1 FS bytes used 524.26GiB > devid1 size 615.01GiB used 614.94GiB path /dev/mapper/pool1 This is what I was alluding to. You could have started a -dusage balance *before* the scrub so that one or several data chunks get freed. Balancing metadata when you're out of space accomplishes nothing and only will very likely fail, just as you saw. You have ~90GB usable space, but that space is spread over chunks with low utilisation. -h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrub: no spae left on device
On Tue, Dec 08, 2015 at 04:46:32PM +0100, Lionel Bouton wrote: > Le 08/12/2015 16:37, Holger Hoffstätte a écrit : > > On 12/08/15 16:06, Marc MERLIN wrote: > >> Howdy, > >> > >> Why would scrub need space and why would it cancel if there isn't enough of > >> it? > >> (kernel 4.3) > >> > >> /etc/cron.daily/btrfs-scrub: > >> btrfs scrub start -Bd /dev/mapper/cryptroot > >> scrub device /dev/mapper/cryptroot (id 1) done > >>scrub started at Mon Dec 7 01:35:08 2015 and finished after 258 seconds > >>total bytes scrubbed: 130.84GiB with 0 errors > >> btrfs scrub start -Bd /dev/mapper/pool1 > >> ERROR: scrubbing /dev/mapper/pool1 failed for device id 1 (No space left > >> on device) > >> scrub device /dev/mapper/pool1 (id 1) canceled > > Scrub rewrites metadata (apparently even in -r aka readonly mode), and that > > can lead to temporary metadata expansion (stuff gets COWed around); it's > > a bit surprising but makes sense if you think about it. > > How long must I think about it until it makes sense? :-) > > Sorry I'm not sure why metadata is rewritten if no error is detected. > I've several theories but lack information: is the fact that no error > has been detected stored somewhere? is scrub using some kind of internal > temporary snapshot(s) to avoid interfering with other operations? other > reason I didn't think about? Yeah, I was also wondering why metadata should be rewritten on a single device scrub. Does not make sense to me. And this is what I got: legolas:~# btrfs balance start -musage=10 -v /mnt/btrfs_pool1/ Dumping filters: flags 0x6, state 0x0, force is off METADATA (flags 0x2): balancing, usage=10 SYSTEM (flags 0x2): balancing, usage=10 ERROR: error during balancing '/mnt/btrfs_pool1/' - No space left on device There may be more info in syslog - try dmesg | tail Ok, that sucks. legolas:~# btrfs balance start -musage=0 -v /mnt/btrfs_pool1/ Dumping filters: flags 0x6, state 0x0, force is off METADATA (flags 0x2): balancing, usage=0 SYSTEM (flags 0x2): balancing, usage=0 Done, had to relocate 0 out of 618 chunks This worked. Mmmh, I thought this wouldn't be necessary anymore in 4.3 kernels? legolas:~# btrfs balance start -musage=10 -v /mnt/btrfs_pool1 Dumping filters: flags 0x6, state 0x0, force is off METADATA (flags 0x2): balancing, usage=10 SYSTEM (flags 0x2): balancing, usage=10 Done, had to relocate 1 out of 618 chunks And now I'm back in business... Still, this is a bit disappointing and at the very least very unexpected in 4.3. legolas:~# btrfs fi df /mnt/btrfs_pool1 Data, single: total=604.88GiB, used=520.09GiB System, DUP: total=32.00MiB, used=96.00KiB Metadata, DUP: total=5.00GiB, used=4.17GiB GlobalReserve, single: total=512.00MiB, used=0.00B legolas:~# btrfs fi show /mnt/btrfs_pool1 Label: 'btrfs_pool1' uuid: 5ee24229-2431-448a-868e-2c325d10bfa7 Total devices 1 FS bytes used 524.26GiB devid1 size 615.01GiB used 614.94GiB path /dev/mapper/pool1 Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrub: no spae left on device
On 12/08/15 16:46, Lionel Bouton wrote: > Le 08/12/2015 16:37, Holger Hoffstätte a écrit : >> On 12/08/15 16:06, Marc MERLIN wrote: >>> Howdy, >>> >>> Why would scrub need space and why would it cancel if there isn't enough of >>> it? >>> (kernel 4.3) >>> >>> /etc/cron.daily/btrfs-scrub: >>> btrfs scrub start -Bd /dev/mapper/cryptroot >>> scrub device /dev/mapper/cryptroot (id 1) done >>> scrub started at Mon Dec 7 01:35:08 2015 and finished after 258 seconds >>> total bytes scrubbed: 130.84GiB with 0 errors >>> btrfs scrub start -Bd /dev/mapper/pool1 >>> ERROR: scrubbing /dev/mapper/pool1 failed for device id 1 (No space left on >>> device) >>> scrub device /dev/mapper/pool1 (id 1) canceled >> Scrub rewrites metadata (apparently even in -r aka readonly mode), and that >> can lead to temporary metadata expansion (stuff gets COWed around); it's >> a bit surprising but makes sense if you think about it. > > How long must I think about it until it makes sense? :-) > > Sorry I'm not sure why metadata is rewritten if no error is detected. > I've several theories but lack information: is the fact that no error > has been detected stored somewhere? is scrub using some kind of internal > temporary snapshot(s) to avoid interfering with other operations? other > reason I didn't think about? Well..I have no idea what the historical motivation for this behaviour was, even though I can make up at least two: rewriting known-good checksums generally (since you know they are good this very moment), and in case of error avoiding the area where the block error occurred (read errors on rust are often clustered and affect entire tracks). That's really all I know. I agree it's surprising, especially since it happens by default and also in -r mode, which might be considered a bug. -h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrub: no spae left on device
On 2015-12-08 10:06, Marc MERLIN wrote: Howdy, Why would scrub need space and why would it cancel if there isn't enough of it? (kernel 4.3) Wild guess here, but maybe scrub unconditionally updates the error counters, regardless of whether any errors were found or not? smime.p7s Description: S/MIME Cryptographic Signature
Re: Scrub: no spae left on device
Le 08/12/2015 16:06, Marc MERLIN a écrit : > Howdy, > > Why would scrub need space and why would it cancel if there isn't enough of > it? > (kernel 4.3) > > /etc/cron.daily/btrfs-scrub: > btrfs scrub start -Bd /dev/mapper/cryptroot > scrub device /dev/mapper/cryptroot (id 1) done > scrub started at Mon Dec 7 01:35:08 2015 and finished after 258 seconds > total bytes scrubbed: 130.84GiB with 0 errors > btrfs scrub start -Bd /dev/mapper/pool1 > ERROR: scrubbing /dev/mapper/pool1 failed for device id 1 (No space left on > device) > scrub device /dev/mapper/pool1 (id 1) canceled I can't be sure (not-a-dev), but one possibility that comes to mind is that if an error is detected writes must be done on the device. The repair might not be done in-place but with CoW and even if the error is not repaired by lack of redundancy IIRC each device tracks the number of errors detected so I assume this is written somewhere (system or metadata chunks most probably). Best regards, Lionel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrub: no spae left on device
Le 08/12/2015 16:37, Holger Hoffstätte a écrit : > On 12/08/15 16:06, Marc MERLIN wrote: >> Howdy, >> >> Why would scrub need space and why would it cancel if there isn't enough of >> it? >> (kernel 4.3) >> >> /etc/cron.daily/btrfs-scrub: >> btrfs scrub start -Bd /dev/mapper/cryptroot >> scrub device /dev/mapper/cryptroot (id 1) done >> scrub started at Mon Dec 7 01:35:08 2015 and finished after 258 seconds >> total bytes scrubbed: 130.84GiB with 0 errors >> btrfs scrub start -Bd /dev/mapper/pool1 >> ERROR: scrubbing /dev/mapper/pool1 failed for device id 1 (No space left on >> device) >> scrub device /dev/mapper/pool1 (id 1) canceled > Scrub rewrites metadata (apparently even in -r aka readonly mode), and that > can lead to temporary metadata expansion (stuff gets COWed around); it's > a bit surprising but makes sense if you think about it. How long must I think about it until it makes sense? :-) Sorry I'm not sure why metadata is rewritten if no error is detected. I've several theories but lack information: is the fact that no error has been detected stored somewhere? is scrub using some kind of internal temporary snapshot(s) to avoid interfering with other operations? other reason I didn't think about? Lionel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrub: no spae left on device
On 12/08/15 16:06, Marc MERLIN wrote: > Howdy, > > Why would scrub need space and why would it cancel if there isn't enough of > it? > (kernel 4.3) > > /etc/cron.daily/btrfs-scrub: > btrfs scrub start -Bd /dev/mapper/cryptroot > scrub device /dev/mapper/cryptroot (id 1) done > scrub started at Mon Dec 7 01:35:08 2015 and finished after 258 seconds > total bytes scrubbed: 130.84GiB with 0 errors > btrfs scrub start -Bd /dev/mapper/pool1 > ERROR: scrubbing /dev/mapper/pool1 failed for device id 1 (No space left on > device) > scrub device /dev/mapper/pool1 (id 1) canceled Scrub rewrites metadata (apparently even in -r aka readonly mode), and that can lead to temporary metadata expansion (stuff gets COWed around); it's a bit surprising but makes sense if you think about it. The fact that you ENOSPCed means that the fs was probably already fully allocated. If it bothers you, a subsequent balance with -musage=10 should vacuum things up. Alternatively just keep using the filesystem; eventually the empty metadata chunks should be collected, on the next remount at the latest. tl;dr: Never allocate all the chunks. Yes, this needs more graceful handling. -h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html