Re: Recovery from full metadata with all device space consumed?

2018-04-19 Thread Drew Bloechl
On Thu, Apr 19, 2018 at 10:43:57PM +, Hugo Mills wrote:
>Given that both data and metadata levels here require paired
> chunks, try adding _two_ temporary devices so that it can allocate a
> new block group.

Thank you very much, that seems to have done the trick:

# fallocate -l 4GiB /var/tmp/btrfs-temp-1
# fallocate -l 4GiB /var/tmp/btrfs-temp-2
# losetup -f /var/tmp/btrfs-temp-1
# losetup -f /var/tmp/btrfs-temp-2
# btrfs device add /dev/loop0 /broken
Performing full device TRIM (4.00GiB) ...
# btrfs device add /dev/loop1 /broken
Performing full device TRIM (4.00GiB) ...
# btrfs balance start -v -dusage=1 /broken
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=1

I'm guessing that'll take a while to complete, but meanwhile, in another
terminal:

# btrfs fi show /broken
Label: 'mon_data'  uuid: 85e52555-7d6d-4346-8b37-8278447eb590
Total devices 6 FS bytes used 69.53GiB
devid1 size 931.51GiB used 731.02GiB path /dev/sda1
devid2 size 931.51GiB used 731.02GiB path /dev/sdb1
devid3 size 931.51GiB used 730.03GiB path /dev/sdc1
devid4 size 931.51GiB used 730.03GiB path /dev/sdd1
devid5 size 4.00GiB used 1.00GiB path /dev/loop0
devid6 size 4.00GiB used 1.00GiB path /dev/loop1

# btrfs fi df /broken
Data, RAID0: total=2.77TiB, used=67.00GiB
System, RAID1: total=8.00MiB, used=192.00KiB
Metadata, RAID1: total=4.00GiB, used=2.49GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Do I understand correctly that this could require up to 3 extra devices,
if for instance you arrived in this situation with a RAID6 data profile?
Or is the number even higher for profiles like RAID10?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Recovery from full metadata with all device space consumed?

2018-04-19 Thread Drew Bloechl
I've got a btrfs filesystem that I can't seem to get back to a useful
state. The symptom I started with is that rename() operations started
dying with ENOSPC, and it looks like the metadata allocation on the
filesystem is full:

# btrfs fi df /broken
Data, RAID0: total=3.63TiB, used=67.00GiB
System, RAID1: total=8.00MiB, used=224.00KiB
Metadata, RAID1: total=3.00GiB, used=2.50GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

All of the consumable space on the backing devices also seems to be in
use:

# btrfs fi show /broken
Label: 'mon_data'  uuid: 85e52555-7d6d-4346-8b37-8278447eb590
Total devices 4 FS bytes used 69.50GiB
devid1 size 931.51GiB used 931.51GiB path /dev/sda1
devid2 size 931.51GiB used 931.51GiB path /dev/sdb1
devid3 size 931.51GiB used 931.51GiB path /dev/sdc1
devid4 size 931.51GiB used 931.51GiB path /dev/sdd1

Even the smallest balance operation I can start fails (this doesn't
change even with an extra temporary device added to the filesystem):

# btrfs balance start -v -dusage=1 /broken
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=1
ERROR: error during balancing '/broken': No space left on device
There may be more info in syslog - try dmesg | tail
# dmesg | tail -1
[11554.296805] BTRFS info (device sdc1): 757 enospc errors during
balance

The current kernel is 4.15.0 from Debian's stretch-backports
(specifically linux-image-4.15.0-0.bpo.2-amd64), but it was Debian's
4.9.30 when the filesystem got into this state. I upgraded it in the
hopes that a newer kernel would be smarter, but no dice.

btrfs-progs is currently at v4.7.3.

Most of what this filesystem stores is Prometheus 1.8's TSDB for its
metrics, which are constantly written at around 50MB/second. The
filesystem never really gets full as far as data goes, but there's a lot
of never-ending churn for what data is there.

Question 1: Are there other steps that can be tried to rescue a
filesystem in this state? I still have it mounted in the same state, and
I'm willing to try other things or extract debugging info.

Question 2: Is there something I could have done to prevent this from
happening in the first place?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Triple parity and beyond

2013-11-19 Thread Drew
I'm not going to claim any expert status on this discussion (the
theory makes my head spin) but I will say I agree with Andrea as far
as prefering his implementation for triple parity and beyond.

PSHUFB has been around the intel platform since the Core2 introduced
it as part of SSSE3 back in Q1 2006. The generation of Intel based
servers that ran pre-Core Xeons are long in the tooth and this is a
value judgement but if your data is big enough you need triple parity,
you probably shouldn't be running it from an ten year old platform.

But that's just me. :-)


-- 
Drew

Nothing in life is to be feared. It is only to be understood.
--Marie Curie

This started out as a hobby and spun horribly out of control.
-Unknown
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html