Re: Recovery from full metadata with all device space consumed?
On Thu, Apr 19, 2018 at 10:43:57PM +, Hugo Mills wrote: >Given that both data and metadata levels here require paired > chunks, try adding _two_ temporary devices so that it can allocate a > new block group. Thank you very much, that seems to have done the trick: # fallocate -l 4GiB /var/tmp/btrfs-temp-1 # fallocate -l 4GiB /var/tmp/btrfs-temp-2 # losetup -f /var/tmp/btrfs-temp-1 # losetup -f /var/tmp/btrfs-temp-2 # btrfs device add /dev/loop0 /broken Performing full device TRIM (4.00GiB) ... # btrfs device add /dev/loop1 /broken Performing full device TRIM (4.00GiB) ... # btrfs balance start -v -dusage=1 /broken Dumping filters: flags 0x1, state 0x0, force is off DATA (flags 0x2): balancing, usage=1 I'm guessing that'll take a while to complete, but meanwhile, in another terminal: # btrfs fi show /broken Label: 'mon_data' uuid: 85e52555-7d6d-4346-8b37-8278447eb590 Total devices 6 FS bytes used 69.53GiB devid1 size 931.51GiB used 731.02GiB path /dev/sda1 devid2 size 931.51GiB used 731.02GiB path /dev/sdb1 devid3 size 931.51GiB used 730.03GiB path /dev/sdc1 devid4 size 931.51GiB used 730.03GiB path /dev/sdd1 devid5 size 4.00GiB used 1.00GiB path /dev/loop0 devid6 size 4.00GiB used 1.00GiB path /dev/loop1 # btrfs fi df /broken Data, RAID0: total=2.77TiB, used=67.00GiB System, RAID1: total=8.00MiB, used=192.00KiB Metadata, RAID1: total=4.00GiB, used=2.49GiB GlobalReserve, single: total=512.00MiB, used=0.00B Do I understand correctly that this could require up to 3 extra devices, if for instance you arrived in this situation with a RAID6 data profile? Or is the number even higher for profiles like RAID10? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Recovery from full metadata with all device space consumed?
I've got a btrfs filesystem that I can't seem to get back to a useful state. The symptom I started with is that rename() operations started dying with ENOSPC, and it looks like the metadata allocation on the filesystem is full: # btrfs fi df /broken Data, RAID0: total=3.63TiB, used=67.00GiB System, RAID1: total=8.00MiB, used=224.00KiB Metadata, RAID1: total=3.00GiB, used=2.50GiB GlobalReserve, single: total=512.00MiB, used=0.00B All of the consumable space on the backing devices also seems to be in use: # btrfs fi show /broken Label: 'mon_data' uuid: 85e52555-7d6d-4346-8b37-8278447eb590 Total devices 4 FS bytes used 69.50GiB devid1 size 931.51GiB used 931.51GiB path /dev/sda1 devid2 size 931.51GiB used 931.51GiB path /dev/sdb1 devid3 size 931.51GiB used 931.51GiB path /dev/sdc1 devid4 size 931.51GiB used 931.51GiB path /dev/sdd1 Even the smallest balance operation I can start fails (this doesn't change even with an extra temporary device added to the filesystem): # btrfs balance start -v -dusage=1 /broken Dumping filters: flags 0x1, state 0x0, force is off DATA (flags 0x2): balancing, usage=1 ERROR: error during balancing '/broken': No space left on device There may be more info in syslog - try dmesg | tail # dmesg | tail -1 [11554.296805] BTRFS info (device sdc1): 757 enospc errors during balance The current kernel is 4.15.0 from Debian's stretch-backports (specifically linux-image-4.15.0-0.bpo.2-amd64), but it was Debian's 4.9.30 when the filesystem got into this state. I upgraded it in the hopes that a newer kernel would be smarter, but no dice. btrfs-progs is currently at v4.7.3. Most of what this filesystem stores is Prometheus 1.8's TSDB for its metrics, which are constantly written at around 50MB/second. The filesystem never really gets full as far as data goes, but there's a lot of never-ending churn for what data is there. Question 1: Are there other steps that can be tried to rescue a filesystem in this state? I still have it mounted in the same state, and I'm willing to try other things or extract debugging info. Question 2: Is there something I could have done to prevent this from happening in the first place? Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Triple parity and beyond
I'm not going to claim any expert status on this discussion (the theory makes my head spin) but I will say I agree with Andrea as far as prefering his implementation for triple parity and beyond. PSHUFB has been around the intel platform since the Core2 introduced it as part of SSSE3 back in Q1 2006. The generation of Intel based servers that ran pre-Core Xeons are long in the tooth and this is a value judgement but if your data is big enough you need triple parity, you probably shouldn't be running it from an ten year old platform. But that's just me. :-) -- Drew Nothing in life is to be feared. It is only to be understood. --Marie Curie This started out as a hobby and spun horribly out of control. -Unknown -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html