Re: abort device removal?

2014-12-07 Thread Marc MERLIN
On Fri, Dec 05, 2014 at 12:28:57PM -0500, moparisthebest wrote:
 Hello all,
 
 I had a 6-device array I added a 4tb device to last night and ran the
 command to remove a previous 4tb device that still worked fine
 overnight.  Unfortunately, one of the OTHER devices completely failed
 while this was happening, and it *looks* like btrfs did the right thing
 and stopped the move, except it's still marked as 0 space in btrfs fi
 show.  The delete command is still running, though iotop shows it's not
 actually reading or writing anything and no further moving messages in
 dmesg/kern.log seems to indicate that too.
 
 So what I think I *need* to do is re-add the drive it's currently trying
 to remove so I can delete the now non-functioning other drive without
 losing any data.  My fear is a reboot or unmount/remount will fail to
 mount the currently-being-removed drive as well causing me to lose
 everything.

So I didn't try this, but my understanding is that remove actually runs
a rebalance to remove all the data from that drive.
If the rebalance didn't finish, the drive is still good and part of the
array.

Obviously, you'd be better off with a full backup, but my guess is that
you could just shutdown, remove the failing drive, and leave all the
other drives.
Then run rebalance and it should recreate the missing data from your
failed drive from parity.

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


abort device removal?

2014-12-05 Thread moparisthebest
Hello all,

I had a 6-device array I added a 4tb device to last night and ran the
command to remove a previous 4tb device that still worked fine
overnight.  Unfortunately, one of the OTHER devices completely failed
while this was happening, and it *looks* like btrfs did the right thing
and stopped the move, except it's still marked as 0 space in btrfs fi
show.  The delete command is still running, though iotop shows it's not
actually reading or writing anything and no further moving messages in
dmesg/kern.log seems to indicate that too.

So what I think I *need* to do is re-add the drive it's currently trying
to remove so I can delete the now non-functioning other drive without
losing any data.  My fear is a reboot or unmount/remount will fail to
mount the currently-being-removed drive as well causing me to lose
everything.

Here is some relevant info from the system:
# uname -a
Linux mytorrentflux1 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
# btrfs --version
Btrfs v3.17.3
# btrfs fi show
Label: 'completed'  uuid: 0d14bb0f-46cc-408e-9245-f06d50ec2da8
Total devices 7 FS bytes used 7.60TiB
devid1 size 3.64TiB used 3.28TiB path /dev/mapper/fourtb1
devid2 size 3.64TiB used 3.29TiB path /dev/mapper/fourtb2
devid3 size 2.73TiB used 2.37TiB path /dev/mapper/threetb1
devid5 size 1.82TiB used 1.82TiB path /dev/mapper/twotb1
devid6 size 0.00B used 1.99TiB path /dev/mapper/fourtb3
devid7 size 2.73TiB used 2.22TiB path /dev/mapper/threetb2
devid8 size 3.64TiB used 240.29GiB path /dev/mapper/fourtb4

Btrfs v3.17.3
# btrfs fi df /mnt/completed/
Data, RAID10: total=6.26TiB, used=6.26TiB
Data, RAID1: total=1.33TiB, used=1.33TiB
System, RAID10: total=96.00MiB, used=852.00KiB
Metadata, RAID10: total=10.77GiB, used=9.90GiB
Metadata, RAID1: total=5.00GiB, used=4.37GiB

fourtb4 is the new drive I just added, fourtb3 is the functioning drive
I attempted to remove before threetb1 completely failed (smartctl can't
even read anything from it, well, from the underlying device)

dmesg/kern.log is too large too attach, here are some important-looking
excerpts (3 lines often repeated):
Dec  5 09:59:35 mytorrentflux1 kernel: [1549876.646751] btrfs: bdev
/dev/mapper/threetb1 errs: wr 17599, rd 973, flush 0, corrupt 0, gen 0
Dec  5 09:59:35 mytorrentflux1 kernel: [1549877.022291] lost page write
due to I/O error on /dev/mapper/threetb1
Dec  5 10:07:08 mytorrentflux1 kernel: [1550329.743294]
btrfs_dev_stat_print_on_error: 264 callbacks suppressed

I appreciate any help or guidance I can get on this issue so I don't
lose data, hopefully.

Thanks much!
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html