Re: Device replace issues and disabling it until they are solved

2016-06-02 Thread Yauhen Kharuzhy
ogical NN" messages after two consecutive device replaces. So, replace is still not usable for RAID5/6. And it is very slow in comparison with 'device add && balance device remove missing' sequence (4x slower). -- Yauhen Kharuzhy -- To unsubscribe from this list: send the line "unsubscr

Re: [PATCH] btrfs: s_bdev is not null after missing replace

2016-05-03 Thread Yauhen Kharuzhy
On Tue, May 03, 2016 at 11:34:50PM +0300, Yauhen Kharuzhy wrote: > On Tue, May 03, 2016 at 07:31:48PM +0200, David Sterba wrote: > > On Thu, Apr 14, 2016 at 06:24:10PM +0800, Anand Jain wrote: > > > Yauhen reported in the ML that s_bdev is null at mount, and > > >

Re: [PATCH] btrfs: s_bdev is not null after missing replace

2016-05-03 Thread Yauhen Kharuzhy
because bdev is null for missing device, things > > gets matched up. Fix this by checking if s_bdev is set. I > > didn't want to completely remove updating s_bdev because > > the future multi device support at vfs layer may need it. > > > > Signed-off-by: Anand Jain <a

Re: RAID6, errors at missing device replacement

2016-05-02 Thread Yauhen Kharuzhy
On Mon, May 02, 2016 at 01:04:30PM -0600, Chris Murphy wrote: > On Mon, May 2, 2016 at 12:43 PM, Yauhen Kharuzhy > <yauhen.kharu...@zavadatar.com> wrote: > > On Sat, Apr 16, 2016 at 07:37:48AM +, Duncan wrote: > >> Yauhen Kharuzhy posted on Fri, 15 Apr 2016

Re: RAID6, errors at missing device replacement

2016-05-02 Thread Yauhen Kharuzhy
On Sat, Apr 16, 2016 at 07:37:48AM +, Duncan wrote: > Yauhen Kharuzhy posted on Fri, 15 Apr 2016 12:49:36 -0700 as excerpted: > > > I have discovered case when replacement of missing devices causes > > metadata corruption. Does anybody know anything about this? > >

RAID56 device replace/rebalance error: bad page state

2016-05-02 Thread Yauhen Kharuzhy
00 48 89 74 24 08 eb 76 40 f6 c7 03 0f 85 d5 00 00 00 44 8b 4f 1c 45 85 c9 <74> 68 41 8d 51 01 48 8d 4f 1c 44 89 c8 f0 0f b1 57 1c 41 39 c1 -- Yauhen Kharuzhy test-replace-balance.sh Description: Bourne shell script

Re: [PATCH v5 00/13] Introduce device state 'failed', spare device and auto replace

2016-04-28 Thread Yauhen Kharuzhy
93] sd 2:0:0:0: [sde] Synchronizing SCSI cache [ 507.665344] sd 2:0:0:0: [sde] Stopping disk [ 511.432359] BTRFS info (device sdf): num_devices 4 rw_devices 2 degraded-option: unset -- Yauhen Kharuzhy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 00/13] Introduce device state 'failed', spare device and auto replace

2016-04-25 Thread Yauhen Kharuzhy
On Mon, Apr 18, 2016 at 07:31:31PM +0800, Anand Jain wrote: > Thanks for various comments, tests and feedback. Seems working good for me. -- Yauhen Kharuzhy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.ke

RAID6, errors at missing device replacement

2016-04-15 Thread Yauhen Kharuzhy
Hi. I have discovered case when replacement of missing devices causes metadata corruption. Does anybody know anything about this? I use 4.4.5 kernel with latest global spare patches. If we have RAID6 (may be reproducible on RAID5 too) and try to replace one missing drive by other and after this

Re: [PATCH v4 00/13] Introduce device state 'failed', spare device and auto replace

2016-04-14 Thread Yauhen Kharuzhy
? kthread_create_on_node+0x200/0x200 [ 5375.719697] ----[ cut here ] -- Yauhen Kharuzhy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs-progs: Check if the FSID was seen by comparing full UUID

2016-04-14 Thread Yauhen Kharuzhy
devid7 size 7.28TiB used 1.57GiB path /dev/sdi To resolve this collision, search for full FSID in the list of seen filesystems. Signed-off-by: Yauhen Kharuzhy <yauhen.kharu...@zavadatar.com> --- cmds-filesystem.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff

Re: [PATCH v4 00/13] Introduce device state 'failed', spare device and auto replace

2016-04-14 Thread Yauhen Kharuzhy
On Tue, Apr 12, 2016 at 10:15:50PM +0800, Anand Jain wrote: > Thanks for various comments, tests and feedback. Tested-By: Yauhen Kharuzhy <yauhen.kharu...@zavadatar.com>, for all patches in the series. -- Yauhen Kharuzhy -- To unsubscribe from this list: send the line "unsubscri

Re: [PATCH v5 11/13] btrfs: introduce device dynamic state transition to offline or failed

2016-04-14 Thread Yauhen Kharuzhy
(fs_devices->fs_info->sb->s_bdev && > + (fs_devices->fs_info->sb->s_bdev == device->bdev)) > + fs_devices->fs_info->sb->s_bdev = next_device->bdev; > + > + if (device->bdev == fs_devices->latest_bdev) > +

Re: [PATCH v4 00/13] Introduce device state 'failed', spare device and auto replace

2016-04-14 Thread Yauhen Kharuzhy
e closing, s_bdev points to invalid bdev. 6) umount -> sync_filesystem() -> sync_blokdev(s_bdev) -> OOPS. -- Yauhen Kharuzhy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: Set superblock s_bdev field properly at device closing

2016-04-14 Thread Yauhen Kharuzhy
On Thu, Apr 14, 2016 at 02:59:23PM +0800, Anand Jain wrote: > > > Hi Yauhen > > On 04/14/2016 09:15 AM, Yauhen Kharuzhy wrote: > >fs_info->sb->s_bdev field isn't set to any value at mount time > > There were patch to do set it at the vfs layer, or some

[PATCH] Btrfs: Set superblock s_bdev field properly at device closing

2016-04-13 Thread Yauhen Kharuzhy
s code. So, set it on mount time and select valid device at device closing time. Alternative solution may be to not set s_bdev entirely. Signed-off-by: Yauhen Kharuzhy <yauhen.kharu...@zavadatar.com> --- fs/btrfs/super.c | 1 + fs/btrfs/volumes.c | 16 2 files changed, 1

Re: [PATCH v4 00/13] Introduce device state 'failed', spare device and auto replace

2016-04-13 Thread Yauhen Kharuzhy
nge+0x29/0xf0 [ 374.909672] RSP [ 374.937941] ---[ end trace 2bbc2fd699f402ff ]--- -- Yauhen Kharuzhy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 00/13] Introduce device state 'failed', spare device and auto replace

2016-04-12 Thread Yauhen Kharuzhy
wap = 381628kB [ 450.314188] Total swap = 418492kB [ 450.317644] 421006 pages RAM [ 450.319573] 0 pages HighMem/MovableOnly [ 450.322100] 21084 pages reserved [ 450.323853] 0 pages hwpoisoned ... -- Yauhen Kharuzhy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs"

Re: [PATCH 01/13] btrfs: Introduce a new function to check if all chunks a OK for degraded mount

2016-04-12 Thread Yauhen Kharuzhy
0; i < map->num_stripes; i++) { > + if (map->stripes[i].dev->missing) > + missing++; > + } > + if (missing > max_tolerated) { > + ret = -EIO; > + btrfs_warn(fs_info,

Re: [PATCH 10/13] btrfs: introduce helper functions to perform hot replace

2016-04-08 Thread Yauhen Kharuzhy
d6e5df 100644 > --- a/fs/btrfs/dev-replace.h > +++ b/fs/btrfs/dev-replace.h > @@ -46,4 +46,5 @@ static inline void btrfs_dev_replace_stats_inc(atomic64_t > *stat_value) > { > atomic64_inc(stat_value); > } > +int btrfs_auto_replace_start(struct btrfs_root *root, struct btrfs_device > *src_device); > #endif > -- > 2.7.0 -- Yauhen Kharuzhy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Missing device handling (was: 'unable to mount btrfs pool...')

2016-04-08 Thread Yauhen Kharuzhy
__u64 missing_devices; /* out */ + __u64 open_devices; /* out */ + __u64 rw_devices; /* out */ + __u64 total_devices; /* out */ + __u64 reserved[118];/* pad to 1k */ }; struc

Re: [PATCH 10/13] btrfs: introduce helper functions to perform hot replace

2016-04-07 Thread Yauhen Kharuzhy
e(tgt_path); > + kfree(src_path); > + > + return 0; > +} > diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h > index e922b42d91df..b918b9d6e5df 100644 > --- a/fs/btrfs/dev-replace.h > +++ b/fs/btrfs/dev-replace.h > @@ -46,4 +46,5 @@ static inline void b

Re: good documentation on btrfs internals and on disk layout

2016-04-05 Thread Yauhen Kharuzhy
2016-04-05 11:56 GMT-07:00 Austin S. Hemmelgarn <ahferro...@gmail.com>: > On 2016-04-05 14:36, Yauhen Kharuzhy wrote: >> >> 2016-04-05 11:15 GMT-07:00 Austin S. Hemmelgarn <ahferro...@gmail.com>: >>> >>> On 2016-04-05 13:53, Yauhen Kharuzhy wrote: &

Re: good documentation on btrfs internals and on disk layout

2016-04-05 Thread Yauhen Kharuzhy
2016-04-05 11:15 GMT-07:00 Austin S. Hemmelgarn <ahferro...@gmail.com>: > On 2016-04-05 13:53, Yauhen Kharuzhy wrote: >> >> Hello, >> >> I try to understand btrfs logic in mounting of multi-device filesystem >> when device generations are different. All

Re: good documentation on btrfs internals and on disk layout

2016-04-05 Thread Yauhen Kharuzhy
Hello, I try to understand btrfs logic in mounting of multi-device filesystem when device generations are different. All my questions are related to RAID5/6 for system, metadata, and data case. Kernel can mount FS with different device generations (if drive was physically removed before last

Re: Global hotspare functionality

2016-04-04 Thread Yauhen Kharuzhy
2016-04-01 18:15 GMT-07:00 Anand Jain : Issue 2. At start of autoreplacig drive by hotspare, kernel craches in transaction handling code (inside of btrfs_commit_transaction() called by autoreplace initiating routines). I 'fixed' this by removing

Re: Global hotspare functionality

2016-04-01 Thread Yauhen Kharuzhy
On Sat, Apr 02, 2016 at 09:15:56AM +0800, Anand Jain wrote: > > > On 03/30/2016 03:47 AM, Yauhen Kharuzhy wrote: > >On Tue, Mar 29, 2016 at 10:41:36PM +0800, Anand Jain wrote: > >> > >>Hi Yauhen, > >> > > > >>> > >>>

Re: Global hotspare functionality

2016-03-30 Thread Yauhen Kharuzhy
On Tue, Mar 29, 2016 at 10:40:40PM +0300, Yauhen Kharuzhy wrote: > Hi. > > I am testing hotspare v2 on kernel v4.4.5 (I will try latest Chris' tree > later) > now with lockdep debugging enabled. At starting of replacement, lockdep > warning is displayed, > becaus

Re: [PATCH 12/12] btrfs: check device for critical errors and mark failed

2016-03-29 Thread Yauhen Kharuzhy
nt if hotspare will be added after drive failure? V1 of the patchset tried to perform autoreplace endlessly until replace drive is added. -- Yauhen Kharuzhy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Global hotspare functionality

2016-03-29 Thread Yauhen Kharuzhy
Reproduced with mason's for-linus-4.6 branch also. 2016-03-29 12:47 GMT-07:00 Yauhen Kharuzhy <yauhen.kharu...@zavadatar.com>: > On Tue, Mar 29, 2016 at 10:41:36PM +0800, Anand Jain wrote: >> >> Hi Yauhen, >> > >> > >> >Issue 2. >> >A

Re: [PATCH 12/12] btrfs: check device for critical errors and mark failed

2016-03-29 Thread Yauhen Kharuzhy
mutex_unlock(>fs_info->casualty_mutex); > + > +sleep: > + if (!try_to_freeze() && !again) { This block was copy-pasted from the cleaner_kthread(). 'again' variable is not used in reality, and using of try_to_freeze() in the cleaner_kthread() was eliminated in 'for-linus-4.6' mason's branc

[PATCH] btrfs: Reset IO error counters before start of device replacing

2016-03-29 Thread Yauhen Kharuzhy
(DEV_REPLACE_START) failed on "/media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/": Input/output error, no error Reset num_write_errors and num_uncorrectable_read_errors counters in the dev_replace structure before start of replacing. Signed-off-by: Yauhen Kharuzhy <yauhen.kharu...@zavadatar.com> -

Re: Global hotspare functionality

2016-03-29 Thread Yauhen Kharuzhy
28 48 c7 c7 6b f8 a3 81 e8 40 09 d9 ff e8 3b 43 31 00 41 c1 ec 09 48 8b 7b 08 45 85 e4 0f 85 13 01 00 00 48 8b 87 f0 00 00 00 <4c> 8b b8 48 05 00 00 4d 85 ff 0f 84 d5 01 00 00 4c 8b af e0 00 [ 1465.750135] RIP [] generic_make_request_checks+0x4d/0x910 [ 1465.776005] RSP [ 1465.79084

Re: Global hotspare functionality

2016-03-29 Thread Yauhen Kharuzhy
ted -- Yauhen Kharuzhy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Global hotspare functionality

2016-03-29 Thread Yauhen Kharuzhy
manager. But as of now btrfs has > already made degraded as non default choice. There is something else > new which is needed and it can be a separate RFC, not part of this > patch set. > > > Please try. V2 sent out. OK, I am going to testing now. -- Yauhen Kharuzhy -- To unsubs

Lockdep warning at device replace finishing

2016-03-24 Thread Yauhen Kharuzhy
Hi. Can anybody point me to possible cause of this lockdep warning and say if it is dangerous in reality? It appeared when I started replacing from the missing drive ('btrfs replace start ). My locking-fu seems to be too weak to resolve this by myself. I use 4.4.5 kernel with Anand's global

Re: Global hotspare functionality

2016-03-19 Thread Yauhen Kharuzhy
Issue 5: Race between close_ctree() and casualty_kthread(): close_ctree(): if (fs_info->casualty_kthread) kthread_stop(fs_info->casualty_kthread); casualty_kthread(): out: fs_info->casualty_kthread = NULL; At SMP system, kthread_stop() argument can be changed

Global hotspare functionality

2016-03-18 Thread Yauhen Kharuzhy
768?at=master and https://bitbucket.org/jekhor/linux-btrfs/commits/be5e2524c10f2b4047da80f9f85b54c6006d4273?at=master -- Yauhen Kharuzhy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo

Re: [PATCH V4 2/2] btrfs-progs: Introduce device delete by devid

2016-03-14 Thread Yauhen Kharuzhy
es for 'global hot spare' functionality working, they depends from two another patchsets which were re-sent few times with changes and introduced new bugs etc... Btw, Anand, if you have public git repo with 'global hotspare' patches, could you please send it's location also? -- Yauhen

[PATCH] btrfs-progs: Dont' stop scanning of devices at first failed device

2016-03-10 Thread Yauhen Kharuzhy
this by remove stopping at any failure in the btrfs_register_all_devices(), just return error count. btrfs_scan_one_device() reports any kind of error already. Signed-off-by: Yauhen Kharuzhy <yauhen.kharu...@zavadatar.com> --- cmds-device.c | 2 +- utils.c | 15 +++ 2

[PATCH] Fix broken 'device scan' arguments parsing

2016-03-08 Thread Yauhen Kharuzhy
commit 52179e4fea41e55f31c92cd033a0b53a5107b4f4 'btrfs-progs: unify argc min/max checking' brokes 'btrfs device scan' command when no argument was given. Fix this. Signed-off-by: Yauhen Kharuzhy <yauhen.kharu...@zavadatar.com> --- cmds-device.c | 2 +- 1 file changed, 1 insertion(+), 1 de