[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)

2018-04-16 Thread Prakash Surya
Closed #591 via 61cfa6ec9acd6e55a45409f30b8d921e99e20c57.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/591#event-1577258558
--
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M8a826661f6c5ed80e653fac9
Delivery options: https://openzfs.topicbox.com/groups


[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)

2018-04-05 Thread Matthew Ahrens
I think we can also count @dweeezil and @behlendorf as reviewers.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/591#issuecomment-379005974
--
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M7d6f599573055bbb1e0a417c
Delivery options: https://openzfs.topicbox.com/groups


[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)

2018-04-03 Thread Matthew Ahrens
@ahrens pushed 2 commits.

68c70cf  Don't clear ranges beyond end of spacemap
5e8c86f  Reasonably limit fault injection mayhem


-- 
You are receiving this because you are subscribed to this thread.
View it on GitHub:
https://github.com/openzfs/openzfs/pull/591/files/b5b707ed3a762ea25cf8ca4025517ff54719ffec..5e8c86fa9ecbead2dd52e4a84d23a3aacef64062

--
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M714f091b7efbb3fb7e098011
Delivery options: https://openzfs.topicbox.com/groups


[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)

2018-03-29 Thread Matthew Ahrens
Some minor changes to address feedback from @grwilson, @prashks, and Sara 
Hartse.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/591#issuecomment-377409083
--
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M173108f58e3b33d8c1f823b8
Delivery options: https://openzfs.topicbox.com/groups


[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)

2018-03-29 Thread Matthew Ahrens
@ahrens pushed 2 commits.

998ce25  comments
b5b707e  removing vdev can be resilvered


-- 
You are receiving this because you are subscribed to this thread.
View it on GitHub:
https://github.com/openzfs/openzfs/pull/591/files/2558c24957d84e095309f9699d3a6b268672d6c6..b5b707ed3a762ea25cf8ca4025517ff54719ffec

--
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M2c97337f23d9de4b1e40fa37
Delivery options: https://openzfs.topicbox.com/groups


[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)

2018-03-27 Thread Brian Behlendorf
behlendorf approved this pull request.

I like it, thanks.



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/591#pullrequestreview-107505918
--
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-Mb4a487362186887163b0c5eb
Delivery options: https://openzfs.topicbox.com/groups


[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)

2018-03-27 Thread Matthew Ahrens
I added 2 more commits, which address feedback from @sdimitro and @behlendorf.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/591#issuecomment-376707957
--
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M1b3296a67f53e43502683fd7
Delivery options: https://openzfs.topicbox.com/groups


[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)

2018-03-27 Thread Matthew Ahrens
@ahrens pushed 2 commits.

de82530  Fix zdb hang when running zloop
2558c24  serapheim feedback


-- 
You are receiving this because you are subscribed to this thread.
View it on GitHub:
https://github.com/openzfs/openzfs/pull/591/files/501473238cfd705ee2ed4b6b3f8c2c0983399eb7..2558c24957d84e095309f9699d3a6b268672d6c6

--
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M260708a65c438744d1b0d580
Delivery options: https://openzfs.topicbox.com/groups


[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)

2018-03-27 Thread Brian Behlendorf
behlendorf requested changes on this pull request.



> + * vdevs.  In either case, we try every combination.  This ensures that if
+ * a mirror has small silent errors on all of its children, we can still
+ * reconstruct the correct data, as long as those errors are at
+ * sufficiently-separated offsets (specifically, separated by the largest
+ * block size - default of 128KB, but up to 16MB).
+ */
+static void
+vdev_indirect_reconstruct_io_done(zio_t *zio)
+{
+   indirect_vsd_t *iv = zio->io_vsd;
+
+   for (;;) {
+   /* copy data from splits to main zio */
+   int ret;
+   for (indirect_split_t *is = list_head(&iv->iv_splits);
+   is != NULL; is = list_next(&iv->iv_splits, is)) {

When testing with `zloop.sh` I observed that when there are a large number of 
segment copies and the damaged copy is one of the "high bits" then this 
function can't be expected to check everything.  This manifested itself as a 
`zdb` hang in my case and a few suspended pools due to stalling the IO pipeline.

I'm proposing 
https://github.com/behlendorf/zfs/commit/2ad01af1b4a50117779aab7e47c7006a7f2451e6
 as a fix . It slightly reworks the logic in 
`vdev_indirect_reconstruct_io_done` to adopt a random sampling apprach to 
reconstruction when there are a large number of segments, otherwise we use the 
existing logic and check everything.  We could further optimize this if we 
decide it's needed.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/591#pullrequestreview-107483336
--
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M78532ea4d3a8c8e84252d238
Delivery options: https://openzfs.topicbox.com/groups


[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)

2018-03-26 Thread Matthew Ahrens
@ahrens pushed 1 commit.

5014732  post event


-- 
You are receiving this because you are subscribed to this thread.
View it on GitHub:
https://github.com/openzfs/openzfs/pull/591/files/1f9502999027b1be7e252690a2f179f0e5c59ff4..501473238cfd705ee2ed4b6b3f8c2c0983399eb7

--
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-Me71fd3a38b2bc979c52abbee
Delivery options: https://openzfs.topicbox.com/groups


[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)

2018-03-26 Thread Matthew Ahrens
I've added another commit to this which addresses an additional problem with 
device removal and mirrors:

Now that we support device removal with mirrors, it's reasonable to change the 
pool config while a removal is in progress. E.g. to "zpool attach", "zpool 
detach", or automatic spare in / out events. However, these operations cause 
panics because we save the vdev_t that is being removed in svr_vdev, but this 
vdev_t can be freed (e.g. if a detach changes the top-level vdev from a mirror 
to a plain disk).

The solution is to store the vdev ID in the spa_vdev_removal_t, rather than the 
vdev_t. Most places that the svr_vdev is used will have to use 
vdev_lookup_top() to obtain the vdev_t (in some cases adding additional calls 
to spa_config_enter/exit).



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/591#issuecomment-376265987
--
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M73ff51e700772a1d7f257fbc
Delivery options: https://openzfs.topicbox.com/groups


[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)

2018-03-19 Thread Matthew Ahrens
@ahrens pushed 1 commit.

0e2e20f  no repair when readonly


-- 
You are receiving this because you are subscribed to this thread.
View it on GitHub:
https://github.com/openzfs/openzfs/pull/591/files/51e5ea3e26b27f2007876718ae5b6664fa017e2c..0e2e20fd09a283f1e4157f229fd82f462caa411e

--
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M730c36d0c4fd689b04247416
Delivery options: https://openzfs.topicbox.com/groups


[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)

2018-03-19 Thread Matthew Ahrens
ahrens commented on this pull request.



> + if (is->is_child[is->is_good_child].ic_data == NULL) {
+   ret = EIO;
+   goto next;
+   }
+
+   abd_copy_off(zio->io_abd,
+   is->is_child[is->is_good_child].ic_data,
+   is->is_split_offset, 0, is->is_size);
+   }
+
+   /* See if this checksum matches. */
+   zio_bad_cksum_t zbc;
+   ret = zio_checksum_error(zio, &zbc);
+   if (ret == 0) {
+   /* Found a matching checksum.  Issue repair i/os. */
+   vdev_indirect_repair(zio);

Sorry about that, I found that problem too but forgot to update the PR.  I 
fixed it in a slightly different, but equivalent way.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/591#discussion_r175498158
--
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M6a1b69cdb2cfb676c5177327
Delivery options: https://openzfs.topicbox.com/groups


[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)

2018-03-18 Thread Tim Chase
dweeezil commented on this pull request.

So far, looks good (in the port to ZoL) except for the repair writes happening 
for read-only pools.

> + if (is->is_child[is->is_good_child].ic_data == NULL) {
+   ret = EIO;
+   goto next;
+   }
+
+   abd_copy_off(zio->io_abd,
+   is->is_child[is->is_good_child].ic_data,
+   is->is_split_offset, 0, is->is_size);
+   }
+
+   /* See if this checksum matches. */
+   zio_bad_cksum_t zbc;
+   ret = zio_checksum_error(zio, &zbc);
+   if (ret == 0) {
+   /* Found a matching checksum.  Issue repair i/os. */
+   vdev_indirect_repair(zio);

I added this here:
```
diff --git a/module/zfs/vdev_indirect.c b/module/zfs/vdev_indirect.c
index f794b92..7d88b4c 100644
--- a/module/zfs/vdev_indirect.c
+++ b/module/zfs/vdev_indirect.c
@@ -1416,7 +1416,8 @@ vdev_indirect_reconstruct_io_done(zio_t *zio)
ret = zio_checksum_error(zio, &zbc);
if (ret == 0) {
/* Found a matching checksum.  Issue repair i/os. */
-   vdev_indirect_repair(zio);
+   if (spa_writeable(zio->io_spa))
+   vdev_indirect_repair(zio);
zio_checksum_verified(zio);
return;
}
```
to keep zdb from triggering ```VERIFY(zio->io_type != ZIO_TYPE_WRITE || 
spa_writeable(spa));``` in ```zio_vdev_io_start()```.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/591#pullrequestreview-104804765
--
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-Md05bf4c0d4eec4182d1a04ae
Delivery options: https://openzfs.topicbox.com/groups


[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)

2018-03-17 Thread Tim Chase
@ahrens I'll look it over and try to get this into the ZoL device evacuation 
port this weekend.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/591#issuecomment-373924421
--
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M4e1cee75acc449f4649dfa31
Delivery options: https://openzfs.topicbox.com/groups


[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)

2018-03-16 Thread Matthew Ahrens
@allanjude @dweeezil Thanks for looking at the previous iteration of this.  The 
new and improved fix is much more comprehensive but the code is almost entirely 
different.  Could I ask that you take a look at this too?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/591#issuecomment-373787529
--
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M97d4b0fced53455ebac09989
Delivery options: https://openzfs.topicbox.com/groups