[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)
Closed #591 via 61cfa6ec9acd6e55a45409f30b8d921e99e20c57. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs/pull/591#event-1577258558 -- openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M8a826661f6c5ed80e653fac9 Delivery options: https://openzfs.topicbox.com/groups
[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)
I think we can also count @dweeezil and @behlendorf as reviewers. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs/pull/591#issuecomment-379005974 -- openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M7d6f599573055bbb1e0a417c Delivery options: https://openzfs.topicbox.com/groups
[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)
@ahrens pushed 2 commits. 68c70cf Don't clear ranges beyond end of spacemap 5e8c86f Reasonably limit fault injection mayhem -- You are receiving this because you are subscribed to this thread. View it on GitHub: https://github.com/openzfs/openzfs/pull/591/files/b5b707ed3a762ea25cf8ca4025517ff54719ffec..5e8c86fa9ecbead2dd52e4a84d23a3aacef64062 -- openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M714f091b7efbb3fb7e098011 Delivery options: https://openzfs.topicbox.com/groups
[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)
Some minor changes to address feedback from @grwilson, @prashks, and Sara Hartse. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs/pull/591#issuecomment-377409083 -- openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M173108f58e3b33d8c1f823b8 Delivery options: https://openzfs.topicbox.com/groups
[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)
@ahrens pushed 2 commits. 998ce25 comments b5b707e removing vdev can be resilvered -- You are receiving this because you are subscribed to this thread. View it on GitHub: https://github.com/openzfs/openzfs/pull/591/files/2558c24957d84e095309f9699d3a6b268672d6c6..b5b707ed3a762ea25cf8ca4025517ff54719ffec -- openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M2c97337f23d9de4b1e40fa37 Delivery options: https://openzfs.topicbox.com/groups
[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)
behlendorf approved this pull request. I like it, thanks. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs/pull/591#pullrequestreview-107505918 -- openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-Mb4a487362186887163b0c5eb Delivery options: https://openzfs.topicbox.com/groups
[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)
I added 2 more commits, which address feedback from @sdimitro and @behlendorf. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs/pull/591#issuecomment-376707957 -- openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M1b3296a67f53e43502683fd7 Delivery options: https://openzfs.topicbox.com/groups
[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)
@ahrens pushed 2 commits. de82530 Fix zdb hang when running zloop 2558c24 serapheim feedback -- You are receiving this because you are subscribed to this thread. View it on GitHub: https://github.com/openzfs/openzfs/pull/591/files/501473238cfd705ee2ed4b6b3f8c2c0983399eb7..2558c24957d84e095309f9699d3a6b268672d6c6 -- openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M260708a65c438744d1b0d580 Delivery options: https://openzfs.topicbox.com/groups
[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)
behlendorf requested changes on this pull request. > + * vdevs. In either case, we try every combination. This ensures that if + * a mirror has small silent errors on all of its children, we can still + * reconstruct the correct data, as long as those errors are at + * sufficiently-separated offsets (specifically, separated by the largest + * block size - default of 128KB, but up to 16MB). + */ +static void +vdev_indirect_reconstruct_io_done(zio_t *zio) +{ + indirect_vsd_t *iv = zio->io_vsd; + + for (;;) { + /* copy data from splits to main zio */ + int ret; + for (indirect_split_t *is = list_head(&iv->iv_splits); + is != NULL; is = list_next(&iv->iv_splits, is)) { When testing with `zloop.sh` I observed that when there are a large number of segment copies and the damaged copy is one of the "high bits" then this function can't be expected to check everything. This manifested itself as a `zdb` hang in my case and a few suspended pools due to stalling the IO pipeline. I'm proposing https://github.com/behlendorf/zfs/commit/2ad01af1b4a50117779aab7e47c7006a7f2451e6 as a fix . It slightly reworks the logic in `vdev_indirect_reconstruct_io_done` to adopt a random sampling apprach to reconstruction when there are a large number of segments, otherwise we use the existing logic and check everything. We could further optimize this if we decide it's needed. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs/pull/591#pullrequestreview-107483336 -- openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M78532ea4d3a8c8e84252d238 Delivery options: https://openzfs.topicbox.com/groups
[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)
@ahrens pushed 1 commit. 5014732 post event -- You are receiving this because you are subscribed to this thread. View it on GitHub: https://github.com/openzfs/openzfs/pull/591/files/1f9502999027b1be7e252690a2f179f0e5c59ff4..501473238cfd705ee2ed4b6b3f8c2c0983399eb7 -- openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-Me71fd3a38b2bc979c52abbee Delivery options: https://openzfs.topicbox.com/groups
[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)
I've added another commit to this which addresses an additional problem with device removal and mirrors: Now that we support device removal with mirrors, it's reasonable to change the pool config while a removal is in progress. E.g. to "zpool attach", "zpool detach", or automatic spare in / out events. However, these operations cause panics because we save the vdev_t that is being removed in svr_vdev, but this vdev_t can be freed (e.g. if a detach changes the top-level vdev from a mirror to a plain disk). The solution is to store the vdev ID in the spa_vdev_removal_t, rather than the vdev_t. Most places that the svr_vdev is used will have to use vdev_lookup_top() to obtain the vdev_t (in some cases adding additional calls to spa_config_enter/exit). -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs/pull/591#issuecomment-376265987 -- openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M73ff51e700772a1d7f257fbc Delivery options: https://openzfs.topicbox.com/groups
[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)
@ahrens pushed 1 commit. 0e2e20f no repair when readonly -- You are receiving this because you are subscribed to this thread. View it on GitHub: https://github.com/openzfs/openzfs/pull/591/files/51e5ea3e26b27f2007876718ae5b6664fa017e2c..0e2e20fd09a283f1e4157f229fd82f462caa411e -- openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M730c36d0c4fd689b04247416 Delivery options: https://openzfs.topicbox.com/groups
[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)
ahrens commented on this pull request. > + if (is->is_child[is->is_good_child].ic_data == NULL) { + ret = EIO; + goto next; + } + + abd_copy_off(zio->io_abd, + is->is_child[is->is_good_child].ic_data, + is->is_split_offset, 0, is->is_size); + } + + /* See if this checksum matches. */ + zio_bad_cksum_t zbc; + ret = zio_checksum_error(zio, &zbc); + if (ret == 0) { + /* Found a matching checksum. Issue repair i/os. */ + vdev_indirect_repair(zio); Sorry about that, I found that problem too but forgot to update the PR. I fixed it in a slightly different, but equivalent way. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs/pull/591#discussion_r175498158 -- openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M6a1b69cdb2cfb676c5177327 Delivery options: https://openzfs.topicbox.com/groups
[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)
dweeezil commented on this pull request. So far, looks good (in the port to ZoL) except for the repair writes happening for read-only pools. > + if (is->is_child[is->is_good_child].ic_data == NULL) { + ret = EIO; + goto next; + } + + abd_copy_off(zio->io_abd, + is->is_child[is->is_good_child].ic_data, + is->is_split_offset, 0, is->is_size); + } + + /* See if this checksum matches. */ + zio_bad_cksum_t zbc; + ret = zio_checksum_error(zio, &zbc); + if (ret == 0) { + /* Found a matching checksum. Issue repair i/os. */ + vdev_indirect_repair(zio); I added this here: ``` diff --git a/module/zfs/vdev_indirect.c b/module/zfs/vdev_indirect.c index f794b92..7d88b4c 100644 --- a/module/zfs/vdev_indirect.c +++ b/module/zfs/vdev_indirect.c @@ -1416,7 +1416,8 @@ vdev_indirect_reconstruct_io_done(zio_t *zio) ret = zio_checksum_error(zio, &zbc); if (ret == 0) { /* Found a matching checksum. Issue repair i/os. */ - vdev_indirect_repair(zio); + if (spa_writeable(zio->io_spa)) + vdev_indirect_repair(zio); zio_checksum_verified(zio); return; } ``` to keep zdb from triggering ```VERIFY(zio->io_type != ZIO_TYPE_WRITE || spa_writeable(spa));``` in ```zio_vdev_io_start()```. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs/pull/591#pullrequestreview-104804765 -- openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-Md05bf4c0d4eec4182d1a04ae Delivery options: https://openzfs.topicbox.com/groups
[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)
@ahrens I'll look it over and try to get this into the ZoL device evacuation port this weekend. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs/pull/591#issuecomment-373924421 -- openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M4e1cee75acc449f4649dfa31 Delivery options: https://openzfs.topicbox.com/groups
[developer] Re: [openzfs/openzfs] 9290 device removal reduces redundancy of mirrors (#591)
@allanjude @dweeezil Thanks for looking at the previous iteration of this. The new and improved fix is much more comprehensive but the code is almost entirely different. Could I ask that you take a look at this too? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs/pull/591#issuecomment-373787529 -- openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M97d4b0fced53455ebac09989 Delivery options: https://openzfs.topicbox.com/groups