[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 ) Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 8 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 06 Nov 2017 23:03:37 + Gerrit-HasComments: No
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 ) Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/8465/8/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: http://gerrit.cloudera.org:8080/#/c/8465/8/src/kudu/tablet/tablet_metadata.cc@420 PS8, Line 420: fs_manager()->block_manager()->NotifyBlockId(max_block_id); > it seems possible that max_block_id is uninitialized at this point if there Yeah, it gets initialized to 0 (see fs/block_id.cc:32) -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 8 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 06 Nov 2017 22:49:21 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 ) Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/8465/8/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: http://gerrit.cloudera.org:8080/#/c/8465/8/src/kudu/tablet/tablet_metadata.cc@420 PS8, Line 420: fs_manager()->block_manager()->NotifyBlockId(max_block_id); it seems possible that max_block_id is uninitialized at this point if there were no blocks. Does it have a well-defined uninitialized value that makes this safe? -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 8 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 06 Nov 2017 22:47:13 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 ) Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. Patch Set 7: (3 comments) http://gerrit.cloudera.org:8080/#/c/8465/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8465/7//COMMIT_MSG@24 PS7, Line 24: the > Double the Done http://gerrit.cloudera.org:8080/#/c/8465/7/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: http://gerrit.cloudera.org:8080/#/c/8465/7/src/kudu/fs/block_manager.h@265 PS7, Line 265: // externally-referenced ids that they may not have previously found. > Getting better. Perhaps it would be good to add an example that will help p Done http://gerrit.cloudera.org:8080/#/c/8465/7/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: http://gerrit.cloudera.org:8080/#/c/8465/7/src/kudu/tablet/tablet_metadata.cc@404 PS7, Line 404: if a disk : // was not read > Can you reword this example so it's more relatable to the relevant Kudu con Done -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 7 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Sat, 04 Nov 2017 05:37:18 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8465 to look at the new patch set (#8). Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. KUDU-2202 avoid block ID reuse for missing dirs The block manager is initially only aware of the blocks it reads when first loading the FS layout. If the opening of a directory fails (e.g. due to a disk failure), the server can still start up, but will fail any tablet configured to use that directory. By not scanning the failed directory, the server will "forget" about some blocks. Since LBM block IDs are sequentially allocated just past the end of the greatest known ID, it's possible for a new tablet to create blocks with IDs already assigned to a failed tablet. If that failed tablet is deleted, the live data belonging to the new tablet will be deleted. To prevent this, a new API has been added to the block managers to grant external components the ability to notify the block manager of block IDs. In loading a tablet's metadata, once all rowsets and orphaned blocks have been seen, the block manager is notified of the highest block ID in the superblock. This patch only targets the LBM, which assigns blocks sequentially, and therefore only needs to know about the largest block ID referenced by tablets. A unit test is added to log_block_manager-test and an integration test is added to ts_recovery-itest to ensure block ID reuse does not happen. Also includes some C++11/stylistic cleanup. Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc --- M src/kudu/fs/block_id.h M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h M src/kudu/tablet/tablet-test-util.h M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h 15 files changed, 247 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/8465/8 -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 8 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 ) Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. Patch Set 7: (3 comments) Looks good, just some nits on comments. http://gerrit.cloudera.org:8080/#/c/8465/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8465/7//COMMIT_MSG@24 PS7, Line 24: the Double the http://gerrit.cloudera.org:8080/#/c/8465/7/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: http://gerrit.cloudera.org:8080/#/c/8465/7/src/kudu/fs/block_manager.h@265 PS7, Line 265: // externally-referenced ids that they may not have previously found. Getting better. Perhaps it would be good to add an example that will help people understand how the block manager may have previously not found a block? Like "...avoid reusing externally-referenced ids that they may not have previously found (e.g. because those ids' blocks were on a data directory that failed)". http://gerrit.cloudera.org:8080/#/c/8465/7/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: http://gerrit.cloudera.org:8080/#/c/8465/7/src/kudu/tablet/tablet_metadata.cc@404 PS7, Line 404: if a disk : // was not read Can you reword this example so it's more relatable to the relevant Kudu concepts? Like, it should probably talk about data directories rather than disks, and maybe it should explain why a directory would not be "read" (because it failed). -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 7 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Sat, 04 Nov 2017 05:16:04 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 ) Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8465/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8465/5//COMMIT_MSG@16 PS5, Line 16: for new > for a new Missed this, sorry -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 6 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Sat, 04 Nov 2017 05:01:30 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8465 to look at the new patch set (#7). Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. KUDU-2202 avoid block ID reuse for missing dirs The block manager is initially only aware of the blocks it reads when first loading the FS layout. If the opening of a directory fails (e.g. due to a disk failure), the server can still start up, but will fail any tablet configured to use that directory. By not scanning the failed directory, the server will "forget" about some blocks. Since LBM block IDs are sequentially allocated just past the end of the greatest known ID, it's possible for a new tablet to create blocks with IDs already assigned to a failed tablet. If that failed tablet is deleted, the live data belonging to the new tablet will be deleted. To prevent this, a new API has been added to the block managers to grant external components the ability to notify the block manager of block IDs. In loading a tablet's metadata, once all rowsets and orphaned blocks have been seen, the block manager is notified of the highest block ID in the the superblock. This patch only targets the LBM, which assigns blocks sequentially, and therefore only needs to know about the largest block ID referenced by tablets. A unit test is added to log_block_manager-test and an integration test is added to ts_recovery-itest to ensure block ID reuse does not happen. Also includes some C++11/stylistic cleanup. Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc --- M src/kudu/fs/block_id.h M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h M src/kudu/tablet/tablet-test-util.h M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h 15 files changed, 246 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/8465/7 -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 7 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 ) Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/fs/block_manager.h@264 PS5, Line 264: should use > The new wording isn't quite right; it suggests that the block manager itsel Done http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/fs/log_block_manager-test.cc File src/kudu/fs/log_block_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/fs/log_block_manager-test.cc@500 PS5, Line 500: // Simulate a complete reset of the block manager's block ID record, e.g. : // restarting but with all the blocks gone. : bm_->next_block_id_.Store(1); > Rather than simulating it in this way, how about deleting the containers fr I agree that'd be not too complicated: DoGetContainers() and then delete some files. There's a hack in log_block_manager.cc:1706 that makes this not the case. I could maybe add a new block manager option to turn of the seeding, but I'll hold off given we get the same behavior reaching in. http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/integration-tests/ts_recovery-itest.cc File src/kudu/integration-tests/ts_recovery-itest.cc: http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/integration-tests/ts_recovery-itest.cc@149 PS5, Line 149: // Set up a basic server that flushes often so we create blocks quickly. > BTW, it might be useful to wait for some compactions too, or some other ope Done http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/tablet/tablet_metadata.cc@403 PS5, Line 403: BlockId max_block_id; > Probably worth adding a comment to this block of code (where you aggregate Done http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/tablet/tablet_metadata.cc@405 PS5, Line 405: if (!block_ids.empty()) { > Is this check needed? With some restructuring, no. But there could be no block_ids, and there would be a segfault a L406. http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/tablet/tablet_metadata.cc@418 PS5, Line 418: fs_manager()->block_manager()->NotifyBlockId(max_block_id); > What happens if we make it down here and max_block_id is still uninitialize The default is 0 (invalid), so it should no-op. -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Sat, 04 Nov 2017 04:05:20 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8465 to look at the new patch set (#6). Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. KUDU-2202 avoid block ID reuse for missing dirs The block manager is initially only aware of the blocks it reads when first loading the FS layout. If the opening of a directory fails (e.g. due to a disk failure), the server can still start up, but will fail any tablet configured to use that directory. By not scanning the failed directory, the server will "forget" about some blocks. Since LBM block IDs are sequentially allocated just past the end of the greatest known ID, it's possible for new tablet to create blocks with IDs already assigned to a failed tablet. If that failed tablet is deleted, the live data belonging to the second tablet will be deleted. To prevent this, a new API has been added to the block managers to grant external components the ability to notify the block manager of block IDs. In loading a tablet's metadata, once all rowsets and orphaned blocks have been seen, the block manager is notified of the highest block ID in the the superblock. This patch only targets the LBM, which assigns blocks sequentially, and therefore only needs to know about the largest block ID referenced by tablets. A unit test is added to log_block_manager-test and an integration test is added to ts_recovery-itest to ensure block ID reuse does not happen. Also includes some C++11/stylistic cleanup. Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc --- M src/kudu/fs/block_id.h M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h M src/kudu/tablet/tablet-test-util.h M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h 15 files changed, 246 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/8465/6 -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 6 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 ) Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/8465/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8465/5//COMMIT_MSG@16 PS5, Line 16: for new for a new http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/fs/block_manager.h@264 PS5, Line 264: should use The new wording isn't quite right; it suggests that the block manager itself "uses" NotifyBlockId, but it's actually other components that do. http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/fs/log_block_manager-test.cc File src/kudu/fs/log_block_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/fs/log_block_manager-test.cc@500 PS5, Line 500: // Simulate a complete reset of the block manager's block ID record, e.g. : // restarting but with all the blocks gone. : bm_->next_block_id_.Store(1); Rather than simulating it in this way, how about deleting the containers from disk and reloading the block manager? That's a little bit more "real world" without making the test too complicated, I hope. Would let you avoid direct access to next_block_id_ too. http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/integration-tests/ts_recovery-itest.cc File src/kudu/integration-tests/ts_recovery-itest.cc: http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/integration-tests/ts_recovery-itest.cc@149 PS5, Line 149: // Set up a basic server that flushes often so we create blocks quickly. BTW, it might be useful to wait for some compactions too, or some other operation that would delete some blocks, so that some block IDs make it into the orphaned block list. http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/tablet/tablet_metadata.cc@403 PS5, Line 403: BlockId max_block_id; Probably worth adding a comment to this block of code (where you aggregate all the block IDs and compute the max) explaining the intent. http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/tablet/tablet_metadata.cc@405 PS5, Line 405: if (!block_ids.empty()) { Is this check needed? http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/tablet/tablet_metadata.cc@418 PS5, Line 418: fs_manager()->block_manager()->NotifyBlockId(max_block_id); What happens if we make it down here and max_block_id is still uninitialized? What value is stored by the block manager? -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Sat, 04 Nov 2017 02:58:30 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8465 to look at the new patch set (#5). Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. KUDU-2202 avoid block ID reuse for missing dirs The block manager is initially only aware of the blocks it reads when first loading the FS layout. If the opening of a directory fails (e.g. due to a disk failure), the server can still start up, but will fail any tablet configured to use that directory. By not scanning the failed directory, the server will "forget" about some blocks. Since LBM block IDs are sequentially allocated just past the end of the greatest known ID, it's possible for new tablet to create blocks with IDs already assigned to a failed tablet. If that failed tablet is deleted, the live data belonging to the second tablet will be deleted. To prevent this, a new API has been added to the block managers to grant external components the ability to notify the block manager of block IDs. In loading a tablet's metadata, once all rowsets and orphaned blocks have been seen, the block manager is notified of the highest block ID in the its superblock. This patch only targets the LBM, which assigns blocks sequentially, and therefore only needs to know about the largest block ID referenced by tablets. A unit test is added to log_block_manager-test and an integration test is added to ts_recovery-itest to ensure block ID reuse does not happen. Also includes some C++11/stylistic cleanup. Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc --- M src/kudu/fs/block_id.h M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h M src/kudu/tablet/tablet-test-util.h M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h 15 files changed, 233 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/8465/5 -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8465 to look at the new patch set (#4). Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. KUDU-2202 avoid block ID reuse for missing dirs The block manager is initially only aware of the blocks it reads when first loading the FS layout. If the opening of a directory fails (e.g. due to a disk failure), the server can still start up, but will fail any tablet configured to use that directory. By not scanning the failed directory, the server will "forget" about some blocks. Since LBM block IDs are sequentially allocated just past the end of the greatest known ID, it's possible for new tablet to create blocks with IDs already assigned to a failed tablet. If that failed tablet is deleted, the live data belonging to the second tablet will be deleted. To prevent this, a new API has been added to the block managers to grant external components the ability to notify the block manager of block IDs. In loading a tablet's metadata, once all rowsets and orphaned blocks have been seen, the block manager is notified of the highest block ID in the its superblock. This patch only targets the LBM, which assigns blocks sequentially, and therefore only needs to know about the largest block ID referenced by tablets. A unit test is added to log_block_manager-test and an integration test is added to ts_recovery-itest to ensure block ID reuse does not happen. Also includes some C++11/stylistic cleanup. Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc --- M src/kudu/fs/block_id.h M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h M src/kudu/tablet/tablet-test-util.h M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h 15 files changed, 240 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/8465/4 -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 ) Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. Patch Set 3: (10 comments) http://gerrit.cloudera.org:8080/#/c/8465/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8465/1//COMMIT_MSG@10 PS1, Line 10: If the opening of a directory fails (e.g. : due to a disk failure), the server can still start up, but will fail any : tablet configured to use that directory. By not scanning the failed : directory, the server will "forget" about some blocks. : > I'd say that the server will still start up but fail any tablet configured Done http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/fs/block_manager.h@264 PS1, Line 264: to > should Done http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/fs/block_manager.h@266 PS1, Line 266: virtual void NotifyBlockId(BlockId block_id) = 0; > Since BlockIds are basically uint64_ts, we should just pass them by copy. Done http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc File src/kudu/integration-tests/ts_recovery-itest.cc: http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@144 PS1, Line 144: if (GetParam() != "log") { : LOG(INFO) << "Missin > Where's the multi-directory part? Was originally structured to fail a directory. No longer! http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@199 PS1, Line 199: for (const string& dir : ts->data_dirs()) { : const string& data_dir = JoinPathSegments(dir, "data"); : vector children; > This will fail the old tablet, right? Is that worth asserting for? Done http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@207 PS1, Line 207: > You can get this special string from data_dirs.h (or should be able to, any Done http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@208 PS1, Line 208: > ASSERT_OK? Done http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@233 PS1, Line 233: new_block_ids.begin(), new_block_ids.end(), > You could SCOPED_TRACE this (and the first/second sets) to avoid logging th Done http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/tablet_metadata.cc@399 PS1, Line 399: next_rowset_idx_ = std::max(next_rowset_idx_, rowset_meta->id() + 1); > I'm curious whether this plumbing the max "down" like this is more efficien Ran a few runs of dense_node-itest and it seems like it doesn't make a huge difference. Good call though, bootstrap is dominated by disk access, the difference in the O(N) copy here is pretty negligible. WiIl go with that approach. http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/tablet_metadata.cc@413 PS1, Line 413: ager of blocks it may not be aware of (e.g. if a : // disk was not read). > Not really necessary for this comment. Done -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Sat, 04 Nov 2017 01:13:07 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8465 to look at the new patch set (#3). Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. KUDU-2202 avoid block ID reuse for missing dirs The block manager is initially only aware of the blocks it reads when first loading the FS layout. If the opening of a directory fails (e.g. due to a disk failure), the server can still start up, but will fail any tablet configured to use that directory. By not scanning the failed directory, the server will "forget" about some blocks. Since LBM block IDs are sequentially allocated just past the end of the greatest known ID, it's possible for new tablet to create blocks with IDs already assigned to a failed tablet. If that failed tablet is deleted, the live data belonging to the second tablet will be deleted. To prevent this, a new API has been added to the block managers to grant external components the ability to notify the block manager of block IDs. The largest block ID is kept track of when opening and reading a tablet's metadata. Once all rowsets and orphaned blocks have been seen, the block manager is notified of the highest block ID in the tablet's superblock. This patch only targets the LBM, which assigns blocks sequentially, and therefore only needs to know about the largest block ID referenced by tablets. A unit test is added to log_block_manager-test and an integration test is added to ts_recovery-itest to ensure block ID reuse does not happen. Also includes some C++11/stylistic cleanup. Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc --- M src/kudu/fs/block_id.h M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h M src/kudu/tablet/tablet-test-util.h M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h 15 files changed, 237 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/8465/3 -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 ) Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/fs/file_block_manager.cc@942 PS1, Line 942: void FileBlockManager::NotifyBlockId(const BlockId& /* block_id */) { > warning: parameter 'block_id' is unused [misc-unused-parameters] Done http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc File src/kudu/integration-tests/ts_recovery-itest.cc: http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@103 PS1, Line 103: using tablet::TabletSuperBlockPB; > warning: using decl 'Substitute' is unused [misc-unused-using-decls] Done http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@104 PS1, Line 104: > warning: using decl 'RowSetDataPB' is unused [misc-unused-using-decls] Done http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/rowset_metadata.cc File src/kudu/tablet/rowset_metadata.cc: http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/rowset_metadata.cc@49 PS1, Line 49: *metadata = std::move(ret); > warning: prefer ptr1 = std::move(ptr2) over ptr1.reset(ptr2.release()) [mis Done http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/tablet_metadata.cc@678 PS1, Line 678: AtomicWord rowset_idx = Barrier_AtomicIncrement(_rowset_idx_, 1) - 1; > warning: parameter 'schema' is unused [misc-unused-parameters] Done -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Fri, 03 Nov 2017 23:32:38 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 ) Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. Patch Set 1: (11 comments) I think it would also be nice to have an LBM-level test that ensures block IDs are not reused. It should be far simpler to write. http://gerrit.cloudera.org:8080/#/c/8465/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8465/1//COMMIT_MSG@10 PS1, Line 10: If the opening of a directory fails (e.g. : due to a disk failure), the server can still start up, but it may not : know about some blocks, and may end up assigning them to a new tablet. : If the blocks ID is used by two tablets, we can get into situations : where deleting one will delete data for the other. I'd say that the server will still start up but fail any tablet configured to use that directory. Additionally, by not scanning the failed directory, the server will "forget" about some blocks, and since LBM block IDs are sequentially allocated beginning just past the block with the greatest ID, it's possible for a second tablet to create a block and be allocated a block ID assigned to a failed tablet. Then, if the failed tablet is deleted, it'll also delete live data belonging to that second tablet. You can put this more concisely, but the keys are to talk about how block ID selection works, and to draw a more explicit connection between the reuse by a second tablet and the data corruption when deleting the failed tablet. http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/fs/block_manager.h@264 PS1, Line 264: can should http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/fs/block_manager.h@266 PS1, Line 266: virtual void NotifyBlockId(const BlockId& block_id) = 0; Since BlockIds are basically uint64_ts, we should just pass them by copy. http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc File src/kudu/integration-tests/ts_recovery-itest.cc: http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@144 PS1, Line 144: // Set up a basic multi-directory server with some rows that flushes often so : // we get some blocks. Where's the multi-directory part? http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@199 PS1, Line 199: : // Now empty the data directories of blocks. The server will start up, not : // read any blocks, but should still avoid the old tablet's blocks. This will fail the old tablet, right? Is that worth asserting for? http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@207 PS1, Line 207: "block_manager_instance" You can get this special string from data_dirs.h (or should be able to, anyway). http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@208 PS1, Line 208: WARN_NOT_OK ASSERT_OK? http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@233 PS1, Line 233: LOG(INFO) << "Intersection: " << BlockId::JoinStrings(block_id_intersection); You could SCOPED_TRACE this (and the first/second sets) to avoid logging this except in the event of failure. http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@233 PS1, Line 233: LOG(INFO) << "Intersection: " << BlockId::JoinStrings(block_id_intersection); You could SCOPED_TRACE this (and the first/second sets) to avoid logging this except in the event of failure. http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/tablet_metadata.cc@399 PS1, Line 399: RETURN_NOT_OK(RowSetMetadata::Load(this, rowset_pb, _block_id, _meta)); I'm curious whether this plumbing the max "down" like this is more efficient than using RowSetMetadata::GetAllBlocks (or even TabletMetadata::CollectAllBlockIDs) to fetch all the IDs, and selecting the maximum amongst them. If not, the bulk approach is certainly less code. http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/tablet_metadata.cc@413 PS1, Line 413: This is only relevant for block managers that assign : // sequential block IDs, Not really necessary for this comment. -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8465 to look at the new patch set (#2). Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. KUDU-2202 avoid block ID reuse for missing dirs The block manager is initially only aware of the blocks it reads when first loading the FS layout. If the opening of a directory fails (e.g. due to a disk failure), the server can still start up, but it may not know about some blocks, and may end up assigning them to a new tablet. If the blocks ID is used by two tablets, we can get into situations where deleting one will delete data for the other. To prevent this, a new API has been added to the block managers to grant external components the ability to notify the block manager of block IDs. The largest block ID is kept track of when opening and reading a tablet's metadata. Once all rowsets and orphaned blocks have been seen, the block manager is notified of the highest block ID in the tablet's superblock. This patch only targets the LBM, which assigns blocks sequentially, and therefore only needs to know about the largest block ID referenced by tablets. An integration test is added to ts_recovery-itest to ensure block ID reuse does not happen. Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc --- M src/kudu/fs/block_id.h M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h M src/kudu/tablet/tablet-test-util.h M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h 14 files changed, 213 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/8465/2 -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew WongGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs
Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8465 Change subject: KUDU-2202 avoid block ID reuse for missing dirs .. KUDU-2202 avoid block ID reuse for missing dirs The block manager is initially only aware of the blocks it reads when first loading the FS layout. If the opening of a directory fails (e.g. due to a disk failure), the server can still start up, but it may not know about some blocks, and may end up assigning them to a new tablet. If the blocks ID is used by two tablets, we can get into situations where deleting one will delete data for the other. To prevent this, a new API has been added to the block managers to grant external components the ability to notify the block manager of block IDs. The largest block ID is kept track of when opening and reading a tablet's metadata. Once all rowsets and orphaned blocks have been seen, the block manager is notified of the highest block ID in the tablet's superblock. This patch only targets the LBM, which assigns blocks sequentially, and therefore only needs to know about the largest block ID referenced by tablets. An integration test is added to ts_recovery-itest to ensure block ID reuse does not happen. Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc --- M src/kudu/fs/block_id.h M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h M src/kudu/tablet/tablet_metadata.cc 11 files changed, 210 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/8465/1 -- To view, visit http://gerrit.cloudera.org:8080/8465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc Gerrit-Change-Number: 8465 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong