[kudu-CR] master: do not delete unknown tablets
Kudu Jenkins has posted comments on this change. Change subject: master: do not delete unknown tablets .. Patch Set 9: Build Started http://104.196.14.100/job/kudu-gerrit/2602/ -- To view, visit http://gerrit.cloudera.org:8080/3645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] master: do not delete unknown tablets
Hello Dan Burkert, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3645 to look at the new patch set (#8). Change subject: master: do not delete unknown tablets .. master: do not delete unknown tablets Quoting from docs/design-docs/multi-master-1.0.md: "The master and/or tserver must enforce that all actions take effect iff they were sent by the master that is currently the leader. After an exhaustive audit of all master state changes (see appendix A), it was determined that the current protection mechanisms built into each RPC are sufficient to provide fencing. The one exception is orphaned replica deletion done in response to a heartbeat. To protect against that, true orphans (i.e. tablets for which no persistent record exists) will not be deleted at all. As the master retains deleted table/tablet metadata in perpetuity, this should ensure that true orphans appear only under drastic circumstances, such as a tserver that heartbeats to the wrong cluster." The new test isn't ideal in that it must wait some time to allow the tserver to receive an RPC from the master, but on my laptop it does fail without the fix, and it should fail fairly often in other machines/environments too. Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 --- M src/kudu/integration-tests/create-table-itest.cc M src/kudu/integration-tests/delete_table-test.cc M src/kudu/master/catalog_manager.cc 3 files changed, 78 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/3645/8 -- To view, visit http://gerrit.cloudera.org:8080/3645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] master: do not delete unknown tablets
Hello Dan Burkert, Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3645 to look at the new patch set (#7). Change subject: master: do not delete unknown tablets .. master: do not delete unknown tablets Quoting from docs/design-docs/multi-master-1.0.md: "The master and/or tserver must enforce that all actions take effect iff they were sent by the master that is currently the leader. After an exhaustive audit of all master state changes (see appendix A), it was determined that the current protection mechanisms built into each RPC are sufficient to provide fencing. The one exception is orphaned replica deletion done in response to a heartbeat. To protect against that, true orphans (i.e. tablets for which no persistent record exists) will not be deleted at all. As the master retains deleted table/tablet metadata in perpetuity, this should ensure that true orphans appear only under drastic circumstances, such as a tserver that heartbeats to the wrong cluster." The new test isn't ideal in that it must wait some time to allow the tserver to receive an RPC from the master, but on my laptop it does fail without the fix, and it should fail fairly often in other machines/environments too. Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 --- M src/kudu/integration-tests/create-table-itest.cc M src/kudu/integration-tests/delete_table-test.cc M src/kudu/master/catalog_manager.cc 3 files changed, 77 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/3645/7 -- To view, visit http://gerrit.cloudera.org:8080/3645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] master: do not delete unknown tablets
Kudu Jenkins has posted comments on this change. Change subject: master: do not delete unknown tablets .. Patch Set 7: Build Started http://104.196.14.100/job/kudu-gerrit/2592/ -- To view, visit http://gerrit.cloudera.org:8080/3645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] master: do not delete unknown tablets
Dan Burkert has posted comments on this change. Change subject: master: do not delete unknown tablets .. Patch Set 6: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/3645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] master: do not delete unknown tablets
Kudu Jenkins has posted comments on this change. Change subject: master: do not delete unknown tablets .. Patch Set 6: Build Started http://104.196.14.100/job/kudu-gerrit/2560/ -- To view, visit http://gerrit.cloudera.org:8080/3645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] master: do not delete unknown tablets
Adar Dembo has uploaded a new patch set (#6). Change subject: master: do not delete unknown tablets .. master: do not delete unknown tablets Quoting from docs/design-docs/multi-master-1.0.md: "The master and/or tserver must enforce that all actions take effect iff they were sent by the master that is currently the leader. After an exhaustive audit of all master state changes (see appendix A), it was determined that the current protection mechanisms built into each RPC are sufficient to provide fencing. The one exception is orphaned replica deletion done in response to a heartbeat. To protect against that, true orphans (i.e. tablets for which no persistent record exists) will not be deleted at all. As the master retains deleted table/tablet metadata in perpetuity, this should ensure that true orphans appear only under drastic circumstances, such as a tserver that heartbeats to the wrong cluster." The new test isn't ideal in that it must wait some time to allow the tserver to receive an RPC from the master, but on my laptop it does fail without the fix, and it should fail fairly often in other machines/environments too. Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 --- M src/kudu/integration-tests/create-table-itest.cc M src/kudu/integration-tests/delete_table-test.cc M src/kudu/master/catalog_manager.cc 3 files changed, 77 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/3645/6 -- To view, visit http://gerrit.cloudera.org:8080/3645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] master: do not delete unknown tablets
Adar Dembo has posted comments on this change. Change subject: master: do not delete unknown tablets .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/3645/5//COMMIT_MSG Commit Message: Line 7: master: do not delete unknown tablets > I wonder if we now need some tool to "reformat" a tablet server? or to inse I'm almost positive that a user will one day blow away their master metadata directory and ask us to fix it. IIRC this happened with HDFS in the CDH3 days. The aggregate of all tablet metadata should be sufficient to reconstruct the master metadata, provided all of the tablets are sufficiently replicated. But, I don't know if it makes sense to spend any time on this problem now, given that no one has run into this and our other priorities. Reformatting a tserver is pretty easy: just stop it, delete its wal/data directories, and restart it. Or did you mean something else? I don't really see a use case for dummy "deleted table/tablet" records, apart from the aforementioned recovery case, at which point these aren't "dummy" records. But, deleting orphaned tablets to reclaim space is a legit use case. The minimal fix is to add a gflag that defaults to off and grants the master permission to delete orphaned tablets. I'll do that here. Beyond that, let me know what JIRAs I should file. :) Line 9: Quoting from the multi-master design doc: > nit: can you provide a link or source code path here? Done http://gerrit.cloudera.org:8080/#/c/3645/5/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: PS5, Line 1546: INFO) > WARNING might be more appropriate here? or perhaps INFO if it's an incremen I changed it to WARNING. It's hard to know in this context whether it's an incremental report or not, and besides, an incremental report only includes a tablet if its state has changed in some way. -- To view, visit http://gerrit.cloudera.org:8080/3645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] master: do not delete unknown tablets
Todd Lipcon has posted comments on this change. Change subject: master: do not delete unknown tablets .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/3645/5//COMMIT_MSG Commit Message: Line 7: master: do not delete unknown tablets I wonder if we now need some tool to "reformat" a tablet server? or to insert a dummy "deleted table" or "deleted tablets" entry into the master? i.e what if for some reason we do end up with some unknown tablets that are being reported, and we want to delete them to reclaim the space. Right now we'd have to send manual delete_tablet RPCs to every server which sounds kind of complicated to script up. Another option would be some kind of "automatically resurrect the tablet based on reports" or something? (thinking about the case here where we have taken a backup of a master and roll back in time, for example). (of course feel free to backlog this, just curious if we anticipate a support burden here) Line 9: Quoting from the multi-master design doc: nit: can you provide a link or source code path here? http://gerrit.cloudera.org:8080/#/c/3645/5/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: PS5, Line 1546: INFO) WARNING might be more appropriate here? or perhaps INFO if it's an incremental report, and WARNING if it's a full report? (since the latter indicates a persistent condition whereas the former might be a transient issue as in the race described above?) -- To view, visit http://gerrit.cloudera.org:8080/3645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] master: do not delete unknown tablets
Adar Dembo has posted comments on this change. Change subject: master: do not delete unknown tablets .. Patch Set 5: Verified+1 More isolate and chrpath() failures. -- To view, visit http://gerrit.cloudera.org:8080/3645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR] master: do not delete unknown tablets
Kudu Jenkins has posted comments on this change. Change subject: master: do not delete unknown tablets .. Patch Set 5: Build Started http://104.196.14.100/job/kudu-gerrit/2471/ -- To view, visit http://gerrit.cloudera.org:8080/3645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR] master: do not delete unknown tablets
Kudu Jenkins has posted comments on this change. Change subject: master: do not delete unknown tablets .. Patch Set 3: Build Started http://104.196.14.100/job/kudu-gerrit/2443/ -- To view, visit http://gerrit.cloudera.org:8080/3645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR] master: do not delete unknown tablets
Adar Dembo has posted comments on this change. Change subject: master: do not delete unknown tablets .. Patch Set 2: Verified+1 One run failed from the same dist-failure as http://gerrit.cloudera.org:8080/3610/6. Another failed when the isolate server started returning code 500. -- To view, visit http://gerrit.cloudera.org:8080/3645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR] master: do not delete unknown tablets
Kudu Jenkins has posted comments on this change. Change subject: master: do not delete unknown tablets .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/2415/ -- To view, visit http://gerrit.cloudera.org:8080/3645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR] master: do not delete unknown tablets
Adar Dembo has uploaded a new change for review. http://gerrit.cloudera.org:8080/3645 Change subject: master: do not delete unknown tablets .. master: do not delete unknown tablets Quoting from the multi-master design doc: "The master and/or tserver must enforce that all actions take effect iff they were sent by the master that is currently the leader. After an exhaustive audit of all master state changes (see appendix A), it was determined that the current protection mechanisms built into each RPC are sufficient to provide fencing. The one exception is orphaned replica deletion done in response to a heartbeat. To protect against that, true orphans (i.e. tablets for which no persistent record exists) will not be deleted at all. As the master retains deleted table/tablet metadata in perpetuity, this should ensure that true orphans appear only under drastic circumstances, such as a tserver that heartbeats to the wrong cluster." The new test isn't ideal in that it must wait some time to allow the tserver to receive an RPC from the master, but on my laptop it does fail without the fix, and it should fail fairly often in other machines/environments too. Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 --- M src/kudu/integration-tests/create-table-itest.cc M src/kudu/integration-tests/delete_table-test.cc M src/kudu/master/catalog_manager.cc 3 files changed, 46 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/3645/1 -- To view, visit http://gerrit.cloudera.org:8080/3645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo
[kudu-CR] master: do not delete unknown tablets
Kudu Jenkins has posted comments on this change. Change subject: master: do not delete unknown tablets .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/2411/ -- To view, visit http://gerrit.cloudera.org:8080/3645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No