[kudu-CR] master: do not delete unknown tablets

2016-07-21 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: master: do not delete unknown tablets
..


Patch Set 9:

Build Started http://104.196.14.100/job/kudu-gerrit/2602/

-- 
To view, visit http://gerrit.cloudera.org:8080/3645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] master: do not delete unknown tablets

2016-07-20 Thread Adar Dembo (Code Review)
Hello Dan Burkert, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3645

to look at the new patch set (#8).

Change subject: master: do not delete unknown tablets
..

master: do not delete unknown tablets

Quoting from docs/design-docs/multi-master-1.0.md:

"The master and/or tserver must enforce that all actions take effect
iff they were sent by the master that is currently the leader.

After an exhaustive audit of all master state changes (see appendix A), it
was determined that the current protection mechanisms built into each RPC
are sufficient to provide fencing. The one exception is orphaned replica
deletion done in response to a heartbeat. To protect against that, true
orphans (i.e. tablets for which no persistent record exists) will not be
deleted at all. As the master retains deleted table/tablet metadata in
perpetuity, this should ensure that true orphans appear only under drastic
circumstances, such as a tserver that heartbeats to the wrong cluster."

The new test isn't ideal in that it must wait some time to allow the tserver
to receive an RPC from the master, but on my laptop it does fail without the
fix, and it should fail fairly often in other machines/environments too.

Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
---
M src/kudu/integration-tests/create-table-itest.cc
M src/kudu/integration-tests/delete_table-test.cc
M src/kudu/master/catalog_manager.cc
3 files changed, 78 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/3645/8
-- 
To view, visit http://gerrit.cloudera.org:8080/3645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] master: do not delete unknown tablets

2016-07-20 Thread Adar Dembo (Code Review)
Hello Dan Burkert, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3645

to look at the new patch set (#7).

Change subject: master: do not delete unknown tablets
..

master: do not delete unknown tablets

Quoting from docs/design-docs/multi-master-1.0.md:

"The master and/or tserver must enforce that all actions take effect
iff they were sent by the master that is currently the leader.

After an exhaustive audit of all master state changes (see appendix A), it
was determined that the current protection mechanisms built into each RPC
are sufficient to provide fencing. The one exception is orphaned replica
deletion done in response to a heartbeat. To protect against that, true
orphans (i.e. tablets for which no persistent record exists) will not be
deleted at all. As the master retains deleted table/tablet metadata in
perpetuity, this should ensure that true orphans appear only under drastic
circumstances, such as a tserver that heartbeats to the wrong cluster."

The new test isn't ideal in that it must wait some time to allow the tserver
to receive an RPC from the master, but on my laptop it does fail without the
fix, and it should fail fairly often in other machines/environments too.

Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
---
M src/kudu/integration-tests/create-table-itest.cc
M src/kudu/integration-tests/delete_table-test.cc
M src/kudu/master/catalog_manager.cc
3 files changed, 77 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/3645/7
-- 
To view, visit http://gerrit.cloudera.org:8080/3645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] master: do not delete unknown tablets

2016-07-20 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: master: do not delete unknown tablets
..


Patch Set 7:

Build Started http://104.196.14.100/job/kudu-gerrit/2592/

-- 
To view, visit http://gerrit.cloudera.org:8080/3645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] master: do not delete unknown tablets

2016-07-20 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: master: do not delete unknown tablets
..


Patch Set 6: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/3645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] master: do not delete unknown tablets

2016-07-19 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: master: do not delete unknown tablets
..


Patch Set 6:

Build Started http://104.196.14.100/job/kudu-gerrit/2560/

-- 
To view, visit http://gerrit.cloudera.org:8080/3645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] master: do not delete unknown tablets

2016-07-19 Thread Adar Dembo (Code Review)
Adar Dembo has uploaded a new patch set (#6).

Change subject: master: do not delete unknown tablets
..

master: do not delete unknown tablets

Quoting from docs/design-docs/multi-master-1.0.md:

"The master and/or tserver must enforce that all actions take effect
iff they were sent by the master that is currently the leader.

After an exhaustive audit of all master state changes (see appendix A), it
was determined that the current protection mechanisms built into each RPC
are sufficient to provide fencing. The one exception is orphaned replica
deletion done in response to a heartbeat. To protect against that, true
orphans (i.e. tablets for which no persistent record exists) will not be
deleted at all. As the master retains deleted table/tablet metadata in
perpetuity, this should ensure that true orphans appear only under drastic
circumstances, such as a tserver that heartbeats to the wrong cluster."

The new test isn't ideal in that it must wait some time to allow the tserver
to receive an RPC from the master, but on my laptop it does fail without the
fix, and it should fail fairly often in other machines/environments too.

Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
---
M src/kudu/integration-tests/create-table-itest.cc
M src/kudu/integration-tests/delete_table-test.cc
M src/kudu/master/catalog_manager.cc
3 files changed, 77 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/3645/6
-- 
To view, visit http://gerrit.cloudera.org:8080/3645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] master: do not delete unknown tablets

2016-07-19 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: master: do not delete unknown tablets
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/3645/5//COMMIT_MSG
Commit Message:

Line 7: master: do not delete unknown tablets
> I wonder if we now need some tool to "reformat" a tablet server? or to inse
I'm almost positive that a user will one day blow away their master metadata 
directory and ask us to fix it. IIRC this happened with HDFS in the CDH3 days. 
The aggregate of all  tablet metadata should be sufficient to reconstruct the 
master metadata, provided all of the tablets are sufficiently replicated. But, 
I don't know if it makes sense to spend any time on this problem now, given 
that no one has run into this and our other priorities.

Reformatting a tserver is pretty easy: just stop it, delete its wal/data 
directories, and restart it. Or did you mean something else?

I don't really see a use case for dummy "deleted table/tablet" records, apart 
from the aforementioned recovery case, at which point these aren't "dummy" 
records. But, deleting orphaned tablets to reclaim space is a legit use case. 
The minimal fix is to add a gflag that defaults to off and grants the master 
permission to delete orphaned tablets. I'll do that here.

Beyond that, let me know what JIRAs I should file. :)


Line 9: Quoting from the multi-master design doc:
> nit: can you provide a link or source code path here?
Done


http://gerrit.cloudera.org:8080/#/c/3645/5/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

PS5, Line 1546: INFO)
> WARNING might be more appropriate here? or perhaps INFO if it's an incremen
I changed it to WARNING. It's hard to know in this context whether it's an 
incremental report or not, and besides, an incremental report only includes a 
tablet if its state has changed in some way.


-- 
To view, visit http://gerrit.cloudera.org:8080/3645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] master: do not delete unknown tablets

2016-07-18 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: master: do not delete unknown tablets
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/3645/5//COMMIT_MSG
Commit Message:

Line 7: master: do not delete unknown tablets
I wonder if we now need some tool to "reformat" a tablet server? or to insert a 
dummy "deleted table" or "deleted tablets" entry into the master? i.e what if 
for some reason we do end up with some unknown tablets that are being reported, 
and we want to delete them to reclaim the space. Right now we'd have to send 
manual delete_tablet RPCs to every server which sounds kind of complicated to 
script up.

Another option would be some kind of "automatically resurrect the tablet based 
on reports" or something?

(thinking about the case here where we have taken a backup of a master and roll 
back in time, for example).

(of course feel free to backlog this, just curious if we anticipate a support 
burden here)


Line 9: Quoting from the multi-master design doc:
nit: can you provide a link or source code path here?


http://gerrit.cloudera.org:8080/#/c/3645/5/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

PS5, Line 1546: INFO)
WARNING might be more appropriate here? or perhaps INFO if it's an incremental 
report, and WARNING if it's a full report? (since the latter indicates a 
persistent condition whereas the former might be a transient issue as in the 
race described above?)


-- 
To view, visit http://gerrit.cloudera.org:8080/3645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] master: do not delete unknown tablets

2016-07-14 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: master: do not delete unknown tablets
..


Patch Set 5: Verified+1

More isolate and chrpath() failures.

-- 
To view, visit http://gerrit.cloudera.org:8080/3645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: No


[kudu-CR] master: do not delete unknown tablets

2016-07-14 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: master: do not delete unknown tablets
..


Patch Set 5:

Build Started http://104.196.14.100/job/kudu-gerrit/2471/

-- 
To view, visit http://gerrit.cloudera.org:8080/3645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: No


[kudu-CR] master: do not delete unknown tablets

2016-07-14 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: master: do not delete unknown tablets
..


Patch Set 3:

Build Started http://104.196.14.100/job/kudu-gerrit/2443/

-- 
To view, visit http://gerrit.cloudera.org:8080/3645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: No


[kudu-CR] master: do not delete unknown tablets

2016-07-13 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: master: do not delete unknown tablets
..


Patch Set 2: Verified+1

One run failed from the same dist-failure as 
http://gerrit.cloudera.org:8080/3610/6. Another failed when the isolate server 
started returning code 500.

-- 
To view, visit http://gerrit.cloudera.org:8080/3645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: No


[kudu-CR] master: do not delete unknown tablets

2016-07-13 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: master: do not delete unknown tablets
..


Patch Set 2:

Build Started http://104.196.14.100/job/kudu-gerrit/2415/

-- 
To view, visit http://gerrit.cloudera.org:8080/3645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: No


[kudu-CR] master: do not delete unknown tablets

2016-07-13 Thread Adar Dembo (Code Review)
Adar Dembo has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3645

Change subject: master: do not delete unknown tablets
..

master: do not delete unknown tablets

Quoting from the multi-master design doc:

"The master and/or tserver must enforce that all actions take effect
iff they were sent by the master that is currently the leader.

After an exhaustive audit of all master state changes (see appendix A), it
was determined that the current protection mechanisms built into each RPC
are sufficient to provide fencing. The one exception is orphaned replica
deletion done in response to a heartbeat. To protect against that, true
orphans (i.e. tablets for which no persistent record exists) will not be
deleted at all. As the master retains deleted table/tablet metadata in
perpetuity, this should ensure that true orphans appear only under drastic
circumstances, such as a tserver that heartbeats to the wrong cluster."

The new test isn't ideal in that it must wait some time to allow the tserver
to receive an RPC from the master, but on my laptop it does fail without the
fix, and it should fail fairly often in other machines/environments too.

Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
---
M src/kudu/integration-tests/create-table-itest.cc
M src/kudu/integration-tests/delete_table-test.cc
M src/kudu/master/catalog_manager.cc
3 files changed, 46 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/3645/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 


[kudu-CR] master: do not delete unknown tablets

2016-07-13 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: master: do not delete unknown tablets
..


Patch Set 1:

Build Started http://104.196.14.100/job/kudu-gerrit/2411/

-- 
To view, visit http://gerrit.cloudera.org:8080/3645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f2d5bb06c38daa7b09854dbb24a7881723551
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No