[kudu-CR] handle disk failures during tablet copies
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/7654 ) Change subject: handle disk failures during tablet copies .. handle disk failures during tablet copies There are two components in a tablet copy: the copy client (that receives data) and the copy session source (that sends data). Coarse-grain handling of disk failures during tablet copies is done for the tablet copy client as: - Before starting a copy client, if no disks are available to place the tablet, simply return (instead of failing a CHECK). - Before downloading each WAL segments or block, check that the tablet is in a healthy group. And for the tablet copy session as: - Before sending a block or log segment, check if the tablet has an error. Upon returning an error, the tablet copy client will shutdown the replica, leaving it in a failed state. A test is added to ensure that both copy clients and that source sessions with failed disks will return errors to the copying client. Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 Reviewed-on: http://gerrit.cloudera.org:8080/7654 Tested-by: Kudu Jenkins Reviewed-by: Mike Percy--- M src/kudu/tserver/tablet_copy-test-base.h M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_service-test.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_copy_source_session.h 7 files changed, 124 insertions(+), 7 deletions(-) Approvals: Kudu Jenkins: Verified Mike Percy: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 Gerrit-Change-Number: 7654 Gerrit-PatchSet: 10 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] handle disk failures during tablet copies
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/7654 ) Change subject: handle disk failures during tablet copies .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 Gerrit-Change-Number: 7654 Gerrit-PatchSet: 9 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Wed, 22 Nov 2017 05:26:11 + Gerrit-HasComments: No
[kudu-CR] handle disk failures during tablet copies
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/7654 ) Change subject: handle disk failures during tablet copies .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 Gerrit-Change-Number: 7654 Gerrit-PatchSet: 8 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 21 Nov 2017 22:34:14 + Gerrit-HasComments: No
[kudu-CR] handle disk failures during tablet copies
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7654 ) Change subject: handle disk failures during tablet copies .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/7654/7/src/kudu/tserver/tablet_copy_source_session.cc File src/kudu/tserver/tablet_copy_source_session.cc: http://gerrit.cloudera.org:8080/#/c/7654/7/src/kudu/tserver/tablet_copy_source_session.cc@133 PS7, Line 133: )); > we can remove this now Done -- To view, visit http://gerrit.cloudera.org:8080/7654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 Gerrit-Change-Number: 7654 Gerrit-PatchSet: 8 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 21 Nov 2017 22:30:43 + Gerrit-HasComments: Yes
[kudu-CR] handle disk failures during tablet copies
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7654 to look at the new patch set (#8). Change subject: handle disk failures during tablet copies .. handle disk failures during tablet copies There are two components in a tablet copy: the copy client (that receives data) and the copy session source (that sends data). Coarse-grain handling of disk failures during tablet copies is done for the tablet copy client as: - Before starting a copy client, if no disks are available to place the tablet, simply return (instead of failing a CHECK). - Before downloading each WAL segments or block, check that the tablet is in a healthy group. And for the tablet copy session as: - Before sending a block or log segment, check if the tablet has an error. Upon returning an error, the tablet copy client will shutdown the replica, leaving it in a failed state. A test is added to ensure that both copy clients and that source sessions with failed disks will return errors to the copying client. Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 --- M src/kudu/tserver/tablet_copy-test-base.h M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_service-test.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_copy_source_session.h 7 files changed, 124 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/7654/8 -- To view, visit http://gerrit.cloudera.org:8080/7654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 Gerrit-Change-Number: 7654 Gerrit-PatchSet: 8 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] handle disk failures during tablet copies
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/7654 ) Change subject: handle disk failures during tablet copies .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/7654/7/src/kudu/tserver/tablet_copy_source_session.cc File src/kudu/tserver/tablet_copy_source_session.cc: http://gerrit.cloudera.org:8080/#/c/7654/7/src/kudu/tserver/tablet_copy_source_session.cc@133 PS7, Line 133: nullptr we can remove this now -- To view, visit http://gerrit.cloudera.org:8080/7654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 Gerrit-Change-Number: 7654 Gerrit-PatchSet: 7 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 21 Nov 2017 20:31:29 + Gerrit-HasComments: Yes
[kudu-CR] handle disk failures during tablet copies
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7654 ) Change subject: handle disk failures during tablet copies .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/7654/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7654/6//COMMIT_MSG@16 PS6, Line 16: WAL > WAL segment Done http://gerrit.cloudera.org:8080/#/c/7654/6/src/kudu/tserver/tablet_copy_source_session.cc File src/kudu/tserver/tablet_copy_source_session.cc: http://gerrit.cloudera.org:8080/#/c/7654/6/src/kudu/tserver/tablet_copy_source_session.cc@133 PS6, Line 133: nullptr > nit: since this is an optional out-param of the function, defaulting it to Done with default http://gerrit.cloudera.org:8080/#/c/7654/6/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/7654/6/src/kudu/tserver/ts_tablet_manager.cc@694 PS6, Line 694: Status s = tc_client.FetchAll(replica); : if (!s.ok()) { : LOG(WARNING) << LogPrefix(tablet_id) << "Tablet Copy: Unable to fetch data from remote peer " : << kSrcPeerInfo << ": " << s.ToString(); : r > There is no need for this; the TabletCopyClient destructor will run Abort() Done http://gerrit.cloudera.org:8080/#/c/7654/6/src/kudu/tserver/ts_tablet_manager.cc@992 PS6, Line 992: (elapsed_ms > FLAGS_tablet_start_warn_threshold_ms) { > that should have already happened above on line 972, right? Ah, this should be "while starting", although I think this change could be pushed to the "handle failures at runtime" patch, since only then can errors get set in the replica externally. -- To view, visit http://gerrit.cloudera.org:8080/7654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 Gerrit-Change-Number: 7654 Gerrit-PatchSet: 7 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 21 Nov 2017 18:17:49 + Gerrit-HasComments: Yes
[kudu-CR] handle disk failures during tablet copies
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7654 to look at the new patch set (#7). Change subject: handle disk failures during tablet copies .. handle disk failures during tablet copies There are two components in a tablet copy: the copy client (that receives data) and the copy session source (that sends data). Coarse-grain handling of disk failures during tablet copies is done for the tablet copy client as: - Before starting a copy client, if no disks are available to place the tablet, simply return (instead of failing a CHECK). - Before downloading each WAL segments or block, check that the tablet is in a healthy group. And for the tablet copy session as: - Before sending a block or log segment, check if the tablet has an error. Upon returning an error, the tablet copy client will shutdown the replica, leaving it in a failed state. A test is added to ensure that both copy clients and that source sessions with failed disks will return errors to the copying client. Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 --- M src/kudu/tserver/tablet_copy-test-base.h M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_service-test.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_copy_source_session.h 7 files changed, 124 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/7654/7 -- To view, visit http://gerrit.cloudera.org:8080/7654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 Gerrit-Change-Number: 7654 Gerrit-PatchSet: 7 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] handle disk failures during tablet copies
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/7654 ) Change subject: handle disk failures during tablet copies .. Patch Set 6: (4 comments) Some of these changes make sense but see my comments about Abort() http://gerrit.cloudera.org:8080/#/c/7654/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7654/6//COMMIT_MSG@16 PS6, Line 16: WALs WAL segment http://gerrit.cloudera.org:8080/#/c/7654/6/src/kudu/tserver/tablet_copy_source_session.cc File src/kudu/tserver/tablet_copy_source_session.cc: http://gerrit.cloudera.org:8080/#/c/7654/6/src/kudu/tserver/tablet_copy_source_session.cc@133 PS6, Line 133: nullptr nit: since this is an optional out-param of the function, defaulting it to nullptr in the header file might be the user-friendlier option. Otherwise, would be helpful to add a comment to document what this is, like: RETURN_NOT_OK(CheckHealthyDirGroup(/*error_code=*/ nullptr)); http://gerrit.cloudera.org:8080/#/c/7654/6/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/7654/6/src/kudu/tserver/ts_tablet_manager.cc@694 PS6, Line 694: // In case of failure, shutdown the replica. : auto failure_cleanup = MakeScopedCleanup([&] { : replica->SetError(s); : replica->Shutdown(); : }); There is no need for this; the TabletCopyClient destructor will run Abort() and tombstone the replica if it didn't succeed. http://gerrit.cloudera.org:8080/#/c/7654/6/src/kudu/tserver/ts_tablet_manager.cc@992 PS6, Line 992: / If the replica was marked failed while bootstrapping, abort. that should have already happened above on line 972, right? -- To view, visit http://gerrit.cloudera.org:8080/7654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 Gerrit-Change-Number: 7654 Gerrit-PatchSet: 6 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 21 Nov 2017 06:03:57 + Gerrit-HasComments: Yes
[kudu-CR] handle disk failures during tablet copies
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7654 to look at the new patch set (#6). Change subject: handle disk failures during tablet copies .. handle disk failures during tablet copies There are two components in a tablet copy: the copy client (that receives data) and the copy session source (that sends data). Coarse-grain handling of disk failures during tablet copies is done for the tablet copy client as: * Before starting a copy client, if no disks are available to place the tablet, simply return (instead of failing a CHECK). * Before downloading each WALs or block, check that the tablet is in a healthy group. And for the tablet copy session as: * Before sending a block or log segment, check if the tablet has an error. Upon returning an error, the tablet copy client will shutdown the replica, leaving it in a failed state. A test is added to ensure that both copy clients and that source sessions with failed disks will return errors to the copying client. Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 --- M src/kudu/tserver/tablet_copy-test-base.h M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_service-test.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_copy_source_session.h M src/kudu/tserver/ts_tablet_manager.cc 8 files changed, 140 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/7654/6 -- To view, visit http://gerrit.cloudera.org:8080/7654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 Gerrit-Change-Number: 7654 Gerrit-PatchSet: 6 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] handle disk failures during tablet copies
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7654 to look at the new patch set (#3). Change subject: handle disk failures during tablet copies .. handle disk failures during tablet copies There are two components in a tablet copy: the copy client (that receives data) and the copy session source (that sends data). Coarse-grain handling of disk failures during tablet copies is done for the tablet copy client as: * Before starting a copy client, if no disks are available to place the tablet, simply return (instead of failing a CHECK). * Before downloading each WALs or block, check that the tablet is in a healthy group. And for the tablet copy session as: * Before sending a block or log segment, check if the tablet has an error. Upon returning an error, the tablet copy client will shutdown the replica, leaving it in a failed state. A test is added to ensure that both copy clients and that source sessions with failed disks will return errors to the copying client. Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 --- M src/kudu/tserver/tablet_copy-test-base.h M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_service-test.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_copy_source_session.h M src/kudu/tserver/ts_tablet_manager.cc 8 files changed, 134 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/7654/3 -- To view, visit http://gerrit.cloudera.org:8080/7654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 Gerrit-Change-Number: 7654 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] handle disk failures during tablet copies
Andrew Wong has abandoned this change. ( http://gerrit.cloudera.org:8080/8607 ) Change subject: handle disk failures during tablet copies .. Abandoned This is a duplicate -- To view, visit http://gerrit.cloudera.org:8080/8607 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Iacbfe446d01dd523fb2f2f81880e5af2551e979f Gerrit-Change-Number: 8607 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew WongGerrit-Reviewer: Kudu Jenkins
[kudu-CR] handle disk failures during tablet copies
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7654 to look at the new patch set (#2). Change subject: handle disk failures during tablet copies .. handle disk failures during tablet copies There are two components in a tablet copy: the copy client (that receives data) and the copy session source (that sends data). Coarse-grain handling of disk failures during tablet copies is done for the tablet copy client as: * Before starting a copy client, if no disks are available to place the tablet, simply return (instead of failing a CHECK). * Before downloading each WALs or block, check that the tablet is in a healthy group. And for the tablet copy session as: * Before sending a block or log segment, check if the tablet has an error. Upon returning an error, the tablet copy client will shutdown the replica, leaving it in a failed state. A test is added to ensure that both copy clients and that source sessions with failed disks will return errors to the copying client. Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 --- M src/kudu/tablet/tablet.h M src/kudu/tserver/tablet_copy-test-base.h M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_service-test.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_copy_source_session.h 8 files changed, 119 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/7654/2 -- To view, visit http://gerrit.cloudera.org:8080/7654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 Gerrit-Change-Number: 7654 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] handle disk failures during tablet copies
Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8607 Change subject: handle disk failures during tablet copies .. handle disk failures during tablet copies There are two components in a tablet copy: the copy client (that receives data) and the copy session source (that sends data). Coarse-grain handling of disk failures during tablet copies is done for the tablet copy client as: * Before starting a copy client, if no disks are available to place the tablet, simply return (instead of failing a CHECK). * Before downloading each WALs or block, check that the tablet is in a healthy group. And for the tablet copy session as: * Before sending a block or log segment, check if the tablet has an error. Upon returning an error, the tablet copy client will shutdown the replica, leaving it in a failed state. A test is added to ensure that both copy clients and that source sessions with failed disks will return errors to the copying client. Change-Id: Iacbfe446d01dd523fb2f2f81880e5af2551e979f --- M src/kudu/tablet/tablet.h M src/kudu/tserver/tablet_copy-test-base.h M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_service-test.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_copy_source_session.h 8 files changed, 119 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/8607/1 -- To view, visit http://gerrit.cloudera.org:8080/8607 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iacbfe446d01dd523fb2f2f81880e5af2551e979f Gerrit-Change-Number: 8607 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong
[kudu-CR] handle disk failures during tablet copies
Adar Dembo has posted comments on this change. Change subject: handle disk failures during tablet copies .. Patch Set 1: (3 comments) I imagine Mike will do a more thorough review, but overall looks good to me. http://gerrit.cloudera.org:8080/#/c/7654/1//COMMIT_MSG Commit Message: Line 10: receiving data) and the copy session sources (that sending data). "receive" and "send". Or "are receiving" and "are sending". http://gerrit.cloudera.org:8080/#/c/7654/1/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: PS1, Line 525: if (group->uuid_indices().size() != valid_uuid_indices.size()) { : return Status::IOError("Directory group contains a failed directory"); : } : group_uuid_indices = _uuid_indices; Unrelated to this patch? http://gerrit.cloudera.org:8080/#/c/7654/1/src/kudu/tserver/tablet_copy_client.cc File src/kudu/tserver/tablet_copy_client.cc: Line 305: Leftover from a change since removed? Or is this stylistic? -- To view, visit http://gerrit.cloudera.org:8080/7654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] handle disk failures during tablet copies
Andrew Wong has uploaded a new change for review. http://gerrit.cloudera.org:8080/7654 Change subject: handle disk failures during tablet copies .. handle disk failures during tablet copies There are two components to tablet copies: the copy clients (that receiving data) and the copy session sources (that sending data). Coarse-grain handling of disk failures during tablet copies is done as follows. For tablet copy source sessions: - if a disk fails in the session (i.e. during a call to ReadFileChunkToBuf, etc.), the error should handle itself at the block layer and return the error to the client - if a disk fails during the session in some other thread, the next call to GetBlockPiece or GetLogSegmentPiece should return the error that failed the replica For tablet copy clients: - when getting next blocks, the client repeatedly gets blocks for the copy. If this fails, the client will fail. - everything will handle itself at the block layer. Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 --- M src/kudu/fs/data_dirs.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/ts_disk_failure-test.cc 4 files changed, 94 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/7654/1 -- To view, visit http://gerrit.cloudera.org:8080/7654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic18d93c218ea13f3086f420a4847cb5e29a47bc7 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong