Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17037 )

Change subject: KUDU-2612 tablet servers automatically register txn participants
......................................................................


Patch Set 9:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

PS9:
> I think it's better to move these new test scenarios into txn_write_ops-ite
Done


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/integration-tests/txn_write_ops-itest.cc
File src/kudu/integration-tests/txn_write_ops-itest.cc:

PS9:
> Could you also add a test in which we delete a replica that has pending ops
Done


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/integration-tests/txn_write_ops-itest.cc@446
PS9, Line 446: GetSingleTxnOpDispatcher() {
> nit: indent by 4 spaces
Done


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/ops/participant_op.cc
File src/kudu/tablet/ops/participant_op.cc:

http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/ops/participant_op.cc@197
PS9, Line 197:       
RETURN_NOT_OK(tablet_replica_->UnregisterTxnOpDispatcher(txn_id()));
> Apply() cannot fail, see L234 for the CHECK_OK(). I suspect we should be ab
Exactly -- as we discussed offline, the proper place for this check is 
ParticipantOp::Prepare()

Done


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.h
File src/kudu/tablet/tablet_replica.h:

http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.h@443
PS9, Line 443: let
> nit: give
Done


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@602
PS9, Line 602: this->RegisteredParticipantCb(txn_id, s);
> Answering my own question, no I don't think it does. That said, we could pr
Ack.


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@602
PS9, Line 602: this->RegisteredParticipantCb(txn_id, s);
> This gets set here and in the ParticipantTxnBeginCallback. Does that mean i
It's being called once, as far as I can see.

Anyway, it was messy due to iterative approach (I tried several options); I 
cleaned this up a bit.  Should be more straightforward now.


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@607
PS9, Line 607: SetRegistrationScheduled
> SetPreliminaryTasksScheduled()?
Done


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1179
PS9, Line 1179: TxnParticipantBegi
> nit: BeginTxnParticipantOp? Same in the callback?
Done


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1208
PS9, Line 1208: CancelPendingTxnOps
> Is it possible for CancelPendingTxnOps to be called twice if BEGIN_TXN op f
Nope, it's called only once as I can see.  Even if it was called twice, the 
second time would be a no-op because the queue of pending ops was empty.

Anyways, I cleaned it up bit -- it should be more straightforward now.  I agree 
it was a bit messy in this revision.


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1260
PS9, Line 1260: preliminary_tasks_scheduled_
> If the main purpose of this is to ensure we don't have multiple registratio
It would if it were guaranteed that the caller would
 (1) always try to schedule those tasks
 (2) always succeed in registering tasks if it tries to do so

(1) isn't guaranteed if looking at that from API perspective.  (2) isn't 
guaranteed because TabletReplica::SubmitTxnWrite() can fail with non-OK status. 
 I agree that the current approach doesn't look particularly good but telying 
on the logic of existing call sites would make even worse.

But I might be missing the essence of your question. Probably you meant just to 
call the scheduler functor here and remove that call-back from the upper level 
to set 'preliminary_tasks_scheduled_'?  I indeed removed the extra level of 
indirection, and now the scheduler function is called right from here.


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tablet/tablet_replica.cc@1362
PS9, Line 1362: Cleanup
> nit: maybe call this RespondFailures() or something to make it more obvious
As for the holding the lock, I don't think it's needed here because this method 
isn't supposed to be called while inflight_status_ is to change concurrently.  
But it's a good call: it looks suspicious.  To make it cleaner, I made this 
method static.

Renamed into RespondWithStatus(), added DCHECK() for the status parameter at 
the call sites.


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tserver/ts_tablet_manager.cc@1166
PS9, Line 1166:
              :   if 
(PREDICT_FALSE(FLAGS_txn_participant_begin_op_inject_latency_ms > 0)) {
              :     // This is to simulate a situation when txn participant 
registration
              :     // above takes too long.
              :     SleepFor(MonoDelta::FromMilliseconds(
              :         FLAGS_txn_participant_begin_op_inject_latency_ms));
              :   }
> nit: can use MAYBE_INJECT_FIXED_LATENCY(), same above.
Done


http://gerrit.cloudera.org:8080/#/c/17037/9/src/kudu/tserver/ts_tablet_manager.cc@1180
PS9, Line 1180: d);
> Would it make sense to pass 'cb' as an argument here to pass to Participant
Good call!  Yep, it makes sense.



--
To view, visit http://gerrit.cloudera.org:8080/17037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia383f7afd208c44695c57aab82e3818fa1712ce6
Gerrit-Change-Number: 17037
Gerrit-PatchSet: 9
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 27 Feb 2021 06:20:14 +0000
Gerrit-HasComments: Yes

Reply via email to