Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8023 )
Change subject: IMPALA-4856: Port data stream service to KRPC ...................................................................... Patch Set 3: (34 comments) Another batch of comments... http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h File be/src/runtime/krpc-data-stream-mgr.h: http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@63 PS3, Line 63: ongoing transmission from a client to a server as a 'stream' I don't think that's accurate. see questions in krpc-data-stream-recvr.h about stream definition. http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@93 PS3, Line 93: process it immediately, add it to a fixed-size 'batch queue' for later processing XXX check whether these are really different http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@164 PS3, Line 164: deferred processing is that because the recvr hasn't showed up yet, or because the stream's queue is full, or both? http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@235 PS3, Line 235: node_id dest_node_id? http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@239 PS3, Line 239: Ownership of the receiver is shared between this DataStream mgr instance and the : /// caller. that seems unnecessary but don't change it now. http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@246 PS3, Line 246: 'proto_batch'? http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@248 PS3, Line 248: . 'request'. Also document what 'response' and 'context' are. http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@266 PS3, Line 266: Notifies the receiver is this an RPC handler? I think we should just be explicit about which of these methods are RPC handlers. http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@267 PS3, Line 267: The RPC what RPC is this talking about? If this is a handler, then it's clear. http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@274 PS3, Line 274: Closes Does it close or cancel? (or is there no difference?) http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@284 PS3, Line 284: RPCs which were buffered To be consistent with terminology used in class comment, maybe say "deferred RPCs" http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@340 PS3, Line 340: ragment instance id, 0 what is that saying? is that a misplaced comma or am I reading this wrong? http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@341 PS3, Line 341: instance id changes I don't understand this. it kinda sounds like we're trying to be able to find all instances of this fragment, but then wouldn't we iterate until the fragment id changes (not the instance id)? http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@349 PS3, Line 349: struct EarlySendersList { hmm, I guess we need this now that we can't block the RPC thread? http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@358 PS3, Line 358: Time Monotonic time http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@374 PS3, Line 374: time monotonic time http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@382 PS3, Line 382: boost::unordered_set<RecvrId> closed_stream_cache_; all this parallel startup stuff really needs to be revisited (but not for this change). it's too complex and brittle. http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@386 PS3, Line 386: Deserialize maybe call it DeserializeDeferred() or DeserializeWorker() to make it clearer that this is only for the deferred (slow) path, since the normal path will also have to deserialize (but doesn't use this code). http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@404 PS3, Line 404: void EnqueueRowBatch(DeserializeWorkItem&& payload); how about grouping this with Deferred function above since it's related. Also, I think the name should be less generic. Like maybe EnqueueDeferredBatch() or EnqueueDeferredRpc() (does the response happen before or after this deferred work?) http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@413 PS3, Line 413: block what's that? http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@414 PS3, Line 414: Status I think that status is not getting checked by the caller. I thought Tim made Status warn on unused result -- why is it not catching this? (Or do we still need to annotate each method?). http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@416 PS3, Line 416: inline uint32_t GetHashValue(const TUniqueId& fragment_instance_id, PlanNodeId node_id); let's add a quick comment. http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@421 PS3, Line 421: HandleTimedOutSenders RespondToTimedOutSender() or RespondTimeOutToSender()? Also the code only responds to a single sender, so the comment and name shouldn't be plural. http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-recvr.h File be/src/runtime/krpc-data-stream-recvr.h: http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-recvr.h@48 PS3, Line 48: Single receiver of an m:n data streams Either the "an" shouldn't be there or streams shouldn't be plural. But I'm not sure exactly how we define the "stream". Is it one pair of (sender,recvr), or is it the set of all (sender,recvr) pairs, or is it all pairs that contain this receiver (*, recvr)? http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-recvr.h@116 PS3, Line 116: discarded that doesn't seem accurate http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-recvr.h@117 PS3, Line 117: payload what's that? http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-recvr.h@126 PS3, Line 126: Stream mentioned above, we didn't clearly define what a "stream" actually is, so it's hard to know what this means. http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-recvr.h@141 PS3, Line 141: amount of buffering is this in bytes? http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-recvr.h@142 PS3, Line 142: we stop acking wouldn't it be more accurate (and consistent with mgr terminology) to say "we defer processing of incoming RPCs once..." http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-sender.h File be/src/runtime/krpc-data-stream-sender.h: http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-sender.h@53 PS3, Line 53: the buffer size in bytes allocated to each channel it's not clear what that means from just reading the header, though i know you just carried this comment so okay to defer. http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-sender.h@59 PS3, Line 59: destinations that's not documented. http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-sender.h@97 PS3, Line 97: WARN_UNUSED_RESULT; should that really be public? seems more like a worker function. http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-sender.h@101 PS3, Line 101: int64_t GetNumDataBytesSent() const; that seems weird. is it for testing? if so, can it be protected instead? http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-sender.h@109 PS3, Line 109: const TDataSink& tsink, RuntimeState* state) WARN_UNUSED_RESULT; why is that protected? is this a testing thing? -- To view, visit http://gerrit.cloudera.org:8080/8023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1 Gerrit-Change-Number: 8023 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Comment-Date: Thu, 12 Oct 2017 04:37:30 +0000 Gerrit-HasComments: Yes
