Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8023 )

Change subject: IMPALA-4856: Port data stream service to KRPC
......................................................................


Patch Set 3:

(34 comments)

Another batch of comments...

http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h
File be/src/runtime/krpc-data-stream-mgr.h:

http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@63
PS3, Line 63: ongoing transmission from a client to a server as a 'stream'
I don't think that's accurate. see questions in krpc-data-stream-recvr.h about 
stream definition.


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@93
PS3, Line 93: process it immediately, add it to a fixed-size 'batch queue' for 
later processing
XXX check whether these are really different


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@164
PS3, Line 164: deferred processing
is that because the recvr hasn't showed up yet, or because the stream's queue 
is full, or both?


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@235
PS3, Line 235: node_id
dest_node_id?


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@239
PS3, Line 239: Ownership of the receiver is shared between this DataStream mgr 
instance and the
             :   /// caller.
that seems unnecessary but don't change it now.


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@246
PS3, Line 246:
'proto_batch'?


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@248
PS3, Line 248: .
'request'.

Also document what 'response' and 'context' are.


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@266
PS3, Line 266: Notifies the receiver
is this an RPC handler? I think we should just be explicit about which of these 
methods are RPC handlers.


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@267
PS3, Line 267: The RPC
what RPC is this talking about? If this is a handler, then it's clear.


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@274
PS3, Line 274: Closes
Does it close or cancel? (or is there no difference?)


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@284
PS3, Line 284: RPCs which were buffered
To be consistent with terminology used in class comment, maybe say "deferred 
RPCs"


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@340
PS3, Line 340: ragment instance id, 0
what is that saying? is that a misplaced comma or am I reading this wrong?


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@341
PS3, Line 341: instance id changes
I don't understand this.  it kinda sounds like we're trying to be able to find 
all instances of this fragment, but then wouldn't we iterate until the fragment 
id changes (not the instance id)?


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@349
PS3, Line 349:   struct EarlySendersList {
hmm, I guess we need this now that we can't block the RPC thread?


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@358
PS3, Line 358: Time
Monotonic time


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@374
PS3, Line 374: time
monotonic time


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@382
PS3, Line 382:   boost::unordered_set<RecvrId> closed_stream_cache_;
all this parallel startup stuff really needs to be revisited (but not for this 
change). it's too complex and brittle.


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@386
PS3, Line 386: Deserialize
maybe call it DeserializeDeferred() or DeserializeWorker() to make it clearer 
that this is only for the deferred (slow) path, since the normal path will also 
have to deserialize (but doesn't use this code).


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@404
PS3, Line 404:   void EnqueueRowBatch(DeserializeWorkItem&& payload);
how about grouping this with Deferred function above since it's related. Also, 
I think the name should be less generic. Like maybe EnqueueDeferredBatch() or 
EnqueueDeferredRpc() (does the response happen before or after this deferred 
work?)


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@413
PS3, Line 413: block
what's that?


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@414
PS3, Line 414: Status
I think that status is not getting checked by the caller. I thought Tim made 
Status warn on unused result -- why is it not catching this? (Or do we still 
need to annotate each method?).


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@416
PS3, Line 416:   inline uint32_t GetHashValue(const TUniqueId& 
fragment_instance_id, PlanNodeId node_id);
let's add a quick comment.


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@421
PS3, Line 421: HandleTimedOutSenders
RespondToTimedOutSender() or RespondTimeOutToSender()?
Also the code only responds to a single sender, so the comment and name 
shouldn't be plural.


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-recvr.h
File be/src/runtime/krpc-data-stream-recvr.h:

http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-recvr.h@48
PS3, Line 48: Single receiver of an m:n data streams
Either the "an" shouldn't be there or streams shouldn't be plural. But I'm not 
sure exactly how we define the "stream". Is it one pair of (sender,recvr), or 
is it the set of all (sender,recvr) pairs, or is it all pairs that contain this 
receiver (*, recvr)?


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-recvr.h@116
PS3, Line 116: discarded
that doesn't seem accurate


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-recvr.h@117
PS3, Line 117: payload
what's that?


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-recvr.h@126
PS3, Line 126: Stream
mentioned above, we didn't clearly define what a "stream" actually is, so it's 
hard to know what this means.


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-recvr.h@141
PS3, Line 141: amount of buffering
is this in bytes?


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-recvr.h@142
PS3, Line 142: we stop acking
wouldn't it be more accurate (and consistent with mgr terminology) to say "we 
defer processing of incoming RPCs once..."


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-sender.h
File be/src/runtime/krpc-data-stream-sender.h:

http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-sender.h@53
PS3, Line 53: the buffer size in bytes allocated to each channel
it's not clear what that means from just reading the header, though i know you 
just carried this comment so okay to defer.


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-sender.h@59
PS3, Line 59: destinations
that's not documented.


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-sender.h@97
PS3, Line 97:       WARN_UNUSED_RESULT;
should that really be public? seems more like a worker function.


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-sender.h@101
PS3, Line 101:   int64_t GetNumDataBytesSent() const;
that seems weird. is it for testing? if so, can it be protected instead?


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-sender.h@109
PS3, Line 109:       const TDataSink& tsink, RuntimeState* state) 
WARN_UNUSED_RESULT;
why is that protected? is this a testing thing?



--
To view, visit http://gerrit.cloudera.org:8080/8023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1
Gerrit-Change-Number: 8023
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Comment-Date: Thu, 12 Oct 2017 04:37:30 +0000
Gerrit-HasComments: Yes

Reply via email to