Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9690 )

Change subject: IMPALA-6685: Improve profiles in KrpcDataStreamRecvr and 
KrpcDataStreamSender
......................................................................


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/9690/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9690/1//COMMIT_MSG@48
PS1, Line 48:    - Deserializ
> Following our in-person discussion, this should probably be cummulative.
Actually, after some more thought, this should stay non-cumulative. The reason 
is that this is the key differentiation for telling whether an exchange node is 
stuck because its parent is not consuming from it or if the producers are not 
sending row batches to it. With cumulative value of DeferredBatches, one really 
cannot tell the difference because it would stay the same in either cases.

If we keep the outstanding DeferredBatches count instead, one can tell the 
parent is not consuming from it if both the bytes dequeued and the number of 
outstanding deferred batches stays flat for a period of time. On the other 
hand, if the producer is not producing, one will see the number of deferred 
batches dwindle and the number of bytes dequeued goes up.

In the long run, we ought to keep the row production rate over time per exec 
node to tell where the bottleneck is.


http://gerrit.cloudera.org:8080/#/c/9690/1/be/src/runtime/krpc-data-stream-recvr.h
File be/src/runtime/krpc-data-stream-recvr.h:

http://gerrit.cloudera.org:8080/#/c/9690/1/be/src/runtime/krpc-data-stream-recvr.h@240
PS1, Line 240:   /// Following counters belong to 'dequeue_profile_'.
> Should we group them in separate structs? That (or encoding in the names) h
I took the suggestion to add comments to break up the comments of the counters.


http://gerrit.cloudera.org:8080/#/c/9690/1/be/src/runtime/krpc-data-stream-recvr.h@263
PS1, Line 263:   RuntimeProfile::Counter* total_eos_received_;
             :
> Should we collect a full histogram instead? You can use a HdrHistogram to t
Looked into using histogram. Seems a bit painful to report the histogram 
periodically as it probably requires custom code in the report exec status path 
to see if an exchange node is in the fragment instance and calls 
RuntimeProfile::AddInfoString() to dump the histogram and it's also a bit hairy 
if you have multiple exchange nodes in a single fragment instance.

An alternate would be to add the histogram at the end of the query but that 
would be not too useful if the query is taking a long time to complete.

That's why I stick with RuntimeProfile::SummaryStatsCounter which seems like a 
reasonable compromise. I am open to suggestion.


http://gerrit.cloudera.org:8080/#/c/9690/1/be/src/runtime/krpc-data-stream-recvr.h@292
PS1, Line 292:   /// Summary stats of time which row batches spent in deferred 
queue before
> maybe also add unit here and to the other _time_ members?
I believe all RuntimeProfile::Counters members which measure time don't 
actually have ns_ suffix as they always have nanosecond resolution due to the 
hardcoded resolution in ADD_TIMER(). So, not sure if I want to break the 
convention here and become inconsistent with the rest of the code base.

'wait_time_start_' below needs the 'ns_' suffix as it's using int64_t as a 
timestamp so may help to have the resolution encoded in its name.


http://gerrit.cloudera.org:8080/#/c/9690/1/be/src/runtime/krpc-data-stream-recvr.h@297
PS1, Line 297: pace impala
> for uint64 values that hold time, it's nice to encode the units in the name
Done


http://gerrit.cloudera.org:8080/#/c/9690/1/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/9690/1/be/src/runtime/krpc-data-stream-recvr.cc@461
PS1, Line 461:           request->tuple_offsets_sidecar_idx(), 
&tuple_offsets).ok())) {
> Should this be a histogram, too?
Please see replies elsewhere.


http://gerrit.cloudera.org:8080/#/c/9690/1/be/src/runtime/krpc-data-stream-recvr.cc@604
PS1, Line 604: ues = is_merging ?
> Are these the other way round? I think receiver is the enqueue profile, sen
Receiver is the dequeuing side of the 'sender_queue'.

Sender is the enqueuing side of the 'sender_queue'. Sender is always the 
inbound direction.

Comment fixed.


http://gerrit.cloudera.org:8080/#/c/9690/1/be/src/runtime/krpc-data-stream-recvr.cc@639
PS1, Line 639: TotalEarlySenders",
> Should we have a separate counter for the early sender batches?
'total_early_senders_' above is for that.


http://gerrit.cloudera.org:8080/#/c/9690/1/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/9690/1/be/src/runtime/krpc-data-stream-sender.cc@635
PS1, Line 635:       profile()->AddDerivedCounter("NetworkThroughput", 
TUnit::BYTES_PER_SECOND,
             :           bind<int64_t>(&RuntimeProfile::UnitsPerSecond, 
bytes_sent_counter_,
             :               total_network_time_));
             :   eos_sent_counter_ = ADD_COUNTER(profile(), "EosSent", TUn
This is actually just measuring the rate in which we are serializing row 
batches. Seems misleading to call it overall_throughput. Also, unclear if we 
care about serialization throughput or we can derive it from the number of 
uncompressed bytes serialized and the serialization time.



--
To view, visit http://gerrit.cloudera.org:8080/9690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ba405921b3df920c1e85b940ce9c8d02fc647cd
Gerrit-Change-Number: 9690
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Mar 2018 07:18:12 +0000
Gerrit-HasComments: Yes

Reply via email to