Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21932 )

Change subject: IMPALA-13509: Copy rows directly to OutboundRowBatch during 
hash partitioning
......................................................................


Patch Set 10:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/21932/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21932/9//COMMIT_MSG@21
PS9, Line 21: PartitionRowCollector's OutboundRowBatch (collector_batch_), which
> typo: OutboundRowBatch
Done


http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/krpc-data-stream-sender-ir.cc
File be/src/runtime/krpc-data-stream-sender-ir.cc:

http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/krpc-data-stream-sender-ir.cc@55
PS9, Line 55:   num_rows_++;
> nit: we don't usually have a space between id and '++'.
Done


http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/krpc-data-stream-sender.cc@194
PS9, Line 194:   // Shutdowns the channel and frees the row batch allocation. 
Any in-flight RPC will
> Is this comment still accurate?
Thanks for spotting, this function is actually no longer used. Reused some 
parts of the comment for PartitionRowCollector's functions.


http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/krpc-data-stream-sender.cc@602
PS9, Line 602:   DCHECK(rpc_in_flight_batch_ == nullptr);
> Does this count successful sends, or attempts to send (some of which may fa
It counts all TransmitData attempts.
I can rewrite it to RpcSuccess to be consistent with RpcRetry/RpcFailure. The 
reason for adding this counter was to be able to deduce the avg number of rows 
in sent row batches - the distinction didn't matter in this case.


http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/krpc-data-stream-sender.cc@1240
PS9, Line 1240:     }
> I don't see a reason to add this additional block. They're already in an el
Done (the block was added for an extra timer which turned out to be ~useless)


http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/krpc-data-stream-sender.cc@1351
PS9, Line 1351: }
> nit: think this should be indented 4 more spaces.
fixed indentation


http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/krpc-data-stream-sender.cc@1353
PS9, Line 1353: } // namespace impala
> Why don't num_receivers factor in here?
This is only used with HASH a and KUDU partitioning, and in this case there is 
always only 1 receiver for each serialized batch.

Actually I think that this is not the best place to track this stat, it would 
be better to do it in Channel when the data is actually sent. The counter in 
KrpcDataStreamSender::SerializeBatch() can be also incorrect when one of the 
channels is closed by the receiver:
https://github.com/apache/impala/blob/5c9b82854a3397572c4adb3c6cebb2e7d1aaad9a/be/src/runtime/krpc-data-stream-sender.cc#L549
In this case we will track as if the data was sent to the closed channel too.


http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/outbound-row-batch.h
File be/src/runtime/outbound-row-batch.h:

http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/outbound-row-batch.h@77
PS9, Line 77:   Status PrepareForSend(int num_tuples_per_row, TrackedString* 
compression_scratch,
> Maybe default used_append_row=false to avoid impacting existing users? Woul
Done


http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/outbound-row-batch.inline.h
File be/src/runtime/outbound-row-batch.inline.h:

http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/outbound-row-batch.inline.h@53
PS9, Line 53:       // resizing to the exact size, similarly to vector. It 
would be clearer to use a
> typo: It would be clearer
Done


http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/outbound-row-batch.inline.h@75
PS9, Line 75:   uint8_t* dst = reinterpret_cast<uin
> I would agree, it seems to be checking something that couldn't be null. You
Removed it. I think I added this DCHECK during desperate crash hunting and 
actually it doesn't make sense.



--
To view, visit http://gerrit.cloudera.org:8080/21932
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81a16c2f0fcfc1f3adef7077b3932a29a0f15a8f
Gerrit-Change-Number: 21932
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Comment-Date: Tue, 05 Nov 2024 08:02:28 +0000
Gerrit-HasComments: Yes

Reply via email to