Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21932 )
Change subject: IMPALA-13509: Copy rows directly to OutboundRowBatch during hash partitioning ...................................................................... Patch Set 10: (10 comments) http://gerrit.cloudera.org:8080/#/c/21932/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21932/9//COMMIT_MSG@21 PS9, Line 21: PartitionRowCollector's OutboundRowBatch (collector_batch_), which > typo: OutboundRowBatch Done http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/krpc-data-stream-sender-ir.cc File be/src/runtime/krpc-data-stream-sender-ir.cc: http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/krpc-data-stream-sender-ir.cc@55 PS9, Line 55: num_rows_++; > nit: we don't usually have a space between id and '++'. Done http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/krpc-data-stream-sender.cc@194 PS9, Line 194: // Shutdowns the channel and frees the row batch allocation. Any in-flight RPC will > Is this comment still accurate? Thanks for spotting, this function is actually no longer used. Reused some parts of the comment for PartitionRowCollector's functions. http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/krpc-data-stream-sender.cc@602 PS9, Line 602: DCHECK(rpc_in_flight_batch_ == nullptr); > Does this count successful sends, or attempts to send (some of which may fa It counts all TransmitData attempts. I can rewrite it to RpcSuccess to be consistent with RpcRetry/RpcFailure. The reason for adding this counter was to be able to deduce the avg number of rows in sent row batches - the distinction didn't matter in this case. http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/krpc-data-stream-sender.cc@1240 PS9, Line 1240: } > I don't see a reason to add this additional block. They're already in an el Done (the block was added for an extra timer which turned out to be ~useless) http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/krpc-data-stream-sender.cc@1351 PS9, Line 1351: } > nit: think this should be indented 4 more spaces. fixed indentation http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/krpc-data-stream-sender.cc@1353 PS9, Line 1353: } // namespace impala > Why don't num_receivers factor in here? This is only used with HASH a and KUDU partitioning, and in this case there is always only 1 receiver for each serialized batch. Actually I think that this is not the best place to track this stat, it would be better to do it in Channel when the data is actually sent. The counter in KrpcDataStreamSender::SerializeBatch() can be also incorrect when one of the channels is closed by the receiver: https://github.com/apache/impala/blob/5c9b82854a3397572c4adb3c6cebb2e7d1aaad9a/be/src/runtime/krpc-data-stream-sender.cc#L549 In this case we will track as if the data was sent to the closed channel too. http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/outbound-row-batch.h File be/src/runtime/outbound-row-batch.h: http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/outbound-row-batch.h@77 PS9, Line 77: Status PrepareForSend(int num_tuples_per_row, TrackedString* compression_scratch, > Maybe default used_append_row=false to avoid impacting existing users? Woul Done http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/outbound-row-batch.inline.h File be/src/runtime/outbound-row-batch.inline.h: http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/outbound-row-batch.inline.h@53 PS9, Line 53: // resizing to the exact size, similarly to vector. It would be clearer to use a > typo: It would be clearer Done http://gerrit.cloudera.org:8080/#/c/21932/9/be/src/runtime/outbound-row-batch.inline.h@75 PS9, Line 75: uint8_t* dst = reinterpret_cast<uin > I would agree, it seems to be checking something that couldn't be null. You Removed it. I think I added this DCHECK during desperate crash hunting and actually it doesn't make sense. -- To view, visit http://gerrit.cloudera.org:8080/21932 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I81a16c2f0fcfc1f3adef7077b3932a29a0f15a8f Gerrit-Change-Number: 21932 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Comment-Date: Tue, 05 Nov 2024 08:02:28 +0000 Gerrit-HasComments: Yes
