Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21932 )

Change subject: IMPALA-13509: Copy rows directly to OutboundRowBatch during 
hash partitioning
......................................................................


Patch Set 13:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/21932/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21932/12//COMMIT_MSG@67
PS12, Line 67: TPCH benchmarks improved significantly
> Nice improvements! Did you also measure TPC-DS?
tpcds also had ~5% improvement


http://gerrit.cloudera.org:8080/#/c/21932/12/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

http://gerrit.cloudera.org:8080/#/c/21932/12/be/src/runtime/descriptors.h@845
PS12, Line 845:   /// Number of tuples per row. Has IR_NO_INLINE to make it 
replacable with constant
> nit: Could you please add comment to this function?
Done


http://gerrit.cloudera.org:8080/#/c/21932/12/be/src/runtime/krpc-data-stream-sender-ir.cc
File be/src/runtime/krpc-data-stream-sender-ir.cc:

http://gerrit.cloudera.org:8080/#/c/21932/12/be/src/runtime/krpc-data-stream-sender-ir.cc@35
PS12, Line 35:   // codegend.
             :   int channel_ids[RowBatch::HASH_BATCH_SIZE];
             :   int row_idx = 0;
             :   while (row_idx < num_rows) {
             :     int row_count = 0;
             :     FOREACH_ROW_LIMIT(batch, row_idx, RowBatch::HASH_BATCH_SIZE, 
row_batch_iter) {
             :       TupleRow* row = row_batch_iter.Get();
             :       channel_ids[row_count++] = HashRow(row, 
exchange_hash_seed_) % num_channels;
             :     }
             :     row_count = 0;
             :     FOREACH_ROW_LIMIT(batch, row_idx, RowBatch::HASH_BATCH_SIZE, 
row_batch_iter) {
             :       int channel_id = channel_ids[row_count++];
             :       PartitionRowCollector& collector = 
partition_row_collectors_[channel_id];
             :
> This code is a bit subtle due to the micro-optimization introduced by https
Added comment.
Note that this logic was added before implementing codegen in 
KrpcDataStreamSender, so the things written in IMPALA-6461 may not be true 
anymore. I still think that it makes sense to have these micro batches to allow 
better pipelining in the hashing part which would be ruined by the many 
branches in deep copy, but this is just speculation.


http://gerrit.cloudera.org:8080/#/c/21932/12/be/src/runtime/krpc-data-stream-sender.h
File be/src/runtime/krpc-data-stream-sender.h:

http://gerrit.cloudera.org:8080/#/c/21932/12/be/src/runtime/krpc-data-stream-sender.h@184
PS12, Line 184: TransmitData(
> typo
Done


http://gerrit.cloudera.org:8080/#/c/21932/12/be/src/runtime/outbound-row-batch.inline.h
File be/src/runtime/outbound-row-batch.inline.h:

http://gerrit.cloudera.org:8080/#/c/21932/12/be/src/runtime/outbound-row-batch.inline.h@76
PS12, Line 76: &tuple_data_.back()) + 1;
> Maybe 'tuple_data_.back()' is simpler.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21932
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81a16c2f0fcfc1f3adef7077b3932a29a0f15a8f
Gerrit-Change-Number: 21932
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Sat, 09 Nov 2024 15:42:56 +0000
Gerrit-HasComments: Yes

Reply via email to