Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21563 )
Change subject: IMPALA-13194: Fast-serialize position delete records ...................................................................... Patch Set 1: (11 comments) http://gerrit.cloudera.org:8080/#/c/21563/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21563/1//COMMIT_MSG@13 PS1, Line 13: e Nit: tuples? http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.h File be/src/runtime/krpc-data-stream-sender.h: http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.h@308 PS1, Line 308: std::unordered_map<Channel*, std::unique_ptr<IcebergPositionDeleteChannel>> Could add a comment that describes this variable. http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.cc@633 PS1, Line 633: unique_ptr I think it would be cleaner if we returned OutboundRowBatch*. AFAICS KrpcDataStreamSender::Channel::TransmitData() could also take a raw pointer. http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.cc@797 PS1, Line 797: if (row_count_ == capacity_) { Can channel_->RowBatchCapacity() ever be 0 at L775? If it can then this check comes too late. If it can't we can add a DCHECK. http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.cc@857 PS1, Line 857: Ubsan::MemSet What if 'tuple_data_size' is 0? In this case 'tuple_data' may be a nullptr according to https://en.cppreference.com/w/cpp/container/vector/data and I'm not sure if memset with a nullptr could be undefined behaviour. http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.cc@943 PS1, Line 943: auto I think writing the actual type (Channel*) is easier to read. http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.cc@945 PS1, Line 945: row_desc_->tuple_descriptors()[0] Could extract into a variable before the loop. http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.cc@1282 PS1, Line 1282: row Not changed in this patch, but it should be 'tuple'. http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/string-value.h File be/src/runtime/string-value.h: http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/string-value.h@200 PS1, Line 200: inline std::size_t hash_value(const StringValue& v) { Not changed in this patch, but how does this work with small strings? StringValue::Eq() first converts the values to SimpleStrings to eliminate the difference between small and normal strings. Or if we only use it with non-small strings, we should add a DCHECK. http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/string-value.h@204 PS1, Line 204: struct StringValueHashWrapper { Do you think we should consider specialising std::hash for StringValue or should we keep this explicit? http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/string-value.h@206 PS1, Line 206: impala:: Do we need this qualifier? -- To view, visit http://gerrit.cloudera.org:8080/21563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6095f318e3d06dedb4197681156b40dd2a326c6f Gerrit-Change-Number: 21563 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Wed, 10 Jul 2024 11:01:31 +0000 Gerrit-HasComments: Yes
