Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21563 )
Change subject: IMPALA-13194: Fast-serialize position delete records ...................................................................... Patch Set 2: (16 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/21563/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21563/1//COMMIT_MSG@13 PS1, Line 13: e > Nit: tuples? Done http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.h File be/src/runtime/krpc-data-stream-sender.h: http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.h@308 PS1, Line 308: /// A mapping from Channel to IcebergPositionDeleteChannel. Only used in DIRECTED mode > Could add a comment that describes this variable. Done http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.cc@234 PS1, Line 234: private: > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.cc@633 PS1, Line 633: > I think it would be cleaner if we returned OutboundRowBatch*. AFAICS KrpcDa TransmitData() requires a unique_ptr so it can swap outbound batches. http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.cc@760 PS1, Line 760: } > Can you add some comments about the role of this class? Added comment. Also moved out most of the contents to IcebergPositionDeleteCollector. http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.cc@797 PS1, Line 797: return Status::OK(); > Can channel_->RowBatchCapacity() ever be 0 at L775? If it can then this che RowBatchCapacity() is always at least 1 (there is a max(1, <expr>) in its body), otherwise the channels couldn't send any rows. http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.cc@855 PS1, Line 855: } > It should compress if the channel is not local. Ah, thanks for catching this. http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.cc@857 PS1, Line 857: (IsDirectedM > What if 'tuple_data_size' is 0? In this case 'tuple_data' may be a nullptr Ubsan::MemSet() handles exactly this case. http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.cc@943 PS1, Line 943: i64 > I think writing the actual type (Channel*) is easier to read. Actual type would be 'unique_ptr<impala::KrpcDataStreamSender::Channel>&', I'm not sure if that would be an improvement in readability in this otherwise trivial case. http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.cc@945 PS1, Line 945: i64 %seed) #49 { > Could extract into a variable before the loop. Done http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/krpc-data-stream-sender.cc@1282 PS1, Line 1282: > Not changed in this patch, but it should be 'tuple'. Done http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/outbound-row-batch.h File be/src/runtime/outbound-row-batch.h: http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/outbound-row-batch.h@72 PS1, Line 72: Status Finalize(int num_tuples_per_row, TrackedString* compression_scratch); > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/row-batch.cc File be/src/runtime/row-batch.cc: http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/row-batch.cc@272 PS1, Line 272: // Switch to using full deduplication in cases where severe size blow-ups are known to > +1 for moving these to another outbound-row-batch.cc Thanks, to make it more safe I've replaced it with an OutboundRowBatch::Finalize() method. http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/string-value.h File be/src/runtime/string-value.h: http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/string-value.h@200 PS1, Line 200: inline std::size_t hash_value(const StringValue& v) { > Not changed in this patch, but how does this work with small strings? Strin Ptr() returns a pointer to the data, and Len() returns the actual length, so it should work well with small strings. SimpleStrings wouldn't make any difference as their 'ptr' and 'len' members would have the same value as Ptr() and Len(). SimpleStrings are only used in cases where they make the code more readable or optimal. http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/string-value.h@204 PS1, Line 204: std::ostream& operator<<(std::ostream& os, const StringValue& string_value); > Do you think we should consider specialising std::hash for StringValue or s Yeah, I think we should. We already do that for boost containers (see hash_value above), so I don't see why we shouldn't for std containers. Done. http://gerrit.cloudera.org:8080/#/c/21563/1/be/src/runtime/string-value.h@206 PS1, Line 206: > Do we need this qualifier? Done -- To view, visit http://gerrit.cloudera.org:8080/21563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6095f318e3d06dedb4197681156b40dd2a326c6f Gerrit-Change-Number: 21563 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Sun, 14 Jul 2024 15:10:48 +0000 Gerrit-HasComments: Yes
