Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20108 )
Change subject: IMPALA-12159: Support ORDER BY for collections of variable length types in select list ...................................................................... Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/20108/5/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: http://gerrit.cloudera.org:8080/#/c/20108/5/be/src/runtime/descriptors.cc@937 PS5, Line 937: const NonWritableBasicBlock& insert_before > This will be used once sorting collections with varlen slots within structs I don't quite understand, it is already used on L958. http://gerrit.cloudera.org:8080/#/c/20108/5/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/20108/5/be/src/runtime/sorter.cc@640 PS5, Line 640: // Recurse first so that children come before parents in the list. > Is this an important property? Can you extend the comment to specify whethe Yes, it is important. I expanded the function comment in sorter-internal.h to include the reason and added a reference to that here. Did the same for Tuple:MaterializeExprs(). This order is chosen because of the way we serialise the values in CopyVarLenData() and CopyVarLenDataConvertOffset(): we write the var-len part of 'StringValue's and 'CollectionValue's to the buffer and update the pointers in-place. The order becomes important if these '(String|Collection)Value's are themselves (var-len) children of other 'CollectionValue's. If children are written before their parents, then when the parents are written their pointers have already been updated so they can be written as-is. If parents were written before their children, updating the pointers in-place would not be enough, the already serialised pointers would have to be updated in the byte buffer. Note that to ensure that this order is kept, the 'StringValue's must be serialised before the 'CollectionValue's - strings can be children of collections but not the other way around. http://gerrit.cloudera.org:8080/#/c/20108/5/be/src/runtime/sorter.cc@812 PS5, Line 812: e = decltype(value->p > Isn't the original version simpler? Due to the DCHECK above the result must Yes, I changed it back. It turns out that adding 0 to nullptr is valid, I thought it was also UB. http://gerrit.cloudera.org:8080/#/c/20108/5/be/src/runtime/tuple.h File be/src/runtime/tuple.h: http://gerrit.cloudera.org:8080/#/c/20108/5/be/src/runtime/tuple.h@168 PS5, Line 168: /// - the materialized non-NULL string value slots are returned in : /// 'non_null_string_values', : /// - the materialized non-NULL collection value slots, along with their byte sizes, : /// are returned in 'non_null_collection_values' and > Can you add a comment about the max size of these? What I mean it that if t Done -- To view, visit http://gerrit.cloudera.org:8080/20108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic15b29393f260b572e11a8dbb9deeb8c02981852 Gerrit-Change-Number: 20108 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Mon, 13 Nov 2023 14:48:58 +0000 Gerrit-HasComments: Yes
