Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20108 )

Change subject: IMPALA-12159: Support ORDER BY for collections of variable 
length types in select list
......................................................................


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/20108/5/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

http://gerrit.cloudera.org:8080/#/c/20108/5/be/src/runtime/descriptors.cc@937
PS5, Line 937:     const NonWritableBasicBlock& insert_before
> This will be used once sorting collections with varlen slots within structs
I don't quite understand, it is already used on L958.


http://gerrit.cloudera.org:8080/#/c/20108/5/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/20108/5/be/src/runtime/sorter.cc@640
PS5, Line 640:       // Recurse first so that children come before parents in 
the list.
> Is this an important property? Can you extend the comment to specify whethe
Yes, it is important. I expanded the function comment in sorter-internal.h to 
include the reason and added a reference to that here. Did the same for 
Tuple:MaterializeExprs().

This order is chosen because of the way we serialise the values in 
CopyVarLenData() and CopyVarLenDataConvertOffset(): we write the var-len part 
of 'StringValue's and 'CollectionValue's to the buffer and update the pointers 
in-place. The order becomes important if these '(String|Collection)Value's are 
themselves (var-len) children of other 'CollectionValue's. If children are 
written before their parents, then when the parents are written their pointers 
have already been updated so they can be written as-is. If parents were written 
before their children, updating the pointers in-place would not be enough, the 
already serialised pointers would have to be updated in the byte buffer. Note 
that to ensure that this order is kept, the 'StringValue's must be serialised 
before the 'CollectionValue's - strings can be children of collections but not 
the other way around.


http://gerrit.cloudera.org:8080/#/c/20108/5/be/src/runtime/sorter.cc@812
PS5, Line 812: e = decltype(value->p
> Isn't the original version simpler? Due to the DCHECK above the result must
Yes, I changed it back. It turns out that adding 0 to nullptr is valid, I 
thought it was also UB.


http://gerrit.cloudera.org:8080/#/c/20108/5/be/src/runtime/tuple.h
File be/src/runtime/tuple.h:

http://gerrit.cloudera.org:8080/#/c/20108/5/be/src/runtime/tuple.h@168
PS5, Line 168:   ///  - the materialized non-NULL string value slots are 
returned in
             :   ///    'non_null_string_values',
             :   ///  - the materialized non-NULL collection value slots, along 
with their byte sizes,
             :   ///    are returned in 'non_null_collection_values' and
> Can you add a comment about the max size of these? What I mean it that if t
Done



--
To view, visit http://gerrit.cloudera.org:8080/20108
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic15b29393f260b572e11a8dbb9deeb8c02981852
Gerrit-Change-Number: 20108
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Mon, 13 Nov 2023 14:48:58 +0000
Gerrit-HasComments: Yes

Reply via email to