Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22058 )

Change subject: WIP [docs] add information on nullable array data block
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/22058/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22058/2//COMMIT_MSG@7
PS2, Line 7: array
> nit: This may have already been answered before, for my understanding - doe
No, it's not.

Multi-dimensional arrays and other complex data structures require an 
additional layer (dealing with so-called 'definition level') that's orthogonal 
to this one.  Basically, this work allows for one-dimensional arrays and also 
provides the basis for so-called 'repetition' level in terms of nested data 
structures representation introduced in Dremel and used in other projects like 
Parquet and Arrow.  This (and related parts) might be a good read to get a 
broader context: 
https://arrow.apache.org/blog/2022/10/08/arrow-parquet-encoding-part-2/


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@156
PS2, Line 156: 1101111
> +1
There isn't any logic -- these bitmaps are completely independent.


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@156
PS2, Line 156: 1101111
The array bitmap and the flattened sequence bitmaps are completely independent.

> Why doesn't the array null bitmap reflect that as well?

Array bitmap provides the information on the nullability of arrays themselves, 
not elements in them.  The bitmaps are independent -- that way it's much easier 
to interpret the contents.

You can think of it like this: first, the full sequence is restored (will 
nulls) using the flattened bitmap.  Now, using the array nullability bitmap and 
the information on the array start indices, arrays cells are being restored 
from the sequence that now contains null elements as well.


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@169
PS2, Line 169: 5,6,7,8
> Can array elements be in random sequence or non-ascending order?
Elements in array can be in any order -- that's just how they are represented 
in array data blocks, but the representation of those is always deterministic 
as per the documented spec here.


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@166
PS2, Line 166: | [2, 2) | {} |
             : | [2, 2) | null |
             : | [2, 4) | { 3,4 } |
             : | [4, 8) | { 5,6,7,8 } |
             : | [8, 9) | { null } |
> It would help to add a one liner definition for these (sort of a notation s
Sure, I'll add this one even if it's easily deducible from the former example. 
As one can see, it would be:

| field | value in human readable format for illustration |
| --- | --- |
| flatten sequence | 3,4 |
| flatten value count | 2 |
| flatten null bitmap length | 3 |
| flatten null bitmap | 011 |
| array start indices length | 4 |
| array start indices | 0,0,0,0 |
| array null bitmap | 1011 |



--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Comment-Date: Fri, 22 Nov 2024 18:59:57 +0000
Gerrit-HasComments: Yes

Reply via email to