[kudu-CR] WIP [docs] add information on nullable array data block

2024-11-22 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22058 )

Change subject: WIP [docs] add information on nullable array data block
..


Patch Set 3: Verified+1

Build Successful

http://jenkins.kudu.apache.org/job/pre_commit/700/ : SUCCESS


--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy 
Gerrit-Comment-Date: Sat, 23 Nov 2024 04:03:27 +
Gerrit-HasComments: No


[kudu-CR] WIP [docs] add information on nullable array data block

2024-11-22 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22058 )

Change subject: WIP [docs] add information on nullable array data block
..


Patch Set 3:

Build Started http://jenkins.kudu.apache.org/job/pre_commit/700/


--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy 
Gerrit-Comment-Date: Sat, 23 Nov 2024 04:03:05 +
Gerrit-HasComments: No


[kudu-CR] WIP [docs] add information on nullable array data block

2024-11-22 Thread Alexey Serbin (Code Review)
Hello Mahesh Reddy, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/22058

to look at the new patch set (#3).

Change subject: WIP [docs] add information on nullable array data block
..

WIP [docs] add information on nullable array data block

Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
---
M docs/design-docs/cfile.md
1 file changed, 106 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/58/22058/3
--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy 


[kudu-CR] WIP [docs] add information on nullable array data block

2024-11-22 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22058 )

Change subject: WIP [docs] add information on nullable array data block
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@120
PS2, Line 120: flatten
> nit: flattened
Done


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@123
PS2, Line 123: similar
> nit: similarly
Done


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@128
PS2, Line 128: flatten
> nit: flattened, likewise for all instances below here and in the examples.
Done


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@148
PS2, Line 148: format
> format for illustration
Done


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@166
PS2, Line 166: | [2, 2) | {} |
 : | [2, 2) | null |
 : | [2, 4) | { 3,4 } |
 : | [4, 8) | { 5,6,7,8 } |
 : | [8, 9) | { null } |
> Sure, I'll add this one even if it's easily deducible from the former examp
Ah, that was a typo in the 'array start indices' field -- it should have been 
'0,1,1,1'.  I fixed that and also changed '3,4' sequence into '4,2' :)



--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy 
Gerrit-Comment-Date: Sat, 23 Nov 2024 04:03:05 +
Gerrit-HasComments: Yes


[kudu-CR] WIP [docs] add information on nullable array data block

2024-11-22 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22058 )

Change subject: WIP [docs] add information on nullable array data block
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/22058/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22058/2//COMMIT_MSG@7
PS2, Line 7: array
> nit: This may have already been answered before, for my understanding - doe
No, it's not.

Multi-dimensional arrays and other complex data structures require an 
additional layer (dealing with so-called 'definition level') that's orthogonal 
to this one.  Basically, this work allows for one-dimensional arrays and also 
provides the basis for so-called 'repetition' level in terms of nested data 
structures representation introduced in Dremel and used in other projects like 
Parquet and Arrow.  This (and related parts) might be a good read to get a 
broader context: 
https://arrow.apache.org/blog/2022/10/08/arrow-parquet-encoding-part-2/


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@156
PS2, Line 156: 110
> +1
There isn't any logic -- these bitmaps are completely independent.


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@156
PS2, Line 156: 110
The array bitmap and the flattened sequence bitmaps are completely independent.

> Why doesn't the array null bitmap reflect that as well?

Array bitmap provides the information on the nullability of arrays themselves, 
not elements in them.  The bitmaps are independent -- that way it's much easier 
to interpret the contents.

You can think of it like this: first, the full sequence is restored (will 
nulls) using the flattened bitmap.  Now, using the array nullability bitmap and 
the information on the array start indices, arrays cells are being restored 
from the sequence that now contains null elements as well.


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@169
PS2, Line 169: 5,6,7,8
> Can array elements be in random sequence or non-ascending order?
Elements in array can be in any order -- that's just how they are represented 
in array data blocks, but the representation of those is always deterministic 
as per the documented spec here.


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@166
PS2, Line 166: | [2, 2) | {} |
 : | [2, 2) | null |
 : | [2, 4) | { 3,4 } |
 : | [4, 8) | { 5,6,7,8 } |
 : | [8, 9) | { null } |
> It would help to add a one liner definition for these (sort of a notation s
Sure, I'll add this one even if it's easily deducible from the former example. 
As one can see, it would be:

| field | value in human readable format for illustration |
| --- | --- |
| flatten sequence | 3,4 |
| flatten value count | 2 |
| flatten null bitmap length | 3 |
| flatten null bitmap | 011 |
| array start indices length | 4 |
| array start indices | 0,0,0,0 |
| array null bitmap | 1011 |



--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy 
Gerrit-Comment-Date: Fri, 22 Nov 2024 18:59:57 +
Gerrit-HasComments: Yes


[kudu-CR] WIP [docs] add information on nullable array data block

2024-11-20 Thread Ashwani Raina (Code Review)
Ashwani Raina has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22058 )

Change subject: WIP [docs] add information on nullable array data block
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/22058/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22058/2//COMMIT_MSG@7
PS2, Line 7: array
nit: This may have already been answered before, for my understanding - does 
this also consider possibility of multi-dimension arrays?
If yes, it would be great if you can add an example to cover that case as well.


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@148
PS2, Line 148: format
format for illustration


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@156
PS2, Line 156: 110
> Does this imply that only the third array is null? Meanwhile the flattened
+1
If there is some bitwise operation logic applied here, it would help to add 
know-how about the same (at least in the comments section here) for posterity.


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@169
PS2, Line 169: 5,6,7,8
Can array elements be in random sequence or non-ascending order?


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@166
PS2, Line 166: | [2, 2) | {} |
 : | [2, 2) | null |
 : | [2, 4) | { 3,4 } |
 : | [4, 8) | { 5,6,7,8 } |
 : | [8, 9) | { null } |
It would help to add a one liner definition for these (sort of a notation 
section):
{}
null
{ null }
{ 3,4 }



--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy 
Gerrit-Comment-Date: Wed, 20 Nov 2024 15:48:11 +
Gerrit-HasComments: Yes


[kudu-CR] WIP [docs] add information on nullable array data block

2024-11-19 Thread Mahesh Reddy (Code Review)
Mahesh Reddy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22058 )

Change subject: WIP [docs] add information on nullable array data block
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@120
PS2, Line 120: flatten
nit: flattened


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@123
PS2, Line 123: similar
nit: similarly


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@128
PS2, Line 128: flatten
nit: flattened, likewise for all instances below here and in the examples.


http://gerrit.cloudera.org:8080/#/c/22058/2/docs/design-docs/cfile.md@156
PS2, Line 156: 110
Does this imply that only the third array is null? Meanwhile the flattened null 
bitmap implies that only the second to last value is null. Why doesn't the 
array null bitmap reflect that as well? Likewise, why doesn't the flattened 
null bitmap show that the 4th value is null? Maybe I'm missing something but 
just wondering.



--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy 
Gerrit-Comment-Date: Tue, 19 Nov 2024 18:45:37 +
Gerrit-HasComments: Yes


[kudu-CR] WIP [docs] add information on nullable array data block

2024-11-14 Thread Abhishek Chennaka (Code Review)
Abhishek Chennaka has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22058 )

Change subject: WIP [docs] add information on nullable array data block
..


Patch Set 2: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/22058/1/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

http://gerrit.cloudera.org:8080/#/c/22058/1/docs/design-docs/cfile.md@151
PS1, Line 151: flatten value count
> That's a good point.  I think we can consider this option to express coordi
Agreed, thanks for the explanation



--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 14 Nov 2024 20:14:01 +
Gerrit-HasComments: Yes


[kudu-CR] WIP [docs] add information on nullable array data block

2024-11-13 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22058 )

Change subject: WIP [docs] add information on nullable array data block
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/22058/1/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

http://gerrit.cloudera.org:8080/#/c/22058/1/docs/design-docs/cfile.md@180
PS1, Line 180: null arrays
> nit: "array null"
Done



--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 14 Nov 2024 03:28:14 +
Gerrit-HasComments: Yes


[kudu-CR] WIP [docs] add information on nullable array data block

2024-11-13 Thread Alexey Serbin (Code Review)
Hello Kudu Jenkins, Abhishek Chennaka,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/22058

to look at the new patch set (#2).

Change subject: WIP [docs] add information on nullable array data block
..

WIP [docs] add information on nullable array data block

Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
---
M docs/design-docs/cfile.md
1 file changed, 81 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/58/22058/2
--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] WIP [docs] add information on nullable array data block

2024-11-13 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22058 )

Change subject: WIP [docs] add information on nullable array data block
..


Patch Set 2: Verified+1

Build Successful

http://jenkins.kudu.apache.org/job/pre_commit/680/ : SUCCESS


--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 14 Nov 2024 03:28:40 +
Gerrit-HasComments: No


[kudu-CR] WIP [docs] add information on nullable array data block

2024-11-13 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22058 )

Change subject: WIP [docs] add information on nullable array data block
..


Patch Set 2:

Build Started http://jenkins.kudu.apache.org/job/pre_commit/680/


--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 14 Nov 2024 03:28:19 +
Gerrit-HasComments: No


[kudu-CR] WIP [docs] add information on nullable array data block

2024-11-13 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22058 )

Change subject: WIP [docs] add information on nullable array data block
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/22058/1/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

http://gerrit.cloudera.org:8080/#/c/22058/1/docs/design-docs/cfile.md@151
PS1, Line 151: array start indices
> When storing the starting indices of each array, we might end up storing la
That's a good point.  I think we can consider this option to express 
coordinates of array cells as well.

Originally, I was thinking of expressing the offsets in rather "absolute" than 
"relative" coordinates.  That seems better because of convenient access to a 
particular cell in the flattened sequence (i.e. fetching the data of particular 
array): there isn't a need to go through the elements of the array cells 
coordinates from the very beginning to find the position in the flattened 
sequence.

I recall Arrow uses similar notation 
(https://arrow.apache.org/docs/format/Columnar.html#variable-size-list-layout) 
in offset buffers for variable-sized lists, and I thought to apply similar 
approach here as well.  However, so far it seems using sizes instead of 
indices/offsets should work in our case as well.

BTW, there shouldn't be very large numbers for array indices with the default 
limit for block size in a CFile of 256KiByte.  Since the values are 
LEB128-encoded, for 4-byte integer values that would be just 2 bytes after the 
encoding for indices close to 64Ki.  Yes, that's would be about 2 times more 
compared with storing array sizes if the majority of arrays contain less than 
256 elements.  In absolute terms, for an extreme case of 1 single element 
arrays we are talking about ~10KiByte vs ~20KiByte.



--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 13 Nov 2024 22:24:49 +
Gerrit-HasComments: Yes


[kudu-CR] WIP [docs] add information on nullable array data block

2024-11-13 Thread Abhishek Chennaka (Code Review)
Abhishek Chennaka has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22058 )

Change subject: WIP [docs] add information on nullable array data block
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/22058/1/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

http://gerrit.cloudera.org:8080/#/c/22058/1/docs/design-docs/cfile.md@151
PS1, Line 151: array start indices
When storing the starting indices of each array, we might end up storing large 
values if the number of arrays in the block is big. Instead how about storing 
the length of each array(which would introduce some latency)?

If the length of the 100th array is 3 elements,
Current way would store it as [...100,102...]
Instead we could store it as [<99th array length>,3,<101th array length>]

There are definitely pros and cons of each way but not sure if that is already 
thought through before arriving here. If so I'm curious to know the thoughts.


http://gerrit.cloudera.org:8080/#/c/22058/1/docs/design-docs/cfile.md@180
PS1, Line 180: null arrays
nit: "array null"



--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 13 Nov 2024 20:05:14 +
Gerrit-HasComments: Yes


[kudu-CR] WIP [docs] add information on nullable array data block

2024-11-12 Thread Alexey Serbin (Code Review)
Alexey Serbin has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/22058


Change subject: WIP [docs] add information on nullable array data block
..

WIP [docs] add information on nullable array data block

Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
---
M docs/design-docs/cfile.md
1 file changed, 78 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/58/22058/1
--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 


[kudu-CR] WIP [docs] add information on nullable array data block

2024-11-12 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22058 )

Change subject: WIP [docs] add information on nullable array data block
..


Patch Set 1: Verified+1

Build Successful

http://jenkins.kudu.apache.org/job/pre_commit/677/ : SUCCESS


--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 12 Nov 2024 23:48:50 +
Gerrit-HasComments: No


[kudu-CR] WIP [docs] add information on nullable array data block

2024-11-12 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22058 )

Change subject: WIP [docs] add information on nullable array data block
..


Patch Set 1:

Build Started http://jenkins.kudu.apache.org/job/pre_commit/677/


--
To view, visit http://gerrit.cloudera.org:8080/22058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8972b3791d155e102240c80012e2b87192914cd1
Gerrit-Change-Number: 22058
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 12 Nov 2024 23:48:28 +
Gerrit-HasComments: No