Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]

2026-03-05 Thread via GitHub


rok commented on PR #438:
URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4006883571

   Yes, but performance gains are likely indicative of what would be possible 
here. I suppose we best first see FIXED_SIZE_LIST debate play out before 
continuing here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]

2026-03-05 Thread via GitHub


pitrou commented on PR #438:
URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4006760847

   This is off-topic as this PR is for VARIABLE_SIZE_LIST, not FIXED_SIZE_LIST.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]

2026-03-05 Thread via GitHub


rahil-c commented on PR #438:
URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4006641061

   @pitrou @rok This was the details of the experiment that I had tried locally 
when writing some vectors to a parquet file with LIST of FLOAT vs having it 
backed by a FIXED_LEN_BYTE_ARRAY, as well as playing around with different 
encodings and compressions. 
   https://lists.apache.org/thread/q9b2lbz8h9loodpzso98wnj1x2tcr20h


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]

2026-03-05 Thread via GitHub


rok commented on PR #438:
URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4006509358

   @rahil-c posted some performance [findings on the 
ML](https://lists.apache.org/thread/q9b2lbz8h9loodpzso98wnj1x2tcr20h), e.g. 
[this table] (I think it's all about fixed size lists). It would be nice to 
have vector-proposal-like form.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]

2026-03-05 Thread via GitHub


pitrou commented on PR #438:
URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4006426542

   The question is more whether the upsides are worth it. This hasn't been 
demonstrated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]

2026-03-05 Thread via GitHub


rok commented on PR #438:
URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4006414592

   Right. This would make the format less optimizable on element level, what 
would be other downsides?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]

2026-03-05 Thread via GitHub


pitrou commented on PR #438:
URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4006286995

   Well, yes, that's how Parquet works. Trying to stuff lists of opaque byte 
arrays doesn't sound like a tremendous idea to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]

2026-03-05 Thread via GitHub


rok commented on PR #438:
URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4005365519

   Even with required elements, LIST still needs repetition levels, and offsets 
must be derived by decoding those levels (at least over the target range), 
rather than read directly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]

2026-03-05 Thread via GitHub


pitrou commented on PR #438:
URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4004877406

   > We would want a VECTOR-like design that would allow variable-size lists 
without per-element definition levels.
   
   I think that's already possible if you have a LIST group node whose child 
node is REQUIRED.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]

2026-03-05 Thread via GitHub


rok commented on PR #438:
URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4004854839

   We would want a VECTOR-like design that would allow variable-size lists 
without per-element definition levels.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]

2026-03-05 Thread via GitHub


pitrou commented on PR #438:
URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4004817999

   > The intent was to define a variable sized list column type without 
repetition/definition levels
   
   Why would it be any better than a LIST column? VECTOR is presumably for 
fized-size lists...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]

2026-03-05 Thread via GitHub


rok commented on PR #438:
URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4004794696

   The intent was to define a variable sized list column type without 
repetition/definition levels. I suppose [vector repetition 
level](https://lists.apache.org/thread/soqd69k8y7b6z0sxbmgrbxcwxbvlj353) would 
address exactly this. We could reuse this PR for the purpose or just close it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]

2026-03-05 Thread via GitHub


pitrou commented on PR #438:
URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4003779026

   What's the point of this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]