Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]
rok commented on PR #438: URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4006883571 Yes, but performance gains are likely indicative of what would be possible here. I suppose we best first see FIXED_SIZE_LIST debate play out before continuing here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]
pitrou commented on PR #438: URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4006760847 This is off-topic as this PR is for VARIABLE_SIZE_LIST, not FIXED_SIZE_LIST. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]
rahil-c commented on PR #438: URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4006641061 @pitrou @rok This was the details of the experiment that I had tried locally when writing some vectors to a parquet file with LIST of FLOAT vs having it backed by a FIXED_LEN_BYTE_ARRAY, as well as playing around with different encodings and compressions. https://lists.apache.org/thread/q9b2lbz8h9loodpzso98wnj1x2tcr20h -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]
rok commented on PR #438: URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4006509358 @rahil-c posted some performance [findings on the ML](https://lists.apache.org/thread/q9b2lbz8h9loodpzso98wnj1x2tcr20h), e.g. [this table] (I think it's all about fixed size lists). It would be nice to have vector-proposal-like form. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]
pitrou commented on PR #438: URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4006426542 The question is more whether the upsides are worth it. This hasn't been demonstrated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]
rok commented on PR #438: URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4006414592 Right. This would make the format less optimizable on element level, what would be other downsides? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]
pitrou commented on PR #438: URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4006286995 Well, yes, that's how Parquet works. Trying to stuff lists of opaque byte arrays doesn't sound like a tremendous idea to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]
rok commented on PR #438: URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4005365519 Even with required elements, LIST still needs repetition levels, and offsets must be derived by decoding those levels (at least over the target range), rather than read directly? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]
pitrou commented on PR #438: URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4004877406 > We would want a VECTOR-like design that would allow variable-size lists without per-element definition levels. I think that's already possible if you have a LIST group node whose child node is REQUIRED. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]
rok commented on PR #438: URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4004854839 We would want a VECTOR-like design that would allow variable-size lists without per-element definition levels. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]
pitrou commented on PR #438: URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4004817999 > The intent was to define a variable sized list column type without repetition/definition levels Why would it be any better than a LIST column? VECTOR is presumably for fized-size lists... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]
rok commented on PR #438: URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4004794696 The intent was to define a variable sized list column type without repetition/definition levels. I suppose [vector repetition level](https://lists.apache.org/thread/soqd69k8y7b6z0sxbmgrbxcwxbvlj353) would address exactly this. We could reuse this PR for the purpose or just close it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] GH-437: [Format] Specify VARIABLE_SIZE_LIST Logical type [parquet-format]
pitrou commented on PR #438: URL: https://github.com/apache/parquet-format/pull/438#issuecomment-4003779026 What's the point of this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
