[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes
[ https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314618#comment-17314618 ] Dongjoon Hyun commented on PARQUET-1143: Hi, [~rdblue]. Could you set the Fix Version, please? > Update Java for format 2.4.0 changes > > > Key: PARQUET-1143 > URL: https://issues.apache.org/jira/browse/PARQUET-1143 > Project: Parquet > Issue Type: Task > Components: parquet-mr >Affects Versions: 1.9.0, 1.8.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes
[ https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814281#comment-16814281 ] Yuming Wang commented on PARQUET-1143: -- [~rdblue] Should we update the *Fix Version/s*? > Update Java for format 2.4.0 changes > > > Key: PARQUET-1143 > URL: https://issues.apache.org/jira/browse/PARQUET-1143 > Project: Parquet > Issue Type: Task > Components: parquet-mr >Affects Versions: 1.9.0, 1.8.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes
[ https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420676#comment-16420676 ] ASF GitHub Bot commented on PARQUET-1143: - rdblue commented on issue #430: PARQUET-1143: Update to Parquet format 2.4.0. URL: https://github.com/apache/parquet-mr/pull/430#issuecomment-377564457 @scottcarey, you don't need to update Spark, I have a branch with it updated that we're already running in production. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Java for format 2.4.0 changes > > > Key: PARQUET-1143 > URL: https://issues.apache.org/jira/browse/PARQUET-1143 > Project: Parquet > Issue Type: Task > Components: parquet-mr >Affects Versions: 1.9.0, 1.8.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes
[ https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420243#comment-16420243 ] ASF GitHub Bot commented on PARQUET-1143: - scottcarey commented on issue #430: PARQUET-1143: Update to Parquet format 2.4.0. URL: https://github.com/apache/parquet-mr/pull/430#issuecomment-377463522 Yeah, I looked a little further into what is needed on the Spark side too. Part way in modifying the vectorized readers to use method signatures that use ByteBufferInputStream rather than (byte[], offset), I hit a spot where they called back into code here that did not take a ByteBufferInputStream. It looks like changes on both sides are needed. I think that whole area of code would work better if coded with a DataInput interface instead. You can wrap a ByteBufferInputStream in an DataInputStream, and get free (and decently efficient but not amazing) tools for reading littleEndian ints, etc. DataInputStream will be quite a bit faster than calling read() 4 times in a row and constructing the int by hand, though its technique of maintaining a small buffer for reading primitives can be emulated. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Java for format 2.4.0 changes > > > Key: PARQUET-1143 > URL: https://issues.apache.org/jira/browse/PARQUET-1143 > Project: Parquet > Issue Type: Task > Components: parquet-mr >Affects Versions: 1.9.0, 1.8.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes
[ https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419833#comment-16419833 ] ASF GitHub Bot commented on PARQUET-1143: - scottcarey commented on issue #430: PARQUET-1143: Update to Parquet format 2.4.0. URL: https://github.com/apache/parquet-mr/pull/430#issuecomment-377383897 FWIW, I tested out the current master code, overriding the version in my spark projects. I could not output zstandard parquet files because spark-sql's `ParquetOptions` class intercepts the config strings and maps them to a `CompressionCodecName` in parquet-hadoop, rather than just delegating the name lookup to parquet-hadoop, so it does not understand the string 'zstd'. This coupling means that using this from spark will require a new version of spark-sql. Honestly, the code here should be responsible for converting from a simple name to the codec, not spark. Then one could upgrade only the parquet version and gain access to new compression codecs without recompiling/releasing spark. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Java for format 2.4.0 changes > > > Key: PARQUET-1143 > URL: https://issues.apache.org/jira/browse/PARQUET-1143 > Project: Parquet > Issue Type: Task > Components: parquet-mr >Affects Versions: 1.9.0, 1.8.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes
[ https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417340#comment-16417340 ] Mark Marsh commented on PARQUET-1143: - I'm also happy to test release candidates against my use case if it will help getting 1.10.0 out. I'm currently developing against the master branch but it will be problematic to push that through QA... > Update Java for format 2.4.0 changes > > > Key: PARQUET-1143 > URL: https://issues.apache.org/jira/browse/PARQUET-1143 > Project: Parquet > Issue Type: Task > Components: parquet-mr >Affects Versions: 1.9.0, 1.8.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes
[ https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416235#comment-16416235 ] ASF GitHub Bot commented on PARQUET-1143: - scottcarey commented on issue #430: PARQUET-1143: Update to Parquet format 2.4.0. URL: https://github.com/apache/parquet-mr/pull/430#issuecomment-376672804 Anything that I can help with to get 1.10.0 out? I'll be happy to test out any RCs on my use case. I'd rather spend time helping with 1.10.0 than testing a custom built version, but I may be forced to build and test a custom version if an official release with zstd available takes too long. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Java for format 2.4.0 changes > > > Key: PARQUET-1143 > URL: https://issues.apache.org/jira/browse/PARQUET-1143 > Project: Parquet > Issue Type: Task > Components: parquet-mr >Affects Versions: 1.9.0, 1.8.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes
[ https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397914#comment-16397914 ] ASF GitHub Bot commented on PARQUET-1143: - rdblue commented on issue #430: PARQUET-1143: Update to Parquet format 2.4.0. URL: https://github.com/apache/parquet-mr/pull/430#issuecomment-372866727 I'd like to get 1.10.0 out in the next week or two. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Java for format 2.4.0 changes > > > Key: PARQUET-1143 > URL: https://issues.apache.org/jira/browse/PARQUET-1143 > Project: Parquet > Issue Type: Task > Components: parquet-mr >Affects Versions: 1.9.0, 1.8.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes
[ https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397790#comment-16397790 ] ASF GitHub Bot commented on PARQUET-1143: - scottcarey commented on issue #430: PARQUET-1143: Update to Parquet format 2.4.0. URL: https://github.com/apache/parquet-mr/pull/430#issuecomment-372844823 This is great! I would love to test out writing some parquet files using zstd compression. It appears I can not do so without a parquet release however, containing this work. Am I mistaken? Is there a way to manually supply parquet-format 2.4 and combine it with released versions of parquet-avro/mr/etc and spark and output zstd files? If not, what is the rough ETA on a 1.9.1 or 1.10.0 release of parquet that would unlock zstd compression? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Java for format 2.4.0 changes > > > Key: PARQUET-1143 > URL: https://issues.apache.org/jira/browse/PARQUET-1143 > Project: Parquet > Issue Type: Task > Components: parquet-mr >Affects Versions: 1.9.0, 1.8.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)