[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes

2021-04-04 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314618#comment-17314618
 ] 

Dongjoon Hyun commented on PARQUET-1143:


Hi, [~rdblue]. Could you set the Fix Version, please?

> Update Java for format 2.4.0 changes
> 
>
> Key: PARQUET-1143
> URL: https://issues.apache.org/jira/browse/PARQUET-1143
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-mr
>Affects Versions: 1.9.0, 1.8.2
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes

2019-04-10 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814281#comment-16814281
 ] 

Yuming Wang commented on PARQUET-1143:
--

[~rdblue] Should we update the *Fix Version/s*?

> Update Java for format 2.4.0 changes
> 
>
> Key: PARQUET-1143
> URL: https://issues.apache.org/jira/browse/PARQUET-1143
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-mr
>Affects Versions: 1.9.0, 1.8.2
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420676#comment-16420676
 ] 

ASF GitHub Bot commented on PARQUET-1143:
-

rdblue commented on issue #430: PARQUET-1143: Update to Parquet format 2.4.0.
URL: https://github.com/apache/parquet-mr/pull/430#issuecomment-377564457
 
 
   @scottcarey, you don't need to update Spark, I have a branch with it updated 
that we're already running in production.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Java for format 2.4.0 changes
> 
>
> Key: PARQUET-1143
> URL: https://issues.apache.org/jira/browse/PARQUET-1143
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-mr
>Affects Versions: 1.9.0, 1.8.2
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420243#comment-16420243
 ] 

ASF GitHub Bot commented on PARQUET-1143:
-

scottcarey commented on issue #430: PARQUET-1143: Update to Parquet format 
2.4.0.
URL: https://github.com/apache/parquet-mr/pull/430#issuecomment-377463522
 
 
   Yeah, I looked a little further into what is needed on the Spark side too.   
Part way in modifying the vectorized readers to use method signatures that use 
ByteBufferInputStream rather than (byte[], offset), I hit a spot where they 
called back into code here that did not take a ByteBufferInputStream.
   
   It looks like changes on both sides are needed.
   
   I think that whole area of code would work better if coded with a DataInput 
interface instead.  You can wrap a ByteBufferInputStream in an DataInputStream, 
and get free (and decently efficient but not amazing) tools for reading 
littleEndian ints, etc.  DataInputStream will be quite a bit faster than 
calling read() 4 times in a row and constructing the int by hand, though its 
technique of maintaining a small buffer for reading primitives can be emulated.
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Java for format 2.4.0 changes
> 
>
> Key: PARQUET-1143
> URL: https://issues.apache.org/jira/browse/PARQUET-1143
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-mr
>Affects Versions: 1.9.0, 1.8.2
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419833#comment-16419833
 ] 

ASF GitHub Bot commented on PARQUET-1143:
-

scottcarey commented on issue #430: PARQUET-1143: Update to Parquet format 
2.4.0.
URL: https://github.com/apache/parquet-mr/pull/430#issuecomment-377383897
 
 
   FWIW, I tested out the current master code, overriding the version in my 
spark projects.  I could not output zstandard parquet files because spark-sql's 
`ParquetOptions` class intercepts the config strings and maps them to a 
`CompressionCodecName` in parquet-hadoop, rather than just delegating the name 
lookup to parquet-hadoop, so it does not understand the string 'zstd'.
   
   This coupling means that using this from spark will require a new version of 
spark-sql.  Honestly, the code here should be responsible for converting from a 
simple name to the codec, not spark.  Then one could upgrade only the parquet 
version and gain access to new compression codecs without recompiling/releasing 
spark.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Java for format 2.4.0 changes
> 
>
> Key: PARQUET-1143
> URL: https://issues.apache.org/jira/browse/PARQUET-1143
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-mr
>Affects Versions: 1.9.0, 1.8.2
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes

2018-03-28 Thread Mark Marsh (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417340#comment-16417340
 ] 

Mark Marsh commented on PARQUET-1143:
-

I'm also happy to test release candidates against my use case if it will help 
getting 1.10.0 out.

I'm currently developing against the master branch but it will be problematic 
to push that through QA...

> Update Java for format 2.4.0 changes
> 
>
> Key: PARQUET-1143
> URL: https://issues.apache.org/jira/browse/PARQUET-1143
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-mr
>Affects Versions: 1.9.0, 1.8.2
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes

2018-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416235#comment-16416235
 ] 

ASF GitHub Bot commented on PARQUET-1143:
-

scottcarey commented on issue #430: PARQUET-1143: Update to Parquet format 
2.4.0.
URL: https://github.com/apache/parquet-mr/pull/430#issuecomment-376672804
 
 
   Anything that I can help with to get 1.10.0 out?  I'll be happy to test out 
any RCs on my use case.  I'd rather spend time helping with 1.10.0 than testing 
a custom built version, but I may be forced to build and test a custom version 
if an official release with zstd available takes too long.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Java for format 2.4.0 changes
> 
>
> Key: PARQUET-1143
> URL: https://issues.apache.org/jira/browse/PARQUET-1143
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-mr
>Affects Versions: 1.9.0, 1.8.2
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397914#comment-16397914
 ] 

ASF GitHub Bot commented on PARQUET-1143:
-

rdblue commented on issue #430: PARQUET-1143: Update to Parquet format 2.4.0.
URL: https://github.com/apache/parquet-mr/pull/430#issuecomment-372866727
 
 
   I'd like to get 1.10.0 out in the next week or two.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Java for format 2.4.0 changes
> 
>
> Key: PARQUET-1143
> URL: https://issues.apache.org/jira/browse/PARQUET-1143
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-mr
>Affects Versions: 1.9.0, 1.8.2
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397790#comment-16397790
 ] 

ASF GitHub Bot commented on PARQUET-1143:
-

scottcarey commented on issue #430: PARQUET-1143: Update to Parquet format 
2.4.0.
URL: https://github.com/apache/parquet-mr/pull/430#issuecomment-372844823
 
 
   This is great!   I would love to test out writing some parquet files using 
zstd compression.
   
   It appears I can not do so without a parquet release however, containing 
this work.
   
   Am I mistaken? Is there a way to manually supply parquet-format 2.4 and 
combine it with released versions of parquet-avro/mr/etc and spark and output 
zstd files?
   
   If not, what is the rough ETA on a 1.9.1 or 1.10.0 release of parquet that 
would unlock zstd compression?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Java for format 2.4.0 changes
> 
>
> Key: PARQUET-1143
> URL: https://issues.apache.org/jira/browse/PARQUET-1143
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-mr
>Affects Versions: 1.9.0, 1.8.2
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)