[jira] [Commented] (PARQUET-2022) ZstdDecompressorStream should close `zstdInputStream`

2021-04-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322570#comment-17322570
 ] 

ASF GitHub Bot commented on PARQUET-2022:
-

dongjoon-hyun commented on pull request #889:
URL: https://github.com/apache/parquet-mr/pull/889#issuecomment-820866086


   Thank you so much, @ggershinsky !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ZstdDecompressorStream should close `zstdInputStream`
> -
>
> Key: PARQUET-2022
> URL: https://issues.apache.org/jira/browse/PARQUET-2022
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.12.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> `ZstdDecompressorStream` should close its resource because 
> `CompressionInputStream.close` closes only the inter stream.
> {code}
> public class ZstdDecompressorStream extends CompressionInputStream {
>   private ZstdInputStream zstdInputStream;
>   public ZstdDecompressorStream(InputStream stream) throws IOException {
> super(stream);
> zstdInputStream = new ZstdInputStream(stream);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] dongjoon-hyun commented on pull request #889: PARQUET-2022: ZstdDecompressorStream should close `zstdInputStream`

2021-04-15 Thread GitBox


dongjoon-hyun commented on pull request #889:
URL: https://github.com/apache/parquet-mr/pull/889#issuecomment-820866086


   Thank you so much, @ggershinsky !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-2022) ZstdDecompressorStream should close `zstdInputStream`

2021-04-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322401#comment-17322401
 ] 

ASF GitHub Bot commented on PARQUET-2022:
-

ggershinsky commented on pull request #889:
URL: https://github.com/apache/parquet-mr/pull/889#issuecomment-820654860


   will do


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ZstdDecompressorStream should close `zstdInputStream`
> -
>
> Key: PARQUET-2022
> URL: https://issues.apache.org/jira/browse/PARQUET-2022
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.12.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> `ZstdDecompressorStream` should close its resource because 
> `CompressionInputStream.close` closes only the inter stream.
> {code}
> public class ZstdDecompressorStream extends CompressionInputStream {
>   private ZstdInputStream zstdInputStream;
>   public ZstdDecompressorStream(InputStream stream) throws IOException {
> super(stream);
> zstdInputStream = new ZstdInputStream(stream);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] ggershinsky commented on pull request #889: PARQUET-2022: ZstdDecompressorStream should close `zstdInputStream`

2021-04-15 Thread GitBox


ggershinsky commented on pull request #889:
URL: https://github.com/apache/parquet-mr/pull/889#issuecomment-820654860


   will do


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: [RESULT] Release Apache Parquet Format 2.9.0 RC0

2021-04-15 Thread Antoine Pitrou


Thank you very much, Uwe!

I have done the following steps:
* updated the Web site
* sent an announcement to dev@parquet.a.o and announce@a.o
* marked format-2.9.0 as released on JIRA
* created format-2.10.0 on JIRA

I don't think there's anything else remaining to do.

Best regards

Antoine.


On Thu, 15 Apr 2021 09:55:28 +0200
"Uwe L. Korn"  wrote:

> Published the release. 
> 
> On Wed, Apr 14, 2021, at 6:30 PM, Driesprong, Fokko wrote:
> > Yes, you'll need PMC permissions to do that.
> > 
> > A PMC could fetch the artifacts from
> > https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.9.0-rc0/
> > and push them into svn as described below :)
> > 
> > Cheers, Fokko
> > 
> > Op wo 14 apr. 2021 om 17:38 schreef Antoine Pitrou :
> >   
> > > On Wed, 14 Apr 2021 17:30:34 +0200
> > > Antoine Pitrou  wrote:
> > >  
> > > > Ok, it seems PMC intervention is needed for the step
> > > > "3. Copy the release artifacts in SVN into releases" outlined in
> > > > https://parquet.apache.org/documentation/how-to-release/ .
> > > >
> > > > Basically, the `apache-parquet-format-2.9.0-rc0` directory from the SVN
> > > > dev/parquet repository should be copied as
> > > > `apache-parquet-format-2.9.0` to the SVN release/parquet repository.
> > > >
> > > > Could a PMC member do that?  
> > >
> > > AFAICT, the required steps are the following (the last one is rejected
> > > for me):
> > >
> > >   $ svn co https://dist.apache.org/repos/dist/dev/parquet candidates
> > >   $ svn co https://dist.apache.org/repos/dist/release/parquet releases
> > >   $ svn cp candidates/apache-parquet-format-2.9.0-rc0/
> > > releases/apache-parquet-format-2.9.0
> > >   $ cd releases/
> > >   $ svn ci -m "Parquet Format: Add release 2.9.0"
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > >  
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > >
> > > > On Wed, 14 Apr 2021 12:10:11 +0200
> > > > Antoine Pitrou  wrote:  
> > > > > Hello,
> > > > >
> > > > > The vote to release 2.9.0 RC0 as Apache Parquet Format 2.9.0 is PASSED
> > > > > with the required three +1 binding votes.
> > > > >
> > > > > I will try to finalize the release myself, but I may need help from a
> > > > > PMC member.
> > > > >
> > > > > Best regards
> > > > >
> > > > > Antoine.
> > > > >
> > > > >
> > > > >
> > > > > On Wed, 7 Apr 2021 15:10:42 +0200
> > > > > Antoine Pitrou  wrote:
> > > > >  
> > > > > > Hi everyone,
> > > > > >
> > > > > > I propose the following RC to be released as official Apache Parquet
> > > > > > Format 2.9.0 release.
> > > > > >
> > > > > > The commit id is b4f0c0a643a6ec1a7def37115dd6967ba9346df7
> > > > > > * This corresponds to the tag: apache-parquet-format-2.9.0-rc0
> > > > > > *  
> > > https://github.com/apache/parquet-format/tree/b4f0c0a643a6ec1a7def37115dd6967ba9346df7
> > >   
> > > > > >
> > > > > > The release tarball, signature, and checksums are here:
> > > > > > *  
> > > https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.9.0-rc0/
> > >   
> > > > > >
> > > > > > You can find the KEYS file here:
> > > > > > * https://downloads.apache.org/parquet/KEYS
> > > > > >
> > > > > > Binary artifacts are staged in Nexus here:
> > > > > > *  
> > > https://repository.apache.org/content/groups/staging/org/apache/parquet/parquet-format/
> > >   
> > > > > >
> > > > > > This release includes the following important fixes and 
> > > > > > improvements:
> > > > > >
> > > > > > * PARQUET-1996 - [Format] Add interoperable LZ4 codec, deprecate  
> > > existing LZ4 codec  
> > > > > > * PARQUET-2013 - [Format] Mention that converted types are 
> > > > > > deprecated
> > > > > >
> > > > > > ...among other changes (see CHANGES.md for full list).
> > > > > >
> > > > > > Please download, verify, and test.
> > > > > >
> > > > > > Please vote in the next 72 hours.
> > > > > >
> > > > > > [ ] +1 Release this as Apache Parquet 2.9.0
> > > > > > [ ] +0
> > > > > > [ ] -1 Do not release this because...
> > > > > >
> > > > > >
> > > > > > Regards
> > > > > >
> > > > > > Antoine.
> > > > > >
> > > > > >
> > > > > >  
> > > > >
> > > > >
> > > > >
> > > > >  
> > > >
> > > >
> > > >
> > > >  
> > >
> > >
> > >
> > >  
> >   
> 





[jira] [Commented] (PARQUET-2022) ZstdDecompressorStream should close `zstdInputStream`

2021-04-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322240#comment-17322240
 ] 

ASF GitHub Bot commented on PARQUET-2022:
-

dongjoon-hyun commented on pull request #889:
URL: https://github.com/apache/parquet-mr/pull/889#issuecomment-820489701


   Could you review this, @ggershinsky ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ZstdDecompressorStream should close `zstdInputStream`
> -
>
> Key: PARQUET-2022
> URL: https://issues.apache.org/jira/browse/PARQUET-2022
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.12.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> `ZstdDecompressorStream` should close its resource because 
> `CompressionInputStream.close` closes only the inter stream.
> {code}
> public class ZstdDecompressorStream extends CompressionInputStream {
>   private ZstdInputStream zstdInputStream;
>   public ZstdDecompressorStream(InputStream stream) throws IOException {
> super(stream);
> zstdInputStream = new ZstdInputStream(stream);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] dongjoon-hyun commented on pull request #889: PARQUET-2022: ZstdDecompressorStream should close `zstdInputStream`

2021-04-15 Thread GitBox


dongjoon-hyun commented on pull request #889:
URL: https://github.com/apache/parquet-mr/pull/889#issuecomment-820489701


   Could you review this, @ggershinsky ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[ANNOUNCE] Apache Parquet Format release 2.9.0

2021-04-15 Thread Antoine Pitrou



Hello,

I'm pleased to announce the release of the Apache Parquet Format version 
2.9.0.


Parquet is a general-purpose columnar file format for nested data. It 
uses space-efficient encodings and a compressed and splittable structure 
for processing frameworks like Hadoop.


Version 2.9.0 makes minor improvements and fixes to the Parquet format 
specification. A significant improvement is the addition of the LZ4_RAW 
compression algorithm which deprecates the unfortunately underspecified 
LZ4 algorithm.


The full list of changes for this release is available below:
https://github.com/apache/parquet-format/blob/apache-parquet-format-2.9.0/CHANGES.md

This release can be downloaded from: https://parquet.apache.org/downloads/

Thanks to everyone for contributing!

Regards,

Antoine Pitrou.


Re: [RESULT] Release Apache Parquet Format 2.9.0 RC0

2021-04-15 Thread Uwe L. Korn
Published the release. 

On Wed, Apr 14, 2021, at 6:30 PM, Driesprong, Fokko wrote:
> Yes, you'll need PMC permissions to do that.
> 
> A PMC could fetch the artifacts from
> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.9.0-rc0/
> and push them into svn as described below :)
> 
> Cheers, Fokko
> 
> Op wo 14 apr. 2021 om 17:38 schreef Antoine Pitrou :
> 
> > On Wed, 14 Apr 2021 17:30:34 +0200
> > Antoine Pitrou  wrote:
> >
> > > Ok, it seems PMC intervention is needed for the step
> > > "3. Copy the release artifacts in SVN into releases" outlined in
> > > https://parquet.apache.org/documentation/how-to-release/ .
> > >
> > > Basically, the `apache-parquet-format-2.9.0-rc0` directory from the SVN
> > > dev/parquet repository should be copied as
> > > `apache-parquet-format-2.9.0` to the SVN release/parquet repository.
> > >
> > > Could a PMC member do that?
> >
> > AFAICT, the required steps are the following (the last one is rejected
> > for me):
> >
> >   $ svn co https://dist.apache.org/repos/dist/dev/parquet candidates
> >   $ svn co https://dist.apache.org/repos/dist/release/parquet releases
> >   $ svn cp candidates/apache-parquet-format-2.9.0-rc0/
> > releases/apache-parquet-format-2.9.0
> >   $ cd releases/
> >   $ svn ci -m "Parquet Format: Add release 2.9.0"
> >
> > Regards
> >
> > Antoine.
> >
> >
> >
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > >
> > > On Wed, 14 Apr 2021 12:10:11 +0200
> > > Antoine Pitrou  wrote:
> > > > Hello,
> > > >
> > > > The vote to release 2.9.0 RC0 as Apache Parquet Format 2.9.0 is PASSED
> > > > with the required three +1 binding votes.
> > > >
> > > > I will try to finalize the release myself, but I may need help from a
> > > > PMC member.
> > > >
> > > > Best regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > >
> > > > On Wed, 7 Apr 2021 15:10:42 +0200
> > > > Antoine Pitrou  wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I propose the following RC to be released as official Apache Parquet
> > > > > Format 2.9.0 release.
> > > > >
> > > > > The commit id is b4f0c0a643a6ec1a7def37115dd6967ba9346df7
> > > > > * This corresponds to the tag: apache-parquet-format-2.9.0-rc0
> > > > > *
> > https://github.com/apache/parquet-format/tree/b4f0c0a643a6ec1a7def37115dd6967ba9346df7
> > > > >
> > > > > The release tarball, signature, and checksums are here:
> > > > > *
> > https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.9.0-rc0/
> > > > >
> > > > > You can find the KEYS file here:
> > > > > * https://downloads.apache.org/parquet/KEYS
> > > > >
> > > > > Binary artifacts are staged in Nexus here:
> > > > > *
> > https://repository.apache.org/content/groups/staging/org/apache/parquet/parquet-format/
> > > > >
> > > > > This release includes the following important fixes and improvements:
> > > > >
> > > > > * PARQUET-1996 - [Format] Add interoperable LZ4 codec, deprecate
> > existing LZ4 codec
> > > > > * PARQUET-2013 - [Format] Mention that converted types are deprecated
> > > > >
> > > > > ...among other changes (see CHANGES.md for full list).
> > > > >
> > > > > Please download, verify, and test.
> > > > >
> > > > > Please vote in the next 72 hours.
> > > > >
> > > > > [ ] +1 Release this as Apache Parquet 2.9.0
> > > > > [ ] +0
> > > > > [ ] -1 Do not release this because...
> > > > >
> > > > >
> > > > > Regards
> > > > >
> > > > > Antoine.
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> 


[jira] [Commented] (PARQUET-2025) Bump snappy to 1.1.8.3 to support Mac m1

2021-04-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321941#comment-17321941
 ] 

ASF GitHub Bot commented on PARQUET-2025:
-

cyraid commented on pull request #893:
URL: https://github.com/apache/parquet-mr/pull/893#issuecomment-820155961


   May I ask why there are so many open pull requests from years ago? Is this 
repository dead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Bump snappy to 1.1.8.3 to support Mac m1
> 
>
> Key: PARQUET-2025
> URL: https://issues.apache.org/jira/browse/PARQUET-2025
> Project: Parquet
>  Issue Type: Bug
>Reporter: Junjie Chen
>Priority: Minor
>
> When running unit tests of  iceberg on Mac m1 , it throws:                
>  
> Caused by:
>                 java.lang.NoClassDefFoundError: Could not initialize class 
> org.xerial.snappy.Snappy
>                     at 
> org.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67)
>                     at 
> org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
>                     at 
> org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)
>                     at 
> org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.compress(CodecFactory.java:165)
>                     at 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:122)
>                     at 
> org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:53)
>                     at 
> org.apache.parquet.column.impl.ColumnWriterBase.writePage(ColumnWriterBase.java:315)
>                     at 
> org.apache.parquet.column.impl.ColumnWriteStoreBase.flush(ColumnWriteStoreBase.java:152)
>                     at 
> org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:27)
>                     at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:172)
>                     at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:114)
>                     at 
> org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:165)
>                     at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetOutputWriter.scala:42)
>                     at 
> org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:57)
>                     at 
> org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:74)
>                     at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247)
>                     at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242)
>                     at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
>                     at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:248)
>                     ... 10 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] cyraid commented on pull request #893: PARQUET-2025: Update Snappy version to 1.1.8.3

2021-04-15 Thread GitBox


cyraid commented on pull request #893:
URL: https://github.com/apache/parquet-mr/pull/893#issuecomment-820155961


   May I ask why there are so many open pull requests from years ago? Is this 
repository dead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org