Re: Re: [Announce] new committer: Gidon Gershinsky

2021-04-12 Thread Maya Anderson
Congratulations, Gidon!
Very well deserved!

Regards,
Maya


> From:"Driesprong, Fokko" 
> To:dev@parquet.apache.org
> Cc:emkornfi...@gmail.com
> Date:07/04/2021 21:26
> Subject:[EXTERNAL] Re: [Announce] new committer: Gidon Gershinsky
> --
>
>
>
> Congrats Gidon, well deserved :)
>
> Op wo 7 apr. 2021 om 18:11 schreef Dongjoon Hyun 
>
> > Congrats, Gidon! :)
> >
> > Bests,
> > Dongjoon.
> >
> > On Wed, Apr 7, 2021 at 9:06 AM Chao Sun  wrote:
> >
> > > Congrats Gidon!
> > >
> > > On Wed, Apr 7, 2021 at 8:27 AM Micah Kornfield 
> > > wrote:
> > >
> > > > Congrats Gidon, well deserved.
> > > >
> > > > On Wed, Apr 7, 2021 at 5:10 AM Nándor Kollár 
> > wrote:
> > > >
> > > > > Congrats Gidon!
> > > > >
> > > > > On 2021/04/07 11:55:45, Gabor Szadovszky  wrote:
> > > > > > The Project Management Committee (PMC) for Apache Parquet
> > > > > > has invited Gidon Gershinsky to become a committer and we are
> > pleased
> > > > > > to announce that he has accepted.
> > > > > >
> > > > > > Welcome Gidon!
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>

-- 
Regards,
Maya


[jira] [Commented] (PARQUET-2024) Remove KEYS file from parquet-mr repo

2021-04-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319501#comment-17319501
 ] 

ASF GitHub Bot commented on PARQUET-2024:
-

gszadovszky opened a new pull request #891:
URL: https://github.com/apache/parquet-mr/pull/891


   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Parquet 
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references 
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
 - https://issues.apache.org/jira/browse/PARQUET-XXX
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove KEYS file from parquet-mr repo
> -
>
> Key: PARQUET-2024
> URL: https://issues.apache.org/jira/browse/PARQUET-2024
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
>
> The official KEYS file is maintained in the release svn repo. The others 
> shall be removed to avoid confusion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] gszadovszky opened a new pull request #891: PARQUET-2024: Remove KEYS file from parquet-mr repo

2021-04-12 Thread GitBox


gszadovszky opened a new pull request #891:
URL: https://github.com/apache/parquet-mr/pull/891


   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Parquet 
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references 
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
 - https://issues.apache.org/jira/browse/PARQUET-XXX
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (PARQUET-2024) Remove KEYS file from parquet-mr repo

2021-04-12 Thread Gabor Szadovszky (Jira)
Gabor Szadovszky created PARQUET-2024:
-

 Summary: Remove KEYS file from parquet-mr repo
 Key: PARQUET-2024
 URL: https://issues.apache.org/jira/browse/PARQUET-2024
 Project: Parquet
  Issue Type: Bug
  Components: parquet-mr
Reporter: Gabor Szadovszky
Assignee: Gabor Szadovszky


The official KEYS file is maintained in the release svn repo. The others shall 
be removed to avoid confusion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release Apache Parquet Format 2.9.0 RC0

2021-04-12 Thread Uwe L. Korn
+1 (binding) 

Verified signature and checksum on the artifact, passed the tests on macOS 11 
(ARM64) with

mamba create -p $(pwd)/../env maven thrift-cpp=0.13
conda activate $(pwd)/../env
mvn test

On Fri, Apr 9, 2021, at 10:27 AM, Gabor Szadovszky wrote:
> Thanks, Wes. If this is the case I am happy to make this final step after
> the vote passes.
> 
> On Fri, Apr 9, 2021 at 3:54 AM Wes McKinney  wrote:
> 
> > hi Gabor — I think you may need to be a PMC member? I'm not sure though.
> >
> > +1 (binding), verified signature and checksum on the artifact
> >
> > On Wed, Apr 7, 2021 at 10:19 AM Gabor Szadovszky  wrote:
> > >
> > > I've updated the KEYS file with your public key in the release repo (
> > > downloads.apache.org is updated already). Please keep in mind that you
> > will
> > > still need write access to the release repo to finalize the release after
> > > the vote passes. Guys, any idea how to request write access to a repo?
> > >
> > > Verified checksum and signature; unit tests pass; parquet-mr builds with
> > > the new RC.
> > > +1(binding)
> > >
> > >
> > >
> > >
> > > On Wed, Apr 7, 2021 at 4:51 PM Antoine Pitrou 
> > wrote:
> > >
> > > >
> > > > Ok, I've tried multiple variations and I still can't commit to the
> > > > release repository.
> > > >
> > > > May I ask you to commit the following patch:
> > > > https://gist.github.com/pitrou/0f9f1ffe280cfb48ea9427ebec19b65e
> > > >
> > > > You can check that the key block matches the one I added in the dev
> > > > repo.
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > > On Wed, 7 Apr 2021 16:35:16 +0200
> > > > Gabor Szadovszky
> > > > 
> > > > wrote:
> > > > > I don't have too much experience in svn. I usually follow the
> > commands
> > > > > listed in the how to release doc and it works for me. (Don't
> > remember if
> > > > > I've had to do some initial steps.) As a committer you should have
> > write
> > > > > access to all the repositories of the Parquet community.
> > > > >
> > > > > On Wed, Apr 7, 2021 at 4:18 PM Antoine Pitrou 
> > > > wrote:
> > > > >
> > > > > >
> > > > > > Ah!  It seems I can't push to that repo:
> > > > > >
> > > > > > SendingKEYS
> > > > > > Transmitting file data .svn: E195023: Commit failed (details
> > follow):
> > > > > > svn: E195023: Changing file
> > > > '/home/antoine/apache/parquet-release/KEYS' is
> > > > > > forbidden by the server
> > > > > > svn: E175013: Access to
> > > > > > '/repos/dist/!svn/txr/46918-13e8/release/parquet/KEYS' forbidden
> > > > > >
> > > > > >
> > > > > > The URL I used for checkout is
> > > > > > https://apit...@dist.apache.org/repos/dist/release/parquet
> > > > > > Should I use another one?
> > > > > >
> > > > > > Regards
> > > > > >
> > > > > > Antoine.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, 7 Apr 2021 16:00:26 +0200
> > > > > > Gabor Szadovszky
> > > > > > 
> > > > > > wrote:
> > > > > > > Sorry, I've missed you updated the dev repo. The downloads page
> > > > mirrors
> > > > > > the
> > > > > > > release repo. Yet another place (besides the parquet-format and
> > > > > > parquet-mr
> > > > > > > repos) where we store a KEYS file for whatever reason. Please
> > update
> > > > the
> > > > > > > one in the release repo.
> > > > > > >
> > > > > > > On Wed, Apr 7, 2021 at 3:47 PM Gabor Szadovszky <
> > > > > > > gabor.szadovs...@cloudera.com> wrote:
> > > > > > >
> > > > > > > > I guess it only requires some time to sync. Last time the
> > release
> > > > > > tarball
> > > > > > > > required ~1hour to sync.
> > > > > > > >
> > > > > > > > On Wed, Apr 7, 2021 at 3:42 PM Antoine Pitrou <
> > anto...@python.org>
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > >>
> > > > > > > >> Hi Gabor,
> > > > > > > >>
> > > > > > > >> Ok, I updated the KEYS file in the Parquet SVN repository.
> > > > > > > >> The changes do appear in
> > > > > > > >> https://dist.apache.org/repos/dist/dev/parquet/KEYS -- but
> > not in
> > > > > > > >> https://downloads.apache.org/parquet/KEYS .  Is there any
> > > > additional
> > > > > > > >> step I should perform?
> > > > > > > >>
> > > > > > > >> Regards
> > > > > > > >>
> > > > > > > >> Antoine.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Wed, 7 Apr 2021 15:19:24 +0200
> > > > > > > >> Gabor Szadovszky  wrote:
> > > > > > > >>
> > > > > > > >> > Hi Antoine,
> > > > > > > >> >
> > > > > > > >> > Thanks for initiating this release! You need to update the
> > > > listed
> > > > > > KEYS
> > > > > > > >> file
> > > > > > > >> > with your public key otherwise we cannot validate the
> > > > signature.
> > > > > > (To do
> > > > > > > >> > that you need to update the releases svn repo. See details
> > in
> > > > the
> > > > > > how to
> > > > > > > >> > release doc about the publishing.)
> > > > > > > >> >
> > > > > > > >> > Regards,
> > > > > > > >> > Gabor
> > > > > > > >> >
> > > > > > > >> > On Wed, Apr 7, 2021 at 3:10 PM Antoine Pitrou <
> > > > anto...@python.org>
> > > > > >
> > > > > > > >> 

[jira] [Commented] (PARQUET-1851) ParquetMetadataConveter throws NPE in an Iceberg unit test

2021-04-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319441#comment-17319441
 ] 

ASF GitHub Bot commented on PARQUET-1851:
-

gszadovszky commented on a change in pull request #852:
URL: https://github.com/apache/parquet-mr/pull/852#discussion_r611595861



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java
##
@@ -860,6 +860,10 @@ public void endColumn() throws IOException {
* @throws IOException if there is an error while writing
*/
   public void endBlock() throws IOException {
+if (currentRecordCount == 0) {
+  throw new IOException("End block with zero record");

Review comment:
   @vdiravka, you might want to create a separate jira about this topic so 
we can discuss it in a more open way. Please, also describe what the empty 
parquet files are used for.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ParquetMetadataConveter throws NPE in an Iceberg unit test
> --
>
> Key: PARQUET-1851
> URL: https://issues.apache.org/jira/browse/PARQUET-1851
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
>Priority: Major
> Fix For: 1.12.0
>
>
> When writing data to parquet in an Iceberg unit test, it throws NPE as below
> {code:java}
> java.lang.NullPointerExceptionjava.lang.NullPointerException at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.addRowGroup(ParquetMetadataConverter.java:476)
>  at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.toParquetMetadata(ParquetMetadataConverter.java:177)
>  at 
> org.apache.parquet.hadoop.ParquetFileWriter.serializeFooter(ParquetFileWriter.java:914)
>  at 
> org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:864) 
> at org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:206) at 
> org.apache.iceberg.data.TestLocalScan.writeFile(TestLocalScan.java:429)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #852: PARQUET-1851: fix parquet metadata converter NPE

2021-04-12 Thread GitBox


gszadovszky commented on a change in pull request #852:
URL: https://github.com/apache/parquet-mr/pull/852#discussion_r611595861



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java
##
@@ -860,6 +860,10 @@ public void endColumn() throws IOException {
* @throws IOException if there is an error while writing
*/
   public void endBlock() throws IOException {
+if (currentRecordCount == 0) {
+  throw new IOException("End block with zero record");

Review comment:
   @vdiravka, you might want to create a separate jira about this topic so 
we can discuss it in a more open way. Please, also describe what the empty 
parquet files are used for.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-1942) Bump Apache Arrow 2.0.0

2021-04-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319127#comment-17319127
 ] 

ASF GitHub Bot commented on PARQUET-1942:
-

martin-g commented on pull request #840:
URL: https://github.com/apache/parquet-mr/pull/840#issuecomment-817582630


   @gszadovszky Fixed with https://github.com/apache/parquet-mr/pull/890


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Bump Apache Arrow 2.0.0
> ---
>
> Key: PARQUET-1942
> URL: https://issues.apache.org/jira/browse/PARQUET-1942
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] martin-g commented on pull request #840: PARQUET-1942: Bump Apache Arrow to 2.0.0

2021-04-12 Thread GitBox


martin-g commented on pull request #840:
URL: https://github.com/apache/parquet-mr/pull/840#issuecomment-817582630


   @gszadovszky Fixed with https://github.com/apache/parquet-mr/pull/890


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-2023) TravisCI builds do not fail even when there is a compilation error

2021-04-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319126#comment-17319126
 ] 

ASF GitHub Bot commented on PARQUET-2023:
-

martin-g opened a new pull request #890:
URL: https://github.com/apache/parquet-mr/pull/890


   Piping to pv looses the exit status of mvn and the builds never fail
   
   Run mvn in quiet mode to prevent Travis error because of too much output
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   https://issues.apache.org/jira/browse/PARQUET-2023
   
   ### Tests
   
   No modifications to the source code. Only to the CI config.
   
   ### Commits
   
   - [ X ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   No new functionality!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> TravisCI builds do not fail even when there is a compilation error
> --
>
> Key: PARQUET-2023
> URL: https://issues.apache.org/jira/browse/PARQUET-2023
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Martin Tzvetanov Grigorov
>Priority: Minor
>
> As noticed at [https://github.com/apache/parquet-mr/pull/840] the build at 
> TravisCI didn't fail despite the compilation errors.
>  
> The reason is the piping to 'pv' for progress monitoring. The exit status of 
> 'mvn' command is ignored and the exit status of 'pv' is 0, i.e. success.
> Currently the build at TravisCI takes around 40 minutes 
> ([https://travis-ci.org/github/apache/parquet-mr/builds/765496614)] so there 
> are 10 more minutes before the upper limit at TravisCI.
> Maven will be run with '–quiet' option to prevent another error - too much 
> output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] martin-g opened a new pull request #890: PARQUET-2023 Do not pipe the mvn output to pv

2021-04-12 Thread GitBox


martin-g opened a new pull request #890:
URL: https://github.com/apache/parquet-mr/pull/890


   Piping to pv looses the exit status of mvn and the builds never fail
   
   Run mvn in quiet mode to prevent Travis error because of too much output
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   https://issues.apache.org/jira/browse/PARQUET-2023
   
   ### Tests
   
   No modifications to the source code. Only to the CI config.
   
   ### Commits
   
   - [ X ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   No new functionality!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (PARQUET-2023) TravisCI builds do not fail even when there is a compilation error

2021-04-12 Thread Martin Tzvetanov Grigorov (Jira)
Martin Tzvetanov Grigorov created PARQUET-2023:
--

 Summary: TravisCI builds do not fail even when there is a 
compilation error
 Key: PARQUET-2023
 URL: https://issues.apache.org/jira/browse/PARQUET-2023
 Project: Parquet
  Issue Type: Improvement
Reporter: Martin Tzvetanov Grigorov


As noticed at [https://github.com/apache/parquet-mr/pull/840] the build at 
TravisCI didn't fail despite the compilation errors.

 

The reason is the piping to 'pv' for progress monitoring. The exit status of 
'mvn' command is ignored and the exit status of 'pv' is 0, i.e. success.

Currently the build at TravisCI takes around 40 minutes 
([https://travis-ci.org/github/apache/parquet-mr/builds/765496614)] so there 
are 10 more minutes before the upper limit at TravisCI.

Maven will be run with '–quiet' option to prevent another error - too much 
output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1942) Bump Apache Arrow 2.0.0

2021-04-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319084#comment-17319084
 ] 

ASF GitHub Bot commented on PARQUET-1942:
-

martin-g commented on pull request #840:
URL: https://github.com/apache/parquet-mr/pull/840#issuecomment-817545488


   @gszadovszky Checking ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Bump Apache Arrow 2.0.0
> ---
>
> Key: PARQUET-1942
> URL: https://issues.apache.org/jira/browse/PARQUET-1942
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] martin-g commented on pull request #840: PARQUET-1942: Bump Apache Arrow to 2.0.0

2021-04-12 Thread GitBox


martin-g commented on pull request #840:
URL: https://github.com/apache/parquet-mr/pull/840#issuecomment-817545488


   @gszadovszky Checking ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org