[jira] [Updated] (PARQUET-1892) CRC comment modification in Thrift

2021-04-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated PARQUET-1892:

Fix Version/s: format-2.9.0

> CRC comment modification in Thrift
> --
>
> Key: PARQUET-1892
> URL: https://issues.apache.org/jira/browse/PARQUET-1892
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
> Fix For: format-2.9.0
>
>
> Mention that CRC is calculated after compression and encryption



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-1777) add Parquet logo vector files to repo

2021-04-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated PARQUET-1777:

Fix Version/s: format-2.9.0

> add Parquet logo vector files to repo
> -
>
> Key: PARQUET-1777
> URL: https://issues.apache.org/jira/browse/PARQUET-1777
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>Priority: Major
>  Labels: pull-request-available
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-1862) A mistake of Parquet Format Thrift definition file's comment

2021-04-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated PARQUET-1862:

Fix Version/s: format-2.9.0

> A mistake of Parquet Format Thrift definition file's comment
> 
>
> Key: PARQUET-1862
> URL: https://issues.apache.org/jira/browse/PARQUET-1862
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Liam Su
>Assignee: Liam Su
>Priority: Minor
> Fix For: format-2.9.0
>
>
> A comment of *DataPageHeaderV2* in the src/main/thrift/parquet.thrift is 
> wrong.
>  
> {code:java}
>   /** optional statistics for this column chunk */
>   8: optional Statistics statistics;
> {code}
>  
>  should be
> {code:java}
>   /** optional statistics for the data in this page */
>   8: optional Statistics statistics;
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-1892) Explain CRC computation better

2021-04-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated PARQUET-1892:

Summary: Explain CRC computation better  (was: CRC comment modification in 
Thrift)

> Explain CRC computation better
> --
>
> Key: PARQUET-1892
> URL: https://issues.apache.org/jira/browse/PARQUET-1892
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
> Fix For: format-2.9.0
>
>
> Mention that CRC is calculated after compression and encryption



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-1777) Add Parquet logo vector files to repo

2021-04-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated PARQUET-1777:

Summary: Add Parquet logo vector files to repo  (was: add Parquet logo 
vector files to repo)

> Add Parquet logo vector files to repo
> -
>
> Key: PARQUET-1777
> URL: https://issues.apache.org/jira/browse/PARQUET-1777
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>Priority: Major
>  Labels: pull-request-available
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-1969) Migrate CI to Github Actions

2021-04-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated PARQUET-1969:

Summary: Migrate CI to Github Actions  (was: Test by GithubAction)

> Migrate CI to Github Actions
> 
>
> Key: PARQUET-1969
> URL: https://issues.apache.org/jira/browse/PARQUET-1969
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-mr
>Affects Versions: 1.12.0
>Reporter: Yuming Wang
>Assignee: Gabor Szadovszky
>Priority: Major
> Fix For: 1.12.0, format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (PARQUET-2015) [Format] Update changelog for 0.29.0

2021-04-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned PARQUET-2015:
---

Assignee: Antoine Pitrou

> [Format] Update changelog for 0.29.0
> 
>
> Key: PARQUET-2015
> URL: https://issues.apache.org/jira/browse/PARQUET-2015
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Blocker
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-1862) Fix comment on statistics field in Thrift file

2021-04-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated PARQUET-1862:

Summary: Fix comment on statistics field in Thrift file  (was: A mistake of 
Parquet Format Thrift definition file's comment)

> Fix comment on statistics field in Thrift file
> --
>
> Key: PARQUET-1862
> URL: https://issues.apache.org/jira/browse/PARQUET-1862
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Liam Su
>Assignee: Liam Su
>Priority: Minor
> Fix For: format-2.9.0
>
>
> A comment of *DataPageHeaderV2* in the src/main/thrift/parquet.thrift is 
> wrong.
>  
> {code:java}
>   /** optional statistics for this column chunk */
>   8: optional Statistics statistics;
> {code}
>  
>  should be
> {code:java}
>   /** optional statistics for the data in this page */
>   8: optional Statistics statistics;
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-2015) [Format] Update changelog for 0.29.0

2021-04-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316127#comment-17316127
 ] 

ASF GitHub Bot commented on PARQUET-2015:
-

pitrou opened a new pull request #172:
URL: https://github.com/apache/parquet-format/pull/172


   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Parquet 
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references 
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
 - https://issues.apache.org/jira/browse/PARQUET-XXX
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Update changelog for 0.29.0
> 
>
> Key: PARQUET-2015
> URL: https://issues.apache.org/jira/browse/PARQUET-2015
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Blocker
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-1930) Bump Apache Thrift to 0.13.0

2021-04-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated PARQUET-1930:

Component/s: parquet-format

> Bump Apache Thrift to 0.13.0
> 
>
> Key: PARQUET-1930
> URL: https://issues.apache.org/jira/browse/PARQUET-1930
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Affects Versions: 1.11.0
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-1930) Bump Apache Thrift to 0.13.0

2021-04-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated PARQUET-1930:

Affects Version/s: (was: 1.11.0)

> Bump Apache Thrift to 0.13.0
> 
>
> Key: PARQUET-1930
> URL: https://issues.apache.org/jira/browse/PARQUET-1930
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-1930) Bump Apache Thrift to 0.13.0

2021-04-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated PARQUET-1930:

Fix Version/s: (was: 1.12.0)
   format-2.9.0

> Bump Apache Thrift to 0.13.0
> 
>
> Key: PARQUET-1930
> URL: https://issues.apache.org/jira/browse/PARQUET-1930
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Affects Versions: 1.11.0
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (PARQUET-1892) CRC comment modification in Thrift

2021-04-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved PARQUET-1892.
-
Resolution: Fixed

Fixed by PR https://github.com/apache/parquet-format/pull/160

> CRC comment modification in Thrift
> --
>
> Key: PARQUET-1892
> URL: https://issues.apache.org/jira/browse/PARQUET-1892
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
> Fix For: format-2.9.0
>
>
> Mention that CRC is calculated after compression and encryption



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-format] pitrou opened a new pull request #172: PARQUET-2015: Update changelog for 0.29.0

2021-04-07 Thread GitBox


pitrou opened a new pull request #172:
URL: https://github.com/apache/parquet-format/pull/172


   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Parquet 
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references 
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
 - https://issues.apache.org/jira/browse/PARQUET-XXX
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: New parquet-format release?

2021-04-07 Thread Gabor Szadovszky
To be honest I did not have time to work on it. There are a couple things
to be finalized. I would like to list all the optional fields in the thrift
file that are practically required at least in certain cases.
We also have a couple of open questions:
- V1 vs V2 pages

- unsigned integers

- supported encodings


Feel free to comment in the PR or if you think a topic requires a wider
audience we might start a separate discussion here in the dev list.
(Meanwhile, we still have the issue that this update heavily impacts
parquet implementations that may not be part of the parquet community.)

Cheers,
Gabor


On Sat, Apr 3, 2021 at 5:33 AM Micah Kornfield 
wrote:

> >
> > "Core features" is clearly not in a shape to be finalized soon so we
> > can postpone it to the release after.
>
>
> What do we think we need to do to get it to a releasable state?
>
> On Tue, Mar 30, 2021 at 6:44 AM Gabor Szadovszky
>  wrote:
>
> > Thanks a lot, Antoine for the summary and heads up. #166 is merged
> > already. The others do not seem to be crucial for the next release but
> > I am fine waiting a bit for the authors' response. (parquet-format
> > thrift bump is not really important because even though we are
> > releasing the generated java classes we are not using them in
> > parquet-mr so this is mainly a testing issue.)
> > "Core features" is clearly not in a shape to be finalized soon so we
> > can postpone it to the release after.
> >
> > Cheers,
> > Gabor
> >
> > On Tue, Mar 30, 2021 at 12:58 PM Antoine Pitrou 
> > wrote:
> > >
> > >
> > > Hi Gabor,
> > >
> > > Ok, I went through the open PRs.  The following PR seem basically
> ready,
> > > just waiting for final feedback (and possible updates) from the
> > > submitters:
> > >
> > > * https://github.com/apache/parquet-format/pull/166
> > >   (PARQUET-1969: Migrate testing from Travis-CI to Github Actions)
> > >
> > > * https://github.com/apache/parquet-format/pull/158
> > >   (PARQUET-1779: Update merge script)
> > >
> > > The following PR needs polishing; I'll wait for feedback from the
> > > submitter and if there is none, will probably push an update myself:
> > >
> > > * https://github.com/apache/parquet-format/pull/162
> > >   (PARQUET-1930: Bump Apache Thrift to 0.13.0)
> > >
> > > Of the remaining PRs:
> > >
> > > * https://github.com/apache/parquet-format/pull/164 looks desirable
> but
> > >   is still in draft state, and I assume will require a bit more
> > >   massaging and/or a final agreement (and perhaps a formal vote?)
> > >   (PARQUET-1950: Define core features)
> > >
> > > * there are a couple of proposed format additions which don't seem to
> > >   have gathered a lot of interest, and are therefore most probably out
> > >   of scope for a forthcoming release
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > >
> > > On Tue, 30 Mar 2021 12:07:44 +0200
> > > Gabor Szadovszky  wrote:
> > > > Hi Antoine,
> > > >
> > > > There are a couple of ongoing PRs in the parquet-format repo.
> However,
> > > > some may take very long (e.g. core features) but some are only
> waiting
> > > > for review (e.g. #166).
> > > > I agree that solving the current situation of LZ4 is worth a
> > > > parquet-format release but the ready PRs should also be included.
> > > >
> > > > Practically any committer can work on a release. (See
> > > > http://parquet.apache.org/documentation/how-to-release/ for
> details.)
> > > > As per the process PMC members are only required to vote on the
> > > > release.
> > > >
> > > > Regards,
> > > > Gabor
> > > >
> > >
> > >
> > >
> >
>


[GitHub] [parquet-format] pitrou merged pull request #162: PARQUET-1930: Bump Apache Thrift to 0.13

2021-04-07 Thread GitBox


pitrou merged pull request #162:
URL: https://github.com/apache/parquet-format/pull/162


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-1930) Bump Apache Thrift to 0.13.0

2021-04-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316113#comment-17316113
 ] 

ASF GitHub Bot commented on PARQUET-1930:
-

pitrou merged pull request #162:
URL: https://github.com/apache/parquet-format/pull/162


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Bump Apache Thrift to 0.13.0
> 
>
> Key: PARQUET-1930
> URL: https://issues.apache.org/jira/browse/PARQUET-1930
> Project: Parquet
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (PARQUET-2015) [Format] Update changelog for 0.29.0

2021-04-07 Thread Antoine Pitrou (Jira)
Antoine Pitrou created PARQUET-2015:
---

 Summary: [Format] Update changelog for 0.29.0
 Key: PARQUET-2015
 URL: https://issues.apache.org/jira/browse/PARQUET-2015
 Project: Parquet
  Issue Type: Task
  Components: parquet-format
Reporter: Antoine Pitrou
 Fix For: format-2.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1222) Specify a well-defined sorting order for float and double types

2021-04-07 Thread Gabor Szadovszky (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316180#comment-17316180
 ] 

Gabor Szadovszky commented on PARQUET-1222:
---

[~apitrou], I guess what you've described is the write path of the statistics. 
Because you cannot control other writers I would suggest following the [spec 
for the read 
path|https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L892-L899].
Meanwhile, I've done some investigation in the parquet-mr code and the format 
and there are issues related to this topic.
* We have created the ColumnOrder object and the related field in the format to 
specify the ordering of the columns and to prepare for the potential solution 
of this (and similar) issues. We are referencing this field in the Statistics 
object used for row-group level stats. Meanwhile, we do not reference this in 
the column indexes. So, in column indexes it is not clear what sorting orders 
do we want to use and how to handle cases like this. How it is implemented in 
parquet-cpp?
* Based on the referenced workaround we handle the special floating point 
values at row-group level in parquet-mr but only for the read path. For the 
write path we still write these values.
* For column indexes we handle these values but only for the write path and not 
for the read path. 

So, we have a couple of issues around this topic and it would be great if we 
would have a final and well defined solution for it.

> Specify a well-defined sorting order for float and double types
> ---
>
> Key: PARQUET-1222
> URL: https://issues.apache.org/jira/browse/PARQUET-1222
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Zoltan Ivanfi
>Priority: Critical
>
> Currently parquet-format specifies the sort order for floating point numbers 
> as follows:
> {code:java}
>*   FLOAT - signed comparison of the represented value
>*   DOUBLE - signed comparison of the represented value
> {code}
> The problem is that the comparison of floating point numbers is only a 
> partial ordering with strange behaviour in specific corner cases. For 
> example, according to IEEE 754, -0 is neither less nor more than \+0 and 
> comparing NaN to anything always returns false. This ordering is not suitable 
> for statistics. Additionally, the Java implementation already uses a 
> different (total) ordering that handles these cases correctly but differently 
> than the C\+\+ implementations, which leads to interoperability problems.
> TypeDefinedOrder for doubles and floats should be deprecated and a new 
> TotalFloatingPointOrder should be introduced. The default for writing doubles 
> and floats would be the new TotalFloatingPointOrder. This ordering should be 
> effective and easy to implement in all programming languages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1976) Use net.alchim31.maven:scala-maven-plugin instead of org.scala-tools:maven-scala-plugin

2021-04-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316269#comment-17316269
 ] 

ASF GitHub Bot commented on PARQUET-1976:
-

martin-g commented on pull request #866:
URL: https://github.com/apache/parquet-mr/pull/866#issuecomment-814857164


   >  I think it is better, in the end, to inherit versions from the Parent 
Apache POM. WDYT?
   
   Let me know if this is what Parquet team prefers and I will update the PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use net.alchim31.maven:scala-maven-plugin instead of 
> org.scala-tools:maven-scala-plugin
> ---
>
> Key: PARQUET-1976
> URL: https://issues.apache.org/jira/browse/PARQUET-1976
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Martin Tzvetanov Grigorov
>Priority: Minor
>
> org.scala-tools:maven-scala-plugin is not maintained since a long time.
> [net.alchim31.maven:scala-maven-plugin|https://github.com/davidB/scala-maven-plugin]
>  is the replacement.
> Also Scala version could be upgraded from 2.12.8 to 2.12.13
> Few other Maven plugins also could be upgraded.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-2015) [Format] Update changelog for 2.9.0

2021-04-07 Thread Gabor Szadovszky (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky updated PARQUET-2015:
--
Summary: [Format] Update changelog for 2.9.0  (was: [Format] Update 
changelog for 0.29.0)

> [Format] Update changelog for 2.9.0
> ---
>
> Key: PARQUET-2015
> URL: https://issues.apache.org/jira/browse/PARQUET-2015
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Blocker
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-format] pitrou merged pull request #172: PARQUET-2015: Update changelog for 2.9.0

2021-04-07 Thread GitBox


pitrou merged pull request #172:
URL: https://github.com/apache/parquet-format/pull/172


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-2015) [Format] Update changelog for 2.9.0

2021-04-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316278#comment-17316278
 ] 

ASF GitHub Bot commented on PARQUET-2015:
-

pitrou merged pull request #172:
URL: https://github.com/apache/parquet-format/pull/172


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Update changelog for 2.9.0
> ---
>
> Key: PARQUET-2015
> URL: https://issues.apache.org/jira/browse/PARQUET-2015
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Blocker
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (PARQUET-2018) ParquetThriftWriter uses deprecated constructors

2021-04-07 Thread Aaron Blake Niskode-Dossett (Jira)
Aaron Blake Niskode-Dossett created PARQUET-2018:


 Summary: ParquetThriftWriter uses deprecated constructors
 Key: PARQUET-2018
 URL: https://issues.apache.org/jira/browse/PARQUET-2018
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-thrift
Affects Versions: 1.12.0
Reporter: Aaron Blake Niskode-Dossett


ParquetThriftWriter only has constructors that rely on deprecated ParquetWriter 
constructors.  It should implement a builder by extending ParquetWriter.builder 
similar to how other parquet writer extensions have.

 

This would, at some point in the future, be a blocker for 2.0.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release Apache Parquet Format 2.9.0 RC0

2021-04-07 Thread Gabor Szadovszky
Sorry, I've missed you updated the dev repo. The downloads page mirrors the
release repo. Yet another place (besides the parquet-format and parquet-mr
repos) where we store a KEYS file for whatever reason. Please update the
one in the release repo.

On Wed, Apr 7, 2021 at 3:47 PM Gabor Szadovszky <
gabor.szadovs...@cloudera.com> wrote:

> I guess it only requires some time to sync. Last time the release tarball
> required ~1hour to sync.
>
> On Wed, Apr 7, 2021 at 3:42 PM Antoine Pitrou  wrote:
>
>>
>> Hi Gabor,
>>
>> Ok, I updated the KEYS file in the Parquet SVN repository.
>> The changes do appear in
>> https://dist.apache.org/repos/dist/dev/parquet/KEYS -- but not in
>> https://downloads.apache.org/parquet/KEYS .  Is there any additional
>> step I should perform?
>>
>> Regards
>>
>> Antoine.
>>
>>
>> On Wed, 7 Apr 2021 15:19:24 +0200
>> Gabor Szadovszky  wrote:
>>
>> > Hi Antoine,
>> >
>> > Thanks for initiating this release! You need to update the listed KEYS
>> file
>> > with your public key otherwise we cannot validate the signature. (To do
>> > that you need to update the releases svn repo. See details in the how to
>> > release doc about the publishing.)
>> >
>> > Regards,
>> > Gabor
>> >
>> > On Wed, Apr 7, 2021 at 3:10 PM Antoine Pitrou 
>> wrote:
>> >
>> > >
>> > > Hi everyone,
>> > >
>> > > I propose the following RC to be released as official Apache Parquet
>> > > Format 2.9.0 release.
>> > >
>> > > The commit id is b4f0c0a643a6ec1a7def37115dd6967ba9346df7
>> > > * This corresponds to the tag: apache-parquet-format-2.9.0-rc0
>> > > *
>> > >
>> https://github.com/apache/parquet-format/tree/b4f0c0a643a6ec1a7def37115dd6967ba9346df7
>> > >
>> > > The release tarball, signature, and checksums are here:
>> > > *
>> > >
>> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.9.0-rc0/
>> > >
>> > > You can find the KEYS file here:
>> > > * https://downloads.apache.org/parquet/KEYS
>> > >
>> > > Binary artifacts are staged in Nexus here:
>> > > *
>> > >
>> https://repository.apache.org/content/groups/staging/org/apache/parquet/parquet-format/
>> > >
>> > > This release includes the following important fixes and improvements:
>> > >
>> > > * PARQUET-1996 - [Format] Add interoperable LZ4 codec, deprecate
>> existing
>> > > LZ4 codec
>> > > * PARQUET-2013 - [Format] Mention that converted types are deprecated
>> > >
>> > > ...among other changes (see CHANGES.md for full list).
>> > >
>> > > Please download, verify, and test.
>> > >
>> > > Please vote in the next 72 hours.
>> > >
>> > > [ ] +1 Release this as Apache Parquet 2.9.0
>> > > [ ] +0
>> > > [ ] -1 Do not release this because...
>> > >
>> > >
>> > > Regards
>> > >
>> > > Antoine.
>> > >
>> > >
>> > >
>> >
>>
>>
>>
>>


Re: [VOTE] Release Apache Parquet Format 2.9.0 RC0

2021-04-07 Thread Antoine Pitrou


Ah!  It seems I can't push to that repo:

SendingKEYS
Transmitting file data .svn: E195023: Commit failed (details follow):
svn: E195023: Changing file '/home/antoine/apache/parquet-release/KEYS' is 
forbidden by the server
svn: E175013: Access to '/repos/dist/!svn/txr/46918-13e8/release/parquet/KEYS' 
forbidden


The URL I used for checkout is
https://apit...@dist.apache.org/repos/dist/release/parquet
Should I use another one?

Regards

Antoine.



On Wed, 7 Apr 2021 16:00:26 +0200
Gabor Szadovszky

wrote:
> Sorry, I've missed you updated the dev repo. The downloads page mirrors the
> release repo. Yet another place (besides the parquet-format and parquet-mr
> repos) where we store a KEYS file for whatever reason. Please update the
> one in the release repo.
> 
> On Wed, Apr 7, 2021 at 3:47 PM Gabor Szadovszky <
> gabor.szadovs...@cloudera.com> wrote:
> 
> > I guess it only requires some time to sync. Last time the release tarball
> > required ~1hour to sync.
> >
> > On Wed, Apr 7, 2021 at 3:42 PM Antoine Pitrou  wrote:
> >  
> >>
> >> Hi Gabor,
> >>
> >> Ok, I updated the KEYS file in the Parquet SVN repository.
> >> The changes do appear in
> >> https://dist.apache.org/repos/dist/dev/parquet/KEYS -- but not in
> >> https://downloads.apache.org/parquet/KEYS .  Is there any additional
> >> step I should perform?
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >>
> >> On Wed, 7 Apr 2021 15:19:24 +0200
> >> Gabor Szadovszky  wrote:
> >>  
> >> > Hi Antoine,
> >> >
> >> > Thanks for initiating this release! You need to update the listed KEYS  
> >> file  
> >> > with your public key otherwise we cannot validate the signature. (To do
> >> > that you need to update the releases svn repo. See details in the how to
> >> > release doc about the publishing.)
> >> >
> >> > Regards,
> >> > Gabor
> >> >
> >> > On Wed, Apr 7, 2021 at 3:10 PM Antoine Pitrou   
> >> wrote:  
> >> >  
> >> > >
> >> > > Hi everyone,
> >> > >
> >> > > I propose the following RC to be released as official Apache Parquet
> >> > > Format 2.9.0 release.
> >> > >
> >> > > The commit id is b4f0c0a643a6ec1a7def37115dd6967ba9346df7
> >> > > * This corresponds to the tag: apache-parquet-format-2.9.0-rc0
> >> > > *
> >> > >  
> >> https://github.com/apache/parquet-format/tree/b4f0c0a643a6ec1a7def37115dd6967ba9346df7
> >>   
> >> > >
> >> > > The release tarball, signature, and checksums are here:
> >> > > *
> >> > >  
> >> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.9.0-rc0/
> >>   
> >> > >
> >> > > You can find the KEYS file here:
> >> > > * https://downloads.apache.org/parquet/KEYS
> >> > >
> >> > > Binary artifacts are staged in Nexus here:
> >> > > *
> >> > >  
> >> https://repository.apache.org/content/groups/staging/org/apache/parquet/parquet-format/
> >>   
> >> > >
> >> > > This release includes the following important fixes and improvements:
> >> > >
> >> > > * PARQUET-1996 - [Format] Add interoperable LZ4 codec, deprecate  
> >> existing  
> >> > > LZ4 codec
> >> > > * PARQUET-2013 - [Format] Mention that converted types are deprecated
> >> > >
> >> > > ...among other changes (see CHANGES.md for full list).
> >> > >
> >> > > Please download, verify, and test.
> >> > >
> >> > > Please vote in the next 72 hours.
> >> > >
> >> > > [ ] +1 Release this as Apache Parquet 2.9.0
> >> > > [ ] +0
> >> > > [ ] -1 Do not release this because...
> >> > >
> >> > >
> >> > > Regards
> >> > >
> >> > > Antoine.
> >> > >
> >> > >
> >> > >  
> >> >  
> >>
> >>
> >>
> >>  
> 





[jira] [Commented] (PARQUET-2019) [Format] Outdated KEYS file in git repo

2021-04-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316440#comment-17316440
 ] 

Antoine Pitrou commented on PARQUET-2019:
-

I've submitted a PR for parquet-format. I'll let someone else handle parquet-mr.

> [Format] Outdated KEYS file in git repo
> ---
>
> Key: PARQUET-2019
> URL: https://issues.apache.org/jira/browse/PARQUET-2019
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Priority: Trivial
>
> For some reason, the parquet-format git repo has a KEYS file that's outdated 
> compared to the one stored in the SVN repo. Unless there's a reason to keep a 
> KEYS file in git, I would suggest removing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[Announce] new committer: Gidon Gershinsky

2021-04-07 Thread Gabor Szadovszky
The Project Management Committee (PMC) for Apache Parquet
has invited Gidon Gershinsky to become a committer and we are pleased
to announce that he has accepted.

Welcome Gidon!


[jira] [Resolved] (PARQUET-2015) [Format] Update changelog for 2.9.0

2021-04-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved PARQUET-2015.
-
Resolution: Fixed

Fixed by Github PR https://github.com/apache/parquet-format/pull/172

> [Format] Update changelog for 2.9.0
> ---
>
> Key: PARQUET-2015
> URL: https://issues.apache.org/jira/browse/PARQUET-2015
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Blocker
> Fix For: format-2.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release Apache Parquet Format 2.9.0 RC0

2021-04-07 Thread Antoine Pitrou


Ok, I've tried multiple variations and I still can't commit to the
release repository.

May I ask you to commit the following patch:
https://gist.github.com/pitrou/0f9f1ffe280cfb48ea9427ebec19b65e

You can check that the key block matches the one I added in the dev
repo.

Regards

Antoine.


On Wed, 7 Apr 2021 16:35:16 +0200
Gabor Szadovszky

wrote:
> I don't have too much experience in svn. I usually follow the commands
> listed in the how to release doc and it works for me. (Don't remember if
> I've had to do some initial steps.) As a committer you should have write
> access to all the repositories of the Parquet community.
> 
> On Wed, Apr 7, 2021 at 4:18 PM Antoine Pitrou  wrote:
> 
> >
> > Ah!  It seems I can't push to that repo:
> >
> > SendingKEYS
> > Transmitting file data .svn: E195023: Commit failed (details follow):
> > svn: E195023: Changing file '/home/antoine/apache/parquet-release/KEYS' is
> > forbidden by the server
> > svn: E175013: Access to
> > '/repos/dist/!svn/txr/46918-13e8/release/parquet/KEYS' forbidden
> >
> >
> > The URL I used for checkout is
> > https://apit...@dist.apache.org/repos/dist/release/parquet
> > Should I use another one?
> >
> > Regards
> >
> > Antoine.
> >
> >
> >
> > On Wed, 7 Apr 2021 16:00:26 +0200
> > Gabor Szadovszky
> > 
> > wrote:  
> > > Sorry, I've missed you updated the dev repo. The downloads page mirrors  
> > the  
> > > release repo. Yet another place (besides the parquet-format and  
> > parquet-mr  
> > > repos) where we store a KEYS file for whatever reason. Please update the
> > > one in the release repo.
> > >
> > > On Wed, Apr 7, 2021 at 3:47 PM Gabor Szadovszky <
> > > gabor.szadovs...@cloudera.com> wrote:
> > >  
> > > > I guess it only requires some time to sync. Last time the release  
> > tarball  
> > > > required ~1hour to sync.
> > > >
> > > > On Wed, Apr 7, 2021 at 3:42 PM Antoine Pitrou   
> > wrote:  
> > > >  
> > > >>
> > > >> Hi Gabor,
> > > >>
> > > >> Ok, I updated the KEYS file in the Parquet SVN repository.
> > > >> The changes do appear in
> > > >> https://dist.apache.org/repos/dist/dev/parquet/KEYS -- but not in
> > > >> https://downloads.apache.org/parquet/KEYS .  Is there any additional
> > > >> step I should perform?
> > > >>
> > > >> Regards
> > > >>
> > > >> Antoine.
> > > >>
> > > >>
> > > >> On Wed, 7 Apr 2021 15:19:24 +0200
> > > >> Gabor Szadovszky  wrote:
> > > >>  
> > > >> > Hi Antoine,
> > > >> >
> > > >> > Thanks for initiating this release! You need to update the listed  
> > KEYS  
> > > >> file  
> > > >> > with your public key otherwise we cannot validate the signature.  
> > (To do  
> > > >> > that you need to update the releases svn repo. See details in the  
> > how to  
> > > >> > release doc about the publishing.)
> > > >> >
> > > >> > Regards,
> > > >> > Gabor
> > > >> >
> > > >> > On Wed, Apr 7, 2021 at 3:10 PM Antoine Pitrou   
> >  
> > > >> wrote:  
> > > >> >  
> > > >> > >
> > > >> > > Hi everyone,
> > > >> > >
> > > >> > > I propose the following RC to be released as official Apache  
> > Parquet  
> > > >> > > Format 2.9.0 release.
> > > >> > >
> > > >> > > The commit id is b4f0c0a643a6ec1a7def37115dd6967ba9346df7
> > > >> > > * This corresponds to the tag: apache-parquet-format-2.9.0-rc0
> > > >> > > *
> > > >> > >  
> > > >>  
> > https://github.com/apache/parquet-format/tree/b4f0c0a643a6ec1a7def37115dd6967ba9346df7
> >  
> > > >> > >
> > > >> > > The release tarball, signature, and checksums are here:
> > > >> > > *
> > > >> > >  
> > > >>  
> > https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.9.0-rc0/
> >  
> > > >> > >
> > > >> > > You can find the KEYS file here:
> > > >> > > * https://downloads.apache.org/parquet/KEYS
> > > >> > >
> > > >> > > Binary artifacts are staged in Nexus here:
> > > >> > > *
> > > >> > >  
> > > >>  
> > https://repository.apache.org/content/groups/staging/org/apache/parquet/parquet-format/
> >  
> > > >> > >
> > > >> > > This release includes the following important fixes and  
> > improvements:  
> > > >> > >
> > > >> > > * PARQUET-1996 - [Format] Add interoperable LZ4 codec, deprecate  
> > > >> existing  
> > > >> > > LZ4 codec
> > > >> > > * PARQUET-2013 - [Format] Mention that converted types are  
> > deprecated  
> > > >> > >
> > > >> > > ...among other changes (see CHANGES.md for full list).
> > > >> > >
> > > >> > > Please download, verify, and test.
> > > >> > >
> > > >> > > Please vote in the next 72 hours.
> > > >> > >
> > > >> > > [ ] +1 Release this as Apache Parquet 2.9.0
> > > >> > > [ ] +0
> > > >> > > [ ] -1 Do not release this because...
> > > >> > >
> > > >> > >
> > > >> > > Regards
> > > >> > >
> > > >> > > Antoine.
> > > >> > >
> > > >> > >
> > > >> > >  
> > > >> >  
> > > >>
> > > >>
> > > >>
> > > >>  
> > >  
> >
> >
> >
> >  
> 





Re: [VOTE] Release Apache Parquet Format 2.9.0 RC0

2021-04-07 Thread Gabor Szadovszky
I've updated the KEYS file with your public key in the release repo (
downloads.apache.org is updated already). Please keep in mind that you will
still need write access to the release repo to finalize the release after
the vote passes. Guys, any idea how to request write access to a repo?

Verified checksum and signature; unit tests pass; parquet-mr builds with
the new RC.
+1(binding)




On Wed, Apr 7, 2021 at 4:51 PM Antoine Pitrou  wrote:

>
> Ok, I've tried multiple variations and I still can't commit to the
> release repository.
>
> May I ask you to commit the following patch:
> https://gist.github.com/pitrou/0f9f1ffe280cfb48ea9427ebec19b65e
>
> You can check that the key block matches the one I added in the dev
> repo.
>
> Regards
>
> Antoine.
>
>
> On Wed, 7 Apr 2021 16:35:16 +0200
> Gabor Szadovszky
> 
> wrote:
> > I don't have too much experience in svn. I usually follow the commands
> > listed in the how to release doc and it works for me. (Don't remember if
> > I've had to do some initial steps.) As a committer you should have write
> > access to all the repositories of the Parquet community.
> >
> > On Wed, Apr 7, 2021 at 4:18 PM Antoine Pitrou 
> wrote:
> >
> > >
> > > Ah!  It seems I can't push to that repo:
> > >
> > > SendingKEYS
> > > Transmitting file data .svn: E195023: Commit failed (details follow):
> > > svn: E195023: Changing file
> '/home/antoine/apache/parquet-release/KEYS' is
> > > forbidden by the server
> > > svn: E175013: Access to
> > > '/repos/dist/!svn/txr/46918-13e8/release/parquet/KEYS' forbidden
> > >
> > >
> > > The URL I used for checkout is
> > > https://apit...@dist.apache.org/repos/dist/release/parquet
> > > Should I use another one?
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > >
> > > On Wed, 7 Apr 2021 16:00:26 +0200
> > > Gabor Szadovszky
> > > 
> > > wrote:
> > > > Sorry, I've missed you updated the dev repo. The downloads page
> mirrors
> > > the
> > > > release repo. Yet another place (besides the parquet-format and
> > > parquet-mr
> > > > repos) where we store a KEYS file for whatever reason. Please update
> the
> > > > one in the release repo.
> > > >
> > > > On Wed, Apr 7, 2021 at 3:47 PM Gabor Szadovszky <
> > > > gabor.szadovs...@cloudera.com> wrote:
> > > >
> > > > > I guess it only requires some time to sync. Last time the release
> > > tarball
> > > > > required ~1hour to sync.
> > > > >
> > > > > On Wed, Apr 7, 2021 at 3:42 PM Antoine Pitrou 
>
> > > wrote:
> > > > >
> > > > >>
> > > > >> Hi Gabor,
> > > > >>
> > > > >> Ok, I updated the KEYS file in the Parquet SVN repository.
> > > > >> The changes do appear in
> > > > >> https://dist.apache.org/repos/dist/dev/parquet/KEYS -- but not in
> > > > >> https://downloads.apache.org/parquet/KEYS .  Is there any
> additional
> > > > >> step I should perform?
> > > > >>
> > > > >> Regards
> > > > >>
> > > > >> Antoine.
> > > > >>
> > > > >>
> > > > >> On Wed, 7 Apr 2021 15:19:24 +0200
> > > > >> Gabor Szadovszky  wrote:
> > > > >>
> > > > >> > Hi Antoine,
> > > > >> >
> > > > >> > Thanks for initiating this release! You need to update the
> listed
> > > KEYS
> > > > >> file
> > > > >> > with your public key otherwise we cannot validate the
> signature.
> > > (To do
> > > > >> > that you need to update the releases svn repo. See details in
> the
> > > how to
> > > > >> > release doc about the publishing.)
> > > > >> >
> > > > >> > Regards,
> > > > >> > Gabor
> > > > >> >
> > > > >> > On Wed, Apr 7, 2021 at 3:10 PM Antoine Pitrou <
> anto...@python.org>
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> > >
> > > > >> > > Hi everyone,
> > > > >> > >
> > > > >> > > I propose the following RC to be released as official Apache
> > > Parquet
> > > > >> > > Format 2.9.0 release.
> > > > >> > >
> > > > >> > > The commit id is b4f0c0a643a6ec1a7def37115dd6967ba9346df7
> > > > >> > > * This corresponds to the tag: apache-parquet-format-2.9.0-rc0
> > > > >> > > *
> > > > >> > >
> > > > >>
> > >
> https://github.com/apache/parquet-format/tree/b4f0c0a643a6ec1a7def37115dd6967ba9346df7
> > >
> > > > >> > >
> > > > >> > > The release tarball, signature, and checksums are here:
> > > > >> > > *
> > > > >> > >
> > > > >>
> > >
> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.9.0-rc0/
> > >
> > > > >> > >
> > > > >> > > You can find the KEYS file here:
> > > > >> > > * https://downloads.apache.org/parquet/KEYS
> > > > >> > >
> > > > >> > > Binary artifacts are staged in Nexus here:
> > > > >> > > *
> > > > >> > >
> > > > >>
> > >
> https://repository.apache.org/content/groups/staging/org/apache/parquet/parquet-format/
> > >
> > > > >> > >
> > > > >> > > This release includes the following important fixes and
> > > improvements:
> > > > >> > >
> > > > >> > > * PARQUET-1996 - [Format] Add interoperable LZ4 codec,
> deprecate
> > > > >> existing
> > > > >> > > LZ4 codec
> > > > >> > > * PARQUET-2013 - [Format] Mention that converted types are
> > > deprecated
> > > > >> > >
> > > > >> > > 

Re: [Announce] new committer: Gidon Gershinsky

2021-04-07 Thread Nándor Kollár
Congrats Gidon!

On 2021/04/07 11:55:45, Gabor Szadovszky  wrote: 
> The Project Management Committee (PMC) for Apache Parquet
> has invited Gidon Gershinsky to become a committer and we are pleased
> to announce that he has accepted.
> 
> Welcome Gidon!
> 


Re: [VOTE] Release Apache Parquet Format 2.9.0 RC0

2021-04-07 Thread Antoine Pitrou


Hi Gabor,

Ok, I updated the KEYS file in the Parquet SVN repository.
The changes do appear in
https://dist.apache.org/repos/dist/dev/parquet/KEYS -- but not in
https://downloads.apache.org/parquet/KEYS .  Is there any additional
step I should perform?

Regards

Antoine.


On Wed, 7 Apr 2021 15:19:24 +0200
Gabor Szadovszky  wrote:

> Hi Antoine,
> 
> Thanks for initiating this release! You need to update the listed KEYS file
> with your public key otherwise we cannot validate the signature. (To do
> that you need to update the releases svn repo. See details in the how to
> release doc about the publishing.)
> 
> Regards,
> Gabor
> 
> On Wed, Apr 7, 2021 at 3:10 PM Antoine Pitrou  wrote:
> 
> >
> > Hi everyone,
> >
> > I propose the following RC to be released as official Apache Parquet
> > Format 2.9.0 release.
> >
> > The commit id is b4f0c0a643a6ec1a7def37115dd6967ba9346df7
> > * This corresponds to the tag: apache-parquet-format-2.9.0-rc0
> > *
> > https://github.com/apache/parquet-format/tree/b4f0c0a643a6ec1a7def37115dd6967ba9346df7
> >
> > The release tarball, signature, and checksums are here:
> > *
> > https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.9.0-rc0/
> >
> > You can find the KEYS file here:
> > * https://downloads.apache.org/parquet/KEYS
> >
> > Binary artifacts are staged in Nexus here:
> > *
> > https://repository.apache.org/content/groups/staging/org/apache/parquet/parquet-format/
> >
> > This release includes the following important fixes and improvements:
> >
> > * PARQUET-1996 - [Format] Add interoperable LZ4 codec, deprecate existing
> > LZ4 codec
> > * PARQUET-2013 - [Format] Mention that converted types are deprecated
> >
> > ...among other changes (see CHANGES.md for full list).
> >
> > Please download, verify, and test.
> >
> > Please vote in the next 72 hours.
> >
> > [ ] +1 Release this as Apache Parquet 2.9.0
> > [ ] +0
> > [ ] -1 Do not release this because...
> >
> >
> > Regards
> >
> > Antoine.
> >
> >
> >  
> 





[jira] [Commented] (PARQUET-2016) Reference column_order field from column indexes

2021-04-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316296#comment-17316296
 ] 

ASF GitHub Bot commented on PARQUET-2016:
-

gszadovszky opened a new pull request #173:
URL: https://github.com/apache/parquet-format/pull/173


   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Parquet 
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references 
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
 - https://issues.apache.org/jira/browse/PARQUET-XXX
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reference column_order field from column indexes
> 
>
> Key: PARQUET-2016
> URL: https://issues.apache.org/jira/browse/PARQUET-2016
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
>
> We have created the field column_order to specify the ordering of a primitive 
> type. This is used for the row group level statistics but we never referenced 
> this from the column indexes feature while in both cases we heavily rely on 
> the ordering.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-format] gszadovszky opened a new pull request #173: PARQUET-2016: Reference column_order field from column indexes

2021-04-07 Thread GitBox


gszadovszky opened a new pull request #173:
URL: https://github.com/apache/parquet-format/pull/173


   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Parquet 
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references 
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
 - https://issues.apache.org/jira/browse/PARQUET-XXX
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[VOTE] Release Apache Parquet Format 2.9.0 RC0

2021-04-07 Thread Antoine Pitrou


Hi everyone,

I propose the following RC to be released as official Apache Parquet
Format 2.9.0 release.

The commit id is b4f0c0a643a6ec1a7def37115dd6967ba9346df7
* This corresponds to the tag: apache-parquet-format-2.9.0-rc0
* 
https://github.com/apache/parquet-format/tree/b4f0c0a643a6ec1a7def37115dd6967ba9346df7

The release tarball, signature, and checksums are here:
* 
https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.9.0-rc0/

You can find the KEYS file here:
* https://downloads.apache.org/parquet/KEYS

Binary artifacts are staged in Nexus here:
* 
https://repository.apache.org/content/groups/staging/org/apache/parquet/parquet-format/

This release includes the following important fixes and improvements:

* PARQUET-1996 - [Format] Add interoperable LZ4 codec, deprecate existing LZ4 
codec
* PARQUET-2013 - [Format] Mention that converted types are deprecated

...among other changes (see CHANGES.md for full list).

Please download, verify, and test.

Please vote in the next 72 hours.

[ ] +1 Release this as Apache Parquet 2.9.0
[ ] +0
[ ] -1 Do not release this because...


Regards

Antoine.




Re: [VOTE] Release Apache Parquet Format 2.9.0 RC0

2021-04-07 Thread Gabor Szadovszky
I don't have too much experience in svn. I usually follow the commands
listed in the how to release doc and it works for me. (Don't remember if
I've had to do some initial steps.) As a committer you should have write
access to all the repositories of the Parquet community.

On Wed, Apr 7, 2021 at 4:18 PM Antoine Pitrou  wrote:

>
> Ah!  It seems I can't push to that repo:
>
> SendingKEYS
> Transmitting file data .svn: E195023: Commit failed (details follow):
> svn: E195023: Changing file '/home/antoine/apache/parquet-release/KEYS' is
> forbidden by the server
> svn: E175013: Access to
> '/repos/dist/!svn/txr/46918-13e8/release/parquet/KEYS' forbidden
>
>
> The URL I used for checkout is
> https://apit...@dist.apache.org/repos/dist/release/parquet
> Should I use another one?
>
> Regards
>
> Antoine.
>
>
>
> On Wed, 7 Apr 2021 16:00:26 +0200
> Gabor Szadovszky
> 
> wrote:
> > Sorry, I've missed you updated the dev repo. The downloads page mirrors
> the
> > release repo. Yet another place (besides the parquet-format and
> parquet-mr
> > repos) where we store a KEYS file for whatever reason. Please update the
> > one in the release repo.
> >
> > On Wed, Apr 7, 2021 at 3:47 PM Gabor Szadovszky <
> > gabor.szadovs...@cloudera.com> wrote:
> >
> > > I guess it only requires some time to sync. Last time the release
> tarball
> > > required ~1hour to sync.
> > >
> > > On Wed, Apr 7, 2021 at 3:42 PM Antoine Pitrou 
> wrote:
> > >
> > >>
> > >> Hi Gabor,
> > >>
> > >> Ok, I updated the KEYS file in the Parquet SVN repository.
> > >> The changes do appear in
> > >> https://dist.apache.org/repos/dist/dev/parquet/KEYS -- but not in
> > >> https://downloads.apache.org/parquet/KEYS .  Is there any additional
> > >> step I should perform?
> > >>
> > >> Regards
> > >>
> > >> Antoine.
> > >>
> > >>
> > >> On Wed, 7 Apr 2021 15:19:24 +0200
> > >> Gabor Szadovszky  wrote:
> > >>
> > >> > Hi Antoine,
> > >> >
> > >> > Thanks for initiating this release! You need to update the listed
> KEYS
> > >> file
> > >> > with your public key otherwise we cannot validate the signature.
> (To do
> > >> > that you need to update the releases svn repo. See details in the
> how to
> > >> > release doc about the publishing.)
> > >> >
> > >> > Regards,
> > >> > Gabor
> > >> >
> > >> > On Wed, Apr 7, 2021 at 3:10 PM Antoine Pitrou 
>
> > >> wrote:
> > >> >
> > >> > >
> > >> > > Hi everyone,
> > >> > >
> > >> > > I propose the following RC to be released as official Apache
> Parquet
> > >> > > Format 2.9.0 release.
> > >> > >
> > >> > > The commit id is b4f0c0a643a6ec1a7def37115dd6967ba9346df7
> > >> > > * This corresponds to the tag: apache-parquet-format-2.9.0-rc0
> > >> > > *
> > >> > >
> > >>
> https://github.com/apache/parquet-format/tree/b4f0c0a643a6ec1a7def37115dd6967ba9346df7
>
> > >> > >
> > >> > > The release tarball, signature, and checksums are here:
> > >> > > *
> > >> > >
> > >>
> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.9.0-rc0/
>
> > >> > >
> > >> > > You can find the KEYS file here:
> > >> > > * https://downloads.apache.org/parquet/KEYS
> > >> > >
> > >> > > Binary artifacts are staged in Nexus here:
> > >> > > *
> > >> > >
> > >>
> https://repository.apache.org/content/groups/staging/org/apache/parquet/parquet-format/
>
> > >> > >
> > >> > > This release includes the following important fixes and
> improvements:
> > >> > >
> > >> > > * PARQUET-1996 - [Format] Add interoperable LZ4 codec, deprecate
> > >> existing
> > >> > > LZ4 codec
> > >> > > * PARQUET-2013 - [Format] Mention that converted types are
> deprecated
> > >> > >
> > >> > > ...among other changes (see CHANGES.md for full list).
> > >> > >
> > >> > > Please download, verify, and test.
> > >> > >
> > >> > > Please vote in the next 72 hours.
> > >> > >
> > >> > > [ ] +1 Release this as Apache Parquet 2.9.0
> > >> > > [ ] +0
> > >> > > [ ] -1 Do not release this because...
> > >> > >
> > >> > >
> > >> > > Regards
> > >> > >
> > >> > > Antoine.
> > >> > >
> > >> > >
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >>
> >
>
>
>
>


[jira] [Commented] (PARQUET-2019) [Format] Outdated KEYS file in git repo

2021-04-07 Thread Gabor Szadovszky (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316421#comment-17316421
 ] 

Gabor Szadovszky commented on PARQUET-2019:
---

+1 to remove KEYS from the parquet-format repo and from parquet-mr repo as 
well. I am not aware we have used these and we clearly not advertise them to 
validate the release. For release validation the official source of the KEYS 
file is https://downloads.apache.org/parquet/KEYS (via the svn repo ).
We might want to remove the KEYS file from the dev repo as well to keep the 
only one that is synced to the downloads server.

> [Format] Outdated KEYS file in git repo
> ---
>
> Key: PARQUET-2019
> URL: https://issues.apache.org/jira/browse/PARQUET-2019
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Priority: Trivial
>
> For some reason, the parquet-format git repo has a KEYS file that's outdated 
> compared to the one stored in the SVN repo. Unless there's a reason to keep a 
> KEYS file in git, I would suggest removing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-2019) [Format] Outdated KEYS file in git repo

2021-04-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316439#comment-17316439
 ] 

ASF GitHub Bot commented on PARQUET-2019:
-

pitrou opened a new pull request #174:
URL: https://github.com/apache/parquet-format/pull/174


   The reference KEYS files are in the SVN Parquet repositories ("dev" and 
"release").
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Parquet 
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references 
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
 - https://issues.apache.org/jira/browse/PARQUET-XXX
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Outdated KEYS file in git repo
> ---
>
> Key: PARQUET-2019
> URL: https://issues.apache.org/jira/browse/PARQUET-2019
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Priority: Trivial
>
> For some reason, the parquet-format git repo has a KEYS file that's outdated 
> compared to the one stored in the SVN repo. Unless there's a reason to keep a 
> KEYS file in git, I would suggest removing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-format] pitrou opened a new pull request #174: PARQUET-2019: Remove outdate KEYS file

2021-04-07 Thread GitBox


pitrou opened a new pull request #174:
URL: https://github.com/apache/parquet-format/pull/174


   The reference KEYS files are in the SVN Parquet repositories ("dev" and 
"release").
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Parquet 
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references 
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
 - https://issues.apache.org/jira/browse/PARQUET-XXX
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-mr] martin-g commented on pull request #866: PARQUET-1976: Update Scala and Maven plugin versions

2021-04-07 Thread GitBox


martin-g commented on pull request #866:
URL: https://github.com/apache/parquet-mr/pull/866#issuecomment-814857164


   >  I think it is better, in the end, to inherit versions from the Parent 
Apache POM. WDYT?
   
   Let me know if this is what Parquet team prefers and I will update the PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: [VOTE] Release Apache Parquet Format 2.9.0 RC0

2021-04-07 Thread Gabor Szadovszky
Hi Antoine,

Thanks for initiating this release! You need to update the listed KEYS file
with your public key otherwise we cannot validate the signature. (To do
that you need to update the releases svn repo. See details in the how to
release doc about the publishing.)

Regards,
Gabor

On Wed, Apr 7, 2021 at 3:10 PM Antoine Pitrou  wrote:

>
> Hi everyone,
>
> I propose the following RC to be released as official Apache Parquet
> Format 2.9.0 release.
>
> The commit id is b4f0c0a643a6ec1a7def37115dd6967ba9346df7
> * This corresponds to the tag: apache-parquet-format-2.9.0-rc0
> *
> https://github.com/apache/parquet-format/tree/b4f0c0a643a6ec1a7def37115dd6967ba9346df7
>
> The release tarball, signature, and checksums are here:
> *
> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.9.0-rc0/
>
> You can find the KEYS file here:
> * https://downloads.apache.org/parquet/KEYS
>
> Binary artifacts are staged in Nexus here:
> *
> https://repository.apache.org/content/groups/staging/org/apache/parquet/parquet-format/
>
> This release includes the following important fixes and improvements:
>
> * PARQUET-1996 - [Format] Add interoperable LZ4 codec, deprecate existing
> LZ4 codec
> * PARQUET-2013 - [Format] Mention that converted types are deprecated
>
> ...among other changes (see CHANGES.md for full list).
>
> Please download, verify, and test.
>
> Please vote in the next 72 hours.
>
> [ ] +1 Release this as Apache Parquet 2.9.0
> [ ] +0
> [ ] -1 Do not release this because...
>
>
> Regards
>
> Antoine.
>
>
>


Re: [VOTE] Release Apache Parquet Format 2.9.0 RC0

2021-04-07 Thread Gabor Szadovszky
I guess it only requires some time to sync. Last time the release tarball
required ~1hour to sync.

On Wed, Apr 7, 2021 at 3:42 PM Antoine Pitrou  wrote:

>
> Hi Gabor,
>
> Ok, I updated the KEYS file in the Parquet SVN repository.
> The changes do appear in
> https://dist.apache.org/repos/dist/dev/parquet/KEYS -- but not in
> https://downloads.apache.org/parquet/KEYS .  Is there any additional
> step I should perform?
>
> Regards
>
> Antoine.
>
>
> On Wed, 7 Apr 2021 15:19:24 +0200
> Gabor Szadovszky  wrote:
>
> > Hi Antoine,
> >
> > Thanks for initiating this release! You need to update the listed KEYS
> file
> > with your public key otherwise we cannot validate the signature. (To do
> > that you need to update the releases svn repo. See details in the how to
> > release doc about the publishing.)
> >
> > Regards,
> > Gabor
> >
> > On Wed, Apr 7, 2021 at 3:10 PM Antoine Pitrou 
> wrote:
> >
> > >
> > > Hi everyone,
> > >
> > > I propose the following RC to be released as official Apache Parquet
> > > Format 2.9.0 release.
> > >
> > > The commit id is b4f0c0a643a6ec1a7def37115dd6967ba9346df7
> > > * This corresponds to the tag: apache-parquet-format-2.9.0-rc0
> > > *
> > >
> https://github.com/apache/parquet-format/tree/b4f0c0a643a6ec1a7def37115dd6967ba9346df7
> > >
> > > The release tarball, signature, and checksums are here:
> > > *
> > >
> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.9.0-rc0/
> > >
> > > You can find the KEYS file here:
> > > * https://downloads.apache.org/parquet/KEYS
> > >
> > > Binary artifacts are staged in Nexus here:
> > > *
> > >
> https://repository.apache.org/content/groups/staging/org/apache/parquet/parquet-format/
> > >
> > > This release includes the following important fixes and improvements:
> > >
> > > * PARQUET-1996 - [Format] Add interoperable LZ4 codec, deprecate
> existing
> > > LZ4 codec
> > > * PARQUET-2013 - [Format] Mention that converted types are deprecated
> > >
> > > ...among other changes (see CHANGES.md for full list).
> > >
> > > Please download, verify, and test.
> > >
> > > Please vote in the next 72 hours.
> > >
> > > [ ] +1 Release this as Apache Parquet 2.9.0
> > > [ ] +0
> > > [ ] -1 Do not release this because...
> > >
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > >
> >
>
>
>
>


[jira] [Commented] (PARQUET-2000) build failed on AArch64, Fedora 33

2021-04-07 Thread Lutz Weischer (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316367#comment-17316367
 ] 

Lutz Weischer commented on PARQUET-2000:


wget 
https://github.com/apache/parquet-testing/raw/40379b3/data/encrypt_columns_and_footer.parquet.encrypted

works. 

LC_ALL=C mvn clean install

which includes the tests, works, using Java 16. 

> build failed on AArch64, Fedora 33 
> ---
>
> Key: PARQUET-2000
> URL: https://issues.apache.org/jira/browse/PARQUET-2000
> Project: Parquet
>  Issue Type: Bug
>Reporter: Lutz Weischer
>Priority: Major
>
> Apache Thrift 0.12.0 is required. Building it reports unsupported .NET, etc. 
> Installing 0.13.0 using yum results in an error on mvn package. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [Announce] new committer: Gidon Gershinsky

2021-04-07 Thread Micah Kornfield
Congrats Gidon, well deserved.

On Wed, Apr 7, 2021 at 5:10 AM Nándor Kollár  wrote:

> Congrats Gidon!
>
> On 2021/04/07 11:55:45, Gabor Szadovszky  wrote:
> > The Project Management Committee (PMC) for Apache Parquet
> > has invited Gidon Gershinsky to become a committer and we are pleased
> > to announce that he has accepted.
> >
> > Welcome Gidon!
> >
>


Re: [Announce] new committer: Gidon Gershinsky

2021-04-07 Thread Dongjoon Hyun
Congrats, Gidon! :)

Bests,
Dongjoon.

On Wed, Apr 7, 2021 at 9:06 AM Chao Sun  wrote:

> Congrats Gidon!
>
> On Wed, Apr 7, 2021 at 8:27 AM Micah Kornfield 
> wrote:
>
> > Congrats Gidon, well deserved.
> >
> > On Wed, Apr 7, 2021 at 5:10 AM Nándor Kollár  wrote:
> >
> > > Congrats Gidon!
> > >
> > > On 2021/04/07 11:55:45, Gabor Szadovszky  wrote:
> > > > The Project Management Committee (PMC) for Apache Parquet
> > > > has invited Gidon Gershinsky to become a committer and we are pleased
> > > > to announce that he has accepted.
> > > >
> > > > Welcome Gidon!
> > > >
> > >
> >
>


Re: [Announce] new committer: Gidon Gershinsky

2021-04-07 Thread Chao Sun
Congrats Gidon!

On Wed, Apr 7, 2021 at 8:27 AM Micah Kornfield 
wrote:

> Congrats Gidon, well deserved.
>
> On Wed, Apr 7, 2021 at 5:10 AM Nándor Kollár  wrote:
>
> > Congrats Gidon!
> >
> > On 2021/04/07 11:55:45, Gabor Szadovszky  wrote:
> > > The Project Management Committee (PMC) for Apache Parquet
> > > has invited Gidon Gershinsky to become a committer and we are pleased
> > > to announce that he has accepted.
> > >
> > > Welcome Gidon!
> > >
> >
>


[jira] [Commented] (PARQUET-1222) Specify a well-defined sorting order for float and double types

2021-04-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316469#comment-17316469
 ] 

Antoine Pitrou commented on PARQUET-1222:
-

Some answers after looking through the code:
* parquet-cpp does not read nor write ColumnIndex
* our handling of min_value and max_value on the read path is naive. We use the 
same comparisons regardless of whether ColumnOrder is present or not. In 
particular, we use native type-specific greater-or-equal comparison (e.g. 
floating-point comparison), which is due to fail with NaNs (but will succeed 
with signed zeros).



> Specify a well-defined sorting order for float and double types
> ---
>
> Key: PARQUET-1222
> URL: https://issues.apache.org/jira/browse/PARQUET-1222
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Zoltan Ivanfi
>Priority: Critical
>
> Currently parquet-format specifies the sort order for floating point numbers 
> as follows:
> {code:java}
>*   FLOAT - signed comparison of the represented value
>*   DOUBLE - signed comparison of the represented value
> {code}
> The problem is that the comparison of floating point numbers is only a 
> partial ordering with strange behaviour in specific corner cases. For 
> example, according to IEEE 754, -0 is neither less nor more than \+0 and 
> comparing NaN to anything always returns false. This ordering is not suitable 
> for statistics. Additionally, the Java implementation already uses a 
> different (total) ordering that handles these cases correctly but differently 
> than the C\+\+ implementations, which leads to interoperability problems.
> TypeDefinedOrder for doubles and floats should be deprecated and a new 
> TotalFloatingPointOrder should be introduced. The default for writing doubles 
> and floats would be the new TotalFloatingPointOrder. This ordering should be 
> effective and easy to implement in all programming languages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (PARQUET-2019) [Format] Outdated KEYS file in git repo

2021-04-07 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved PARQUET-2019.
-
  Assignee: Antoine Pitrou
Resolution: Fixed

Fixed by Github PR https://github.com/apache/parquet-format/pull/174

> [Format] Outdated KEYS file in git repo
> ---
>
> Key: PARQUET-2019
> URL: https://issues.apache.org/jira/browse/PARQUET-2019
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Trivial
>
> For some reason, the parquet-format git repo has a KEYS file that's outdated 
> compared to the one stored in the SVN repo. Unless there's a reason to keep a 
> KEYS file in git, I would suggest removing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-2019) [Format] Outdated KEYS file in git repo

2021-04-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316485#comment-17316485
 ] 

ASF GitHub Bot commented on PARQUET-2019:
-

pitrou merged pull request #174:
URL: https://github.com/apache/parquet-format/pull/174


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Format] Outdated KEYS file in git repo
> ---
>
> Key: PARQUET-2019
> URL: https://issues.apache.org/jira/browse/PARQUET-2019
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Priority: Trivial
>
> For some reason, the parquet-format git repo has a KEYS file that's outdated 
> compared to the one stored in the SVN repo. Unless there's a reason to keep a 
> KEYS file in git, I would suggest removing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-format] pitrou merged pull request #174: PARQUET-2019: Remove outdate KEYS file

2021-04-07 Thread GitBox


pitrou merged pull request #174:
URL: https://github.com/apache/parquet-format/pull/174


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: [Announce] new committer: Gidon Gershinsky

2021-04-07 Thread Driesprong, Fokko
Congrats Gidon, well deserved :)

Op wo 7 apr. 2021 om 18:11 schreef Dongjoon Hyun 

> Congrats, Gidon! :)
>
> Bests,
> Dongjoon.
>
> On Wed, Apr 7, 2021 at 9:06 AM Chao Sun  wrote:
>
> > Congrats Gidon!
> >
> > On Wed, Apr 7, 2021 at 8:27 AM Micah Kornfield 
> > wrote:
> >
> > > Congrats Gidon, well deserved.
> > >
> > > On Wed, Apr 7, 2021 at 5:10 AM Nándor Kollár 
> wrote:
> > >
> > > > Congrats Gidon!
> > > >
> > > > On 2021/04/07 11:55:45, Gabor Szadovszky  wrote:
> > > > > The Project Management Committee (PMC) for Apache Parquet
> > > > > has invited Gidon Gershinsky to become a committer and we are
> pleased
> > > > > to announce that he has accepted.
> > > > >
> > > > > Welcome Gidon!
> > > > >
> > > >
> > >
> >
>


[GitHub] [parquet-mr] saravanb-msft opened a new pull request #887: Fix block size issue

2021-04-07 Thread GitBox


saravanb-msft opened a new pull request #887:
URL: https://github.com/apache/parquet-mr/pull/887


   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Parquet 
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references 
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
 - https://issues.apache.org/jira/browse/PARQUET-XXX
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (PARQUET-2016) Reference column_order field from column indexes

2021-04-07 Thread Gabor Szadovszky (Jira)
Gabor Szadovszky created PARQUET-2016:
-

 Summary: Reference column_order field from column indexes
 Key: PARQUET-2016
 URL: https://issues.apache.org/jira/browse/PARQUET-2016
 Project: Parquet
  Issue Type: Bug
  Components: parquet-format
Reporter: Gabor Szadovszky
Assignee: Gabor Szadovszky


We have created the field column_order to specify the ordering of a primitive 
type. This is used for the row group level statistics but we never referenced 
this from the column indexes feature while in both cases we heavily rely on the 
ordering.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (PARQUET-2017) Handle special values of floating point statistics

2021-04-07 Thread Gabor Szadovszky (Jira)
Gabor Szadovszky created PARQUET-2017:
-

 Summary: Handle special values of floating point statistics
 Key: PARQUET-2017
 URL: https://issues.apache.org/jira/browse/PARQUET-2017
 Project: Parquet
  Issue Type: Bug
  Components: parquet-mr
Reporter: Gabor Szadovszky
Assignee: Gabor Szadovszky


Based on PARQUET-1251 we have implemented the suggested workaround but it is 
not complete in all situations. 
* We handle the special floating point values at row-group level in parquet-mr 
but only for the read path. For the write path we still write these values.
* For column indexes we handle these values but only for the write path and not 
for the read path.

We should implement the workaround for both read and write paths for all cases 
so we not only handle potentially invalid values but also don't write them to 
the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)