[parquet-hadoop] - Hadoop Illegal reflective access

2020-07-06 Thread Zubair Uddin Farooqui
Hi Team, I am using parquet-hadoop for parsing Hadoop records with JDK-11. I am getting the following warnings when I try to open file. WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil

Re: [DISCUSS] Ongoing LZ4 problems with Parquet files

2020-07-06 Thread Wes McKinney
On Mon, Jul 6, 2020 at 11:08 AM Antoine Pitrou wrote: > > > Le 06/07/2020 à 17:57, Steve Kim a écrit : > > The Parquet format specification is ambiguous about the exact details of > > LZ4 compression. However, the *de facto* reference implementation in Java > > (parquet-mr) uses the Hadoop LZ4 cod

Re: [DISCUSS] Ongoing LZ4 problems with Parquet files

2020-07-06 Thread Steve Kim
> Would that keep compatibility with existing files produces by Parquet C++? Changing the lz4 implementation to be compatible with parquet-mr/hadoop would break compatibility with any existing files that were written by Parquet C++ using lz4 compression. I believe that it is not possible to reliab

Re: Subject: [VOTE] Release Apache Parquet 1.11.1 RC0

2020-07-06 Thread Ryan Blue
Gabor, is it possible to add PARQUET-1853 to this patch release? That fixes the problem that caused the parquet-avro Jar to be huge. Sorry for not suggesting this on the discuss thread for the release, I must have missed it. rb On Mon, Jul 6, 2020 at 2:36 AM Gabor Szadovszky wrote: > Hi everyo

Re: [DISCUSS] Ongoing LZ4 problems with Parquet files

2020-07-06 Thread Antoine Pitrou
Le 06/07/2020 à 17:57, Steve Kim a écrit : > The Parquet format specification is ambiguous about the exact details of > LZ4 compression. However, the *de facto* reference implementation in Java > (parquet-mr) uses the Hadoop LZ4 codec. > > I think that it is important for Parquet c++ to have com

Re: [DISCUSS] Ongoing LZ4 problems with Parquet files

2020-07-06 Thread Steve Kim
The Parquet format specification is ambiguous about the exact details of LZ4 compression. However, the *de facto* reference implementation in Java (parquet-mr) uses the Hadoop LZ4 codec. I think that it is important for Parquet c++ to have compatibility and feature parity with parquet-mr when poss

Subject: [VOTE] Release Apache Parquet 1.11.1 RC0

2020-07-06 Thread Gabor Szadovszky
Hi everyone, I propose the following RC to be released as official Apache Parquet 1.11.1 release. The commit id is 1796c55d6bfb614b78ff497984078430837b7b07 * This corresponds to the tag: apache-parquet-1.11.1-rc0 * https://github.com/apache/parquet-mr/tree/1796c55d6bfb614b78ff497984078430837b7b07

[jira] [Resolved] (PARQUET-1864) How to generate a file with UUID as a Logical type

2020-07-06 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-1864. --- Fix Version/s: (was: 1.11.1) Assignee: Gabor Szadovszky Resoluti

[jira] [Commented] (PARQUET-1739) Make Spark SQL support Column indexes

2020-07-06 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151859#comment-17151859 ] Gabor Szadovszky commented on PARQUET-1739: --- Removed 1.11.1 as target release

[jira] [Updated] (PARQUET-1739) Make Spark SQL support Column indexes

2020-07-06 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky updated PARQUET-1739: -- Fix Version/s: (was: 1.11.1) > Make Spark SQL support Column indexes > --

[GitHub] [parquet-mr] gszadovszky merged pull request #615: PARQUET-1373: Encryption key tools

2020-07-06 Thread GitBox
gszadovszky merged pull request #615: URL: https://github.com/apache/parquet-mr/pull/615 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[jira] [Commented] (PARQUET-1373) Encryption key management tools

2020-07-06 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151856#comment-17151856 ] ASF GitHub Bot commented on PARQUET-1373: - gszadovszky merged pull request #615

[jira] [Assigned] (PARQUET-1879) Apache Arrow can not read a Parquet File written with Parqet-Avro 1.11.0 with a Map field

2020-07-06 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-1879: - Assignee: Matthew McMahon > Apache Arrow can not read a Parquet File written w

[GitHub] [parquet-mr] gszadovszky merged pull request #798: PARQUET-1879 MapKeyValue is not a valid Logical Type

2020-07-06 Thread GitBox
gszadovszky merged pull request #798: URL: https://github.com/apache/parquet-mr/pull/798 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[jira] [Commented] (PARQUET-1879) Apache Arrow can not read a Parquet File written with Parqet-Avro 1.11.0 with a Map field

2020-07-06 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151837#comment-17151837 ] ASF GitHub Bot commented on PARQUET-1879: - gszadovszky merged pull request #798