Re: Current status of Data Page V2?

2020-10-21 Thread Micah Kornfield
I've created https://github.com/apache/parquet-format/pull/163 to try to document these (note I really don't have historical context here so please review carefully). I would appreciate it if someone could point me to a reference on what the current status of V2 is? What is left unsettled? When

[jira] [Commented] (PARQUET-1933) [Format] Clarify encodings and data page guidance.

2020-10-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218722#comment-17218722 ] ASF GitHub Bot commented on PARQUET-1933: - emkornfield opened a new pull request #163: URL:

[GitHub] [parquet-format] emkornfield opened a new pull request #163: PARQUET-1933: Clarify encodings relative to data page usage.

2020-10-21 Thread GitBox
emkornfield opened a new pull request #163: URL: https://github.com/apache/parquet-format/pull/163 Make sure you have checked _all_ steps below. ### Jira - [X ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET-1933) issues and

[jira] [Created] (PARQUET-1933) [Format] Clarify encodings and data page guidance.

2020-10-21 Thread Micah Kornfield (Jira)
Micah Kornfield created PARQUET-1933: Summary: [Format] Clarify encodings and data page guidance. Key: PARQUET-1933 URL: https://issues.apache.org/jira/browse/PARQUET-1933 Project: Parquet

[GitHub] [parquet-mr] shangxinli commented on pull request #808: Parquet-1396: Cryptodata Interface for Schema Activation of Parquet E…

2020-10-21 Thread GitBox
shangxinli commented on pull request #808: URL: https://github.com/apache/parquet-mr/pull/808#issuecomment-714205196 I did a push prematurely. That is not ready for review yet. I will rework on it soon and let you know when it is ready for review.

[GitHub] [parquet-mr] shangxinli commented on a change in pull request #808: Parquet-1396: Cryptodata Interface for Schema Activation of Parquet E…

2020-10-21 Thread GitBox
shangxinli commented on a change in pull request #808: URL: https://github.com/apache/parquet-mr/pull/808#discussion_r509863658 ## File path: parquet-hadoop/src/test/java/org/apache/parquet/crypto/propertiesfactory/SchemaCryptoPropertiesFactory.java ## @@ -0,0 +1,135 @@ +/* +

[jira] [Commented] (PARQUET-1396) Example of using EncryptionPropertiesFactory and DecryptionPropertiesFactory

2020-10-21 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218717#comment-17218717 ] Xinli Shang commented on PARQUET-1396: -- Most of the functionality of this Jira has been addressed

[jira] [Updated] (PARQUET-1396) Example of using EncryptionPropertiesFactory and DecryptionPropertiesFactory

2020-10-21 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinli Shang updated PARQUET-1396: - Summary: Example of using EncryptionPropertiesFactory and DecryptionPropertiesFactory (was:

[GitHub] [parquet-mr] shangxinli commented on a change in pull request #808: Parquet-1396: Cryptodata Interface for Schema Activation of Parquet E…

2020-10-21 Thread GitBox
shangxinli commented on a change in pull request #808: URL: https://github.com/apache/parquet-mr/pull/808#discussion_r509848961 ## File path: parquet-hadoop/src/test/java/org/apache/parquet/crypto/propertiesfactory/SchemaControlEncryptionTest.java ## @@ -0,0 +1,248 @@ +/* + *

[GitHub] [parquet-mr] shangxinli commented on a change in pull request #808: Parquet-1396: Cryptodata Interface for Schema Activation of Parquet E…

2020-10-21 Thread GitBox
shangxinli commented on a change in pull request #808: URL: https://github.com/apache/parquet-mr/pull/808#discussion_r509848811 ## File path: parquet-column/src/main/java/org/apache/parquet/schema/Type.java ## @@ -362,5 +362,4 @@ void checkContains(Type subType) { *

[jira] [Commented] (PARQUET-1929) Bump Snappy to 1.1.8

2020-10-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218656#comment-17218656 ] ASF GitHub Bot commented on PARQUET-1929: - maropu commented on pull request #833: URL:

[GitHub] [parquet-mr] maropu commented on pull request #833: [PARQUET-1929] Bump Snappy to 1.1.8

2020-10-21 Thread GitBox
maropu commented on pull request #833: URL: https://github.com/apache/parquet-mr/pull/833#issuecomment-714013870 Thanks~, @Fokko This is an automated message from the Apache Git Service. To respond to the message, please

[jira] [Commented] (PARQUET-1932) Bump Fastutil to 8.4.2

2020-10-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218576#comment-17218576 ] ASF GitHub Bot commented on PARQUET-1932: - Fokko opened a new pull request #836: URL:

[GitHub] [parquet-mr] Fokko opened a new pull request #836: [PARQUET-1932] Bump Fastutil to 8.4.2

2020-10-21 Thread GitBox
Fokko opened a new pull request #836: URL: https://github.com/apache/parquet-mr/pull/836 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in

[jira] [Created] (PARQUET-1932) Bump Fastutil to 8.4.2

2020-10-21 Thread Fokko Driesprong (Jira)
Fokko Driesprong created PARQUET-1932: - Summary: Bump Fastutil to 8.4.2 Key: PARQUET-1932 URL: https://issues.apache.org/jira/browse/PARQUET-1932 Project: Parquet Issue Type: Improvement

[jira] [Commented] (PARQUET-1900) Run mvn clean in CI

2020-10-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218571#comment-17218571 ] ASF GitHub Bot commented on PARQUET-1900: - Fokko closed pull request #812: URL:

[GitHub] [parquet-mr] Fokko closed pull request #812: PARQUET-1900: Add mvn clean to CI

2020-10-21 Thread GitBox
Fokko closed pull request #812: URL: https://github.com/apache/parquet-mr/pull/812 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [parquet-mr] Fokko commented on pull request #812: PARQUET-1900: Add mvn clean to CI

2020-10-21 Thread GitBox
Fokko commented on pull request #812: URL: https://github.com/apache/parquet-mr/pull/812#issuecomment-713851121 Thanks @qinghui-xu for following up. Let me know if you find out where the problem lies. This is an automated

[jira] [Commented] (PARQUET-1910) Parquet-cli is broken after TransCompressionCommand was added

2020-10-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218566#comment-17218566 ] ASF GitHub Bot commented on PARQUET-1910: - Fokko merged pull request #814: URL:

[jira] [Resolved] (PARQUET-1910) Parquet-cli is broken after TransCompressionCommand was added

2020-10-21 Thread Fokko Driesprong (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved PARQUET-1910. --- Fix Version/s: 1.12.0 Resolution: Fixed > Parquet-cli is broken after

[GitHub] [parquet-mr] Fokko merged pull request #814: PARQUET-1910: Fix broken parquet-cli

2020-10-21 Thread GitBox
Fokko merged pull request #814: URL: https://github.com/apache/parquet-mr/pull/814 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[jira] [Resolved] (PARQUET-1924) Do not Instantiate a New LongHashFunction

2020-10-21 Thread Fokko Driesprong (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved PARQUET-1924. --- Fix Version/s: 1.12.0 Resolution: Fixed > Do not Instantiate a New

[jira] [Commented] (PARQUET-1924) Do not Instantiate a New LongHashFunction

2020-10-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218565#comment-17218565 ] ASF GitHub Bot commented on PARQUET-1924: - Fokko merged pull request #827: URL:

[GitHub] [parquet-mr] Fokko merged pull request #827: PARQUET-1924: Do not Instantiate a New LongHashFunction

2020-10-21 Thread GitBox
Fokko merged pull request #827: URL: https://github.com/apache/parquet-mr/pull/827 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[jira] [Commented] (PARQUET-1931) Bump Junit 4.13.1

2020-10-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218556#comment-17218556 ] ASF GitHub Bot commented on PARQUET-1931: - Fokko opened a new pull request #835: URL:

[jira] [Created] (PARQUET-1931) Bump Junit 4.13.1

2020-10-21 Thread Fokko Driesprong (Jira)
Fokko Driesprong created PARQUET-1931: - Summary: Bump Junit 4.13.1 Key: PARQUET-1931 URL: https://issues.apache.org/jira/browse/PARQUET-1931 Project: Parquet Issue Type: Improvement

[GitHub] [parquet-mr] Fokko opened a new pull request #835: [PARQUET-1931] Bump Junit to 4.13.1

2020-10-21 Thread GitBox
Fokko opened a new pull request #835: URL: https://github.com/apache/parquet-mr/pull/835 Looks like there is a vulnerability in Junit: https://github.com/Fokko/parquet-mr/pull/44 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the

[jira] [Commented] (PARQUET-1930) Bump Apache Thrift to 0.13

2020-10-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218553#comment-17218553 ] ASF GitHub Bot commented on PARQUET-1930: - Fokko opened a new pull request #834: URL:

[GitHub] [parquet-mr] Fokko opened a new pull request #834: [PARQUET-1930] Bump Apache Thrift to 0.13

2020-10-21 Thread GitBox
Fokko opened a new pull request #834: URL: https://github.com/apache/parquet-mr/pull/834 Make sure you have checked _all_ steps below. Changelog: https://github.com/apache/thrift/blob/master/CHANGES.md#0130 Thrift 0.13.0 makes the first steps to be compatible with Java 9+, and

[jira] [Commented] (PARQUET-1930) Bump Apache Thrift to 0.13

2020-10-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218549#comment-17218549 ] ASF GitHub Bot commented on PARQUET-1930: - Fokko opened a new pull request #162: URL:

[GitHub] [parquet-format] Fokko opened a new pull request #162: [PARQUET-1930] Bump Apache Thrift to 0.13

2020-10-21 Thread GitBox
Fokko opened a new pull request #162: URL: https://github.com/apache/parquet-format/pull/162 Make sure you have checked _all_ steps below. Changelog: https://github.com/apache/thrift/blob/master/CHANGES.md#0130 ### Jira - [ ] My PR addresses the following [Parquet

[jira] [Created] (PARQUET-1930) Bump Apache Thrift to 0.13

2020-10-21 Thread Fokko Driesprong (Jira)
Fokko Driesprong created PARQUET-1930: - Summary: Bump Apache Thrift to 0.13 Key: PARQUET-1930 URL: https://issues.apache.org/jira/browse/PARQUET-1930 Project: Parquet Issue Type:

[GitHub] [parquet-mr] Fokko opened a new pull request #833: [PARQUET-1929] Bump Snappy to 1.1.8

2020-10-21 Thread GitBox
Fokko opened a new pull request #833: URL: https://github.com/apache/parquet-mr/pull/833 For performance improvements; the released snappy-java bundles the latest Snappy v1.1.8 binaries with small performance improvements. - snappy-java release note:

[jira] [Commented] (PARQUET-1929) Bump Snappy to 1.1.8

2020-10-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218511#comment-17218511 ] ASF GitHub Bot commented on PARQUET-1929: - Fokko opened a new pull request #833: URL:

[jira] [Created] (PARQUET-1929) Bump Snappy to 1.1.8

2020-10-21 Thread Fokko Driesprong (Jira)
Fokko Driesprong created PARQUET-1929: - Summary: Bump Snappy to 1.1.8 Key: PARQUET-1929 URL: https://issues.apache.org/jira/browse/PARQUET-1929 Project: Parquet Issue Type: Improvement

Re: Create a parquet-protobuf JIRA component

2020-10-21 Thread Aaron Niskode-Dossett
Gabor -- is there an active parquet committer who works in the protobuf module? There are several open PRs (mostly from David, one from me, perhaps others) that would constitute nice improvements to that module. Thanks, Aaron On Wed, Oct 21, 2020 at 7:39 AM Aaron Niskode-Dossett <

[jira] [Updated] (PARQUET-1917) [parquet-proto] default values are stored in oneOf fields that aren't set

2020-10-21 Thread Aaron Blake Niskode-Dossett (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Blake Niskode-Dossett updated PARQUET-1917: - Component/s: parquet-protobuf > [parquet-proto] default values

[jira] [Commented] (PARQUET-1927) ColumnIndex should provide number of records skipped

2020-10-21 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218325#comment-17218325 ] Xinli Shang commented on PARQUET-1927: -- The workaround I can think of is to apply ColumnIndex to

Re: Create a parquet-protobuf JIRA component

2020-10-21 Thread Aaron Niskode-Dossett
Wonderful, thank you! My company hopes to use proto+parquet a lot and I look forward to contributing! On Wed, Oct 21, 2020 at 2:54 AM Gabor Szadovszky wrote: > Sorry, I've missed this thread. Just created the component. Feel free to > use it. > > On Tue, Oct 20, 2020 at 4:27 PM Aaron

[GitHub] [parquet-mr] ggershinsky commented on a change in pull request #808: Parquet-1396: Cryptodata Interface for Schema Activation of Parquet E…

2020-10-21 Thread GitBox
ggershinsky commented on a change in pull request #808: URL: https://github.com/apache/parquet-mr/pull/808#discussion_r509206279 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetWriter.java ## @@ -279,6 +279,11 @@ public ParquetWriter(Path file,

[GitHub] [parquet-mr] ggershinsky commented on a change in pull request #808: Parquet-1396: Cryptodata Interface for Schema Activation of Parquet E…

2020-10-21 Thread GitBox
ggershinsky commented on a change in pull request #808: URL: https://github.com/apache/parquet-mr/pull/808#discussion_r509200667 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetWriter.java ## @@ -279,6 +279,11 @@ public ParquetWriter(Path file,

[GitHub] [parquet-mr] ggershinsky commented on pull request #808: Parquet-1396: Cryptodata Interface for Schema Activation of Parquet E…

2020-10-21 Thread GitBox
ggershinsky commented on pull request #808: URL: https://github.com/apache/parquet-mr/pull/808#issuecomment-713498654 > I have some comments otherwise I am fine with this change. > > Meanwhile, the title and maybe the jira description do not fit this change. I think, they should be

[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #808: Parquet-1396: Cryptodata Interface for Schema Activation of Parquet E…

2020-10-21 Thread GitBox
gszadovszky commented on a change in pull request #808: URL: https://github.com/apache/parquet-mr/pull/808#discussion_r509113750 ## File path: parquet-hadoop/src/test/java/org/apache/parquet/crypto/propertiesfactory/SchemaControlEncryptionTest.java ## @@ -0,0 +1,248 @@ +/* +

[jira] [Commented] (PARQUET-1927) ColumnIndex should provide number of records skipped

2020-10-21 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218176#comment-17218176 ] Gabor Szadovszky commented on PARQUET-1927: --- I think, it is fine extending the current API if

Re: Create a parquet-protobuf JIRA component

2020-10-21 Thread Gabor Szadovszky
Sorry, I've missed this thread. Just created the component. Feel free to use it. On Tue, Oct 20, 2020 at 4:27 PM Aaron Niskode-Dossett wrote: > Hi, just bumping this request for a parquet-protobuf JIRA component again. > > On Fri, Oct 2, 2020 at 9:03 AM David wrote: > > > Hello Gang, > > > > I