Re: Files with inconsistent num_rows and num_values?
Hi Gang, For writes I'm seeing "parquet-mr version 1.11.1" and "parquet-mr version 1.10.1". I need to look more into the page headers to check for consistency. At the column level, in some cases the number of values read by pyarrow is consistent with num_rows and in some cases it is consistent with num_values. I don't see any discernable pattern based on schema or types. It looks like the parquet files might have been written with avro ("parquet.avro.schema" key and a corresponding schema are present in their metadata). Thanks, Micah On Tue, Nov 28, 2023 at 6:30 PM Gang Wu wrote: > Hi Micah, > > Does the FileMetaData.version [1] provide any information about > the writer? What about the num_values in each page header? Is > the actual number of values consistent with num_values in the > ColumnMetaData? > > [1] > > https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L1108 > > Best, > Gang > > On Wed, Nov 29, 2023 at 2:22 AM Micah Kornfield > wrote: > > > We've recently encountered files that have inconsistencies between the > > number of rows specified in the row group [1] and the total number of > > values in a column [2] for non-repeated columns (within a file there is > > inconsistency between columns but all counts appear to be greater than or > > equal to the number of rows). . > > > > Two questions: > > 1. Is anyone aware of parquet implementations that might generate files > > like this? > > 2. Does anyone have an opinion on the correct interpretation of these > > files? Should the files be treated as corrupt, or should the number of > > rows be treated as authoritative and any additional data in a column be > > truncated? > > > > It appears different engines make different choices in this case. Arrow > > treats this as corruption. Spark seems to allow reading the data. > > > > Thanks, > > Micah > > > > > > [1] > > > > > https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L895 > > [2] > > > > > https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L786 > > >
[jira] [Commented] (PARQUET-2396) Refactor `ColumnIndexBuilder`
[ https://issues.apache.org/jira/browse/PARQUET-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790918#comment-17790918 ] ASF GitHub Bot commented on PARQUET-2396: - zhangjiashen commented on code in PR #1219: URL: https://github.com/apache/parquet-mr/pull/1219#discussion_r1408788644 ## parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java: ## @@ -298,24 +295,22 @@ public > PrimitiveIterator.OfInt visit(NotEq notEq) { public > PrimitiveIterator.OfInt visit(In in) { Set values = in.getValues(); IntSet matchingIndexesForNull = new IntOpenHashSet(); // for null - Iterator it = values.iterator(); - while(it.hasNext()) { -T value = it.next(); -if (value == null) { - if (nullCounts == null) { -// Searching for nulls so if we don't have null related statistics we have to return all pages -return IndexIterator.all(getPageCount()); - } else { -for (int i = 0; i < nullCounts.length; i++) { - if (nullCounts[i] > 0) { -matchingIndexesForNull.add(i); + for (T value : values) { + if (value == null) { Review Comment: Nit: Let's modify the indent spaces to 2 and ensure consistency and similar to changes below? > Refactor `ColumnIndexBuilder` > - > > Key: PARQUET-2396 > URL: https://issues.apache.org/jira/browse/PARQUET-2396 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] PARQUET-2396: Refactor `ColumnIndexBuilder` [parquet-mr]
zhangjiashen commented on code in PR #1219: URL: https://github.com/apache/parquet-mr/pull/1219#discussion_r1408788644 ## parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java: ## @@ -298,24 +295,22 @@ public > PrimitiveIterator.OfInt visit(NotEq notEq) { public > PrimitiveIterator.OfInt visit(In in) { Set values = in.getValues(); IntSet matchingIndexesForNull = new IntOpenHashSet(); // for null - Iterator it = values.iterator(); - while(it.hasNext()) { -T value = it.next(); -if (value == null) { - if (nullCounts == null) { -// Searching for nulls so if we don't have null related statistics we have to return all pages -return IndexIterator.all(getPageCount()); - } else { -for (int i = 0; i < nullCounts.length; i++) { - if (nullCounts[i] > 0) { -matchingIndexesForNull.add(i); + for (T value : values) { + if (value == null) { Review Comment: Nit: Let's modify the indent spaces to 2 and ensure consistency and similar to changes below? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2396) Refactor `ColumnIndexBuilder`
[ https://issues.apache.org/jira/browse/PARQUET-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790917#comment-17790917 ] ASF GitHub Bot commented on PARQUET-2396: - zhangjiashen commented on code in PR #1219: URL: https://github.com/apache/parquet-mr/pull/1219#discussion_r1408788644 ## parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java: ## @@ -298,24 +295,22 @@ public > PrimitiveIterator.OfInt visit(NotEq notEq) { public > PrimitiveIterator.OfInt visit(In in) { Set values = in.getValues(); IntSet matchingIndexesForNull = new IntOpenHashSet(); // for null - Iterator it = values.iterator(); - while(it.hasNext()) { -T value = it.next(); -if (value == null) { - if (nullCounts == null) { -// Searching for nulls so if we don't have null related statistics we have to return all pages -return IndexIterator.all(getPageCount()); - } else { -for (int i = 0; i < nullCounts.length; i++) { - if (nullCounts[i] > 0) { -matchingIndexesForNull.add(i); + for (T value : values) { + if (value == null) { Review Comment: Nit: Let's modify the indent spaces to 2 and ensure consistency? > Refactor `ColumnIndexBuilder` > - > > Key: PARQUET-2396 > URL: https://issues.apache.org/jira/browse/PARQUET-2396 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] PARQUET-2396: Refactor `ColumnIndexBuilder` [parquet-mr]
zhangjiashen commented on code in PR #1219: URL: https://github.com/apache/parquet-mr/pull/1219#discussion_r1408788644 ## parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java: ## @@ -298,24 +295,22 @@ public > PrimitiveIterator.OfInt visit(NotEq notEq) { public > PrimitiveIterator.OfInt visit(In in) { Set values = in.getValues(); IntSet matchingIndexesForNull = new IntOpenHashSet(); // for null - Iterator it = values.iterator(); - while(it.hasNext()) { -T value = it.next(); -if (value == null) { - if (nullCounts == null) { -// Searching for nulls so if we don't have null related statistics we have to return all pages -return IndexIterator.all(getPageCount()); - } else { -for (int i = 0; i < nullCounts.length; i++) { - if (nullCounts[i] > 0) { -matchingIndexesForNull.add(i); + for (T value : values) { + if (value == null) { Review Comment: Nit: Let's modify the indent spaces to 2 and ensure consistency? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: Files with inconsistent num_rows and num_values?
Hi Micah, Does the FileMetaData.version [1] provide any information about the writer? What about the num_values in each page header? Is the actual number of values consistent with num_values in the ColumnMetaData? [1] https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L1108 Best, Gang On Wed, Nov 29, 2023 at 2:22 AM Micah Kornfield wrote: > We've recently encountered files that have inconsistencies between the > number of rows specified in the row group [1] and the total number of > values in a column [2] for non-repeated columns (within a file there is > inconsistency between columns but all counts appear to be greater than or > equal to the number of rows). . > > Two questions: > 1. Is anyone aware of parquet implementations that might generate files > like this? > 2. Does anyone have an opinion on the correct interpretation of these > files? Should the files be treated as corrupt, or should the number of > rows be treated as authoritative and any additional data in a column be > truncated? > > It appears different engines make different choices in this case. Arrow > treats this as corruption. Spark seems to allow reading the data. > > Thanks, > Micah > > > [1] > > https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L895 > [2] > > https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L786 >
[jira] [Commented] (PARQUET-2386) More consistent code style in parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790840#comment-17790840 ] ASF GitHub Bot commented on PARQUET-2386: - wgtmac commented on PR #1209: URL: https://github.com/apache/parquet-mr/pull/1209#issuecomment-1831045581 Thanks for the improvement! Could you please take a look at this? @shangxinli @gszadovszky @Fokko > More consistent code style in parquet-mr > > > Key: PARQUET-2386 > URL: https://issues.apache.org/jira/browse/PARQUET-2386 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Reporter: Atour Mousavi Gourabi >Assignee: Atour Mousavi Gourabi >Priority: Major > > The code style conventions used in parquet-mr are generally inconsistent and > unenforced. We might want to consider using linters such as Spotless and a > more extensive .editorconfig configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] PARQUET-2386: More consistent code style in parquet-mr [parquet-mr]
wgtmac commented on PR #1209: URL: https://github.com/apache/parquet-mr/pull/1209#issuecomment-1831045581 Thanks for the improvement! Could you please take a look at this? @shangxinli @gszadovszky @Fokko -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2397) Make use of `isEmpty`
[ https://issues.apache.org/jira/browse/PARQUET-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790822#comment-17790822 ] ASF GitHub Bot commented on PARQUET-2397: - Fokko opened a new pull request, #1220: URL: https://github.com/apache/parquet-mr/pull/1220 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2397 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does > Make use of `isEmpty` > - > > Key: PARQUET-2397 > URL: https://issues.apache.org/jira/browse/PARQUET-2397 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] PARQUET-2397: Make use of `isEmpty` [parquet-mr]
Fokko opened a new pull request, #1220: URL: https://github.com/apache/parquet-mr/pull/1220 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2397 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (PARQUET-2397) Make use of `isEmpty`
Fokko Driesprong created PARQUET-2397: - Summary: Make use of `isEmpty` Key: PARQUET-2397 URL: https://issues.apache.org/jira/browse/PARQUET-2397 Project: Parquet Issue Type: Improvement Affects Versions: 1.13.1 Reporter: Fokko Driesprong Assignee: Fokko Driesprong Fix For: 1.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PARQUET-2396) Refactor `ColumnIndexBuilder`
Fokko Driesprong created PARQUET-2396: - Summary: Refactor `ColumnIndexBuilder` Key: PARQUET-2396 URL: https://issues.apache.org/jira/browse/PARQUET-2396 Project: Parquet Issue Type: Improvement Affects Versions: 1.13.1 Reporter: Fokko Driesprong Assignee: Fokko Driesprong Fix For: 1.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] PARQUET-2396: Refactor `ColumnIndexBuilder` [parquet-mr]
Fokko opened a new pull request, #1219: URL: https://github.com/apache/parquet-mr/pull/1219 Small refactor to improve readability Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-XXX - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2396) Refactor `ColumnIndexBuilder`
[ https://issues.apache.org/jira/browse/PARQUET-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790819#comment-17790819 ] ASF GitHub Bot commented on PARQUET-2396: - Fokko opened a new pull request, #1219: URL: https://github.com/apache/parquet-mr/pull/1219 Small refactor to improve readability Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-XXX - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does > Refactor `ColumnIndexBuilder` > - > > Key: PARQUET-2396 > URL: https://issues.apache.org/jira/browse/PARQUET-2396 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2395) Prefer `singletonList` over `asList`
[ https://issues.apache.org/jira/browse/PARQUET-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790815#comment-17790815 ] ASF GitHub Bot commented on PARQUET-2395: - Fokko opened a new pull request, #1218: URL: https://github.com/apache/parquet-mr/pull/1218 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-XXX - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does > Prefer `singletonList` over `asList` > > > Key: PARQUET-2395 > URL: https://issues.apache.org/jira/browse/PARQUET-2395 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] PARQUET-2395: Prefer `singletonList` over `asList` [parquet-mr]
Fokko opened a new pull request, #1218: URL: https://github.com/apache/parquet-mr/pull/1218 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-XXX - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (PARQUET-2395) Prefer `singletonList`
Fokko Driesprong created PARQUET-2395: - Summary: Prefer `singletonList` Key: PARQUET-2395 URL: https://issues.apache.org/jira/browse/PARQUET-2395 Project: Parquet Issue Type: Improvement Affects Versions: 1.13.1 Reporter: Fokko Driesprong Assignee: Fokko Driesprong Fix For: 1.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PARQUET-2395) Prefer `singletonList` over `asList`
[ https://issues.apache.org/jira/browse/PARQUET-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated PARQUET-2395: -- Summary: Prefer `singletonList` over `asList` (was: Prefer `singletonList`) > Prefer `singletonList` over `asList` > > > Key: PARQUET-2395 > URL: https://issues.apache.org/jira/browse/PARQUET-2395 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PARQUET-2394) Use `computeIfAbsent` in `MessageColumnIO`
Fokko Driesprong created PARQUET-2394: - Summary: Use `computeIfAbsent` in `MessageColumnIO` Key: PARQUET-2394 URL: https://issues.apache.org/jira/browse/PARQUET-2394 Project: Parquet Issue Type: Improvement Affects Versions: 1.13.1 Reporter: Fokko Driesprong Assignee: Fokko Driesprong Fix For: 1.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2394) Use `computeIfAbsent` in `MessageColumnIO`
[ https://issues.apache.org/jira/browse/PARQUET-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790812#comment-17790812 ] ASF GitHub Bot commented on PARQUET-2394: - Fokko opened a new pull request, #1217: URL: https://github.com/apache/parquet-mr/pull/1217 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2394 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does > Use `computeIfAbsent` in `MessageColumnIO` > -- > > Key: PARQUET-2394 > URL: https://issues.apache.org/jira/browse/PARQUET-2394 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] PARQUET-2394: Use `computeIfAbsent` in `MessageColumnIO` [parquet-mr]
Fokko opened a new pull request, #1217: URL: https://github.com/apache/parquet-mr/pull/1217 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2394 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2393) Make `ColumnIOCreatorVisitor` static
[ https://issues.apache.org/jira/browse/PARQUET-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790810#comment-17790810 ] ASF GitHub Bot commented on PARQUET-2393: - Fokko opened a new pull request, #1216: URL: https://github.com/apache/parquet-mr/pull/1216 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-XXX - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does > Make `ColumnIOCreatorVisitor` static > > > Key: PARQUET-2393 > URL: https://issues.apache.org/jira/browse/PARQUET-2393 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PARQUET-2393) Make `ColumnIOCreatorVisitor` static
Fokko Driesprong created PARQUET-2393: - Summary: Make `ColumnIOCreatorVisitor` static Key: PARQUET-2393 URL: https://issues.apache.org/jira/browse/PARQUET-2393 Project: Parquet Issue Type: Improvement Affects Versions: 1.13.1 Reporter: Fokko Driesprong Assignee: Fokko Driesprong Fix For: 1.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] PARQUET-2393: Make `ColumnIOCreatorVisitor` static [parquet-mr]
Fokko opened a new pull request, #1216: URL: https://github.com/apache/parquet-mr/pull/1216 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-XXX - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2392) Remove StringBuilder in `LogicalTypeAnnotation`
[ https://issues.apache.org/jira/browse/PARQUET-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790807#comment-17790807 ] ASF GitHub Bot commented on PARQUET-2392: - Fokko opened a new pull request, #1215: URL: https://github.com/apache/parquet-mr/pull/1215 Make sure you have checked _all_ steps below. StringBuilder only makes sense when you concatenate in a loop. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2392 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does > Remove StringBuilder in `LogicalTypeAnnotation` > --- > > Key: PARQUET-2392 > URL: https://issues.apache.org/jira/browse/PARQUET-2392 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] PARQUET-2392: Remove StringBuilder in `LogicalTypeAnnotation` [parquet-mr]
Fokko opened a new pull request, #1215: URL: https://github.com/apache/parquet-mr/pull/1215 Make sure you have checked _all_ steps below. StringBuilder only makes sense when you concatenate in a loop. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2392 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (PARQUET-2392) Remove StringBuilder in `LogicalTypeAnnotation`
Fokko Driesprong created PARQUET-2392: - Summary: Remove StringBuilder in `LogicalTypeAnnotation` Key: PARQUET-2392 URL: https://issues.apache.org/jira/browse/PARQUET-2392 Project: Parquet Issue Type: Improvement Affects Versions: 1.13.1 Reporter: Fokko Driesprong Assignee: Fokko Driesprong Fix For: 1.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2391) Remove unnecessary unboxing
[ https://issues.apache.org/jira/browse/PARQUET-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790805#comment-17790805 ] ASF GitHub Bot commented on PARQUET-2391: - Fokko opened a new pull request, #1214: URL: https://github.com/apache/parquet-mr/pull/1214 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2391 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does > Remove unnecessary unboxing > --- > > Key: PARQUET-2391 > URL: https://issues.apache.org/jira/browse/PARQUET-2391 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] PARQUET-2391: Remove unnecessary unboxing [parquet-mr]
Fokko opened a new pull request, #1214: URL: https://github.com/apache/parquet-mr/pull/1214 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2391 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (PARQUET-2391) Remove unnecessary unboxing
Fokko Driesprong created PARQUET-2391: - Summary: Remove unnecessary unboxing Key: PARQUET-2391 URL: https://issues.apache.org/jira/browse/PARQUET-2391 Project: Parquet Issue Type: Improvement Affects Versions: 1.13.1 Reporter: Fokko Driesprong Assignee: Fokko Driesprong Fix For: 1.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2390) Replace anonymouse functions with lambda's
[ https://issues.apache.org/jira/browse/PARQUET-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790800#comment-17790800 ] ASF GitHub Bot commented on PARQUET-2390: - Fokko opened a new pull request, #1213: URL: https://github.com/apache/parquet-mr/pull/1213 They are easier to read Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2390 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does > Replace anonymouse functions with lambda's > -- > > Key: PARQUET-2390 > URL: https://issues.apache.org/jira/browse/PARQUET-2390 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] PARQUET-2390: Replace anonymouse functions with lambdas [parquet-mr]
Fokko opened a new pull request, #1213: URL: https://github.com/apache/parquet-mr/pull/1213 They are easier to read Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2390 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (PARQUET-2390) Replace anonymouse functions with lambda's
Fokko Driesprong created PARQUET-2390: - Summary: Replace anonymouse functions with lambda's Key: PARQUET-2390 URL: https://issues.apache.org/jira/browse/PARQUET-2390 Project: Parquet Issue Type: Improvement Affects Versions: 1.13.1 Reporter: Fokko Driesprong Assignee: Fokko Driesprong Fix For: 1.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] PARQUET-2389: Remove redundant initializers [parquet-mr]
Fokko opened a new pull request, #1212: URL: https://github.com/apache/parquet-mr/pull/1212 Just some cleanup Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2389 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (PARQUET-2389) Remove redundant initializers
Fokko Driesprong created PARQUET-2389: - Summary: Remove redundant initializers Key: PARQUET-2389 URL: https://issues.apache.org/jira/browse/PARQUET-2389 Project: Parquet Issue Type: Improvement Affects Versions: 1.13.1 Reporter: Fokko Driesprong Assignee: Fokko Driesprong Fix For: 1.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2389) Remove redundant initializers
[ https://issues.apache.org/jira/browse/PARQUET-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790797#comment-17790797 ] ASF GitHub Bot commented on PARQUET-2389: - Fokko opened a new pull request, #1212: URL: https://github.com/apache/parquet-mr/pull/1212 Just some cleanup Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2389 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does > Remove redundant initializers > - > > Key: PARQUET-2389 > URL: https://issues.apache.org/jira/browse/PARQUET-2389 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2388) Deprecate `CHARSETS` on `PlainValuesWriter`
[ https://issues.apache.org/jira/browse/PARQUET-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790795#comment-17790795 ] ASF GitHub Bot commented on PARQUET-2388: - Fokko opened a new pull request, #1211: URL: https://github.com/apache/parquet-mr/pull/1211 Not being used Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2388 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does > Deprecate `CHARSETS` on `PlainValuesWriter` > --- > > Key: PARQUET-2388 > URL: https://issues.apache.org/jira/browse/PARQUET-2388 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] PARQUET-2388: Deprecate `CHARSETS` on `PlainValuesWriter` [parquet-mr]
Fokko opened a new pull request, #1211: URL: https://github.com/apache/parquet-mr/pull/1211 Not being used Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2388 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (PARQUET-2388) Deprecate `CHARSETS` on `PlainValuesWriter`
Fokko Driesprong created PARQUET-2388: - Summary: Deprecate `CHARSETS` on `PlainValuesWriter` Key: PARQUET-2388 URL: https://issues.apache.org/jira/browse/PARQUET-2388 Project: Parquet Issue Type: Improvement Affects Versions: 1.13.1 Reporter: Fokko Driesprong Assignee: Fokko Driesprong Fix For: 1.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2387) Simplify `hasFieldsIgnored` expression
[ https://issues.apache.org/jira/browse/PARQUET-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790793#comment-17790793 ] ASF GitHub Bot commented on PARQUET-2387: - Fokko opened a new pull request, #1210: URL: https://github.com/apache/parquet-mr/pull/1210 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2387 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does > Simplify `hasFieldsIgnored` expression > -- > > Key: PARQUET-2387 > URL: https://issues.apache.org/jira/browse/PARQUET-2387 > Project: Parquet > Issue Type: Improvement > Components: parquet-thrift >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] PARQUET-2387: Simplify `hasFieldsIgnored` expression [parquet-mr]
Fokko opened a new pull request, #1210: URL: https://github.com/apache/parquet-mr/pull/1210 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2387 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (PARQUET-2387) Simplify `hasFieldsIgnored` expression
Fokko Driesprong created PARQUET-2387: - Summary: Simplify `hasFieldsIgnored` expression Key: PARQUET-2387 URL: https://issues.apache.org/jira/browse/PARQUET-2387 Project: Parquet Issue Type: Improvement Components: parquet-thrift Affects Versions: 1.13.1 Reporter: Fokko Driesprong Assignee: Fokko Driesprong Fix For: 1.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2344) Bump to Thirft 0.19.0
[ https://issues.apache.org/jira/browse/PARQUET-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790791#comment-17790791 ] ASF GitHub Bot commented on PARQUET-2344: - Fokko commented on PR #1192: URL: https://github.com/apache/parquet-mr/pull/1192#issuecomment-1830859619 @wgtmac Thanks for splitting out the format upgrade. Always a good idea to make PRs smaller. I finally fixed all the tests, and this looks good to go to me 👍 > Bump to Thirft 0.19.0 > - > > Key: PARQUET-2344 > URL: https://issues.apache.org/jira/browse/PARQUET-2344 > Project: Parquet > Issue Type: Bug > Components: parquet-format, parquet-mr >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: format-2.10.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] PARQUET-2344: Bump to Thrift 0.19.0 [parquet-mr]
Fokko commented on PR #1192: URL: https://github.com/apache/parquet-mr/pull/1192#issuecomment-1830859619 @wgtmac Thanks for splitting out the format upgrade. Always a good idea to make PRs smaller. I finally fixed all the tests, and this looks good to go to me 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (PARQUET-2296) Bump easymock from 3.4 to 5.1.0
[ https://issues.apache.org/jira/browse/PARQUET-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved PARQUET-2296. --- Resolution: Fixed > Bump easymock from 3.4 to 5.1.0 > --- > > Key: PARQUET-2296 > URL: https://issues.apache.org/jira/browse/PARQUET-2296 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.13.0 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PARQUET-2300) Update jackson-core 2.13.4 to a version without CVE PRISMA-2023-0067
[ https://issues.apache.org/jira/browse/PARQUET-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved PARQUET-2300. --- Assignee: Gang Wu Resolution: Fixed > Update jackson-core 2.13.4 to a version without CVE PRISMA-2023-0067 > > > Key: PARQUET-2300 > URL: https://issues.apache.org/jira/browse/PARQUET-2300 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.13.0 >Reporter: Gianluca Vagnoni >Assignee: Gang Wu >Priority: Major > Fix For: 1.14.0 > > > The library "{*}parquet-jackson{*}" version 1.13.0 and 1.13.1 contains the > vulnerability PRISMA-2023-0067 > ([https://github.com/FasterXML/jackson-core/pull/827)] > ([https://github.com/IBM/ibm-cos-sdk-java/issues/58)] > Please upgrade the shaded library to jackson-core version 2.15.0 to fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PARQUET-2336) Add caching key to CodecFactory
[ https://issues.apache.org/jira/browse/PARQUET-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved PARQUET-2336. --- Resolution: Fixed > Add caching key to CodecFactory > --- > > Key: PARQUET-2336 > URL: https://issues.apache.org/jira/browse/PARQUET-2336 > Project: Parquet > Issue Type: Bug > Components: parquet-hadoop >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PARQUET-2384) Mark toOriginalType as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved PARQUET-2384. --- Resolution: Fixed > Mark toOriginalType as deprecated > - > > Key: PARQUET-2384 > URL: https://issues.apache.org/jira/browse/PARQUET-2384 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PARQUET-2368) Update japicmp to 1.18.1
[ https://issues.apache.org/jira/browse/PARQUET-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved PARQUET-2368. --- Resolution: Fixed > Update japicmp to 1.18.1 > > > Key: PARQUET-2368 > URL: https://issues.apache.org/jira/browse/PARQUET-2368 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PARQUET-2382) Remove the deprecated OriginalType
[ https://issues.apache.org/jira/browse/PARQUET-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated PARQUET-2382: -- Fix Version/s: 2.0.0 (was: 1.14.0) > Remove the deprecated OriginalType > -- > > Key: PARQUET-2382 > URL: https://issues.apache.org/jira/browse/PARQUET-2382 > Project: Parquet > Issue Type: Improvement >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 2.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2384) Mark toOriginalType as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790790#comment-17790790 ] ASF GitHub Bot commented on PARQUET-2384: - Fokko merged PR #1202: URL: https://github.com/apache/parquet-mr/pull/1202 > Mark toOriginalType as deprecated > - > > Key: PARQUET-2384 > URL: https://issues.apache.org/jira/browse/PARQUET-2384 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] Bump org.codehaus.mojo:exec-maven-plugin from 3.1.0 to 3.1.1 [parquet-mr]
Fokko merged PR #1206: URL: https://github.com/apache/parquet-mr/pull/1206 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2384) Mark toOriginalType as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790789#comment-17790789 ] ASF GitHub Bot commented on PARQUET-2384: - Fokko commented on PR #1202: URL: https://github.com/apache/parquet-mr/pull/1202#issuecomment-1830855336 Thanks for the review @wgtmac > Mark toOriginalType as deprecated > - > > Key: PARQUET-2384 > URL: https://issues.apache.org/jira/browse/PARQUET-2384 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.13.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] PARQUET-2384: Mark `toOriginalType` as deprecated [parquet-mr]
Fokko merged PR #1202: URL: https://github.com/apache/parquet-mr/pull/1202 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2384: Mark `toOriginalType` as deprecated [parquet-mr]
Fokko commented on PR #1202: URL: https://github.com/apache/parquet-mr/pull/1202#issuecomment-1830855336 Thanks for the review @wgtmac -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Files with inconsistent num_rows and num_values?
We've recently encountered files that have inconsistencies between the number of rows specified in the row group [1] and the total number of values in a column [2] for non-repeated columns (within a file there is inconsistency between columns but all counts appear to be greater than or equal to the number of rows). . Two questions: 1. Is anyone aware of parquet implementations that might generate files like this? 2. Does anyone have an opinion on the correct interpretation of these files? Should the files be treated as corrupt, or should the number of rows be treated as authoritative and any additional data in a column be truncated? It appears different engines make different choices in this case. Arrow treats this as corruption. Spark seems to allow reading the data. Thanks, Micah [1] https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L895 [2] https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L786
[jira] [Commented] (PARQUET-2386) More consistent code style in parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790682#comment-17790682 ] ASF GitHub Bot commented on PARQUET-2386: - amousavigourabi commented on PR #1209: URL: https://github.com/apache/parquet-mr/pull/1209#issuecomment-1830328306 The `.editorconfig` has been expanded for IntelliJ and is mostly compliant with the Spotless configuration. IntelliJ refactoring and Spotless have some minor disagreements on continuation indents sometimes, which cannot really be resolved at the moment. As it is included in the Maven lifecycle, the Spotless configuration would of course be leading. > More consistent code style in parquet-mr > > > Key: PARQUET-2386 > URL: https://issues.apache.org/jira/browse/PARQUET-2386 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Reporter: Atour Mousavi Gourabi >Assignee: Atour Mousavi Gourabi >Priority: Major > > The code style conventions used in parquet-mr are generally inconsistent and > unenforced. We might want to consider using linters such as Spotless and a > more extensive .editorconfig configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] PARQUET-2386: More consistent code style in parquet-mr [parquet-mr]
amousavigourabi commented on PR #1209: URL: https://github.com/apache/parquet-mr/pull/1209#issuecomment-1830328306 The `.editorconfig` has been expanded for IntelliJ and is mostly compliant with the Spotless configuration. IntelliJ refactoring and Spotless have some minor disagreements on continuation indents sometimes, which cannot really be resolved at the moment. As it is included in the Maven lifecycle, the Spotless configuration would of course be leading. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2386) More consistent code style in parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790681#comment-17790681 ] ASF GitHub Bot commented on PARQUET-2386: - amousavigourabi commented on PR #1209: URL: https://github.com/apache/parquet-mr/pull/1209#issuecomment-1830320974 @wgtmac > More consistent code style in parquet-mr > > > Key: PARQUET-2386 > URL: https://issues.apache.org/jira/browse/PARQUET-2386 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Reporter: Atour Mousavi Gourabi >Assignee: Atour Mousavi Gourabi >Priority: Major > > The code style conventions used in parquet-mr are generally inconsistent and > unenforced. We might want to consider using linters such as Spotless and a > more extensive .editorconfig configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] PARQUET-2386: More consistent code style in parquet-mr [parquet-mr]
amousavigourabi commented on PR #1209: URL: https://github.com/apache/parquet-mr/pull/1209#issuecomment-1830320974 @wgtmac -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2386) More consistent code style in parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790680#comment-17790680 ] ASF GitHub Bot commented on PARQUET-2386: - amousavigourabi opened a new pull request, #1209: URL: https://github.com/apache/parquet-mr/pull/1209 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-XXX - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: - This PR only refactors style, no logic is added or removed in any way shape or form ### Commits - [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does - Adds note in the PR template on the style checks --- This PR contains two commits, the first adds the style checks and configurations, the second applies these changes to the repository. > More consistent code style in parquet-mr > > > Key: PARQUET-2386 > URL: https://issues.apache.org/jira/browse/PARQUET-2386 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Reporter: Atour Mousavi Gourabi >Assignee: Atour Mousavi Gourabi >Priority: Major > > The code style conventions used in parquet-mr are generally inconsistent and > unenforced. We might want to consider using linters such as Spotless and a > more extensive .editorconfig configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] PARQUET-2386: More consistent code style in parquet-mr [parquet-mr]
amousavigourabi opened a new pull request, #1209: URL: https://github.com/apache/parquet-mr/pull/1209 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-XXX - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: - This PR only refactors style, no logic is added or removed in any way shape or form ### Commits - [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does - Adds note in the PR template on the style checks --- This PR contains two commits, the first adds the style checks and configurations, the second applies these changes to the repository. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org