[jira] [Commented] (PARQUET-1901) Add filter null check for ColumnIndex
[ https://issues.apache.org/jira/browse/PARQUET-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183481#comment-17183481 ] Ryan Blue commented on PARQUET-1901: It isn't clear to me how a filter implementation would handle the filter itself being null. It could return a default value to accept/read, but that runs into issues when filters like {{not(null)}} are passed in. So I agree with Gabor that it makes sense for a null filter to be an exceptional case in the filter implementations themselves. But I would expect a method like {{calculateRowRanges}} to correctly return the default {{RowRanges.createSingle(rowCount)}} if that method were passed a null value, since it is not actually processing the filter. For Iceberg, I'm wondering if it wouldn't be easier to implement our own filter implementation that produced row ranges and passed them in. That's how we filter row groups and I think it has been much easier not needing to convert to Parquet filters, which are difficult to work with. > Add filter null check for ColumnIndex > --- > > Key: PARQUET-1901 > URL: https://issues.apache.org/jira/browse/PARQUET-1901 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Xinli Shang >Assignee: Xinli Shang >Priority: Major > Fix For: 1.12.0 > > > This Jira is opened for discussion that should we add null checking for the > filter when ColumnIndex is enabled. > In the ColumnIndexFilter#calculateRowRanges() method, the input parameter > 'filter' is assumed to be non-null without checking. It throws NPE when > ColumnIndex is enabled(by default) but there is no filter set in the > ParquetReadOptions. The call stack is as below. > java.lang.NullPointerException > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81) > at > org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:961) > at > org.apache.parquet.hadoop.ParquetFileReader.readNextFilteredRowGroup(ParquetFileReader.java:891) > If we don't add, the user might need to choose to call readNextRowGroup() or > readFilteredNextRowGroup() accordingly based on filter existence. > Thoughts? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1901) Add filter null check for ColumnIndex
[ https://issues.apache.org/jira/browse/PARQUET-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183352#comment-17183352 ] Xinli Shang commented on PARQUET-1901: -- Hi [~rdblue], please comment on this if you have different opinions. This is found during the ColumnIndex integration to Iceberg. We would need to handle the null checking in Iceberg anyway before Parquet 1.12.0. > Add filter null check for ColumnIndex > --- > > Key: PARQUET-1901 > URL: https://issues.apache.org/jira/browse/PARQUET-1901 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Xinli Shang >Assignee: Xinli Shang >Priority: Major > Fix For: 1.12.0 > > > This Jira is opened for discussion that should we add null checking for the > filter when ColumnIndex is enabled. > In the ColumnIndexFilter#calculateRowRanges() method, the input parameter > 'filter' is assumed to be non-null without checking. It throws NPE when > ColumnIndex is enabled(by default) but there is no filter set in the > ParquetReadOptions. The call stack is as below. > java.lang.NullPointerException > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81) > at > org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:961) > at > org.apache.parquet.hadoop.ParquetFileReader.readNextFilteredRowGroup(ParquetFileReader.java:891) > If we don't add, the user might need to choose to call readNextRowGroup() or > readFilteredNextRowGroup() accordingly based on filter existence. > Thoughts? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1901) Add filter null check for ColumnIndex
[ https://issues.apache.org/jira/browse/PARQUET-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183312#comment-17183312 ] Gabor Szadovszky commented on PARQUET-1901: --- It is clear we shall handle this case properly. I've quickly checked the other filters ({{DictionaryFilter}}, {{StatisticsFilter}} and {{BloomFilterImpl}}) and neither handles the case of the filter being {{null}} (meaning they all throw NPE). So, I would vote on not checking for the filter being {{null}} in {{ColumnIndexFilter}}. Instead, the places where it is invoked shall handle the case of a {{null}} filter like [here|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L870-L872]. > Add filter null check for ColumnIndex > --- > > Key: PARQUET-1901 > URL: https://issues.apache.org/jira/browse/PARQUET-1901 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Xinli Shang >Assignee: Xinli Shang >Priority: Major > Fix For: 1.12.0 > > > This Jira is opened for discussion that should we add null checking for the > filter when ColumnIndex is enabled. > In the ColumnIndexFilter#calculateRowRanges() method, the input parameter > 'filter' is assumed to be non-null without checking. It throws NPE when > ColumnIndex is enabled(by default) but there is no filter set in the > ParquetReadOptions. The call stack is as below. > java.lang.NullPointerException > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81) > at > org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:961) > at > org.apache.parquet.hadoop.ParquetFileReader.readNextFilteredRowGroup(ParquetFileReader.java:891) > If we don't add, the user might need to choose to call readNextRowGroup() or > readFilteredNextRowGroup() accordingly based on filter existence. > Thoughts? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1455) [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183236#comment-17183236 ] ASF GitHub Bot commented on PARQUET-1455: - gszadovszky commented on pull request #561: URL: https://github.com/apache/parquet-mr/pull/561#issuecomment-679108339 Trying to re-trigger Travis by close-reopen This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf > > > Key: PARQUET-1455 > URL: https://issues.apache.org/jira/browse/PARQUET-1455 > Project: Parquet > Issue Type: Bug >Reporter: Qinghui Xu >Assignee: Qinghui Xu >Priority: Major > Labels: pull-request-available > > Background - > In protobuf enum is more like integers other than string, and is encoded as > integer on the wire. > In Protobuf, each enum value is associated with a number (integer), and > people can set enum field using number directly regardless whether the number > is associated to an enum value or not. While enum filed is set with a number > that does not match any enum value defined in the schema, by using protobuf > reflection API (as parquet-protobuf does) to read the enum field we will get > a label "UNKNOWN_ENUM__" generated by protobuf reflection. > Thus parquet-protobuf will write string "UNKNOWN_ENUM__" > into the enum column whenever its protobuf schema does not recognize the > number. > > Problematics - > There are two cases of unknown enum while using parquet-protobuf: > 1. Protobuf already contains unknown enum when we write it to parquet > (sometimes people manipulate enum using numbers), so it will write a label > "UNKNOWN_ENUM_*" as string in parquet. And when we read it back to protobuf, > we found this "true" unknown value > 2. Protobuf contains valid value when write to parquet, but the reader uses > an outdated proto schema which misses some enum values. So the > not-in-old-schema enum values are "unknown" to the reader. > Current behavior of parquet-proto reader is to reject in both cases with some > runtime exception. This does not make sense in case 1, the write part does > respect protobuf enum behavior while the read part does not. And case 2 > should be handled if protobuf user is interested in the number instead of > label. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1896) [Maven] parquet-tools build is broken
[ https://issues.apache.org/jira/browse/PARQUET-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183239#comment-17183239 ] ASF GitHub Bot commented on PARQUET-1896: - qinghui-xu commented on pull request #809: URL: https://github.com/apache/parquet-mr/pull/809#issuecomment-679108861 Yes, @gszadovszky And after some tests with the travis, it seems my analysis is not correct (at least partially wrong about the travis reusing sandbox/caches). The build in my fork still ignores to compile `parquet-tools` even after I added `mvn clean`. (https://github.com/apache/parquet-mr/pull/812#issuecomment-679106056) We can continue the discussion there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Maven] parquet-tools build is broken > - > > Key: PARQUET-1896 > URL: https://issues.apache.org/jira/browse/PARQUET-1896 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.12.0 >Reporter: Qinghui Xu >Assignee: Qinghui Xu >Priority: Major > > There is a compilation error when running `mvn clean install` on the > parquet-mr project: > Environment: macos 10.14.6 (Darwin Kernel Version 18.7.0), maven 3.6.3 > {code:java} > [ERROR] COMPILATION ERROR : > [INFO] - > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[21,43] > package com.fasterxml.jackson.databind.node does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[29,38] > package com.fasterxml.jackson.databind does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[30,43] > package com.fasterxml.jackson.databind.node does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[22,38] > package com.fasterxml.jackson.databind does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[72,23] > cannot find symbol > symbol: class BinaryNode > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,5] > cannot find symbol > symbol: class ObjectMapper > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,31] > cannot find symbol > symbol: class ObjectMapper > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[125,18] > cannot find symbol > symbol: class BinaryNode > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[59,20] > cannot find symbol > symbol: class BinaryNode > location: class org.apache.parquet.tools.read.SimpleMapRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,7] > cannot find symbol > symbol: class ObjectMapper > location: class > org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,33] > cannot find symbol > symbol: class ObjectMapper > location: class > org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1455) [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183237#comment-17183237 ] ASF GitHub Bot commented on PARQUET-1455: - qinghui-xu opened a new pull request #561: URL: https://github.com/apache/parquet-mr/pull/561 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf > > > Key: PARQUET-1455 > URL: https://issues.apache.org/jira/browse/PARQUET-1455 > Project: Parquet > Issue Type: Bug >Reporter: Qinghui Xu >Assignee: Qinghui Xu >Priority: Major > Labels: pull-request-available > > Background - > In protobuf enum is more like integers other than string, and is encoded as > integer on the wire. > In Protobuf, each enum value is associated with a number (integer), and > people can set enum field using number directly regardless whether the number > is associated to an enum value or not. While enum filed is set with a number > that does not match any enum value defined in the schema, by using protobuf > reflection API (as parquet-protobuf does) to read the enum field we will get > a label "UNKNOWN_ENUM__" generated by protobuf reflection. > Thus parquet-protobuf will write string "UNKNOWN_ENUM__" > into the enum column whenever its protobuf schema does not recognize the > number. > > Problematics - > There are two cases of unknown enum while using parquet-protobuf: > 1. Protobuf already contains unknown enum when we write it to parquet > (sometimes people manipulate enum using numbers), so it will write a label > "UNKNOWN_ENUM_*" as string in parquet. And when we read it back to protobuf, > we found this "true" unknown value > 2. Protobuf contains valid value when write to parquet, but the reader uses > an outdated proto schema which misses some enum values. So the > not-in-old-schema enum values are "unknown" to the reader. > Current behavior of parquet-proto reader is to reject in both cases with some > runtime exception. This does not make sense in case 1, the write part does > respect protobuf enum behavior while the read part does not. And case 2 > should be handled if protobuf user is interested in the number instead of > label. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1455) [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183235#comment-17183235 ] ASF GitHub Bot commented on PARQUET-1455: - gszadovszky closed pull request #561: URL: https://github.com/apache/parquet-mr/pull/561 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf > > > Key: PARQUET-1455 > URL: https://issues.apache.org/jira/browse/PARQUET-1455 > Project: Parquet > Issue Type: Bug >Reporter: Qinghui Xu >Assignee: Qinghui Xu >Priority: Major > Labels: pull-request-available > > Background - > In protobuf enum is more like integers other than string, and is encoded as > integer on the wire. > In Protobuf, each enum value is associated with a number (integer), and > people can set enum field using number directly regardless whether the number > is associated to an enum value or not. While enum filed is set with a number > that does not match any enum value defined in the schema, by using protobuf > reflection API (as parquet-protobuf does) to read the enum field we will get > a label "UNKNOWN_ENUM__" generated by protobuf reflection. > Thus parquet-protobuf will write string "UNKNOWN_ENUM__" > into the enum column whenever its protobuf schema does not recognize the > number. > > Problematics - > There are two cases of unknown enum while using parquet-protobuf: > 1. Protobuf already contains unknown enum when we write it to parquet > (sometimes people manipulate enum using numbers), so it will write a label > "UNKNOWN_ENUM_*" as string in parquet. And when we read it back to protobuf, > we found this "true" unknown value > 2. Protobuf contains valid value when write to parquet, but the reader uses > an outdated proto schema which misses some enum values. So the > not-in-old-schema enum values are "unknown" to the reader. > Current behavior of parquet-proto reader is to reject in both cases with some > runtime exception. This does not make sense in case 1, the write part does > respect protobuf enum behavior while the read part does not. And case 2 > should be handled if protobuf user is interested in the number instead of > label. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-mr] qinghui-xu commented on pull request #809: PARQUET-1896: Fix parquet-tools build
qinghui-xu commented on pull request #809: URL: https://github.com/apache/parquet-mr/pull/809#issuecomment-679108861 Yes, @gszadovszky And after some tests with the travis, it seems my analysis is not correct (at least partially wrong about the travis reusing sandbox/caches). The build in my fork still ignores to compile `parquet-tools` even after I added `mvn clean`. (https://github.com/apache/parquet-mr/pull/812#issuecomment-679106056) We can continue the discussion there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-mr] qinghui-xu opened a new pull request #561: PARQUET-1455: [parquet-protobuf] Handle protobuf enum schema evolution and unknown enum value
qinghui-xu opened a new pull request #561: URL: https://github.com/apache/parquet-mr/pull/561 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-mr] gszadovszky commented on pull request #561: PARQUET-1455: [parquet-protobuf] Handle protobuf enum schema evolution and unknown enum value
gszadovszky commented on pull request #561: URL: https://github.com/apache/parquet-mr/pull/561#issuecomment-679108339 Trying to re-trigger Travis by close-reopen This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-mr] gszadovszky closed pull request #561: PARQUET-1455: [parquet-protobuf] Handle protobuf enum schema evolution and unknown enum value
gszadovszky closed pull request #561: URL: https://github.com/apache/parquet-mr/pull/561 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-1900) Run mvn clean in CI
[ https://issues.apache.org/jira/browse/PARQUET-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183233#comment-17183233 ] ASF GitHub Bot commented on PARQUET-1900: - qinghui-xu commented on pull request #812: URL: https://github.com/apache/parquet-mr/pull/812#issuecomment-679106056 Hey, yes, I notice that. And even adding `mvn clean` seems not triggering the recompile of `parquet-tools`. I've made it [here](https://github.com/qinghui-xu/parquet-mr/tree/newbranch) in my fork for a test purpose and travis did ignore the compilation of it: https://travis-ci.org/github/qinghui-xu/parquet-mr/jobs/719733598 And by checking the logs, I don't see maven clean plugin being invoked. There must be some problems elsewhere. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Run mvn clean in CI > --- > > Key: PARQUET-1900 > URL: https://issues.apache.org/jira/browse/PARQUET-1900 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Reporter: Qinghui Xu >Assignee: Qinghui Xu >Priority: Major > > Currently parquet-mr CI does not run `mvn clean`, modules without changes are > not recompiled each time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-mr] qinghui-xu commented on pull request #812: PARQUET-1900: Add mvn clean to CI
qinghui-xu commented on pull request #812: URL: https://github.com/apache/parquet-mr/pull/812#issuecomment-679106056 Hey, yes, I notice that. And even adding `mvn clean` seems not triggering the recompile of `parquet-tools`. I've made it [here](https://github.com/qinghui-xu/parquet-mr/tree/newbranch) in my fork for a test purpose and travis did ignore the compilation of it: https://travis-ci.org/github/qinghui-xu/parquet-mr/jobs/719733598 And by checking the logs, I don't see maven clean plugin being invoked. There must be some problems elsewhere. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (PARQUET-1902) Invoke mvn clean in Travis
[ https://issues.apache.org/jira/browse/PARQUET-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-1902. --- Resolution: Duplicate > Invoke mvn clean in Travis > -- > > Key: PARQUET-1902 > URL: https://issues.apache.org/jira/browse/PARQUET-1902 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Gabor Szadovszky >Priority: Major > > Currently we do not invoke {{mvn clean}} in the Travis build which may cause > undetected issues in our CI. (See PR > [#809|https://github.com/apache/parquet-mr/pull/809] for details.) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1896) [Maven] parquet-tools build is broken
[ https://issues.apache.org/jira/browse/PARQUET-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183230#comment-17183230 ] ASF GitHub Bot commented on PARQUET-1896: - gszadovszky commented on pull request #809: URL: https://github.com/apache/parquet-mr/pull/809#issuecomment-679104802 Just realized you've already created JIRA-1900. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Maven] parquet-tools build is broken > - > > Key: PARQUET-1896 > URL: https://issues.apache.org/jira/browse/PARQUET-1896 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.12.0 >Reporter: Qinghui Xu >Assignee: Qinghui Xu >Priority: Major > > There is a compilation error when running `mvn clean install` on the > parquet-mr project: > Environment: macos 10.14.6 (Darwin Kernel Version 18.7.0), maven 3.6.3 > {code:java} > [ERROR] COMPILATION ERROR : > [INFO] - > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[21,43] > package com.fasterxml.jackson.databind.node does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[29,38] > package com.fasterxml.jackson.databind does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[30,43] > package com.fasterxml.jackson.databind.node does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[22,38] > package com.fasterxml.jackson.databind does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[72,23] > cannot find symbol > symbol: class BinaryNode > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,5] > cannot find symbol > symbol: class ObjectMapper > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,31] > cannot find symbol > symbol: class ObjectMapper > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[125,18] > cannot find symbol > symbol: class BinaryNode > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[59,20] > cannot find symbol > symbol: class BinaryNode > location: class org.apache.parquet.tools.read.SimpleMapRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,7] > cannot find symbol > symbol: class ObjectMapper > location: class > org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,33] > cannot find symbol > symbol: class ObjectMapper > location: class > org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-mr] gszadovszky commented on pull request #809: PARQUET-1896: Fix parquet-tools build
gszadovszky commented on pull request #809: URL: https://github.com/apache/parquet-mr/pull/809#issuecomment-679104802 Just realized you've already created JIRA-1900. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (PARQUET-1896) [Maven] parquet-tools build is broken
[ https://issues.apache.org/jira/browse/PARQUET-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-1896. --- Resolution: Fixed > [Maven] parquet-tools build is broken > - > > Key: PARQUET-1896 > URL: https://issues.apache.org/jira/browse/PARQUET-1896 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.12.0 >Reporter: Qinghui Xu >Assignee: Qinghui Xu >Priority: Major > > There is a compilation error when running `mvn clean install` on the > parquet-mr project: > Environment: macos 10.14.6 (Darwin Kernel Version 18.7.0), maven 3.6.3 > {code:java} > [ERROR] COMPILATION ERROR : > [INFO] - > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[21,43] > package com.fasterxml.jackson.databind.node does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[29,38] > package com.fasterxml.jackson.databind does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[30,43] > package com.fasterxml.jackson.databind.node does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[22,38] > package com.fasterxml.jackson.databind does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[72,23] > cannot find symbol > symbol: class BinaryNode > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,5] > cannot find symbol > symbol: class ObjectMapper > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,31] > cannot find symbol > symbol: class ObjectMapper > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[125,18] > cannot find symbol > symbol: class BinaryNode > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[59,20] > cannot find symbol > symbol: class BinaryNode > location: class org.apache.parquet.tools.read.SimpleMapRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,7] > cannot find symbol > symbol: class ObjectMapper > location: class > org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,33] > cannot find symbol > symbol: class ObjectMapper > location: class > org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1896) [Maven] parquet-tools build is broken
[ https://issues.apache.org/jira/browse/PARQUET-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183223#comment-17183223 ] ASF GitHub Bot commented on PARQUET-1896: - gszadovszky merged pull request #809: URL: https://github.com/apache/parquet-mr/pull/809 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Maven] parquet-tools build is broken > - > > Key: PARQUET-1896 > URL: https://issues.apache.org/jira/browse/PARQUET-1896 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.12.0 >Reporter: Qinghui Xu >Assignee: Qinghui Xu >Priority: Major > > There is a compilation error when running `mvn clean install` on the > parquet-mr project: > Environment: macos 10.14.6 (Darwin Kernel Version 18.7.0), maven 3.6.3 > {code:java} > [ERROR] COMPILATION ERROR : > [INFO] - > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[21,43] > package com.fasterxml.jackson.databind.node does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[29,38] > package com.fasterxml.jackson.databind does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[30,43] > package com.fasterxml.jackson.databind.node does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[22,38] > package com.fasterxml.jackson.databind does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[72,23] > cannot find symbol > symbol: class BinaryNode > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,5] > cannot find symbol > symbol: class ObjectMapper > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,31] > cannot find symbol > symbol: class ObjectMapper > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[125,18] > cannot find symbol > symbol: class BinaryNode > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[59,20] > cannot find symbol > symbol: class BinaryNode > location: class org.apache.parquet.tools.read.SimpleMapRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,7] > cannot find symbol > symbol: class ObjectMapper > location: class > org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,33] > cannot find symbol > symbol: class ObjectMapper > location: class > org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1896) [Maven] parquet-tools build is broken
[ https://issues.apache.org/jira/browse/PARQUET-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183222#comment-17183222 ] ASF GitHub Bot commented on PARQUET-1896: - gszadovszky commented on pull request #809: URL: https://github.com/apache/parquet-mr/pull/809#issuecomment-679102521 @qinghui-xu, thanks a lot for the investigation and fixing this issue. I think, we should add `mvn clean` to the travis conf to ensure issues like this one will not occur again. Created [PARQUET-1902](https://issues.apache.org/jira/browse/PARQUET-1902) to track this. Feel free to comment to the jira. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Maven] parquet-tools build is broken > - > > Key: PARQUET-1896 > URL: https://issues.apache.org/jira/browse/PARQUET-1896 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.12.0 >Reporter: Qinghui Xu >Assignee: Qinghui Xu >Priority: Major > > There is a compilation error when running `mvn clean install` on the > parquet-mr project: > Environment: macos 10.14.6 (Darwin Kernel Version 18.7.0), maven 3.6.3 > {code:java} > [ERROR] COMPILATION ERROR : > [INFO] - > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[21,43] > package com.fasterxml.jackson.databind.node does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[29,38] > package com.fasterxml.jackson.databind does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[30,43] > package com.fasterxml.jackson.databind.node does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[22,38] > package com.fasterxml.jackson.databind does not exist > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[72,23] > cannot find symbol > symbol: class BinaryNode > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,5] > cannot find symbol > symbol: class ObjectMapper > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,31] > cannot find symbol > symbol: class ObjectMapper > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[125,18] > cannot find symbol > symbol: class BinaryNode > location: class org.apache.parquet.tools.read.SimpleRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[59,20] > cannot find symbol > symbol: class BinaryNode > location: class org.apache.parquet.tools.read.SimpleMapRecord > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,7] > cannot find symbol > symbol: class ObjectMapper > location: class > org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter > [ERROR] > /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,33] > cannot find symbol > symbol: class ObjectMapper > location: class > org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1455) [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183220#comment-17183220 ] ASF GitHub Bot commented on PARQUET-1455: - qinghui-xu commented on pull request #561: URL: https://github.com/apache/parquet-mr/pull/561#issuecomment-679101925 @Fokko It's already on top of the master branch. Could you retrigger the CI by hand somehow? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf > > > Key: PARQUET-1455 > URL: https://issues.apache.org/jira/browse/PARQUET-1455 > Project: Parquet > Issue Type: Bug >Reporter: Qinghui Xu >Assignee: Qinghui Xu >Priority: Major > Labels: pull-request-available > > Background - > In protobuf enum is more like integers other than string, and is encoded as > integer on the wire. > In Protobuf, each enum value is associated with a number (integer), and > people can set enum field using number directly regardless whether the number > is associated to an enum value or not. While enum filed is set with a number > that does not match any enum value defined in the schema, by using protobuf > reflection API (as parquet-protobuf does) to read the enum field we will get > a label "UNKNOWN_ENUM__" generated by protobuf reflection. > Thus parquet-protobuf will write string "UNKNOWN_ENUM__" > into the enum column whenever its protobuf schema does not recognize the > number. > > Problematics - > There are two cases of unknown enum while using parquet-protobuf: > 1. Protobuf already contains unknown enum when we write it to parquet > (sometimes people manipulate enum using numbers), so it will write a label > "UNKNOWN_ENUM_*" as string in parquet. And when we read it back to protobuf, > we found this "true" unknown value > 2. Protobuf contains valid value when write to parquet, but the reader uses > an outdated proto schema which misses some enum values. So the > not-in-old-schema enum values are "unknown" to the reader. > Current behavior of parquet-proto reader is to reject in both cases with some > runtime exception. This does not make sense in case 1, the write part does > respect protobuf enum behavior while the read part does not. And case 2 > should be handled if protobuf user is interested in the number instead of > label. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-mr] gszadovszky merged pull request #809: PARQUET-1896: Fix parquet-tools build
gszadovszky merged pull request #809: URL: https://github.com/apache/parquet-mr/pull/809 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-mr] gszadovszky commented on pull request #809: PARQUET-1896: Fix parquet-tools build
gszadovszky commented on pull request #809: URL: https://github.com/apache/parquet-mr/pull/809#issuecomment-679102521 @qinghui-xu, thanks a lot for the investigation and fixing this issue. I think, we should add `mvn clean` to the travis conf to ensure issues like this one will not occur again. Created [PARQUET-1902](https://issues.apache.org/jira/browse/PARQUET-1902) to track this. Feel free to comment to the jira. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (PARQUET-1902) Invoke mvn clean in Travis
Gabor Szadovszky created PARQUET-1902: - Summary: Invoke mvn clean in Travis Key: PARQUET-1902 URL: https://issues.apache.org/jira/browse/PARQUET-1902 Project: Parquet Issue Type: Bug Components: parquet-mr Reporter: Gabor Szadovszky Currently we do not invoke {{mvn clean}} in the Travis build which may cause undetected issues in our CI. (See PR [#809|https://github.com/apache/parquet-mr/pull/809] for details.) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-mr] qinghui-xu commented on pull request #561: PARQUET-1455: [parquet-protobuf] Handle protobuf enum schema evolution and unknown enum value
qinghui-xu commented on pull request #561: URL: https://github.com/apache/parquet-mr/pull/561#issuecomment-679101925 @Fokko It's already on top of the master branch. Could you retrigger the CI by hand somehow? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-1900) Run mvn clean in CI
[ https://issues.apache.org/jira/browse/PARQUET-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183016#comment-17183016 ] ASF GitHub Bot commented on PARQUET-1900: - Fokko commented on pull request #812: URL: https://github.com/apache/parquet-mr/pull/812#issuecomment-678961933 AFAIK there is no caching from previous runs, so each run should be clean 🤔 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Run mvn clean in CI > --- > > Key: PARQUET-1900 > URL: https://issues.apache.org/jira/browse/PARQUET-1900 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Reporter: Qinghui Xu >Assignee: Qinghui Xu >Priority: Major > > Currently parquet-mr CI does not run `mvn clean`, modules without changes are > not recompiled each time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-mr] Fokko commented on pull request #812: PARQUET-1900: Add mvn clean to CI
Fokko commented on pull request #812: URL: https://github.com/apache/parquet-mr/pull/812#issuecomment-678961933 AFAIK there is no caching from previous runs, so each run should be clean 🤔 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-1455) [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183015#comment-17183015 ] ASF GitHub Bot commented on PARQUET-1455: - Fokko commented on pull request #561: URL: https://github.com/apache/parquet-mr/pull/561#issuecomment-678960902 @qinghui-xu The CI is experiencing some connectivity issues, could you rebase against master to retrigger the build? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf > > > Key: PARQUET-1455 > URL: https://issues.apache.org/jira/browse/PARQUET-1455 > Project: Parquet > Issue Type: Bug >Reporter: Qinghui Xu >Assignee: Qinghui Xu >Priority: Major > Labels: pull-request-available > > Background - > In protobuf enum is more like integers other than string, and is encoded as > integer on the wire. > In Protobuf, each enum value is associated with a number (integer), and > people can set enum field using number directly regardless whether the number > is associated to an enum value or not. While enum filed is set with a number > that does not match any enum value defined in the schema, by using protobuf > reflection API (as parquet-protobuf does) to read the enum field we will get > a label "UNKNOWN_ENUM__" generated by protobuf reflection. > Thus parquet-protobuf will write string "UNKNOWN_ENUM__" > into the enum column whenever its protobuf schema does not recognize the > number. > > Problematics - > There are two cases of unknown enum while using parquet-protobuf: > 1. Protobuf already contains unknown enum when we write it to parquet > (sometimes people manipulate enum using numbers), so it will write a label > "UNKNOWN_ENUM_*" as string in parquet. And when we read it back to protobuf, > we found this "true" unknown value > 2. Protobuf contains valid value when write to parquet, but the reader uses > an outdated proto schema which misses some enum values. So the > not-in-old-schema enum values are "unknown" to the reader. > Current behavior of parquet-proto reader is to reject in both cases with some > runtime exception. This does not make sense in case 1, the write part does > respect protobuf enum behavior while the read part does not. And case 2 > should be handled if protobuf user is interested in the number instead of > label. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [parquet-mr] Fokko commented on pull request #561: PARQUET-1455: [parquet-protobuf] Handle protobuf enum schema evolution and unknown enum value
Fokko commented on pull request #561: URL: https://github.com/apache/parquet-mr/pull/561#issuecomment-678960902 @qinghui-xu The CI is experiencing some connectivity issues, could you rebase against master to retrigger the build? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org