[jira] [Commented] (PARQUET-1901) Add filter null check for ColumnIndex

2020-08-24 Thread Ryan Blue (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183481#comment-17183481
 ] 

Ryan Blue commented on PARQUET-1901:


It isn't clear to me how a filter implementation would handle the filter itself 
being null. It could return a default value to accept/read, but that runs into 
issues when filters like {{not(null)}} are passed in. So I agree with Gabor 
that it makes sense for a null filter to be an exceptional case in the filter 
implementations themselves.

But I would expect a method like {{calculateRowRanges}} to correctly return the 
default {{RowRanges.createSingle(rowCount)}} if that method were passed a null 
value, since it is not actually processing the filter.

For Iceberg, I'm wondering if it wouldn't be easier to implement our own filter 
implementation that produced row ranges and passed them in. That's how we 
filter row groups and I think it has been much easier not needing to convert to 
Parquet filters, which are difficult to work with.

> Add filter null check for ColumnIndex  
> ---
>
> Key: PARQUET-1901
> URL: https://issues.apache.org/jira/browse/PARQUET-1901
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
> Fix For: 1.12.0
>
>
> This Jira is opened for discussion that should we add null checking for the 
> filter when ColumnIndex is enabled. 
> In the ColumnIndexFilter#calculateRowRanges() method, the input parameter 
> 'filter' is assumed to be non-null without checking. It throws NPE when 
> ColumnIndex is enabled(by default) but there is no filter set in the 
> ParquetReadOptions. The call stack is as below. 
> java.lang.NullPointerException
> at 
> org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:961)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readNextFilteredRowGroup(ParquetFileReader.java:891)
> If we don't add, the user might need to choose to call readNextRowGroup() or 
> readFilteredNextRowGroup() accordingly based on filter existence. 
> Thoughts?  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1901) Add filter null check for ColumnIndex

2020-08-24 Thread Xinli Shang (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183352#comment-17183352
 ] 

Xinli Shang commented on PARQUET-1901:
--

Hi [~rdblue], please comment on this if you have different opinions. This is 
found during the ColumnIndex integration to Iceberg. We would need to handle 
the null checking in Iceberg anyway before Parquet 1.12.0.  

> Add filter null check for ColumnIndex  
> ---
>
> Key: PARQUET-1901
> URL: https://issues.apache.org/jira/browse/PARQUET-1901
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
> Fix For: 1.12.0
>
>
> This Jira is opened for discussion that should we add null checking for the 
> filter when ColumnIndex is enabled. 
> In the ColumnIndexFilter#calculateRowRanges() method, the input parameter 
> 'filter' is assumed to be non-null without checking. It throws NPE when 
> ColumnIndex is enabled(by default) but there is no filter set in the 
> ParquetReadOptions. The call stack is as below. 
> java.lang.NullPointerException
> at 
> org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:961)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readNextFilteredRowGroup(ParquetFileReader.java:891)
> If we don't add, the user might need to choose to call readNextRowGroup() or 
> readFilteredNextRowGroup() accordingly based on filter existence. 
> Thoughts?  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1901) Add filter null check for ColumnIndex

2020-08-24 Thread Gabor Szadovszky (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183312#comment-17183312
 ] 

Gabor Szadovszky commented on PARQUET-1901:
---

It is clear we shall handle this case properly. I've quickly checked the other 
filters ({{DictionaryFilter}}, {{StatisticsFilter}} and {{BloomFilterImpl}}) 
and neither handles the case of the filter being {{null}} (meaning they all 
throw NPE). So, I would vote on not checking for the filter being {{null}} in 
{{ColumnIndexFilter}}. Instead, the places where it is invoked shall handle the 
case of a {{null}} filter like 
[here|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L870-L872].

> Add filter null check for ColumnIndex  
> ---
>
> Key: PARQUET-1901
> URL: https://issues.apache.org/jira/browse/PARQUET-1901
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
> Fix For: 1.12.0
>
>
> This Jira is opened for discussion that should we add null checking for the 
> filter when ColumnIndex is enabled. 
> In the ColumnIndexFilter#calculateRowRanges() method, the input parameter 
> 'filter' is assumed to be non-null without checking. It throws NPE when 
> ColumnIndex is enabled(by default) but there is no filter set in the 
> ParquetReadOptions. The call stack is as below. 
> java.lang.NullPointerException
> at 
> org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:961)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readNextFilteredRowGroup(ParquetFileReader.java:891)
> If we don't add, the user might need to choose to call readNextRowGroup() or 
> readFilteredNextRowGroup() accordingly based on filter existence. 
> Thoughts?  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1455) [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf

2020-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183236#comment-17183236
 ] 

ASF GitHub Bot commented on PARQUET-1455:
-

gszadovszky commented on pull request #561:
URL: https://github.com/apache/parquet-mr/pull/561#issuecomment-679108339


   Trying to re-trigger Travis by close-reopen



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf
> 
>
> Key: PARQUET-1455
> URL: https://issues.apache.org/jira/browse/PARQUET-1455
> Project: Parquet
>  Issue Type: Bug
>Reporter: Qinghui Xu
>Assignee: Qinghui Xu
>Priority: Major
>  Labels: pull-request-available
>
> Background - 
> In protobuf enum is more like integers other than string, and is encoded as 
> integer on the wire.
> In Protobuf, each enum value is associated with a number (integer), and 
> people can set enum field using number directly regardless whether the number 
> is associated to an enum value or not. While enum filed is set with a number 
> that does not match any enum value defined in the schema, by using protobuf 
> reflection API (as parquet-protobuf does) to read the enum field we will get 
> a label "UNKNOWN_ENUM__" generated by protobuf reflection. 
> Thus parquet-protobuf will write string "UNKNOWN_ENUM__" 
> into the enum column whenever its protobuf schema does not recognize the 
> number.
>  
> Problematics -
> There are two cases of unknown enum while using parquet-protobuf:
>  1. Protobuf already contains unknown enum when we write it to parquet 
> (sometimes people manipulate enum using numbers), so it will write a label 
> "UNKNOWN_ENUM_*" as string in parquet. And when we read it back to protobuf, 
> we found this "true" unknown value
>  2. Protobuf contains valid value when write to parquet, but the reader uses 
> an outdated proto schema which misses some enum values. So the 
> not-in-old-schema enum values are "unknown" to the reader.
> Current behavior of parquet-proto reader is to reject in both cases with some 
> runtime exception. This does not make sense in case 1, the write part does 
> respect protobuf enum behavior while the read part does not. And case 2 
> should be handled if protobuf user is interested in the number instead of 
> label.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1896) [Maven] parquet-tools build is broken

2020-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183239#comment-17183239
 ] 

ASF GitHub Bot commented on PARQUET-1896:
-

qinghui-xu commented on pull request #809:
URL: https://github.com/apache/parquet-mr/pull/809#issuecomment-679108861


   Yes, @gszadovszky 
   And after some tests with the travis, it seems my analysis is not correct 
(at least partially wrong about the travis reusing sandbox/caches).
   The build in my fork still ignores to compile `parquet-tools` even after I 
added `mvn clean`. 
(https://github.com/apache/parquet-mr/pull/812#issuecomment-679106056)
   We can continue the discussion there.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Maven] parquet-tools build is broken
> -
>
> Key: PARQUET-1896
> URL: https://issues.apache.org/jira/browse/PARQUET-1896
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.12.0
>Reporter: Qinghui Xu
>Assignee: Qinghui Xu
>Priority: Major
>
> There is a compilation error when running `mvn clean install` on the 
> parquet-mr project:
> Environment: macos 10.14.6 (Darwin Kernel Version 18.7.0), maven 3.6.3
> {code:java}
> [ERROR] COMPILATION ERROR :
> [INFO] -
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[21,43]
>  package com.fasterxml.jackson.databind.node does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[29,38]
>  package com.fasterxml.jackson.databind does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[30,43]
>  package com.fasterxml.jackson.databind.node does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[22,38]
>  package com.fasterxml.jackson.databind does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[72,23]
>  cannot find symbol
>   symbol:   class BinaryNode
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,5]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,31]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[125,18]
>  cannot find symbol
>   symbol:   class BinaryNode
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[59,20]
>  cannot find symbol
>   symbol:   class BinaryNode
>   location: class org.apache.parquet.tools.read.SimpleMapRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,7]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class 
> org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,33]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class 
> org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1455) [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf

2020-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183237#comment-17183237
 ] 

ASF GitHub Bot commented on PARQUET-1455:
-

qinghui-xu opened a new pull request #561:
URL: https://github.com/apache/parquet-mr/pull/561


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf
> 
>
> Key: PARQUET-1455
> URL: https://issues.apache.org/jira/browse/PARQUET-1455
> Project: Parquet
>  Issue Type: Bug
>Reporter: Qinghui Xu
>Assignee: Qinghui Xu
>Priority: Major
>  Labels: pull-request-available
>
> Background - 
> In protobuf enum is more like integers other than string, and is encoded as 
> integer on the wire.
> In Protobuf, each enum value is associated with a number (integer), and 
> people can set enum field using number directly regardless whether the number 
> is associated to an enum value or not. While enum filed is set with a number 
> that does not match any enum value defined in the schema, by using protobuf 
> reflection API (as parquet-protobuf does) to read the enum field we will get 
> a label "UNKNOWN_ENUM__" generated by protobuf reflection. 
> Thus parquet-protobuf will write string "UNKNOWN_ENUM__" 
> into the enum column whenever its protobuf schema does not recognize the 
> number.
>  
> Problematics -
> There are two cases of unknown enum while using parquet-protobuf:
>  1. Protobuf already contains unknown enum when we write it to parquet 
> (sometimes people manipulate enum using numbers), so it will write a label 
> "UNKNOWN_ENUM_*" as string in parquet. And when we read it back to protobuf, 
> we found this "true" unknown value
>  2. Protobuf contains valid value when write to parquet, but the reader uses 
> an outdated proto schema which misses some enum values. So the 
> not-in-old-schema enum values are "unknown" to the reader.
> Current behavior of parquet-proto reader is to reject in both cases with some 
> runtime exception. This does not make sense in case 1, the write part does 
> respect protobuf enum behavior while the read part does not. And case 2 
> should be handled if protobuf user is interested in the number instead of 
> label.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1455) [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf

2020-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183235#comment-17183235
 ] 

ASF GitHub Bot commented on PARQUET-1455:
-

gszadovszky closed pull request #561:
URL: https://github.com/apache/parquet-mr/pull/561


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf
> 
>
> Key: PARQUET-1455
> URL: https://issues.apache.org/jira/browse/PARQUET-1455
> Project: Parquet
>  Issue Type: Bug
>Reporter: Qinghui Xu
>Assignee: Qinghui Xu
>Priority: Major
>  Labels: pull-request-available
>
> Background - 
> In protobuf enum is more like integers other than string, and is encoded as 
> integer on the wire.
> In Protobuf, each enum value is associated with a number (integer), and 
> people can set enum field using number directly regardless whether the number 
> is associated to an enum value or not. While enum filed is set with a number 
> that does not match any enum value defined in the schema, by using protobuf 
> reflection API (as parquet-protobuf does) to read the enum field we will get 
> a label "UNKNOWN_ENUM__" generated by protobuf reflection. 
> Thus parquet-protobuf will write string "UNKNOWN_ENUM__" 
> into the enum column whenever its protobuf schema does not recognize the 
> number.
>  
> Problematics -
> There are two cases of unknown enum while using parquet-protobuf:
>  1. Protobuf already contains unknown enum when we write it to parquet 
> (sometimes people manipulate enum using numbers), so it will write a label 
> "UNKNOWN_ENUM_*" as string in parquet. And when we read it back to protobuf, 
> we found this "true" unknown value
>  2. Protobuf contains valid value when write to parquet, but the reader uses 
> an outdated proto schema which misses some enum values. So the 
> not-in-old-schema enum values are "unknown" to the reader.
> Current behavior of parquet-proto reader is to reject in both cases with some 
> runtime exception. This does not make sense in case 1, the write part does 
> respect protobuf enum behavior while the read part does not. And case 2 
> should be handled if protobuf user is interested in the number instead of 
> label.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] qinghui-xu commented on pull request #809: PARQUET-1896: Fix parquet-tools build

2020-08-24 Thread GitBox


qinghui-xu commented on pull request #809:
URL: https://github.com/apache/parquet-mr/pull/809#issuecomment-679108861


   Yes, @gszadovszky 
   And after some tests with the travis, it seems my analysis is not correct 
(at least partially wrong about the travis reusing sandbox/caches).
   The build in my fork still ignores to compile `parquet-tools` even after I 
added `mvn clean`. 
(https://github.com/apache/parquet-mr/pull/812#issuecomment-679106056)
   We can continue the discussion there.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-mr] qinghui-xu opened a new pull request #561: PARQUET-1455: [parquet-protobuf] Handle protobuf enum schema evolution and unknown enum value

2020-08-24 Thread GitBox


qinghui-xu opened a new pull request #561:
URL: https://github.com/apache/parquet-mr/pull/561


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-mr] gszadovszky commented on pull request #561: PARQUET-1455: [parquet-protobuf] Handle protobuf enum schema evolution and unknown enum value

2020-08-24 Thread GitBox


gszadovszky commented on pull request #561:
URL: https://github.com/apache/parquet-mr/pull/561#issuecomment-679108339


   Trying to re-trigger Travis by close-reopen



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-mr] gszadovszky closed pull request #561: PARQUET-1455: [parquet-protobuf] Handle protobuf enum schema evolution and unknown enum value

2020-08-24 Thread GitBox


gszadovszky closed pull request #561:
URL: https://github.com/apache/parquet-mr/pull/561


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-1900) Run mvn clean in CI

2020-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183233#comment-17183233
 ] 

ASF GitHub Bot commented on PARQUET-1900:
-

qinghui-xu commented on pull request #812:
URL: https://github.com/apache/parquet-mr/pull/812#issuecomment-679106056


   Hey, yes, I notice that.
   And even adding `mvn clean` seems not triggering the recompile of 
`parquet-tools`. I've made it 
[here](https://github.com/qinghui-xu/parquet-mr/tree/newbranch) in my fork for 
a test purpose and travis did ignore the compilation of it: 
https://travis-ci.org/github/qinghui-xu/parquet-mr/jobs/719733598
   And by checking the logs, I don't see maven clean plugin being invoked.
   There must be some problems elsewhere.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Run mvn clean in CI
> ---
>
> Key: PARQUET-1900
> URL: https://issues.apache.org/jira/browse/PARQUET-1900
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Qinghui Xu
>Assignee: Qinghui Xu
>Priority: Major
>
> Currently parquet-mr CI does not run `mvn clean`, modules without changes are 
> not recompiled each time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] qinghui-xu commented on pull request #812: PARQUET-1900: Add mvn clean to CI

2020-08-24 Thread GitBox


qinghui-xu commented on pull request #812:
URL: https://github.com/apache/parquet-mr/pull/812#issuecomment-679106056


   Hey, yes, I notice that.
   And even adding `mvn clean` seems not triggering the recompile of 
`parquet-tools`. I've made it 
[here](https://github.com/qinghui-xu/parquet-mr/tree/newbranch) in my fork for 
a test purpose and travis did ignore the compilation of it: 
https://travis-ci.org/github/qinghui-xu/parquet-mr/jobs/719733598
   And by checking the logs, I don't see maven clean plugin being invoked.
   There must be some problems elsewhere.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (PARQUET-1902) Invoke mvn clean in Travis

2020-08-24 Thread Gabor Szadovszky (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky resolved PARQUET-1902.
---
Resolution: Duplicate

> Invoke mvn clean in Travis
> --
>
> Key: PARQUET-1902
> URL: https://issues.apache.org/jira/browse/PARQUET-1902
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: Gabor Szadovszky
>Priority: Major
>
> Currently we do not invoke {{mvn clean}} in the Travis build which may cause 
> undetected issues in our CI. (See PR 
> [#809|https://github.com/apache/parquet-mr/pull/809] for details.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1896) [Maven] parquet-tools build is broken

2020-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183230#comment-17183230
 ] 

ASF GitHub Bot commented on PARQUET-1896:
-

gszadovszky commented on pull request #809:
URL: https://github.com/apache/parquet-mr/pull/809#issuecomment-679104802


   Just realized you've already created JIRA-1900. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Maven] parquet-tools build is broken
> -
>
> Key: PARQUET-1896
> URL: https://issues.apache.org/jira/browse/PARQUET-1896
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.12.0
>Reporter: Qinghui Xu
>Assignee: Qinghui Xu
>Priority: Major
>
> There is a compilation error when running `mvn clean install` on the 
> parquet-mr project:
> Environment: macos 10.14.6 (Darwin Kernel Version 18.7.0), maven 3.6.3
> {code:java}
> [ERROR] COMPILATION ERROR :
> [INFO] -
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[21,43]
>  package com.fasterxml.jackson.databind.node does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[29,38]
>  package com.fasterxml.jackson.databind does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[30,43]
>  package com.fasterxml.jackson.databind.node does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[22,38]
>  package com.fasterxml.jackson.databind does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[72,23]
>  cannot find symbol
>   symbol:   class BinaryNode
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,5]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,31]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[125,18]
>  cannot find symbol
>   symbol:   class BinaryNode
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[59,20]
>  cannot find symbol
>   symbol:   class BinaryNode
>   location: class org.apache.parquet.tools.read.SimpleMapRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,7]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class 
> org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,33]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class 
> org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] gszadovszky commented on pull request #809: PARQUET-1896: Fix parquet-tools build

2020-08-24 Thread GitBox


gszadovszky commented on pull request #809:
URL: https://github.com/apache/parquet-mr/pull/809#issuecomment-679104802


   Just realized you've already created JIRA-1900. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (PARQUET-1896) [Maven] parquet-tools build is broken

2020-08-24 Thread Gabor Szadovszky (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky resolved PARQUET-1896.
---
Resolution: Fixed

> [Maven] parquet-tools build is broken
> -
>
> Key: PARQUET-1896
> URL: https://issues.apache.org/jira/browse/PARQUET-1896
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.12.0
>Reporter: Qinghui Xu
>Assignee: Qinghui Xu
>Priority: Major
>
> There is a compilation error when running `mvn clean install` on the 
> parquet-mr project:
> Environment: macos 10.14.6 (Darwin Kernel Version 18.7.0), maven 3.6.3
> {code:java}
> [ERROR] COMPILATION ERROR :
> [INFO] -
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[21,43]
>  package com.fasterxml.jackson.databind.node does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[29,38]
>  package com.fasterxml.jackson.databind does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[30,43]
>  package com.fasterxml.jackson.databind.node does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[22,38]
>  package com.fasterxml.jackson.databind does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[72,23]
>  cannot find symbol
>   symbol:   class BinaryNode
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,5]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,31]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[125,18]
>  cannot find symbol
>   symbol:   class BinaryNode
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[59,20]
>  cannot find symbol
>   symbol:   class BinaryNode
>   location: class org.apache.parquet.tools.read.SimpleMapRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,7]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class 
> org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,33]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class 
> org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1896) [Maven] parquet-tools build is broken

2020-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183223#comment-17183223
 ] 

ASF GitHub Bot commented on PARQUET-1896:
-

gszadovszky merged pull request #809:
URL: https://github.com/apache/parquet-mr/pull/809


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Maven] parquet-tools build is broken
> -
>
> Key: PARQUET-1896
> URL: https://issues.apache.org/jira/browse/PARQUET-1896
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.12.0
>Reporter: Qinghui Xu
>Assignee: Qinghui Xu
>Priority: Major
>
> There is a compilation error when running `mvn clean install` on the 
> parquet-mr project:
> Environment: macos 10.14.6 (Darwin Kernel Version 18.7.0), maven 3.6.3
> {code:java}
> [ERROR] COMPILATION ERROR :
> [INFO] -
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[21,43]
>  package com.fasterxml.jackson.databind.node does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[29,38]
>  package com.fasterxml.jackson.databind does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[30,43]
>  package com.fasterxml.jackson.databind.node does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[22,38]
>  package com.fasterxml.jackson.databind does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[72,23]
>  cannot find symbol
>   symbol:   class BinaryNode
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,5]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,31]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[125,18]
>  cannot find symbol
>   symbol:   class BinaryNode
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[59,20]
>  cannot find symbol
>   symbol:   class BinaryNode
>   location: class org.apache.parquet.tools.read.SimpleMapRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,7]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class 
> org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,33]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class 
> org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1896) [Maven] parquet-tools build is broken

2020-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183222#comment-17183222
 ] 

ASF GitHub Bot commented on PARQUET-1896:
-

gszadovszky commented on pull request #809:
URL: https://github.com/apache/parquet-mr/pull/809#issuecomment-679102521


   @qinghui-xu, thanks a lot for the investigation and fixing this issue.
   I think, we should add `mvn clean` to the travis conf to ensure issues like 
this one will not occur again. Created 
[PARQUET-1902](https://issues.apache.org/jira/browse/PARQUET-1902) to track 
this. Feel free to comment to the jira.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Maven] parquet-tools build is broken
> -
>
> Key: PARQUET-1896
> URL: https://issues.apache.org/jira/browse/PARQUET-1896
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.12.0
>Reporter: Qinghui Xu
>Assignee: Qinghui Xu
>Priority: Major
>
> There is a compilation error when running `mvn clean install` on the 
> parquet-mr project:
> Environment: macos 10.14.6 (Darwin Kernel Version 18.7.0), maven 3.6.3
> {code:java}
> [ERROR] COMPILATION ERROR :
> [INFO] -
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[21,43]
>  package com.fasterxml.jackson.databind.node does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[29,38]
>  package com.fasterxml.jackson.databind does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[30,43]
>  package com.fasterxml.jackson.databind.node does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[22,38]
>  package com.fasterxml.jackson.databind does not exist
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[72,23]
>  cannot find symbol
>   symbol:   class BinaryNode
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,5]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[108,31]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleRecord.java:[125,18]
>  cannot find symbol
>   symbol:   class BinaryNode
>   location: class org.apache.parquet.tools.read.SimpleRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/read/SimpleMapRecord.java:[59,20]
>  cannot find symbol
>   symbol:   class BinaryNode
>   location: class org.apache.parquet.tools.read.SimpleMapRecord
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,7]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class 
> org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter
> [ERROR] 
> /Users/q.xu/Sources/thirdparty/parquet-mr/parquet-tools/src/main/java/org/apache/parquet/tools/json/JsonRecordFormatter.java:[116,33]
>  cannot find symbol
>   symbol:   class ObjectMapper
>   location: class 
> org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1455) [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf

2020-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183220#comment-17183220
 ] 

ASF GitHub Bot commented on PARQUET-1455:
-

qinghui-xu commented on pull request #561:
URL: https://github.com/apache/parquet-mr/pull/561#issuecomment-679101925


   @Fokko It's already on top of the master branch. Could you retrigger the CI 
by hand somehow?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf
> 
>
> Key: PARQUET-1455
> URL: https://issues.apache.org/jira/browse/PARQUET-1455
> Project: Parquet
>  Issue Type: Bug
>Reporter: Qinghui Xu
>Assignee: Qinghui Xu
>Priority: Major
>  Labels: pull-request-available
>
> Background - 
> In protobuf enum is more like integers other than string, and is encoded as 
> integer on the wire.
> In Protobuf, each enum value is associated with a number (integer), and 
> people can set enum field using number directly regardless whether the number 
> is associated to an enum value or not. While enum filed is set with a number 
> that does not match any enum value defined in the schema, by using protobuf 
> reflection API (as parquet-protobuf does) to read the enum field we will get 
> a label "UNKNOWN_ENUM__" generated by protobuf reflection. 
> Thus parquet-protobuf will write string "UNKNOWN_ENUM__" 
> into the enum column whenever its protobuf schema does not recognize the 
> number.
>  
> Problematics -
> There are two cases of unknown enum while using parquet-protobuf:
>  1. Protobuf already contains unknown enum when we write it to parquet 
> (sometimes people manipulate enum using numbers), so it will write a label 
> "UNKNOWN_ENUM_*" as string in parquet. And when we read it back to protobuf, 
> we found this "true" unknown value
>  2. Protobuf contains valid value when write to parquet, but the reader uses 
> an outdated proto schema which misses some enum values. So the 
> not-in-old-schema enum values are "unknown" to the reader.
> Current behavior of parquet-proto reader is to reject in both cases with some 
> runtime exception. This does not make sense in case 1, the write part does 
> respect protobuf enum behavior while the read part does not. And case 2 
> should be handled if protobuf user is interested in the number instead of 
> label.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] gszadovszky merged pull request #809: PARQUET-1896: Fix parquet-tools build

2020-08-24 Thread GitBox


gszadovszky merged pull request #809:
URL: https://github.com/apache/parquet-mr/pull/809


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-mr] gszadovszky commented on pull request #809: PARQUET-1896: Fix parquet-tools build

2020-08-24 Thread GitBox


gszadovszky commented on pull request #809:
URL: https://github.com/apache/parquet-mr/pull/809#issuecomment-679102521


   @qinghui-xu, thanks a lot for the investigation and fixing this issue.
   I think, we should add `mvn clean` to the travis conf to ensure issues like 
this one will not occur again. Created 
[PARQUET-1902](https://issues.apache.org/jira/browse/PARQUET-1902) to track 
this. Feel free to comment to the jira.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (PARQUET-1902) Invoke mvn clean in Travis

2020-08-24 Thread Gabor Szadovszky (Jira)
Gabor Szadovszky created PARQUET-1902:
-

 Summary: Invoke mvn clean in Travis
 Key: PARQUET-1902
 URL: https://issues.apache.org/jira/browse/PARQUET-1902
 Project: Parquet
  Issue Type: Bug
  Components: parquet-mr
Reporter: Gabor Szadovszky


Currently we do not invoke {{mvn clean}} in the Travis build which may cause 
undetected issues in our CI. (See PR 
[#809|https://github.com/apache/parquet-mr/pull/809] for details.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] qinghui-xu commented on pull request #561: PARQUET-1455: [parquet-protobuf] Handle protobuf enum schema evolution and unknown enum value

2020-08-24 Thread GitBox


qinghui-xu commented on pull request #561:
URL: https://github.com/apache/parquet-mr/pull/561#issuecomment-679101925


   @Fokko It's already on top of the master branch. Could you retrigger the CI 
by hand somehow?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-1900) Run mvn clean in CI

2020-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183016#comment-17183016
 ] 

ASF GitHub Bot commented on PARQUET-1900:
-

Fokko commented on pull request #812:
URL: https://github.com/apache/parquet-mr/pull/812#issuecomment-678961933


   AFAIK there is no caching from previous runs, so each run should be clean 🤔 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Run mvn clean in CI
> ---
>
> Key: PARQUET-1900
> URL: https://issues.apache.org/jira/browse/PARQUET-1900
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Qinghui Xu
>Assignee: Qinghui Xu
>Priority: Major
>
> Currently parquet-mr CI does not run `mvn clean`, modules without changes are 
> not recompiled each time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] Fokko commented on pull request #812: PARQUET-1900: Add mvn clean to CI

2020-08-24 Thread GitBox


Fokko commented on pull request #812:
URL: https://github.com/apache/parquet-mr/pull/812#issuecomment-678961933


   AFAIK there is no caching from previous runs, so each run should be clean 🤔 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-1455) [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf

2020-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183015#comment-17183015
 ] 

ASF GitHub Bot commented on PARQUET-1455:
-

Fokko commented on pull request #561:
URL: https://github.com/apache/parquet-mr/pull/561#issuecomment-678960902


   @qinghui-xu The CI is experiencing some connectivity issues, could you 
rebase against master to retrigger the build?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf
> 
>
> Key: PARQUET-1455
> URL: https://issues.apache.org/jira/browse/PARQUET-1455
> Project: Parquet
>  Issue Type: Bug
>Reporter: Qinghui Xu
>Assignee: Qinghui Xu
>Priority: Major
>  Labels: pull-request-available
>
> Background - 
> In protobuf enum is more like integers other than string, and is encoded as 
> integer on the wire.
> In Protobuf, each enum value is associated with a number (integer), and 
> people can set enum field using number directly regardless whether the number 
> is associated to an enum value or not. While enum filed is set with a number 
> that does not match any enum value defined in the schema, by using protobuf 
> reflection API (as parquet-protobuf does) to read the enum field we will get 
> a label "UNKNOWN_ENUM__" generated by protobuf reflection. 
> Thus parquet-protobuf will write string "UNKNOWN_ENUM__" 
> into the enum column whenever its protobuf schema does not recognize the 
> number.
>  
> Problematics -
> There are two cases of unknown enum while using parquet-protobuf:
>  1. Protobuf already contains unknown enum when we write it to parquet 
> (sometimes people manipulate enum using numbers), so it will write a label 
> "UNKNOWN_ENUM_*" as string in parquet. And when we read it back to protobuf, 
> we found this "true" unknown value
>  2. Protobuf contains valid value when write to parquet, but the reader uses 
> an outdated proto schema which misses some enum values. So the 
> not-in-old-schema enum values are "unknown" to the reader.
> Current behavior of parquet-proto reader is to reject in both cases with some 
> runtime exception. This does not make sense in case 1, the write part does 
> respect protobuf enum behavior while the read part does not. And case 2 
> should be handled if protobuf user is interested in the number instead of 
> label.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] Fokko commented on pull request #561: PARQUET-1455: [parquet-protobuf] Handle protobuf enum schema evolution and unknown enum value

2020-08-24 Thread GitBox


Fokko commented on pull request #561:
URL: https://github.com/apache/parquet-mr/pull/561#issuecomment-678960902


   @qinghui-xu The CI is experiencing some connectivity issues, could you 
rebase against master to retrigger the build?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org