[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-07-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546633#comment-16546633
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on issue #1259: DRILL-6353: Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#issuecomment-405589356
 
 
   @xiexingguang Please create new JIRA. PR is a place to discuss code 
modifications.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
> Attachments: alltypes_optional.json, fixedlenDecimal.json
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-07-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545947#comment-16545947
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

xiexingguang commented on issue #1259: DRILL-6353: Upgrade Parquet MR 
dependencies
URL: https://github.com/apache/drill/pull/1259#issuecomment-405437843
 
 
   @arina-ielchiieva varchar push down  function  seems  import for us,  do 
you have a plan to supoort it fully ? if do have  plan, please  inform us .tks


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
> Attachments: alltypes_optional.json, fixedlenDecimal.json
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-07-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545071#comment-16545071
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

arina-ielchiieva commented on issue #1259: DRILL-6353: Upgrade Parquet MR 
dependencies
URL: https://github.com/apache/drill/pull/1259#issuecomment-405213446
 
 
   @xiexingguang no, support for varchar push down is not fully implemented yet.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
> Attachments: alltypes_optional.json, fixedlenDecimal.json
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-07-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544832#comment-16544832
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

xiexingguang commented on issue #1259: DRILL-6353: Upgrade Parquet MR 
dependencies
URL: https://github.com/apache/drill/pull/1259#issuecomment-405155695
 
 
   it means , once Upgrade Parquet MR dependencies , filter push down can 
support  type of varchar ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
> Attachments: alltypes_optional.json, fixedlenDecimal.json
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511965#comment-16511965
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

ilooner closed pull request #1259: DRILL-6353: Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/contrib/storage-hive/hive-exec-shade/pom.xml 
b/contrib/storage-hive/hive-exec-shade/pom.xml
index 6f511adf71..98fd4b8150 100644
--- a/contrib/storage-hive/hive-exec-shade/pom.xml
+++ b/contrib/storage-hive/hive-exec-shade/pom.xml
@@ -31,6 +31,20 @@
   jar
   contrib/hive-storage-plugin/hive-exec-shaded
 
+  
+1.8.3
+  
+
+  
+
+  
+org.apache.parquet
+parquet-hadoop-bundle
+${hive.parquet.version}
+  
+
+  
+
   
 
   org.apache.hive
@@ -68,11 +82,6 @@
 
   
 
-
-
-  org.apache.parquet
-  parquet-column
-
   
 
   
@@ -83,7 +92,7 @@
   
 
   org.apache.hive:hive-exec
-  org.apache.parquet:parquet-column
+  org.apache.parquet:parquet-hadoop-bundle
   commons-codec:commons-codec
   com.fasterxml.jackson.core:jackson-databind
   com.fasterxml.jackson.core:jackson-annotations
@@ -117,6 +126,10 @@
   org.apache.parquet.
   hive.org.apache.parquet.
 
+
+  shaded.parquet.
+  hive.shaded.parquet.
+
 
   org.apache.avro.
   hive.org.apache.avro.
diff --git a/exec/java-exec/pom.xml b/exec/java-exec/pom.xml
index 2205c2f4cd..7701e76165 100644
--- a/exec/java-exec/pom.xml
+++ b/exec/java-exec/pom.xml
@@ -249,92 +249,17 @@
 
   org.apache.parquet
   parquet-hadoop
-  ${parquet.version}
   
-
-  org.apache.hadoop
-  hadoop-client
-
-
-  org.apache.hadoop
-  hadoop-common
-
   
 
 
   org.apache.parquet
   parquet-format
-  2.3.0-incubating
-  
-
-  org.apache.hadoop
-  hadoop-client
-
-
-  org.apache.hadoop
-  hadoop-common
-
-  
 
 
   org.apache.parquet
   parquet-common
   ${parquet.version}
-  
-
-  org.apache.hadoop
-  hadoop-client
-
-
-  org.apache.hadoop
-  hadoop-common
-
-  
-
-
-  org.apache.parquet
-  parquet-jackson
-  ${parquet.version}
-  
-
-  org.apache.hadoop
-  hadoop-client
-
-
-  org.apache.hadoop
-  hadoop-common
-
-  
-
-
-  org.apache.parquet
-  parquet-encoding
-  ${parquet.version}
-  
-
-  org.apache.hadoop
-  hadoop-client
-
-
-  org.apache.hadoop
-  hadoop-common
-
-  
-
-
-  org.apache.parquet
-  parquet-generator
-  ${parquet.version}
-  
-
-  org.apache.hadoop
-  hadoop-client
-
-
-  org.apache.hadoop
-  hadoop-common
-
-  
 
 
   javax.inject
diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetComparisonPredicate.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetComparisonPredicate.java
index 9e561ad364..ebceefb435 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetComparisonPredicate.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetComparisonPredicate.java
@@ -113,7 +113,7 @@ public boolean canDrop(RangeExprEvaluator evaluator) {
   // can drop when left's max < right's min, or right's max < left's min
   final C leftMin = leftStat.genericGetMin();
   final C rightMin = rightStat.genericGetMin();
-  return leftStat.genericGetMax().compareTo(rightMin) < 0 || 
rightStat.genericGetMax().compareTo(leftMin) < 0;
+  return (leftStat.compareMaxToValue(rightMin) < 0) || 
(rightStat.compareMaxToValue(leftMin) < 0);
 }) {
   @Override
   public String toString() {
@@ -132,7 +132,7 @@ public String toString() {
 return new ParquetComparisonPredicate(left, right, (leftStat, 
rightStat) -> {
   // can drop when left's max <= right's min.
   final C rightMin = rightStat.genericGetMin();
-  return leftStat.genericGetMax().compareTo(rightMin) <= 0;
+  return leftStat.compareMaxToValue(rightMin) <= 0;
 });
   }
 
@@ -146,7 +146,7 @@ public 

[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510461#comment-16510461
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

sachouche commented on a change in pull request #1259: DRILL-6353: Upgrade 
Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r194928369
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/PageReader.java
 ##
 @@ -445,25 +450,29 @@ public void clear(){
* @throws IOException An IO related condition
*/
   void resetDefinitionLevelReader(int skipCount) throws IOException {
-if (parentColumnReader.columnDescriptor.getMaxDefinitionLevel() != 0) {
-  throw new UnsupportedOperationException("Unsupoorted Operation");
-}
+
Preconditions.checkState(parentColumnReader.columnDescriptor.getMaxDefinitionLevel()
 == 1);
 
 Review comment:
   Thank you Vlad! 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
> Attachments: alltypes_optional.json, fixedlenDecimal.json
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510347#comment-16510347
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on a change in pull request #1259: DRILL-6353: Upgrade Parquet 
MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r194912341
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/PageReader.java
 ##
 @@ -445,25 +450,29 @@ public void clear(){
* @throws IOException An IO related condition
*/
   void resetDefinitionLevelReader(int skipCount) throws IOException {
-if (parentColumnReader.columnDescriptor.getMaxDefinitionLevel() != 0) {
-  throw new UnsupportedOperationException("Unsupoorted Operation");
-}
+
Preconditions.checkState(parentColumnReader.columnDescriptor.getMaxDefinitionLevel()
 == 1);
 
 Review comment:
   Yes, I included the other fix as well. Please review changes to 
`VarLenBulkPageReader.java`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
> Attachments: alltypes_optional.json, fixedlenDecimal.json
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509940#comment-16509940
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

sachouche commented on a change in pull request #1259: DRILL-6353: Upgrade 
Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r194829048
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/PageReader.java
 ##
 @@ -445,25 +450,29 @@ public void clear(){
* @throws IOException An IO related condition
*/
   void resetDefinitionLevelReader(int skipCount) throws IOException {
-if (parentColumnReader.columnDescriptor.getMaxDefinitionLevel() != 0) {
-  throw new UnsupportedOperationException("Unsupoorted Operation");
-}
+
Preconditions.checkState(parentColumnReader.columnDescriptor.getMaxDefinitionLevel()
 == 1);
 
 Review comment:
   @vrozov,
   
   I believe that max-definition can be either zero or one. Zero if all columns 
are null.
   
   FYI - Did you also include the fix that I made in "VarLenBulkPageReader.java"


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
> Attachments: alltypes_optional.json, fixedlenDecimal.json
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509941#comment-16509941
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

sachouche commented on a change in pull request #1259: DRILL-6353: Upgrade 
Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r194829048
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/PageReader.java
 ##
 @@ -445,25 +450,29 @@ public void clear(){
* @throws IOException An IO related condition
*/
   void resetDefinitionLevelReader(int skipCount) throws IOException {
-if (parentColumnReader.columnDescriptor.getMaxDefinitionLevel() != 0) {
-  throw new UnsupportedOperationException("Unsupoorted Operation");
-}
+
Preconditions.checkState(parentColumnReader.columnDescriptor.getMaxDefinitionLevel()
 == 1);
 
 Review comment:
   @vrozov,
   
   I believe that max-definition can be either zero or one. Zero if all values 
are null.
   
   FYI - Did you also include the fix that I made in "VarLenBulkPageReader.java"


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
> Attachments: alltypes_optional.json, fixedlenDecimal.json
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509643#comment-16509643
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on a change in pull request #1259: DRILL-6353: Upgrade Parquet 
MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r194742812
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/PageReader.java
 ##
 @@ -445,25 +450,29 @@ public void clear(){
* @throws IOException An IO related condition
*/
   void resetDefinitionLevelReader(int skipCount) throws IOException {
-if (parentColumnReader.columnDescriptor.getMaxDefinitionLevel() != 0) {
-  throw new UnsupportedOperationException("Unsupoorted Operation");
-}
+
Preconditions.checkState(parentColumnReader.columnDescriptor.getMaxDefinitionLevel()
 == 1);
 
 Review comment:
   @sachouche Please review fix for DRILL-6447


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
> Attachments: alltypes_optional.json, fixedlenDecimal.json
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508490#comment-16508490
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

parthchandra commented on a change in pull request #1259: DRILL-6353: Upgrade 
Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r194501388
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   I also had an offline chat with Vlad on this one. The problem is that 
Parquet has changed its behaviour and will not give us the stats for Decimal 
when we read footers. 
   We have, therefore, no way of knowing whether Decimal stats are correct or 
not (even if they are correct) unless we try to hack something in Parquet. 
Hacking something in Parquet is not an option since that is exactly what this 
PR is trying to fix !
   Also, we have never supported Decimal in Drill, so we do not have to 
consider backward compatibility. There are some users using Decimal (based on 
posts to the mailing list), but the old implementation never worked reliably so 
this will be an overall improvement for all parties.
   
   +1. And thanks Vlad, Arina for pursuing this one to the end :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: alltypes_optional.json, fixedlenDecimal.json
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508407#comment-16508407
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

arina-ielchiieva commented on a change in pull request #1259: DRILL-6353: 
Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r194481314
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   Vlad thanks for investigating the issue. Since it's Parquet problem, it can 
leave tests to be ignored just please add comment in each of them to indicated 
the root cause. @parthchandra are you ok with this approach?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: alltypes_optional.json, fixedlenDecimal.json
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507448#comment-16507448
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on a change in pull request #1259: DRILL-6353: Upgrade Parquet 
MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r194265374
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   Parquet library behavior for DECIMAL statistics was changed in 
[PARQUET-686](https://issues.apache.org/jira/browse/PARQUET-686) (see [Parquet 
PR #367](https://github.com/apache/parquet-mr/pull/367)). I filed 
[PARQUET-1322](https://issues.apache.org/jira/browse/PARQUET-1322) to track 
statistics availability for DECIMAL types.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: alltypes_optional.json, fixedlenDecimal.json
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506827#comment-16506827
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on a change in pull request #1259: DRILL-6353: Upgrade Parquet 
MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r194215694
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   And for `testDecimalPartitionPruning` statistics for `MANAGER_ID` is not 
available either:
   ```
   {
 "encodingStats" : null,
 "dictionaryPageOffset" : 0,
 "valueCount" : 107,
 "totalSize" : 168,
 "totalUncompressedSize" : 363,
 "statistics" : {
   "max" : null,
   "min" : null,
   "maxBytes" : null,
   "minBytes" : null,
   "empty" : true,
   "numNulls" : -1,
   "numNullsSet" : false
 },
 "firstDataPageOffset" : 5550,
 "type" : "FIXED_LEN_BYTE_ARRAY",
 "path" : [ "MANAGER_ID" ],
 "codec" : "SNAPPY",
 "primitiveType" : {
   "name" : "MANAGER_ID",
   "repetition" : "OPTIONAL",
   "originalType" : "DECIMAL",
   "id" : null,
   "primitive" : true,
   "primitiveTypeName" : "FIXED_LEN_BYTE_ARRAY",
   "decimalMetadata" : {
 "precision" : 6,
 "scale" : 0
   },
   "typeLength" : 3
 },
 "encodings" : [ "PLAIN", "BIT_PACKED", "RLE" ],
 "startingPos" : 5550
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: alltypes_optional.json, fixedlenDecimal.json
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506824#comment-16506824
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on a change in pull request #1259: DRILL-6353: Upgrade Parquet 
MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r194215287
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   The same applies to `testIntervalYearPartitionPruning`: statistics for 
`col_intrvl_yr` is also not available for the same reason:
   ```
   {
 "encodingStats" : null,
 "dictionaryPageOffset" : 0,
 "valueCount" : 6,
 "totalSize" : 81,
 "totalUncompressedSize" : 91,
 "statistics" : {
   "max" : null,
   "min" : null,
   "maxBytes" : null,
   "minBytes" : null,
   "empty" : true,
   "numNulls" : -1,
   "numNullsSet" : false
 },
 "firstDataPageOffset" : 451,
 "type" : "FIXED_LEN_BYTE_ARRAY",
 "path" : [ "col_intrvl_yr" ],
 "primitiveType" : {
   "name" : "col_intrvl_yr",
   "repetition" : "OPTIONAL",
   "originalType" : "INTERVAL",
   "id" : null,
   "primitive" : true,
   "primitiveTypeName" : "FIXED_LEN_BYTE_ARRAY",
   "decimalMetadata" : null,
   "typeLength" : 12
 },
 "codec" : "SNAPPY",
 "encodings" : [ "RLE", "BIT_PACKED", "PLAIN" ],
 "startingPos" : 451
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: alltypes_optional.json, fixedlenDecimal.json
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506822#comment-16506822
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on a change in pull request #1259: DRILL-6353: Upgrade Parquet 
MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r194214952
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   I attached the output of 
`org.apache.parquet.format.converter.ParquetMetadataConverter` in the debug 
mode to [DRILL-6353](https://issues.apache.org/jira/browse/DRILL-6353). As you 
can see there is no statistics available for the `col_intrvl_day`:
   ```
   {
 "encodingStats" : null,
 "dictionaryPageOffset" : 0,
 "valueCount" : 6,
 "totalSize" : 92,
 "totalUncompressedSize" : 91,
 "statistics" : {
   "max" : null,
   "min" : null,
   "maxBytes" : null,
   "minBytes" : null,
   "empty" : true,
   "numNulls" : -1,
   "numNullsSet" : false
 },
 "firstDataPageOffset" : 532,
 "type" : "FIXED_LEN_BYTE_ARRAY",
 "path" : [ "col_intrvl_day" ],
 "primitiveType" : {
   "name" : "col_intrvl_day",
   "repetition" : "OPTIONAL",
   "originalType" : "INTERVAL",
   "id" : null,
   "primitive" : true,
   "primitiveTypeName" : "FIXED_LEN_BYTE_ARRAY",
   "decimalMetadata" : null,
   "typeLength" : 12
 },
 "codec" : "SNAPPY",
 "encodings" : [ "RLE", "BIT_PACKED", "PLAIN" ],
 "startingPos" : 532
   }
   ```
   This is result of parquet fix for "Deprecate type-defined sort ordering for 
INTERVAL 
type"([PARQUET-1064](https://issues.apache.org/jira/browse/PARQUET-1064))


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: alltypes_optional.json
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502508#comment-16502508
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

parthchandra closed pull request #1259: DRILL-6353: Upgrade Parquet MR 
dependencies
URL: https://github.com/apache/drill/pull/1259
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/contrib/storage-hive/hive-exec-shade/pom.xml 
b/contrib/storage-hive/hive-exec-shade/pom.xml
index 6f511adf71..98fd4b8150 100644
--- a/contrib/storage-hive/hive-exec-shade/pom.xml
+++ b/contrib/storage-hive/hive-exec-shade/pom.xml
@@ -31,6 +31,20 @@
   jar
   contrib/hive-storage-plugin/hive-exec-shaded
 
+  
+1.8.3
+  
+
+  
+
+  
+org.apache.parquet
+parquet-hadoop-bundle
+${hive.parquet.version}
+  
+
+  
+
   
 
   org.apache.hive
@@ -68,11 +82,6 @@
 
   
 
-
-
-  org.apache.parquet
-  parquet-column
-
   
 
   
@@ -83,7 +92,7 @@
   
 
   org.apache.hive:hive-exec
-  org.apache.parquet:parquet-column
+  org.apache.parquet:parquet-hadoop-bundle
   commons-codec:commons-codec
   com.fasterxml.jackson.core:jackson-databind
   com.fasterxml.jackson.core:jackson-annotations
@@ -117,6 +126,10 @@
   org.apache.parquet.
   hive.org.apache.parquet.
 
+
+  shaded.parquet.
+  hive.shaded.parquet.
+
 
   org.apache.avro.
   hive.org.apache.avro.
diff --git a/exec/java-exec/pom.xml b/exec/java-exec/pom.xml
index 0d03cc8515..d0c6724b37 100644
--- a/exec/java-exec/pom.xml
+++ b/exec/java-exec/pom.xml
@@ -249,92 +249,17 @@
 
   org.apache.parquet
   parquet-hadoop
-  ${parquet.version}
   
-
-  org.apache.hadoop
-  hadoop-client
-
-
-  org.apache.hadoop
-  hadoop-common
-
   
 
 
   org.apache.parquet
   parquet-format
-  2.3.0-incubating
-  
-
-  org.apache.hadoop
-  hadoop-client
-
-
-  org.apache.hadoop
-  hadoop-common
-
-  
 
 
   org.apache.parquet
   parquet-common
   ${parquet.version}
-  
-
-  org.apache.hadoop
-  hadoop-client
-
-
-  org.apache.hadoop
-  hadoop-common
-
-  
-
-
-  org.apache.parquet
-  parquet-jackson
-  ${parquet.version}
-  
-
-  org.apache.hadoop
-  hadoop-client
-
-
-  org.apache.hadoop
-  hadoop-common
-
-  
-
-
-  org.apache.parquet
-  parquet-encoding
-  ${parquet.version}
-  
-
-  org.apache.hadoop
-  hadoop-client
-
-
-  org.apache.hadoop
-  hadoop-common
-
-  
-
-
-  org.apache.parquet
-  parquet-generator
-  ${parquet.version}
-  
-
-  org.apache.hadoop
-  hadoop-client
-
-
-  org.apache.hadoop
-  hadoop-common
-
-  
 
 
   javax.inject
diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetComparisonPredicates.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetComparisonPredicates.java
index 5ba597c2a1..673d242d6d 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetComparisonPredicates.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetComparisonPredicates.java
@@ -20,274 +20,176 @@
 import org.apache.drill.common.expression.LogicalExpression;
 import org.apache.drill.common.expression.LogicalExpressionBase;
 import org.apache.drill.common.expression.visitors.ExprVisitor;
+import org.apache.drill.exec.expr.fn.FunctionGenerationHelper;
 import org.apache.parquet.column.statistics.Statistics;
 
 import java.util.ArrayList;
 import java.util.Iterator;
 import java.util.List;
+import java.util.function.BiPredicate;
+
+import static 
org.apache.drill.exec.expr.stat.ParquetPredicatesHelper.isNullOrEmpty;
+import static 
org.apache.drill.exec.expr.stat.ParquetPredicatesHelper.isAllNulls;
 
 /**
  * Comparison predicates for parquet filter pushdown.
  */
-public class ParquetComparisonPredicates {
-  public static abstract  class ParquetCompPredicate extends 
LogicalExpressionBase implements ParquetFilterPredicate {
-protected final LogicalExpression left;
-protected final 

[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502509#comment-16502509
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov opened a new pull request #1259: DRILL-6353: Upgrade Parquet MR 
dependencies
URL: https://github.com/apache/drill/pull/1259
 
 
   @parthchandra Please review


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502507#comment-16502507
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

parthchandra commented on issue #1259: DRILL-6353: Upgrade Parquet MR 
dependencies
URL: https://github.com/apache/drill/pull/1259#issuecomment-394860732
 
 
   Based on Arina's analysis, I don't think it is ok to ignore this test 
failure. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501957#comment-16501957
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on issue #1259: DRILL-6353: Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#issuecomment-394756494
 
 
   @arina-ielchiieva Please see [my 
comment](https://github.com/apache/drill/pull/1259#discussion_r190642661)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501875#comment-16501875
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

arina-ielchiieva commented on issue #1259: DRILL-6353: Upgrade Parquet MR 
dependencies
URL: https://github.com/apache/drill/pull/1259#issuecomment-394734756
 
 
   @vrozov so we gonna do with the ignored tests? :) Do you have an explanation 
why they fail?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501848#comment-16501848
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on issue #1259: DRILL-6353: Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#issuecomment-394724067
 
 
   @parthchandra @arina-ielchiieva Please review


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-06-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499236#comment-16499236
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on issue #1259: DRILL-6353: Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#issuecomment-394126286
 
 
   @parthchandra @arina-ielchiieva The PR is ready for the final review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494161#comment-16494161
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on issue #1259: DRILL-6353: Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#issuecomment-392923265
 
 
   The fix for PARQUET-77 is included into 1.10.0, 1.9.0 and 1.8.3 as it is not 
specific for Apache Drill.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494012#comment-16494012
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

parthchandra commented on issue #1259: DRILL-6353: Upgrade Parquet MR 
dependencies
URL: https://github.com/apache/drill/pull/1259#issuecomment-392885535
 
 
   The fix for the stats was part of a big commit to add support for
   ByteBuffers in Parquet (PARQUET-77
   ; commit 6b605a4
   
).
   See the included commit 7bc2a4d
   

   which
   was to fix the overwriting of stats.
   
   
   
   On Thu, May 24, 2018 at 6:55 PM, Vlad Rozov 
   wrote:
   
   > *@vrozov* commented on this pull request.
   > --
   >
   > In exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/
   > TestParquetMetadataCache.java
   > :
   >
   > > @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws 
Exception {
   >  }
   >}
   >
   > +  @Ignore
   >
   > It will be good if you can point to JIRA with the fix that Drill uses to
   > correct statistics. Without JIRA it is not clear what particular fix is
   > used by Drill to workaround bugs in how parquet library handles statistics
   > and for what data types.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > , or mute
   > the thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490939#comment-16490939
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on a change in pull request #1259: DRILL-6353: Upgrade Parquet 
MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r190941859
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   Please check what parquet-tools report when it reads that file back.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490889#comment-16490889
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

arina-ielchiieva commented on a change in pull request #1259: DRILL-6353: 
Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r190935162
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   This is snippet from metadata file. Drill generates it using info from the 
footer.
   
   
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata.java
   
   
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata_V3.java


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490885#comment-16490885
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on a change in pull request #1259: DRILL-6353: Upgrade Parquet 
MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r190933671
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   Where the `columnTypeInfo` comes from? Is it an output of parquet-tools? If 
yes, please provide details of the command line parameters used. If not, please 
check what parquet-tools report.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490874#comment-16490874
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

arina-ielchiieva commented on a change in pull request #1259: DRILL-6353: 
Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r190929789
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   I have investigated why these tests fail. For example, let's take 
`testIntervalDayPartitionPruning`.
   First test creates partitioned table using Drill. Since table is created at 
runtime, new parquet lib is used. Created table contains 4 files, one of them 
contains all nulls.  For this file with nulls, statistics for all types except 
of binary is `num_nulls: 3, min/max not defined`. For binary type it is `no 
stats for this column`. For binary columns without null, statistics is written 
correctly. Did not check when mixed though (but I think it should be fine). In 
previous parquet version, statistics was written correctly when binary column 
contained all null. Maybe this is bug in parquet, maybe in Drill writer. 
   
   Another problem is with metadata file. We do write metadata for binary 
columns into it successfully. Example:
   ```
 "columnTypeInfo" : {
   "`col_intrvl_day`" : {
 "name" : [ "col_intrvl_day" ],
 "primitiveType" : "FIXED_LEN_BYTE_ARRAY",
 "originalType" : "INTERVAL",
 "precision" : 0,
 "scale" : 0,
 "repetitionLevel" : 0,
 "definitionLevel" : 1
   },
   "name" : [ "col_intrvl_day" ],
   "minValue" : "ABoAAACQ4KEB",
   "maxValue" : "ABoAAACQ4KEB",
   "nulls" : 0
   ```
   But when reading it back from file, we read empty strings. Looks like this 
one is Drill bug.
   
   @vrozov  I also have noticed that `ParquetFileReader.readFooter(conf, path, 
NO_FILTER);` is deprecated. If you'll have a chance, please replace it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490872#comment-16490872
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

arina-ielchiieva commented on a change in pull request #1259: DRILL-6353: 
Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r190929789
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   I have investigated why these tests fail. For example, let's take 
`testIntervalDayPartitionPruning`.
   First test creates partitioned table using Drill. Since table is created at 
runtime, new parquet lib is used. Created table contains 4 files, one of them 
contains all nulls.  For this file with nulls, statistics for all types except 
of binary is `num_nulls: 3, min/max not defined`. For binary type it is `no 
stats for this column`. For binary columns without null, statistics is written 
correctly. Did not check when mixed though (but I think it should be fine). In 
previous parquet version, statistics was written correctly. Maybe this is bug 
in parquet, maybe in Drill writer. 
   
   Another problem is with metadata file. We do write metadata for binary 
columns into it successfully. Example:
   ```
 "columnTypeInfo" : {
   "`col_intrvl_day`" : {
 "name" : [ "col_intrvl_day" ],
 "primitiveType" : "FIXED_LEN_BYTE_ARRAY",
 "originalType" : "INTERVAL",
 "precision" : 0,
 "scale" : 0,
 "repetitionLevel" : 0,
 "definitionLevel" : 1
   },
   "name" : [ "col_intrvl_day" ],
   "minValue" : "ABoAAACQ4KEB",
   "maxValue" : "ABoAAACQ4KEB",
   "nulls" : 0
   ```
   But when reading it back from file, we read empty strings. Looks like this 
one is Drill bug.
   
   @vrozov  I also have noticed that `ParquetFileReader.readFooter(conf, path, 
NO_FILTER);` is deprecated. If you'll have a chance, please replace it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490867#comment-16490867
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

arina-ielchiieva commented on a change in pull request #1259: DRILL-6353: 
Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r190929789
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   I have investigated why these tests fail. For example, let's take 
`testIntervalDayPartitionPruning`.
   First test creates partitioned table using Drill. Since table is created at 
runtime, new parquet lib is used. Created table contains 4 files, one of them 
contains all nulls.  For this file with nulls, statistics for all types except 
of binary is `num_nulls: 3, min/max not defined`. For binary type it is `no 
stats for this column`. In previous parquet version, statistics was written 
correctly. Maybe this is bug in parquet, maybe in Drill writer. 
   
   Another problem is with metadata file. We do write metadata for binary 
columns into it successfully. Example:
   ```
 "columnTypeInfo" : {
   "`col_intrvl_day`" : {
 "name" : [ "col_intrvl_day" ],
 "primitiveType" : "FIXED_LEN_BYTE_ARRAY",
 "originalType" : "INTERVAL",
 "precision" : 0,
 "scale" : 0,
 "repetitionLevel" : 0,
 "definitionLevel" : 1
   },
   "name" : [ "col_intrvl_day" ],
   "minValue" : "ABoAAACQ4KEB",
   "maxValue" : "ABoAAACQ4KEB",
   "nulls" : 0
   ```
   But when reading it back from file, we read empty strings. Looks like this 
one is Drill bug.
   
   @vrozov  I also have noticed that `ParquetFileReader.readFooter(conf, path, 
NO_FILTER);`. If you'll have a chance, please replace it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490118#comment-16490118
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on a change in pull request #1259: DRILL-6353: Upgrade Parquet 
MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r190774252
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   It will be good if you can point to JIRA with the fix that Drill uses to 
correct statistics. Without JIRA it is not clear what particular fix is used by 
Drill to workaround bugs in how parquet library handles statistics and for what 
data types.
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489988#comment-16489988
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

parthchandra commented on a change in pull request #1259: DRILL-6353: Upgrade 
Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r190758787
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   Hmm, we need to take a look at this. For a period of two years, files 
written by tools using Parquet libraries were writing incorrect statistics, but 
because Drill had its own build where we had fixed the issue (we found the 
issue in the first place), files written by Drill were correct. A very large 
number of Drill users use the Parquet files produced by Drill and it was 
decided that we cannot penalize them. We provided a migration tool to users to 
tag files produced by Drill. The tool added information in the extra metadata 
in Parquet files to indicate the file was written by Drill and stats from these 
files should be allowed. 
   AFAIK, this should be in the current build of Drill Parquet as well as in 
the Parquet library v 1.8.2 and above. 
   Do you know if the stats corruption that affects these tests is something 
that was fixed in a version after 1.8.2?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489703#comment-16489703
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on a change in pull request #1259: DRILL-6353: Upgrade Parquet 
MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r190715129
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   The tests fail during plan validation as the new version of the parquet 
library ignores wrong statistics for the data types used by queries. Even if 
statistics is wrong for a small portion of parquet files and for the parquet 
files used by the tests it is correct, Drill can't rely on wrong statistics as 
it leads to the wrong query results. Basically, there is a bug in the version 
of the parquet library that Drill currently uses that may cause query result to 
be wrong and this bug is fixed in the new version causing 2 unit tests 
failures. IMO, it is better to upgrade to the new library sooner than later 
even if it will cause slowdown for some queries.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489332#comment-16489332
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on a change in pull request #1259: DRILL-6353: Upgrade Parquet 
MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r190652287
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   The test needs to be fixed as part of a separate JIRA/PR (another option is 
to remove the check for the filter, but IMO it is even less desirable).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489308#comment-16489308
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

arina-ielchiieva commented on a change in pull request #1259: DRILL-6353: 
Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r190648363
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   In this case, I don't think it's a good idea to disable unit tests. You can 
consider asking for help to resolve unit tests failures but not disable them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489281#comment-16489281
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on a change in pull request #1259: DRILL-6353: Upgrade Parquet 
MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r190642661
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   I do not plan to enable the tests back as part of the PR. The test relies on 
wrong statistics and needs to be fixed/modified for the new parquet library. As 
I am not familiar with the functionality it tests, I'll file JIRA to work on 
enabling those tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488724#comment-16488724
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

arina-ielchiieva commented on a change in pull request #1259: DRILL-6353: 
Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r190524422
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   @vvysotskyi no worries, work is still in progress.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488546#comment-16488546
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vvysotskyi commented on a change in pull request #1259: DRILL-6353: Upgrade 
Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r190483417
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ##
 @@ -737,6 +738,7 @@ public void testBooleanPartitionPruning() throws Exception 
{
 }
   }
 
+  @Ignore
 
 Review comment:
   Could you please explain the reason why these tests should be ignored?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479215#comment-16479215
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on issue #1259: DRILL-6353: Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#issuecomment-389909424
 
 
   @arina-ielchiieva Please review


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477139#comment-16477139
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

arina-ielchiieva commented on issue #1259: DRILL-6353: Upgrade Parquet MR 
dependencies
URL: https://github.com/apache/drill/pull/1259#issuecomment-389452379
 
 
   @vrozov sounds like, we are all in consensus on upgrade to the latest 
parquet version. So when all tests pass, please ping us and we'll finish code 
review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476467#comment-16476467
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

parthchandra commented on a change in pull request #1259: DRILL-6353: Upgrade 
Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r188435328
 
 

 ##
 File path: pom.xml
 ##
 @@ -44,7 +44,7 @@
 1.7.6
 18.0
 2
-1.8.1-drill-r0
+1.10.0
 
 Review comment:
   Fair enough. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475889#comment-16475889
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on issue #1259: DRILL-6353: Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#issuecomment-389182912
 
 
   I am still working on fixing the unit and functional tests. The PR is open 
to initiate a discussion on `parquet-mr` version upgrade.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475551#comment-16475551
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

arina-ielchiieva commented on a change in pull request #1259: DRILL-6353: 
Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r188184531
 
 

 ##
 File path: pom.xml
 ##
 @@ -44,7 +44,7 @@
 1.7.6
 18.0
 2
-1.8.1-drill-r0
+1.10.0
 
 Review comment:
   Well, I need this upgrade to implement varchar filter push down, since it 
has been fixed in 1.10.0 but not in 1.8.3. I think if all unit tests pass, 
along with Functional & Advanced we are safe to go.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475381#comment-16475381
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

arina-ielchiieva commented on a change in pull request #1259: DRILL-6353: 
Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r188184531
 
 

 ##
 File path: pom.xml
 ##
 @@ -44,7 +44,7 @@
 1.7.6
 18.0
 2
-1.8.1-drill-r0
+1.10.0
 
 Review comment:
   Well, I need this upgrade to implement varchar filter push down, since it 
has been fixed in 1.10.0 but not in 1.8.3. I think if you unit tests pass, 
along with Functional & Advanced we are safe to go.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475300#comment-16475300
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

ilooner commented on issue #1259: DRILL-6353: Upgrade Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#issuecomment-389049203
 
 
   @vrozov Please fix Travis failures
   
   ```
   Tests in error: 
 TestCTASPartitionFilter.withoutDistribution » UserRemote SYSTEM ERROR: 
Illegal...
 TestCTASPartitionFilter>BaseTestQuery.closeClient:286 » Runtime Exception 
whil...
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475192#comment-16475192
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov commented on a change in pull request #1259: DRILL-6353: Upgrade Parquet 
MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r188148618
 
 

 ##
 File path: pom.xml
 ##
 @@ -44,7 +44,7 @@
 1.7.6
 18.0
 2
-1.8.1-drill-r0
+1.10.0
 
 Review comment:
   `1.8.3` as well as `1.8.1-drill-r0` are supposed to be a patch release on 
top of `1.8.0`. Unfortunately `parquet-mr` does not properly follow semantic 
versioning and have functional and API level changes in the patch releases. On 
top of that, both `1.9.0` and `1.10.0` are not backward compatible with `1.8.0` 
and/or `1.8.3`, so upgrade to `1.8.3` will not help us with the upgrade to 
`1.10.0` later either or make it safer. I'd suggest to pay the price once. 
Additionally, the latest parquet version may have a functionality to be used in 
filter pushdown. @arina-ielchiieva what is your take?
   
   Parquet libraries used by hive are shaded within `drill-hive-exec-shaded`, 
so hive is guarded from the parquet-mr library upgrade.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474762#comment-16474762
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

parthchandra commented on a change in pull request #1259: DRILL-6353: Upgrade 
Parquet MR dependencies
URL: https://github.com/apache/drill/pull/1259#discussion_r188088771
 
 

 ##
 File path: pom.xml
 ##
 @@ -44,7 +44,7 @@
 1.7.6
 18.0
 2
-1.8.1-drill-r0
+1.10.0
 
 Review comment:
   1.10? Is it safer to upgrade to 1.8.3 and then test out 1.9/1.10 before 
upgrading to it?
   Also, with this change the Hive parquet version is 1.8.2. I wonder what 
impact that might have on compatibility?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6353) Upgrade Parquet MR dependencies

2018-05-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474605#comment-16474605
 ] 

ASF GitHub Bot commented on DRILL-6353:
---

vrozov opened a new pull request #1259: DRILL-6353: Upgrade Parquet MR 
dependencies
URL: https://github.com/apache/drill/pull/1259
 
 
   @parthchandra Please review


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Parquet MR dependencies
> ---
>
> Key: DRILL-6353
> URL: https://issues.apache.org/jira/browse/DRILL-6353
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>
> Upgrade from a custom build {{1.8.1-drill-r0}} to Apache release {{1.10.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)