[jira] [Work logged] (HIVE-24551) Hive should include transitive dependencies from calcite after shading it

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24551?focusedWorklogId=525388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-525388
 ]

ASF GitHub Bot logged work on HIVE-24551:
-

Author: ASF GitHub Bot
Created on: 17/Dec/20 05:16
Start Date: 17/Dec/20 05:16
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun commented on pull request #1792:
URL: https://github.com/apache/hive/pull/1792#issuecomment-747210561


   Thank you for pinging me, @sunchao .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 525388)
Time Spent: 50m  (was: 40m)

> Hive should include transitive dependencies from calcite after shading it
> -
>
> Key: HIVE-24551
> URL: https://issues.apache.org/jira/browse/HIVE-24551
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.8
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently as part of effort of shading Guava from Hive, we shade Calcite and 
> exclude all its artifacts from the binary distribution. However, this also 
> removes all its transitive dependencies which will still needed at runtime. 
> Without these, Hive queries will fail with error like:
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/eigenbase/util/property/BooleanProperty
> at 
> org.apache.calcite.util.SaffronProperties.(SaffronProperties.java:66)
> at 
> org.apache.calcite.util.SaffronProperties.instance(SaffronProperties.java:134)
> at org.apache.calcite.util.Util.getDefaultCharset(Util.java:769)
> at 
> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.getDefaultCharset(RelDataTypeFactoryImpl.java:565)
> at 
> org.apache.calcite.sql.type.SqlTypeUtil.addCharsetAndCollation(SqlTypeUtil.java:1070)
> at 
> org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:65)
> at org.apache.calcite.rex.RexBuilder.(RexBuilder.java:114)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:991)
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149)
> at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1069)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1085)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:364)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11138)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
> Caused by: java.lang.ClassNotFoundException: 
> org.eigenbase.util.property.BooleanProperty
> at 

[jira] [Work logged] (HIVE-24551) Hive should include transitive dependencies from calcite after shading it

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24551?focusedWorklogId=525366=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-525366
 ]

ASF GitHub Bot logged work on HIVE-24551:
-

Author: ASF GitHub Bot
Created on: 17/Dec/20 03:13
Start Date: 17/Dec/20 03:13
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #1792:
URL: https://github.com/apache/hive/pull/1792#discussion_r544779939



##
File path: ql/pom.xml
##
@@ -880,6 +880,12 @@
   joda-time:joda-time
   org.apache.calcite:*
   org.apache.calcite.avatica:avatica
+  
+  net.hydromatic:eigenbase-properties
+  org.codehaus.janino:janino
+  org.codehaus.janino:commons-compiler
+  org.pentaho:pentaho-aggdesigner-algorithm

Review comment:
   Yeah that's the downside of it - whenever we upgrade calcite we should 
check its dependencies and update here as well. I'm not sure if there is an 
easy way to include an artifact and all its transitive dependencies in the 
maven-shade-plugin 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 525366)
Time Spent: 40m  (was: 0.5h)

> Hive should include transitive dependencies from calcite after shading it
> -
>
> Key: HIVE-24551
> URL: https://issues.apache.org/jira/browse/HIVE-24551
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.8
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently as part of effort of shading Guava from Hive, we shade Calcite and 
> exclude all its artifacts from the binary distribution. However, this also 
> removes all its transitive dependencies which will still needed at runtime. 
> Without these, Hive queries will fail with error like:
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/eigenbase/util/property/BooleanProperty
> at 
> org.apache.calcite.util.SaffronProperties.(SaffronProperties.java:66)
> at 
> org.apache.calcite.util.SaffronProperties.instance(SaffronProperties.java:134)
> at org.apache.calcite.util.Util.getDefaultCharset(Util.java:769)
> at 
> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.getDefaultCharset(RelDataTypeFactoryImpl.java:565)
> at 
> org.apache.calcite.sql.type.SqlTypeUtil.addCharsetAndCollation(SqlTypeUtil.java:1070)
> at 
> org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:65)
> at org.apache.calcite.rex.RexBuilder.(RexBuilder.java:114)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:991)
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149)
> at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1069)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1085)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:364)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11138)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
> 

[jira] [Work logged] (HIVE-24551) Hive should include transitive dependencies from calcite after shading it

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24551?focusedWorklogId=525358=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-525358
 ]

ASF GitHub Bot logged work on HIVE-24551:
-

Author: ASF GitHub Bot
Created on: 17/Dec/20 02:50
Start Date: 17/Dec/20 02:50
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #1792:
URL: https://github.com/apache/hive/pull/1792#discussion_r544772225



##
File path: ql/pom.xml
##
@@ -880,6 +880,12 @@
   joda-time:joda-time
   org.apache.calcite:*
   org.apache.calcite.avatica:avatica
+  
+  net.hydromatic:eigenbase-properties
+  org.codehaus.janino:janino
+  org.codehaus.janino:commons-compiler
+  org.pentaho:pentaho-aggdesigner-algorithm

Review comment:
   One question. Will this be flaky? Once calcite changes transitive 
dependencies, seems we also need to update this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 525358)
Time Spent: 0.5h  (was: 20m)

> Hive should include transitive dependencies from calcite after shading it
> -
>
> Key: HIVE-24551
> URL: https://issues.apache.org/jira/browse/HIVE-24551
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.8
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently as part of effort of shading Guava from Hive, we shade Calcite and 
> exclude all its artifacts from the binary distribution. However, this also 
> removes all its transitive dependencies which will still needed at runtime. 
> Without these, Hive queries will fail with error like:
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/eigenbase/util/property/BooleanProperty
> at 
> org.apache.calcite.util.SaffronProperties.(SaffronProperties.java:66)
> at 
> org.apache.calcite.util.SaffronProperties.instance(SaffronProperties.java:134)
> at org.apache.calcite.util.Util.getDefaultCharset(Util.java:769)
> at 
> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.getDefaultCharset(RelDataTypeFactoryImpl.java:565)
> at 
> org.apache.calcite.sql.type.SqlTypeUtil.addCharsetAndCollation(SqlTypeUtil.java:1070)
> at 
> org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:65)
> at org.apache.calcite.rex.RexBuilder.(RexBuilder.java:114)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:991)
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149)
> at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1069)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1085)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:364)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11138)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 

[jira] [Work logged] (HIVE-24551) Hive should include transitive dependencies from calcite after shading it

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24551?focusedWorklogId=525354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-525354
 ]

ASF GitHub Bot logged work on HIVE-24551:
-

Author: ASF GitHub Bot
Created on: 17/Dec/20 02:36
Start Date: 17/Dec/20 02:36
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1792:
URL: https://github.com/apache/hive/pull/1792#issuecomment-747165626


   cc @viirya @dongjoon-hyun @szehon-ho 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 525354)
Time Spent: 20m  (was: 10m)

> Hive should include transitive dependencies from calcite after shading it
> -
>
> Key: HIVE-24551
> URL: https://issues.apache.org/jira/browse/HIVE-24551
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.8
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently as part of effort of shading Guava from Hive, we shade Calcite and 
> exclude all its artifacts from the binary distribution. However, this also 
> removes all its transitive dependencies which will still needed at runtime. 
> Without these, Hive queries will fail with error like:
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/eigenbase/util/property/BooleanProperty
> at 
> org.apache.calcite.util.SaffronProperties.(SaffronProperties.java:66)
> at 
> org.apache.calcite.util.SaffronProperties.instance(SaffronProperties.java:134)
> at org.apache.calcite.util.Util.getDefaultCharset(Util.java:769)
> at 
> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.getDefaultCharset(RelDataTypeFactoryImpl.java:565)
> at 
> org.apache.calcite.sql.type.SqlTypeUtil.addCharsetAndCollation(SqlTypeUtil.java:1070)
> at 
> org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:65)
> at org.apache.calcite.rex.RexBuilder.(RexBuilder.java:114)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:991)
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149)
> at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1069)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1085)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:364)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11138)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
> Caused by: java.lang.ClassNotFoundException: 
> org.eigenbase.util.property.BooleanProperty
> at 

[jira] [Updated] (HIVE-24551) Hive should include transitive dependencies from calcite after shading it

2020-12-16 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-24551:

Description: 
Currently as part of effort of shading Guava from Hive, we shade Calcite and 
exclude all its artifacts from the binary distribution. However, this also 
removes all its transitive dependencies which will still needed at runtime. 
Without these, Hive queries will fail with error like:
{code}
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/eigenbase/util/property/BooleanProperty
at 
org.apache.calcite.util.SaffronProperties.(SaffronProperties.java:66)
at 
org.apache.calcite.util.SaffronProperties.instance(SaffronProperties.java:134)
at org.apache.calcite.util.Util.getDefaultCharset(Util.java:769)
at 
org.apache.calcite.rel.type.RelDataTypeFactoryImpl.getDefaultCharset(RelDataTypeFactoryImpl.java:565)
at 
org.apache.calcite.sql.type.SqlTypeUtil.addCharsetAndCollation(SqlTypeUtil.java:1070)
at 
org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:65)
at org.apache.calcite.rex.RexBuilder.(RexBuilder.java:114)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:991)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1069)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1085)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:364)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11138)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.lang.ClassNotFoundException: 
org.eigenbase.util.property.BooleanProperty
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 33 more
{code}

This happens in branch-2.3 but the same thing might happen in master as well - 
although it uses calcite 1.21 which has changed a lot from 1.10 that branch-2.3 
is using. For instance, in calcite 1.21 it no longer depends on org.eigenbase


  was:
Currently as part of effort of shading Guava from Hive, we shade Calcite and 
exclude all its artifacts from the binary distribution. However, this also 
removes all its transitive dependencies which will still needed at runtime. 
Without these, Hive queries will fail with error like:
{code}
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/eigenbase/util/property/BooleanProperty
at 
org.apache.calcite.util.SaffronProperties.(SaffronProperties.java:66)
at 
org.apache.calcite.util.SaffronProperties.instance(SaffronProperties.java:134)
at org.apache.calcite.util.Util.getDefaultCharset(Util.java:769)
at 
org.apache.calcite.rel.type.RelDataTypeFactoryImpl.getDefaultCharset(RelDataTypeFactoryImpl.java:565)
at 
org.apache.calcite.sql.type.SqlTypeUtil.addCharsetAndCollation(SqlTypeUtil.java:1070)
at 
org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:65)
at 

[jira] [Work logged] (HIVE-24551) Hive should include transitive dependencies from calcite after shading it

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24551?focusedWorklogId=525353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-525353
 ]

ASF GitHub Bot logged work on HIVE-24551:
-

Author: ASF GitHub Bot
Created on: 17/Dec/20 02:31
Start Date: 17/Dec/20 02:31
Worklog Time Spent: 10m 
  Work Description: sunchao opened a new pull request #1792:
URL: https://github.com/apache/hive/pull/1792


   
   
   ### What changes were proposed in this pull request?
   
   
   This includes `calcite-core`'s transitive dependencies:
   - net.hydromatic:eigenbase-properties
   - org.codehaus.janino:janino
   - org.codehaus.janino:commons-compiler
   - org.pentaho:pentaho-aggdesigner-algorithm
   
in the fat `hive-exec` jar. 
   
   ### Why are the changes needed?
   
   
   Currently as part of effort of shading Guava from Hive, we shade Calcite and 
exclude all its artifacts from the binary distribution. However, this also 
removes all its transitive dependencies which will still needed at runtime. 
Without these, Hive queries will fail with error like:
   
   ```
   Exception in thread "main" java.lang.NoClassDefFoundError: 
org/eigenbase/util/property/BooleanProperty
   at 
org.apache.calcite.util.SaffronProperties.(SaffronProperties.java:66)
   at 
org.apache.calcite.util.SaffronProperties.instance(SaffronProperties.java:134)
   at org.apache.calcite.util.Util.getDefaultCharset(Util.java:769)
   at 
org.apache.calcite.rel.type.RelDataTypeFactoryImpl.getDefaultCharset(RelDataTypeFactoryImpl.java:565)
   at 
org.apache.calcite.sql.type.SqlTypeUtil.addCharsetAndCollation(SqlTypeUtil.java:1070)
   at 
org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:65)
   at org.apache.calcite.rex.RexBuilder.(RexBuilder.java:114)
   at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:991)
   at 
org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149)
   at 
org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106)
   at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1069)
   at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1085)
   at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:364)
   at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11138)
   at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
   at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
   at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
   at 
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
   at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
   at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:498)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
   Caused by: java.lang.ClassNotFoundException: 
org.eigenbase.util.property.BooleanProperty
   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
   ... 33 more
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   Verified that the classes are included in the `hive-exec` jars with the 
change. I also manually tested this by launching a HS2 and run a simple query 
against it. It passed while previous was failing due to the above error.



This is an automated 

[jira] [Updated] (HIVE-24551) Hive should include transitive dependencies from calcite after shading it

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24551:
--
Labels: pull-request-available  (was: )

> Hive should include transitive dependencies from calcite after shading it
> -
>
> Key: HIVE-24551
> URL: https://issues.apache.org/jira/browse/HIVE-24551
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.8
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently as part of effort of shading Guava from Hive, we shade Calcite and 
> exclude all its artifacts from the binary distribution. However, this also 
> removes all its transitive dependencies which will still needed at runtime. 
> Without these, Hive queries will fail with error like:
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/eigenbase/util/property/BooleanProperty
> at 
> org.apache.calcite.util.SaffronProperties.(SaffronProperties.java:66)
> at 
> org.apache.calcite.util.SaffronProperties.instance(SaffronProperties.java:134)
> at org.apache.calcite.util.Util.getDefaultCharset(Util.java:769)
> at 
> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.getDefaultCharset(RelDataTypeFactoryImpl.java:565)
> at 
> org.apache.calcite.sql.type.SqlTypeUtil.addCharsetAndCollation(SqlTypeUtil.java:1070)
> at 
> org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:65)
> at org.apache.calcite.rex.RexBuilder.(RexBuilder.java:114)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:991)
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149)
> at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1069)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1085)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:364)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11138)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
> Caused by: java.lang.ClassNotFoundException: 
> org.eigenbase.util.property.BooleanProperty
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> ... 33 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24551) Hive should include transitive dependencies from calcite after shading it

2020-12-16 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-24551:

Affects Version/s: 2.3.8

> Hive should include transitive dependencies from calcite after shading it
> -
>
> Key: HIVE-24551
> URL: https://issues.apache.org/jira/browse/HIVE-24551
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.8
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Blocker
>
> Currently as part of effort of shading Guava from Hive, we shade Calcite and 
> exclude all its artifacts from the binary distribution. However, this also 
> removes all its transitive dependencies which will still needed at runtime. 
> Without these, Hive queries will fail with error like:
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/eigenbase/util/property/BooleanProperty
> at 
> org.apache.calcite.util.SaffronProperties.(SaffronProperties.java:66)
> at 
> org.apache.calcite.util.SaffronProperties.instance(SaffronProperties.java:134)
> at org.apache.calcite.util.Util.getDefaultCharset(Util.java:769)
> at 
> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.getDefaultCharset(RelDataTypeFactoryImpl.java:565)
> at 
> org.apache.calcite.sql.type.SqlTypeUtil.addCharsetAndCollation(SqlTypeUtil.java:1070)
> at 
> org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:65)
> at org.apache.calcite.rex.RexBuilder.(RexBuilder.java:114)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:991)
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149)
> at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1069)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1085)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:364)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11138)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
> Caused by: java.lang.ClassNotFoundException: 
> org.eigenbase.util.property.BooleanProperty
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> ... 33 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24551) Hive should include transitive dependencies from calcite after shading it

2020-12-16 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned HIVE-24551:
---


> Hive should include transitive dependencies from calcite after shading it
> -
>
> Key: HIVE-24551
> URL: https://issues.apache.org/jira/browse/HIVE-24551
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Blocker
>
> Currently as part of effort of shading Guava from Hive, we shade Calcite and 
> exclude all its artifacts from the binary distribution. However, this also 
> removes all its transitive dependencies which will still needed at runtime. 
> Without these, Hive queries will fail with error like:
> {code}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/eigenbase/util/property/BooleanProperty
> at 
> org.apache.calcite.util.SaffronProperties.(SaffronProperties.java:66)
> at 
> org.apache.calcite.util.SaffronProperties.instance(SaffronProperties.java:134)
> at org.apache.calcite.util.Util.getDefaultCharset(Util.java:769)
> at 
> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.getDefaultCharset(RelDataTypeFactoryImpl.java:565)
> at 
> org.apache.calcite.sql.type.SqlTypeUtil.addCharsetAndCollation(SqlTypeUtil.java:1070)
> at 
> org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:65)
> at org.apache.calcite.rex.RexBuilder.(RexBuilder.java:114)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:991)
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149)
> at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1069)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1085)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:364)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11138)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
> Caused by: java.lang.ClassNotFoundException: 
> org.eigenbase.util.property.BooleanProperty
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> ... 33 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24267) RetryingClientTimeBased should always perform first invocation

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24267?focusedWorklogId=525337=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-525337
 ]

ASF GitHub Bot logged work on HIVE-24267:
-

Author: ASF GitHub Bot
Created on: 17/Dec/20 00:54
Start Date: 17/Dec/20 00:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1573:
URL: https://github.com/apache/hive/pull/1573#issuecomment-747133708


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 525337)
Time Spent: 0.5h  (was: 20m)

> RetryingClientTimeBased should always perform first invocation
> --
>
> Key: HIVE-24267
> URL: https://issues.apache.org/jira/browse/HIVE-24267
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24267.01.patch, HIVE-24267.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24254) Remove setOwner call in ReplChangeManager

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24254?focusedWorklogId=525336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-525336
 ]

ASF GitHub Bot logged work on HIVE-24254:
-

Author: ASF GitHub Bot
Created on: 17/Dec/20 00:54
Start Date: 17/Dec/20 00:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1567:
URL: https://github.com/apache/hive/pull/1567


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 525336)
Time Spent: 40m  (was: 0.5h)

> Remove setOwner call in ReplChangeManager
> -
>
> Key: HIVE-24254
> URL: https://issues.apache.org/jira/browse/HIVE-24254
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24254.01.patch, HIVE-24254.02.patch, 
> HIVE-24254.03.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2020-12-16 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250704#comment-17250704
 ] 

Vihang Karajgaonkar commented on HIVE-24543:


I have created a design doc which can be reviewed here: 
https://cwiki.apache.org/confluence/display/Hive/Support+SAML+2.0+authentication+mode

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24549) TxnManager should not be shared across queries

2020-12-16 Thread John Sherman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sherman reassigned HIVE-24549:
---

Assignee: (was: John Sherman)

> TxnManager should not be shared across queries
> --
>
> Key: HIVE-24549
> URL: https://issues.apache.org/jira/browse/HIVE-24549
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: John Sherman
>Priority: Major
>
> There are various sections of code that assume the DbTxnManager is not shared 
> across concurrent queries in a session.
>  Such as (which gets invoked during closeOperation):
>  
> [https://github.com/apache/hive/blob/3f5e01cae5b65dde7edb3fbde8ebe70c1d02f6cf/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L868-L885]
> {code:java}
>// is usually called after close() to commit or rollback a query and end 
> the driver life cycle.
>   // do not understand why it is needed and wonder if it could be combined 
> with close.
>   @Override
>   public void destroy() {
> driverState.lock();
> try {
>   // in the cancel case where the driver state is INTERRUPTED, destroy 
> will be deferred to
>   // the query process
>   if (driverState.isDestroyed()) {
> return;
>   } else {
> driverState.descroyed();
>   }
> } finally {
>   driverState.unlock();
> }
> driverTxnHandler.destroy();
>   }
> {code}
> The problematic part is the: driverTxnHandler.destroy() which looks like:
> {code:java}
>  void destroy() {
>boolean isTxnOpen =
>  driverContext != null &&
>  driverContext.getTxnManager() != null &&
>  driverContext.getTxnManager().isTxnOpen();
>release(!hiveLocks.isEmpty() || isTxnOpen);
>  }
> {code}
> What happens is (rough sketch):
>  Q1 - starts operation, acquires txn, does operation, closes txn/cleans up 
> txn info, starts fetching data
>  Q2 - starts operation, acquire txn
>  Q1 - calls close operation which in turn calls destroy which sees the Q2s 
> transaction information and cleans it up.
>  Q2 - proceeds and fails in splitGeneration when it no longer can find its 
> Valid*TxnIdList information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24550) Cleanup only transaction information for the current DriverContext

2020-12-16 Thread John Sherman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sherman reassigned HIVE-24550:
---


> Cleanup only transaction information for the current DriverContext
> --
>
> Key: HIVE-24550
> URL: https://issues.apache.org/jira/browse/HIVE-24550
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>
> Long term solution would be: https://issues.apache.org/jira/browse/HIVE-24549
> Short term solution for the common usage pattern described in HIVE-24549 is 
> to ensure the current driverContext queryId matches the TxnManagers queryId.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24549) TxnManager should not be shared across queries

2020-12-16 Thread John Sherman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250693#comment-17250693
 ] 

John Sherman commented on HIVE-24549:
-

cc: [~pvargacl] did some investigation and was able to create a nice repro test

> TxnManager should not be shared across queries
> --
>
> Key: HIVE-24549
> URL: https://issues.apache.org/jira/browse/HIVE-24549
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>
> There are various sections of code that assume the DbTxnManager is not shared 
> across concurrent queries in a session.
>  Such as (which gets invoked during closeOperation):
>  
> [https://github.com/apache/hive/blob/3f5e01cae5b65dde7edb3fbde8ebe70c1d02f6cf/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L868-L885]
> {code:java}
>// is usually called after close() to commit or rollback a query and end 
> the driver life cycle.
>   // do not understand why it is needed and wonder if it could be combined 
> with close.
>   @Override
>   public void destroy() {
> driverState.lock();
> try {
>   // in the cancel case where the driver state is INTERRUPTED, destroy 
> will be deferred to
>   // the query process
>   if (driverState.isDestroyed()) {
> return;
>   } else {
> driverState.descroyed();
>   }
> } finally {
>   driverState.unlock();
> }
> driverTxnHandler.destroy();
>   }
> {code}
> The problematic part is the: driverTxnHandler.destroy() which looks like:
> {code:java}
>  void destroy() {
>boolean isTxnOpen =
>  driverContext != null &&
>  driverContext.getTxnManager() != null &&
>  driverContext.getTxnManager().isTxnOpen();
>release(!hiveLocks.isEmpty() || isTxnOpen);
>  }
> {code}
> What happens is (rough sketch):
>  Q1 - starts operation, acquires txn, does operation, closes txn/cleans up 
> txn info, starts fetching data
>  Q2 - starts operation, acquire txn
>  Q1 - calls close operation which in turn calls destroy which sees the Q2s 
> transaction information and cleans it up.
>  Q2 - proceeds and fails in splitGeneration when it no longer can find its 
> Valid*TxnIdList information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24543:
--
Labels: pull-request-available  (was: )

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?focusedWorklogId=525318=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-525318
 ]

ASF GitHub Bot logged work on HIVE-24543:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 22:53
Start Date: 16/Dec/20 22:53
Worklog Time Spent: 10m 
  Work Description: vihangk1 opened a new pull request #1791:
URL: https://github.com/apache/hive/pull/1791


   ## What changes were proposed in this pull request?
   This is a WIP patch to add support for SAML 2.0 based authention for 
HiveServer2 clients. Currently this would only support local desktop based JDBC 
clients. The JDBC Driver opens a browser and points to the SSO URL to complete 
the authentication flow.
   
   
   ### Why are the changes needed?
   New feature
   
   ### Does this PR introduce _any_ user-facing change?
   It introduces new hive configurations to configure the SAML client.
   
   ### How was this patch tested?
   Added new unit tests to authenticate against a dockerized SAML 2.0 provider. 
Also, manually tested with Okta SAML 2.0 application.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 525318)
Remaining Estimate: 0h
Time Spent: 10m

> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24549) TxnManager should not be shared across queries

2020-12-16 Thread John Sherman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sherman reassigned HIVE-24549:
---


> TxnManager should not be shared across queries
> --
>
> Key: HIVE-24549
> URL: https://issues.apache.org/jira/browse/HIVE-24549
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>
> There are various sections of code that assume the DbTxnManager is not shared 
> across concurrent queries in a session.
>  Such as (which gets invoked during closeOperation):
>  
> [https://github.com/apache/hive/blob/3f5e01cae5b65dde7edb3fbde8ebe70c1d02f6cf/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L868-L885]
> {code:java}
>// is usually called after close() to commit or rollback a query and end 
> the driver life cycle.
>   // do not understand why it is needed and wonder if it could be combined 
> with close.
>   @Override
>   public void destroy() {
> driverState.lock();
> try {
>   // in the cancel case where the driver state is INTERRUPTED, destroy 
> will be deferred to
>   // the query process
>   if (driverState.isDestroyed()) {
> return;
>   } else {
> driverState.descroyed();
>   }
> } finally {
>   driverState.unlock();
> }
> driverTxnHandler.destroy();
>   }
> {code}
> The problematic part is the: driverTxnHandler.destroy() which looks like:
> {code:java}
>  void destroy() {
>boolean isTxnOpen =
>  driverContext != null &&
>  driverContext.getTxnManager() != null &&
>  driverContext.getTxnManager().isTxnOpen();
>release(!hiveLocks.isEmpty() || isTxnOpen);
>  }
> {code}
> What happens is (rough sketch):
>  Q1 - starts operation, acquires txn, does operation, closes txn/cleans up 
> txn info, starts fetching data
>  Q2 - starts operation, acquire txn
>  Q1 - calls close operation which in turn calls destroy which sees the Q2s 
> transaction information and cleans it up.
>  Q2 - proceeds and fails in splitGeneration when it no longer can find its 
> Valid*TxnIdList information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24229) DirectSql fails in case of OracleDB

2020-12-16 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250643#comment-17250643
 ] 

Ayush Saxena commented on HIVE-24229:
-

Hi [~ngangam]
For me on Oracle DB, it was giving `ClassCastException`, Something like can not 
case oracle.sql.Clob to string. 
{{extractSqlClob}} has a check in it {{  if (value instanceof Clob) { }}} 
so will it bother still?

> DirectSql fails in case of OracleDB
> ---
>
> Key: HIVE-24229
> URL: https://issues.apache.org/jira/browse/HIVE-24229
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Direct Sql fails due to different data type mapping incase of Oracle DB



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23684) Large underestimation in NDV stats when input and join cardinality ratio is big

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23684?focusedWorklogId=525252=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-525252
 ]

ASF GitHub Bot logged work on HIVE-23684:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 20:38
Start Date: 16/Dec/20 20:38
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1786:
URL: https://github.com/apache/hive/pull/1786#discussion_r544603006



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -2686,6 +2686,8 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Estimate statistics in absence of statistics."),
 HIVE_STATS_NDV_ESTIMATE_PERC("hive.stats.ndv.estimate.percent", (float)20,
 "This many percentage of rows will be estimated as count distinct in 
absence of statistics."),
+HIVE_STATS_JOIN_NDV_READJUSTMENT("hive.stats.join.ndv.readjustment", false,
+"Setting this to true will make Hive use Calcite to adjust 
estimatation for ndv after join."),

Review comment:
   `Setting this to true will make Hive use Calcite`
   
   Instead of 'Calcite' logic, could you maybe mention (one-liner since it is a 
config and there is no need of it to be too long) the kind of estimation it 
does?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
##
@@ -2546,19 +2547,25 @@ private void updateColStats(HiveConf conf, Statistics 
stats, long leftUnmatchedR
   for (ColStatistics cs : colStats) {
 colNameStatsAvailable.add(cs.getColumnName());
 int pos = jop.getConf().getReversedExprs().get(cs.getColumnName());
-long oldRowCount = rowCountParents.get(pos);
-double ratio = (double) newNumRows / (double) oldRowCount;
 long oldDV = cs.getCountDistint();
+
+boolean useCalciteForNdvReadjustment
+= HiveConf.getBoolVar(conf, 
ConfVars.HIVE_STATS_JOIN_NDV_READJUSTMENT);
 long newDV = oldDV;
+if (useCalciteForNdvReadjustment) {
+  newDV = RelMdUtil.numDistinctVals(oldDV * 1.0, newNumRows * 
1.0).longValue();

Review comment:
   Can `RelMdUtil.numDistinctVals` return null? Just making sure we do not 
need a null check.

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -2686,6 +2686,8 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Estimate statistics in absence of statistics."),
 HIVE_STATS_NDV_ESTIMATE_PERC("hive.stats.ndv.estimate.percent", (float)20,
 "This many percentage of rows will be estimated as count distinct in 
absence of statistics."),
+HIVE_STATS_JOIN_NDV_READJUSTMENT("hive.stats.join.ndv.readjustment", false,
+"Setting this to true will make Hive use Calcite to adjust 
estimatation for ndv after join."),

Review comment:
   typo: `estimatation`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 525252)
Time Spent: 20m  (was: 10m)

> Large underestimation in NDV stats when input and join cardinality ratio is 
> big
> ---
>
> Key: HIVE-23684
> URL: https://issues.apache.org/jira/browse/HIVE-23684
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Large underestimations of NDV values may occur after a join operation since 
> the current logic will decrease the original NDV values proportionally.
> The 
> [code|https://github.com/apache/hive/blob/1271d08a3c51c021fa710449f8748b8cdb12b70f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2558]
>  compares the number of rows of each relation before the join with the number 
> of rows after the join and extracts a ratio for each side. Based on this 
> ratio it adapts (reduces) the NDV accordingly.
> Consider for instance the following query:
> {code:sql}
> select inv_warehouse_sk
>  , inv_item_sk
>  , stddev_samp(inv_quantity_on_hand) stdev
>  , avg(inv_quantity_on_hand) mean
> from inventory
>, date_dim
> where inv_date_sk = d_date_sk
>   and d_year = 1999
>   and d_moy = 2
> group by inv_warehouse_sk, inv_item_sk;
> {code}
> For the sake of the discussion, I outline below 

[jira] [Work started] (HIVE-24548) CompactionHeartbeater leaks metastore connections

2020-12-16 Thread Peter Varga (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24548 started by Peter Varga.
--
> CompactionHeartbeater leaks metastore connections
> -
>
> Key: HIVE-24548
> URL: https://issues.apache.org/jira/browse/HIVE-24548
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Every Heartbeater thread creates a new metastore client, that is never closed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24548) CompactionHeartbeater leaks metastore connections

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24548:
--
Labels: pull-request-available  (was: )

> CompactionHeartbeater leaks metastore connections
> -
>
> Key: HIVE-24548
> URL: https://issues.apache.org/jira/browse/HIVE-24548
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Every Heartbeater thread creates a new metastore client, that is never closed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24548) CompactionHeartbeater leaks metastore connections

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24548?focusedWorklogId=525221=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-525221
 ]

ASF GitHub Bot logged work on HIVE-24548:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 19:29
Start Date: 16/Dec/20 19:29
Worklog Time Spent: 10m 
  Work Description: pvargacl opened a new pull request #1790:
URL: https://github.com/apache/hive/pull/1790


   
   
   ### What changes were proposed in this pull request?
   Close metastore connection when CompactionHeartbeater is destoyed
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 525221)
Remaining Estimate: 0h
Time Spent: 10m

> CompactionHeartbeater leaks metastore connections
> -
>
> Key: HIVE-24548
> URL: https://issues.apache.org/jira/browse/HIVE-24548
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Every Heartbeater thread creates a new metastore client, that is never closed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24548) CompactionHeartbeater leaks metastore connections

2020-12-16 Thread Peter Varga (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga reassigned HIVE-24548:
--


> CompactionHeartbeater leaks metastore connections
> -
>
> Key: HIVE-24548
> URL: https://issues.apache.org/jira/browse/HIVE-24548
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>
> Every Heartbeater thread creates a new metastore client, that is never closed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24229) DirectSql fails in case of OracleDB

2020-12-16 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250415#comment-17250415
 ] 

Naveen Gangam commented on HIVE-24229:
--

[~ayushtkn] Have a quick question about part of this fix. Looks like the change 
include the following fix.

{noformat}
public void apply(Partition t, Object[] fields) {
-t.putToParameters((String)fields[1], extractSqlClob(fields[2]));
+t.putToParameters(extractSqlClob(fields[1]), 
extractSqlClob(fields[2]));
  }});
{noformat}

The PARTITION_PARAMS.PARAM_KEY is a varchar(256). Why was the change from 
"(String)fields[1]" to "extractSqlClob(fields[1]" required? I am concerned that 
this might cause issues in some other databases where the column value might be 
treated as a reference to a file and not the actual value itself.

{noformat}
-- Table PARTITION_PARAMS for join relationship
CREATE TABLE PARTITION_PARAMS
(
PART_ID NUMBER NOT NULL,
PARAM_KEY VARCHAR2(256) NOT NULL,
PARAM_VALUE CLOB NULL
);

{noformat}


> DirectSql fails in case of OracleDB
> ---
>
> Key: HIVE-24229
> URL: https://issues.apache.org/jira/browse/HIVE-24229
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Direct Sql fails due to different data type mapping incase of Oracle DB



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24547) Fix acid_vectorization_original

2020-12-16 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24547:

Description: 
the failure was hidden by the failed-to-read issue

the test is most likely failed first after HIVE-24274 

{code}
[ERROR] 
org.apache.hadoop.hive.cli.split0.TestMiniLlapLocalCliDriver.testCliDriver[acid_vectorization_original]
  Time elapsed: 10.931 s  <<< FAILURE!
java.lang.AssertionError
at 
org.apache.hadoop.hive.ql.parse.UnparseTranslator.addTranslation(UnparseTranslator.java:123)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:13073)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genJoinRelNode(CalcitePlanner.java:2897)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genJoinLogicalPlan(CalcitePlanner.java:3124)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5298)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1874)
{code}

  was:
the failure was hidden by the failed-to-read issue

the test is most likely failed first after HIVE-24274 


> Fix acid_vectorization_original
> ---
>
> Key: HIVE-24547
> URL: https://issues.apache.org/jira/browse/HIVE-24547
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> the failure was hidden by the failed-to-read issue
> the test is most likely failed first after HIVE-24274 
> {code}
> [ERROR] 
> org.apache.hadoop.hive.cli.split0.TestMiniLlapLocalCliDriver.testCliDriver[acid_vectorization_original]
>   Time elapsed: 10.931 s  <<< FAILURE!
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hive.ql.parse.UnparseTranslator.addTranslation(UnparseTranslator.java:123)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:13073)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genJoinRelNode(CalcitePlanner.java:2897)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genJoinLogicalPlan(CalcitePlanner.java:3124)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5298)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1874)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24264) Fix failed-to-read errors in precommit runs

2020-12-16 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24264.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Krisztian for reviewing the changes!

> Fix failed-to-read errors in precommit runs
> ---
>
> Key: HIVE-24264
> URL: https://issues.apache.org/jira/browse/HIVE-24264
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> the following happens:
> * this seems to be caused by tests outputting a lot of messages
> * some error happens in surefire - and the system-err is discarded
> * junit xml becomes corrupted
> * jenkins does report the failure - but doesn't take it into account in build 
> result setting; so the result will remain green



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24264) Fix failed-to-read errors in precommit runs

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24264:
--
Labels: pull-request-available  (was: )

> Fix failed-to-read errors in precommit runs
> ---
>
> Key: HIVE-24264
> URL: https://issues.apache.org/jira/browse/HIVE-24264
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> the following happens:
> * this seems to be caused by tests outputting a lot of messages
> * some error happens in surefire - and the system-err is discarded
> * junit xml becomes corrupted
> * jenkins does report the failure - but doesn't take it into account in build 
> result setting; so the result will remain green



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24264) Fix failed-to-read errors in precommit runs

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24264?focusedWorklogId=525045=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-525045
 ]

ASF GitHub Bot logged work on HIVE-24264:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 14:33
Start Date: 16/Dec/20 14:33
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1574:
URL: https://github.com/apache/hive/pull/1574


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 525045)
Remaining Estimate: 0h
Time Spent: 10m

> Fix failed-to-read errors in precommit runs
> ---
>
> Key: HIVE-24264
> URL: https://issues.apache.org/jira/browse/HIVE-24264
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> the following happens:
> * this seems to be caused by tests outputting a lot of messages
> * some error happens in surefire - and the system-err is discarded
> * junit xml becomes corrupted
> * jenkins does report the failure - but doesn't take it into account in build 
> result setting; so the result will remain green



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24519) Optimize MV: Materialized views should not rebuild when tables are not modified

2020-12-16 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250318#comment-17250318
 ] 

Jesus Camacho Rodriguez commented on HIVE-24519:


{quote}
In this test an MV is created with rewriting.time.window=5min. After that an 
insert executed on one of its source tables but the MV is considered to be up 
to date because of no timeout when rebuild is requested. Also the query 
rewritten to use the MV returns less record than the query with the original 
plan would return.
{quote}
[~kkasa], that should not be the behavior. For rebuild purposes, whether an MV 
is outdated or not should be determined using only the write id lists for the 
tables it uses.

> Optimize MV: Materialized views should not rebuild when tables are not 
> modified
> ---
>
> Key: HIVE-24519
> URL: https://issues.apache.org/jira/browse/HIVE-24519
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Rajesh Balamohan
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> e.g
> {noformat}
> create materialized view c_c_address as 
> select c_customer_sk from customer c, customer_address ca where 
> c_current_addr_sk = ca.ca_address_id;
> ALTER MATERIALIZED VIEW c_c_address REBUILD; <-- This shouldn't trigger 
> rebuild, when source tables are not modified
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24519) Optimize MV: Materialized views should not rebuild when tables are not modified

2020-12-16 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250317#comment-17250317
 ] 

Krisztian Kasa commented on HIVE-24519:
---

[~rajesh.balamohan] Should we add a force option? Like
{code:java}
ALTER MATERIALIZED VIEW cmv_mat_view_n3 REBUILD ENFORCED
or
ALTER MATERIALIZED VIEW cmv_mat_view_n3 REBUILD FORCE
{code}
If for some reason the system cannot determine the condition of the 
materialized view well the user has no option to trigger the rebuild.
 Only drop and re-create the view works.

Example: materialized_view_create_rewrite_time_window.q
 In this test an MV is created with rewriting.time.window=5min. After that an 
insert executed on one of its source tables but the MV is considered to be up 
to date because of no timeout when rebuild is requested. Also the query 
rewritten to use the MV returns less record than the query with the original 
plan would return.

> Optimize MV: Materialized views should not rebuild when tables are not 
> modified
> ---
>
> Key: HIVE-24519
> URL: https://issues.apache.org/jira/browse/HIVE-24519
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Rajesh Balamohan
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> e.g
> {noformat}
> create materialized view c_c_address as 
> select c_customer_sk from customer c, customer_address ca where 
> c_current_addr_sk = ca.ca_address_id;
> ALTER MATERIALIZED VIEW c_c_address REBUILD; <-- This shouldn't trigger 
> rebuild, when source tables are not modified
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24519) Optimize MV: Materialized views should not rebuild when tables are not modified

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24519?focusedWorklogId=525009=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-525009
 ]

ASF GitHub Bot logged work on HIVE-24519:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 13:06
Start Date: 16/Dec/20 13:06
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1772:
URL: https://github.com/apache/hive/pull/1772#discussion_r544282613



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java
##
@@ -69,6 +72,23 @@ public void analyzeInternal(ASTNode root) throws 
SemanticException {
 
 LOG.debug("Rebuilding materialized view " + 
tableName.getNotEmptyDbTable());
 super.analyzeInternal(rewrittenAST);
+
+try {
+  Table table = db.getTable(tableName.getDb(), tableName.getTable());

Review comment:
   Thanks for highlighting that we have all necessary info before calling 
`super.analyzeInternal()`.
   Creating `HiveOperation.ALTER_MATERIALIZED_VIEW_REBUILD` was necessary to 
get this work but no need to modify the `rootTasks`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 525009)
Time Spent: 0.5h  (was: 20m)

> Optimize MV: Materialized views should not rebuild when tables are not 
> modified
> ---
>
> Key: HIVE-24519
> URL: https://issues.apache.org/jira/browse/HIVE-24519
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Rajesh Balamohan
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> e.g
> {noformat}
> create materialized view c_c_address as 
> select c_customer_sk from customer c, customer_address ca where 
> c_current_addr_sk = ca.ca_address_id;
> ALTER MATERIALIZED VIEW c_c_address REBUILD; <-- This shouldn't trigger 
> rebuild, when source tables are not modified
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=525007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-525007
 ]

ASF GitHub Bot logged work on HIVE-24471:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 12:52
Start Date: 16/Dec/20 12:52
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1736:
URL: https://github.com/apache/hive/pull/1736#discussion_r544256032



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -1790,6 +1790,10 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 HIVEALIAS("hive.alias", "", ""),
 HIVEMAPSIDEAGGREGATE("hive.map.aggr", true, "Whether to use map-side 
aggregation in Hive Group By queries"),
 HIVEGROUPBYSKEW("hive.groupby.skewindata", false, "Whether there is skew 
in data to optimize group by queries"),
+
+HIVE_ENABLE_COMBINER_FOR_GROUP_BY("hive.enable.combiner.for.groupby", true,
+"Whether to enable tez combiner to aggregate the records after sorting 
is done"),

Review comment:
   Maybe clarify it is only used for map side aggregation? Any case this 
would not be beneficial?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByCombiner.java
##
@@ -0,0 +1,282 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByCombiner;
+import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.BaseWork;
+import org.apache.hadoop.hive.ql.plan.GroupByDesc;
+import org.apache.hadoop.hive.ql.plan.ReduceWork;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.serde2.AbstractSerDe;
+import org.apache.hadoop.hive.serde2.Deserializer;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.Serializer;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.DataInputBuffer;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.apache.tez.runtime.api.TaskContext;
+import org.apache.tez.runtime.library.common.sort.impl.IFile;
+import org.apache.tez.runtime.library.common.sort.impl.TezRawKeyValueIterator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+
+import static org.apache.hadoop.hive.ql.exec.Utilities.HAS_REDUCE_WORK;
+import static org.apache.hadoop.hive.ql.exec.Utilities.REDUCE_PLAN_NAME;
+
+// Combiner for normal group by operator. In case of map side aggregate, the 
partially
+// aggregated records are sorted based on group by key. If because of some 
reasons, like hash
+// table memory exceeded the limit or the first few batches of records have 
less ndvs, the
+// aggregation is not done, then here the aggregation can be done cheaply as 
the records
+// are sorted based on group by key.
+public class GroupByCombiner extends VectorGroupByCombiner {
+
+  private static final Logger LOG = LoggerFactory.getLogger(
+  org.apache.hadoop.hive.ql.exec.GroupByCombiner.class.getName());
+
+  private transient GenericUDAFEvaluator[] aggregationEvaluators;
+  Deserializer valueDeserializer;
+  GenericUDAFEvaluator.AggregationBuffer[] aggregationBuffers;
+  GroupByOperator groupByOperator;
+  Serializer valueSerializer;
+  ObjectInspector aggrObjectInspector;
+  DataInputBuffer valueBuffer;
+  Object[] cachedValues;
+
+  public GroupByCombiner(TaskContext taskContext) throws HiveException, 
IOException {
+super(taskContext);
+if (rw != null) {
+  try {
+

[jira] [Updated] (HIVE-24471) Add support for combiner in hash mode group aggregation

2020-12-16 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-24471:
--
Description: 
In map side group aggregation, partial grouped aggregation is calculated to 
reduce the data written to disk by map task. In case of hash aggregation, where 
the input data is not sorted, hash table is used (with sorting also being 
performed before flushing). If the hash table size increases beyond 
configurable limit, data is flushed to disk and new hash table is generated. If 
the reduction by hash table is less than min hash aggregation reduction 
calculated during compile time, the map side aggregation is converted to 
streaming mode. So if the first few batch of records does not result into 
significant reduction, then the mode is switched to streaming mode. This may 
have impact on performance, if the subsequent batch of records have less number 
of distinct values. 

To improve performance both in Hash and Streaming mode, a combiner can be added 
to the map task after the keys are sorted. This will make sure that the 
aggregation is done if possible and reduce the data written to disk.

  was:
In map side group aggregation, partial grouped aggregation is calculated to 
reduce the data written to disk by map task. In case of hash aggregation, where 
the input data is not sorted, hash table is used. If the hash table size 
increases beyond configurable limit, data is flushed to disk and new hash table 
is generated. If the reduction by hash table is less than min hash aggregation 
reduction calculated during compile time, the map side aggregation is converted 
to streaming mode. So if the first few batch of records does not result into 
significant reduction, then the mode is switched to streaming mode. This may 
have impact on performance, if the subsequent batch of records have less number 
of distinct values. 

To improve performance both in Hash and Streaming mode, a combiner can be added 
to the map task after the keys are sorted. This will make sure that the 
aggregation is done if possible and reduce the data written to disk.


> Add support for combiner in hash mode group aggregation 
> 
>
> Key: HIVE-24471
> URL: https://issues.apache.org/jira/browse/HIVE-24471
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In map side group aggregation, partial grouped aggregation is calculated to 
> reduce the data written to disk by map task. In case of hash aggregation, 
> where the input data is not sorted, hash table is used (with sorting also 
> being performed before flushing). If the hash table size increases beyond 
> configurable limit, data is flushed to disk and new hash table is generated. 
> If the reduction by hash table is less than min hash aggregation reduction 
> calculated during compile time, the map side aggregation is converted to 
> streaming mode. So if the first few batch of records does not result into 
> significant reduction, then the mode is switched to streaming mode. This may 
> have impact on performance, if the subsequent batch of records have less 
> number of distinct values. 
> To improve performance both in Hash and Streaming mode, a combiner can be 
> added to the map task after the keys are sorted. This will make sure that the 
> aggregation is done if possible and reduce the data written to disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24471) Add support for combiner in hash mode group aggregation

2020-12-16 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-24471:
--
Description: 
In map side group aggregation, partial grouped aggregation is calculated to 
reduce the data written to disk by map task. In case of hash aggregation, where 
the input data is not sorted, hash table is used. If the hash table size 
increases beyond configurable limit, data is flushed to disk and new hash table 
is generated. If the reduction by hash table is less than min hash aggregation 
reduction calculated during compile time, the map side aggregation is converted 
to streaming mode. So if the first few batch of records does not result into 
significant reduction, then the mode is switched to streaming mode. This may 
have impact on performance, if the subsequent batch of records have less number 
of distinct values. 

To improve performance both in Hash and Streaming mode, a combiner can be added 
to the map task after the keys are sorted. This will make sure that the 
aggregation is done if possible and reduce the data written to disk.

  was:In map side group aggregation, partial grouped aggregation is calculated 
to reduce the data written to disk by map task. In case of hash aggregation, 
where the input data is not sorted, hash table is used. If the hash table size 
increases beyond configurable limit, data is flushed to disk and new hash table 
is generated. If the reduction by hash table is less than min hash aggregation 
reduction calculated during compile time, the map side aggregation is converted 
to streaming mode. So if the first few batch of records does not result into 
significant reduction, then the mode is switched to streaming mode. This may 
have impact on performance, if the subsequent batch of records have less number 
of distinct values. To mitigate this situation, a combiner can be added to the 
map task after the keys are sorted. This will make sure that the aggregation is 
done if possible and reduce the data written to disk.


> Add support for combiner in hash mode group aggregation 
> 
>
> Key: HIVE-24471
> URL: https://issues.apache.org/jira/browse/HIVE-24471
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In map side group aggregation, partial grouped aggregation is calculated to 
> reduce the data written to disk by map task. In case of hash aggregation, 
> where the input data is not sorted, hash table is used. If the hash table 
> size increases beyond configurable limit, data is flushed to disk and new 
> hash table is generated. If the reduction by hash table is less than min hash 
> aggregation reduction calculated during compile time, the map side 
> aggregation is converted to streaming mode. So if the first few batch of 
> records does not result into significant reduction, then the mode is switched 
> to streaming mode. This may have impact on performance, if the subsequent 
> batch of records have less number of distinct values. 
> To improve performance both in Hash and Streaming mode, a combiner can be 
> added to the map task after the keys are sorted. This will make sure that the 
> aggregation is done if possible and reduce the data written to disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-12371) Adding a timeout connection parameter for JDBC

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-12371?focusedWorklogId=524991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524991
 ]

ASF GitHub Bot logged work on HIVE-12371:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 11:53
Start Date: 16/Dec/20 11:53
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1611:
URL: https://github.com/apache/hive/pull/1611#issuecomment-746171619


   +1, looks good to me!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524991)
Time Spent: 50m  (was: 40m)

> Adding a timeout connection parameter for JDBC
> --
>
> Key: HIVE-12371
> URL: https://issues.apache.org/jira/browse/HIVE-12371
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: Nemon Lou
>Assignee: Xi Chen
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There are some timeout settings from server side:
> HIVE-4766
> HIVE-6679
> Adding a timeout connection parameter for JDBC is useful in some scenario:
> 1,beeline (which can not set timeout manually)
> 2,customize timeout for different connections (among hive or RDBs,which can 
> not be done via DriverManager.setLoginTimeout())
> Just like postgresql,
> {noformat}
> jdbc:postgresql://localhost/test?user=fred=secret=true=0
> {noformat}
> or mysql
> {noformat}
> jdbc:mysql://xxx.xx.xxx.xxx:3306/database?connectTimeout=6=6
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-12371) Adding a timeout connection parameter for JDBC

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-12371?focusedWorklogId=524990=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524990
 ]

ASF GitHub Bot logged work on HIVE-12371:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 11:51
Start Date: 16/Dec/20 11:51
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1611:
URL: https://github.com/apache/hive/pull/1611#discussion_r544236706



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -1002,11 +1002,19 @@ private String getSessionValue(String varName, String 
varDefault) {
 return varValue;
   }
 
-  // copy loginTimeout from driver manager. Thrift timeout needs to be in 
millis
+  // use socketTimeout from jdbc connection url. Thrift timeout needs to be in 
millis
   private void setupLoginTimeout() {
-long timeOut = TimeUnit.SECONDS.toMillis(DriverManager.getLoginTimeout());
+String socketTimeoutStr = 
sessConfMap.getOrDefault(JdbcConnectionParams.SOCKET_TIMEOUT, "0");
+long timeOut = 0;
+try {
+  timeOut = Long.parseLong(socketTimeoutStr);
+} catch (NumberFormatException e) {
+  LOG.info("Failed to parse socketTimeout of value" + socketTimeoutStr);

Review comment:
   A space may be needed here after `of value`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524990)
Time Spent: 40m  (was: 0.5h)

> Adding a timeout connection parameter for JDBC
> --
>
> Key: HIVE-12371
> URL: https://issues.apache.org/jira/browse/HIVE-12371
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: Nemon Lou
>Assignee: Xi Chen
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> There are some timeout settings from server side:
> HIVE-4766
> HIVE-6679
> Adding a timeout connection parameter for JDBC is useful in some scenario:
> 1,beeline (which can not set timeout manually)
> 2,customize timeout for different connections (among hive or RDBs,which can 
> not be done via DriverManager.setLoginTimeout())
> Just like postgresql,
> {noformat}
> jdbc:postgresql://localhost/test?user=fred=secret=true=0
> {noformat}
> or mysql
> {noformat}
> jdbc:mysql://xxx.xx.xxx.xxx:3306/database?connectTimeout=6=6
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-12371) Adding a timeout connection parameter for JDBC

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-12371?focusedWorklogId=524988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524988
 ]

ASF GitHub Bot logged work on HIVE-12371:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 11:50
Start Date: 16/Dec/20 11:50
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1611:
URL: https://github.com/apache/hive/pull/1611#discussion_r544235952



##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -1002,11 +1002,19 @@ private String getSessionValue(String varName, String 
varDefault) {
 return varValue;
   }
 
-  // copy loginTimeout from driver manager. Thrift timeout needs to be in 
millis
+  // use socketTimeout from jdbc connection url. Thrift timeout needs to be in 
millis
   private void setupLoginTimeout() {
-long timeOut = TimeUnit.SECONDS.toMillis(DriverManager.getLoginTimeout());
+String socketTimeoutStr = 
sessConfMap.getOrDefault(JdbcConnectionParams.SOCKET_TIMEOUT, "0");
+long timeOut = 0;

Review comment:
   What if the default value set to 
TimeUnit.SECONDS.toMillis(DriverManager.getLoginTimeout())?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524988)
Time Spent: 0.5h  (was: 20m)

> Adding a timeout connection parameter for JDBC
> --
>
> Key: HIVE-12371
> URL: https://issues.apache.org/jira/browse/HIVE-12371
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: Nemon Lou
>Assignee: Xi Chen
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There are some timeout settings from server side:
> HIVE-4766
> HIVE-6679
> Adding a timeout connection parameter for JDBC is useful in some scenario:
> 1,beeline (which can not set timeout manually)
> 2,customize timeout for different connections (among hive or RDBs,which can 
> not be done via DriverManager.setLoginTimeout())
> Just like postgresql,
> {noformat}
> jdbc:postgresql://localhost/test?user=fred=secret=true=0
> {noformat}
> or mysql
> {noformat}
> jdbc:mysql://xxx.xx.xxx.xxx:3306/database?connectTimeout=6=6
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24539) OrcInputFormat schema generation should respect column delimiter

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24539?focusedWorklogId=524986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524986
 ]

ASF GitHub Bot logged work on HIVE-24539:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 11:48
Start Date: 16/Dec/20 11:48
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1783:
URL: https://github.com/apache/hive/pull/1783#discussion_r544233488



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
##
@@ -2675,12 +2676,13 @@ public static TypeDescription convertTypeInfo(TypeInfo 
info) {
   public static TypeDescription getDesiredRowTypeDescr(Configuration conf,
boolean isAcidRead,
int dataColumns) {
-
 String columnNameProperty = null;
 String columnTypeProperty = null;
 
 ArrayList schemaEvolutionColumnNames = null;
 ArrayList schemaEvolutionTypeDescrs = null;
+// Make sure we split colNames using the right Delimiter
+final String columnNameDelimiter = 
conf.get(serdeConstants.COLUMN_NAME_DELIMITER, 
String.valueOf(SerDeUtils.COMMA));

Review comment:
   Hey @abstractdog  thanks for taking a look!
   > this makes me think that in order to use commas in column names, you need 
to define another column name delimiter, otherwise those are cannot be 
distinguished from each other, right?
   
   We already check this corner case when creating the **TableDesc** 
https://github.com/apache/hive/blob/95f3d6512f35839f2fad3cfd608616534e506a4b/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java#L562
   
   
https://github.com/pgaref/hive/blob/0d2b39ac180d809788f662fbe3271482cd4d909d/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L1562
   **getColumnNameDelimiter** method actually checks for commas in colNames and 
uses `\0` to split them instead.
   
   This PR is just making use of the custom delimiter that was forgotten for 
OrcInputFormat but is done for others e.g., OrcOutputFormat
   
   In your example above the DELIMITER will be `\0` to avoid colName splitting 
issues
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524986)
Time Spent: 1h 10m  (was: 1h)

> OrcInputFormat schema generation should respect column delimiter
> 
>
> Key: HIVE-24539
> URL: https://issues.apache.org/jira/browse/HIVE-24539
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> OrcInputFormat currently generates schema using the given configuration and 
> the default delimiter – that causes inconsistencies when names contain commas.
> We should follow a similar approach to 
> [OrcOutputFormat|https://github.com/apache/hive/blob/9563dd63188280f4b7c307f36e1ea0c69aec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java#L145]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24539) OrcInputFormat schema generation should respect column delimiter

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24539?focusedWorklogId=524985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524985
 ]

ASF GitHub Bot logged work on HIVE-24539:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 11:47
Start Date: 16/Dec/20 11:47
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1783:
URL: https://github.com/apache/hive/pull/1783#discussion_r544233488



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
##
@@ -2675,12 +2676,13 @@ public static TypeDescription convertTypeInfo(TypeInfo 
info) {
   public static TypeDescription getDesiredRowTypeDescr(Configuration conf,
boolean isAcidRead,
int dataColumns) {
-
 String columnNameProperty = null;
 String columnTypeProperty = null;
 
 ArrayList schemaEvolutionColumnNames = null;
 ArrayList schemaEvolutionTypeDescrs = null;
+// Make sure we split colNames using the right Delimiter
+final String columnNameDelimiter = 
conf.get(serdeConstants.COLUMN_NAME_DELIMITER, 
String.valueOf(SerDeUtils.COMMA));

Review comment:
   Hey @abstractdog  thanks for taking a look!
   > this makes me think that in order to use commas in column names, you need 
to define another column name delimiter, otherwise those are cannot be 
distinguished from each other, right?
   
   We already check this corner case when creating the **TableDesc** 
https://github.com/apache/hive/blob/95f3d6512f35839f2fad3cfd608616534e506a4b/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java#L562
   
   
https://github.com/pgaref/hive/blob/0d2b39ac180d809788f662fbe3271482cd4d909d/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L1562
   **getColumnNameDelimiter** method actually checks for commas in colNames and 
uses \0 to split them instead.
   
   This PR is just making use of the custom delimiter that was forgotten for 
OrcInputFormat but is done for others e.g., OrcOutputFormat
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524985)
Time Spent: 1h  (was: 50m)

> OrcInputFormat schema generation should respect column delimiter
> 
>
> Key: HIVE-24539
> URL: https://issues.apache.org/jira/browse/HIVE-24539
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> OrcInputFormat currently generates schema using the given configuration and 
> the default delimiter – that causes inconsistencies when names contain commas.
> We should follow a similar approach to 
> [OrcOutputFormat|https://github.com/apache/hive/blob/9563dd63188280f4b7c307f36e1ea0c69aec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java#L145]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24539) OrcInputFormat schema generation should respect column delimiter

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24539?focusedWorklogId=524982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524982
 ]

ASF GitHub Bot logged work on HIVE-24539:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 11:46
Start Date: 16/Dec/20 11:46
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1783:
URL: https://github.com/apache/hive/pull/1783#discussion_r544233488



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
##
@@ -2675,12 +2676,13 @@ public static TypeDescription convertTypeInfo(TypeInfo 
info) {
   public static TypeDescription getDesiredRowTypeDescr(Configuration conf,
boolean isAcidRead,
int dataColumns) {
-
 String columnNameProperty = null;
 String columnTypeProperty = null;
 
 ArrayList schemaEvolutionColumnNames = null;
 ArrayList schemaEvolutionTypeDescrs = null;
+// Make sure we split colNames using the right Delimiter
+final String columnNameDelimiter = 
conf.get(serdeConstants.COLUMN_NAME_DELIMITER, 
String.valueOf(SerDeUtils.COMMA));

Review comment:
   Hey @abstractdog  thanks for taking a look!
   > this makes me think that in order to use commas in column names, you need 
to define another column name delimiter, otherwise those are cannot be 
distinguished from each other, right?
   We already check this corner case when creating the 
https://github.com/apache/hive/blob/95f3d6512f35839f2fad3cfd608616534e506a4b/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java#L562
   
   
https://github.com/pgaref/hive/blob/0d2b39ac180d809788f662fbe3271482cd4d909d/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L1562
   actually checks for commas in colNames and uses \0 to split them instead.
   
   This PR is just making use of the custom delimiter that was forgotten for 
OrcInputFormat but is done for others e.g., OrcOutputFormat
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524982)
Time Spent: 40m  (was: 0.5h)

> OrcInputFormat schema generation should respect column delimiter
> 
>
> Key: HIVE-24539
> URL: https://issues.apache.org/jira/browse/HIVE-24539
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> OrcInputFormat currently generates schema using the given configuration and 
> the default delimiter – that causes inconsistencies when names contain commas.
> We should follow a similar approach to 
> [OrcOutputFormat|https://github.com/apache/hive/blob/9563dd63188280f4b7c307f36e1ea0c69aec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java#L145]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24539) OrcInputFormat schema generation should respect column delimiter

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24539?focusedWorklogId=524984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524984
 ]

ASF GitHub Bot logged work on HIVE-24539:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 11:46
Start Date: 16/Dec/20 11:46
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1783:
URL: https://github.com/apache/hive/pull/1783#discussion_r544233488



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
##
@@ -2675,12 +2676,13 @@ public static TypeDescription convertTypeInfo(TypeInfo 
info) {
   public static TypeDescription getDesiredRowTypeDescr(Configuration conf,
boolean isAcidRead,
int dataColumns) {
-
 String columnNameProperty = null;
 String columnTypeProperty = null;
 
 ArrayList schemaEvolutionColumnNames = null;
 ArrayList schemaEvolutionTypeDescrs = null;
+// Make sure we split colNames using the right Delimiter
+final String columnNameDelimiter = 
conf.get(serdeConstants.COLUMN_NAME_DELIMITER, 
String.valueOf(SerDeUtils.COMMA));

Review comment:
   Hey @abstractdog  thanks for taking a look!
   > this makes me think that in order to use commas in column names, you need 
to define another column name delimiter, otherwise those are cannot be 
distinguished from each other, right?
   
   We already check this corner case when creating the 
https://github.com/apache/hive/blob/95f3d6512f35839f2fad3cfd608616534e506a4b/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java#L562
   
   
https://github.com/pgaref/hive/blob/0d2b39ac180d809788f662fbe3271482cd4d909d/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L1562
   actually checks for commas in colNames and uses \0 to split them instead.
   
   This PR is just making use of the custom delimiter that was forgotten for 
OrcInputFormat but is done for others e.g., OrcOutputFormat
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524984)
Time Spent: 50m  (was: 40m)

> OrcInputFormat schema generation should respect column delimiter
> 
>
> Key: HIVE-24539
> URL: https://issues.apache.org/jira/browse/HIVE-24539
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> OrcInputFormat currently generates schema using the given configuration and 
> the default delimiter – that causes inconsistencies when names contain commas.
> We should follow a similar approach to 
> [OrcOutputFormat|https://github.com/apache/hive/blob/9563dd63188280f4b7c307f36e1ea0c69aec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java#L145]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24539) OrcInputFormat schema generation should respect column delimiter

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24539?focusedWorklogId=524977=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524977
 ]

ASF GitHub Bot logged work on HIVE-24539:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 11:39
Start Date: 16/Dec/20 11:39
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1783:
URL: https://github.com/apache/hive/pull/1783#discussion_r544085969



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
##
@@ -2675,12 +2676,13 @@ public static TypeDescription convertTypeInfo(TypeInfo 
info) {
   public static TypeDescription getDesiredRowTypeDescr(Configuration conf,
boolean isAcidRead,
int dataColumns) {
-
 String columnNameProperty = null;
 String columnTypeProperty = null;
 
 ArrayList schemaEvolutionColumnNames = null;
 ArrayList schemaEvolutionTypeDescrs = null;
+// Make sure we split colNames using the right Delimiter
+final String columnNameDelimiter = 
conf.get(serdeConstants.COLUMN_NAME_DELIMITER, 
String.valueOf(SerDeUtils.COMMA));

Review comment:
   this makes me think that in order to use commas in column names, you 
need to define another column name delimiter, otherwise those are cannot be 
distinguished from each other, right?
   I mean, could you please include an example where the configuration can be 
used for this purpose? I haven't seen that in q test, what's the valid use-case 
for multiple columns? what happens if you try to do something like:
   ```
   create table test_n4 (`x,y` int, z int);
   select `x,y`, z from test_n4 where `x,y` >= 2 and z = 0;
   ```
   other than this, the patch is simple and neat, which I like :)
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524977)
Time Spent: 0.5h  (was: 20m)

> OrcInputFormat schema generation should respect column delimiter
> 
>
> Key: HIVE-24539
> URL: https://issues.apache.org/jira/browse/HIVE-24539
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> OrcInputFormat currently generates schema using the given configuration and 
> the default delimiter – that causes inconsistencies when names contain commas.
> We should follow a similar approach to 
> [OrcOutputFormat|https://github.com/apache/hive/blob/9563dd63188280f4b7c307f36e1ea0c69aec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java#L145]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24546) Avoid unwanted cloud storage call during dynamic partition load

2020-12-16 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-24546:

Attachment: simple_test.sql

> Avoid unwanted cloud storage call during dynamic partition load
> ---
>
> Key: HIVE-24546
> URL: https://issues.apache.org/jira/browse/HIVE-24546
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: simple_test.sql
>
>
> {code:java}
>  private void createDpDirCheckSrc(final Path dpStagingPath, final Path 
> dpFinalPath) throws IOException {
> if (!fs.exists(dpStagingPath) && !fs.exists(dpFinalPath)) {
>   fs.mkdirs(dpStagingPath);
>   // move task will create dp final path
>   if (reporter != null) {
> reporter.incrCounter(counterGroup, 
> Operator.HIVE_COUNTER_CREATED_DYNAMIC_PARTITIONS, 1);
>   }
> }
>   }
>  {code}
>  
>  
> {noformat}
> at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:370)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1960)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3164)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3031)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2899)
>   at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1723)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:4157)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createDpDir(FileSinkOperator.java:948)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.updateDPCounters(FileSinkOperator.java:916)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:849)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:814)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createNewPaths(FileSinkOperator.java:1200)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:1324)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1036)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24545) jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24545?focusedWorklogId=524961=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524961
 ]

ASF GitHub Bot logged work on HIVE-24545:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 11:00
Start Date: 16/Dec/20 11:00
Worklog Time Spent: 10m 
  Work Description: abstractdog opened a new pull request #1789:
URL: https://github.com/apache/hive/pull/1789


   ### What changes were proposed in this pull request?
   We should use java.sql.getLargeUpdateCount() where it's possible. 
User-facing case is beeline output.
   
   ### Why are the changes needed?
   Because this can be confusing for the user on beeline output:
   ```
   20/12/16 01:37:36 [main]: WARN jdbc.HiveStatement: Number of rows is greater 
than Integer.MAX_VALUE
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, beeline is supposed to return row numbers > Integer.MAX_VALUE properly.
   
   ### How was this patch tested?
   Not yet tested.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524961)
Remaining Estimate: 0h
Time Spent: 10m

> jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE
> 
>
> Key: HIVE-24545
> URL: https://issues.apache.org/jira/browse/HIVE-24545
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found this while IOW on TPCDS 10TB:
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 ..  llap SUCCEEDED   4210   421000  
>  0 362
> Reducer 2 ..  llap SUCCEEDED10110100  
>  0   2
> Reducer 3 ..  llap SUCCEEDED   1009   100900  
>  0   1
> --
> VERTICES: 03/03  [==>>] 100%  ELAPSED TIME: 12613.62 s
> --
> 20/12/16 01:37:36 [main]: WARN jdbc.HiveStatement: Number of rows is greater 
> than Integer.MAX_VALUE
> {code}
> my scenario was:
> {code}
> set hive.exec.max.dynamic.partitions=2000;
> drop table if exists test_sales_2;
> create table test_sales_2 like 
> tpcds_bin_partitioned_acid_orc_1.store_sales;
> insert overwrite table test_sales_2 select * from 
> tpcds_bin_partitioned_acid_orc_1.store_sales where ss_sold_date_sk > 
> 2451868;
> {code}
> regarding affected row numbers:
> {code}
> select count(*) from tpcds_bin_partitioned_acid_orc_1.store_sales where 
> ss_sold_date_sk > 2451868;
> +--+
> | _c0  |
> +--+
> | 12287871907  |
> +--+
> {code}
> I guess we should switch to long



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24545) jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24545:
--
Labels: pull-request-available  (was: )

> jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE
> 
>
> Key: HIVE-24545
> URL: https://issues.apache.org/jira/browse/HIVE-24545
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found this while IOW on TPCDS 10TB:
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 ..  llap SUCCEEDED   4210   421000  
>  0 362
> Reducer 2 ..  llap SUCCEEDED10110100  
>  0   2
> Reducer 3 ..  llap SUCCEEDED   1009   100900  
>  0   1
> --
> VERTICES: 03/03  [==>>] 100%  ELAPSED TIME: 12613.62 s
> --
> 20/12/16 01:37:36 [main]: WARN jdbc.HiveStatement: Number of rows is greater 
> than Integer.MAX_VALUE
> {code}
> my scenario was:
> {code}
> set hive.exec.max.dynamic.partitions=2000;
> drop table if exists test_sales_2;
> create table test_sales_2 like 
> tpcds_bin_partitioned_acid_orc_1.store_sales;
> insert overwrite table test_sales_2 select * from 
> tpcds_bin_partitioned_acid_orc_1.store_sales where ss_sold_date_sk > 
> 2451868;
> {code}
> regarding affected row numbers:
> {code}
> select count(*) from tpcds_bin_partitioned_acid_orc_1.store_sales where 
> ss_sold_date_sk > 2451868;
> +--+
> | _c0  |
> +--+
> | 12287871907  |
> +--+
> {code}
> I guess we should switch to long



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24545) jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE

2020-12-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-24545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250238#comment-17250238
 ] 

László Bodor edited comment on HIVE-24545 at 12/16/20, 10:40 AM:
-

this part was touched in HIVE-23117, but the problem still persist I think, 
seems like we downcast to int, even when the thrift response is about to return 
long:
{code}
  @Override
  public int getUpdateCount() throws SQLException {
checkConnection("getUpdateCount");
/**
 * Poll on the operation status, till the operation is complete. We want to 
ensure that since a
 * client might end up using executeAsync and then call this to check if 
the query run is
 * finished.
 */
long numModifiedRows = -1L;
TGetOperationStatusResp resp = waitForOperationToComplete();
if (resp != null) {
  numModifiedRows = resp.getNumModifiedRows();
}
if (numModifiedRows == -1L || numModifiedRows > Integer.MAX_VALUE) {
  LOG.warn("Invalid number of updated rows: {}", numModifiedRows);
  return -1;
}
return (int) numModifiedRows;
  }
{code}

seems like java.sql.Statement forces us to implement:
{code}
int getUpdateCount() throws SQLException;
{code}

I'm wondering if we can switch to 
https://docs.oracle.com/en/java/javase/11/docs/api/java.sql/java/sql/Statement.html#getLargeUpdateCount()


was (Author: abstractdog):
this part was touched in HIVE-23117, but the problem still persist I think, 
seems like we downcast to int, even when the thrift response is about to return 
long:
{code}
  @Override
  public int getUpdateCount() throws SQLException {
checkConnection("getUpdateCount");
/**
 * Poll on the operation status, till the operation is complete. We want to 
ensure that since a
 * client might end up using executeAsync and then call this to check if 
the query run is
 * finished.
 */
long numModifiedRows = -1L;
TGetOperationStatusResp resp = waitForOperationToComplete();
if (resp != null) {
  numModifiedRows = resp.getNumModifiedRows();
}
if (numModifiedRows == -1L || numModifiedRows > Integer.MAX_VALUE) {
  LOG.warn("Invalid number of updated rows: {}", numModifiedRows);
  return -1;
}
return (int) numModifiedRows;
  }
{code}

seems like java.sql.Statement forces us to implement:
{code}
int getUpdateCount() throws SQLException;
{code}


> jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE
> 
>
> Key: HIVE-24545
> URL: https://issues.apache.org/jira/browse/HIVE-24545
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Minor
>
> I found this while IOW on TPCDS 10TB:
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 ..  llap SUCCEEDED   4210   421000  
>  0 362
> Reducer 2 ..  llap SUCCEEDED10110100  
>  0   2
> Reducer 3 ..  llap SUCCEEDED   1009   100900  
>  0   1
> --
> VERTICES: 03/03  [==>>] 100%  ELAPSED TIME: 12613.62 s
> --
> 20/12/16 01:37:36 [main]: WARN jdbc.HiveStatement: Number of rows is greater 
> than Integer.MAX_VALUE
> {code}
> my scenario was:
> {code}
> set hive.exec.max.dynamic.partitions=2000;
> drop table if exists test_sales_2;
> create table test_sales_2 like 
> tpcds_bin_partitioned_acid_orc_1.store_sales;
> insert overwrite table test_sales_2 select * from 
> tpcds_bin_partitioned_acid_orc_1.store_sales where ss_sold_date_sk > 
> 2451868;
> {code}
> regarding affected row numbers:
> {code}
> select count(*) from tpcds_bin_partitioned_acid_orc_1.store_sales where 
> ss_sold_date_sk > 2451868;
> +--+
> | _c0  |
> +--+
> | 12287871907  |
> +--+
> {code}
> I guess we should switch to long



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24545) jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE

2020-12-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-24545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250238#comment-17250238
 ] 

László Bodor commented on HIVE-24545:
-

this part was touched in HIVE-23117, but the problem still persist I think, 
seems like we downcast to int, even when the thrift response is about to return 
long:
{code}
  @Override
  public int getUpdateCount() throws SQLException {
checkConnection("getUpdateCount");
/**
 * Poll on the operation status, till the operation is complete. We want to 
ensure that since a
 * client might end up using executeAsync and then call this to check if 
the query run is
 * finished.
 */
long numModifiedRows = -1L;
TGetOperationStatusResp resp = waitForOperationToComplete();
if (resp != null) {
  numModifiedRows = resp.getNumModifiedRows();
}
if (numModifiedRows == -1L || numModifiedRows > Integer.MAX_VALUE) {
  LOG.warn("Invalid number of updated rows: {}", numModifiedRows);
  return -1;
}
return (int) numModifiedRows;
  }
{code}

seems like java.sql.Statement forces us to implement:
{code}
int getUpdateCount() throws SQLException;
{code}


> jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE
> 
>
> Key: HIVE-24545
> URL: https://issues.apache.org/jira/browse/HIVE-24545
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Minor
>
> I found this while IOW on TPCDS 10TB:
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 ..  llap SUCCEEDED   4210   421000  
>  0 362
> Reducer 2 ..  llap SUCCEEDED10110100  
>  0   2
> Reducer 3 ..  llap SUCCEEDED   1009   100900  
>  0   1
> --
> VERTICES: 03/03  [==>>] 100%  ELAPSED TIME: 12613.62 s
> --
> 20/12/16 01:37:36 [main]: WARN jdbc.HiveStatement: Number of rows is greater 
> than Integer.MAX_VALUE
> {code}
> my scenario was:
> {code}
> set hive.exec.max.dynamic.partitions=2000;
> drop table if exists test_sales_2;
> create table test_sales_2 like 
> tpcds_bin_partitioned_acid_orc_1.store_sales;
> insert overwrite table test_sales_2 select * from 
> tpcds_bin_partitioned_acid_orc_1.store_sales where ss_sold_date_sk > 
> 2451868;
> {code}
> regarding affected row numbers:
> {code}
> select count(*) from tpcds_bin_partitioned_acid_orc_1.store_sales where 
> ss_sold_date_sk > 2451868;
> +--+
> | _c0  |
> +--+
> | 12287871907  |
> +--+
> {code}
> I guess we should switch to long



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24545) jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE

2020-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24545:

Description: 
I found this while IOW on TPCDS 10TB:

{code}
--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
--
Map 1 ..  llap SUCCEEDED   4210   421000
   0 362
Reducer 2 ..  llap SUCCEEDED10110100
   0   2
Reducer 3 ..  llap SUCCEEDED   1009   100900
   0   1
--
VERTICES: 03/03  [==>>] 100%  ELAPSED TIME: 12613.62 s
--
20/12/16 01:37:36 [main]: WARN jdbc.HiveStatement: Number of rows is greater 
than Integer.MAX_VALUE
{code}

my scenario was:
{code}
set hive.exec.max.dynamic.partitions=2000;
drop table if exists test_sales_2;
create table test_sales_2 like tpcds_bin_partitioned_acid_orc_1.store_sales;
insert overwrite table test_sales_2 select * from 
tpcds_bin_partitioned_acid_orc_1.store_sales where ss_sold_date_sk > 
2451868;
{code}

regarding affected row numbers:
{code}
select count(*) from tpcds_bin_partitioned_acid_orc_1.store_sales where 
ss_sold_date_sk > 2451868;
+--+
| _c0  |
+--+
| 12287871907  |
+--+
{code}

I guess we should switch to long

  was:
I found this while IOW on TPCDS 10TB:

{code}
--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
--
Map 1 ..  llap SUCCEEDED   4210   421000
   0 362
Reducer 2 ..  llap SUCCEEDED10110100
   0   2
Reducer 3 ..  llap SUCCEEDED   1009   100900
   0   1
--
VERTICES: 03/03  [==>>] 100%  ELAPSED TIME: 12613.62 s
--
20/12/16 01:37:36 [main]: WARN jdbc.HiveStatement: Number of rows is greater 
than Integer.MAX_VALUE
{code}

my scenario was:
{code}
set hive.exec.max.dynamic.partitions=2000;
drop table if exists test_sales_2;
create table test_sales_2 like tpcds_bin_partitioned_acid_orc_1.store_sales;
insert overwrite table test_sales_2 select * from 
tpcds_bin_partitioned_acid_orc_1.store_sales where ss_sold_date_sk > 
2451868;
{code}

regaridng affected row numbers:
{code}
select count(*) from tpcds_bin_partitioned_acid_orc_1.store_sales where 
ss_sold_date_sk > 2451868;
+--+
| _c0  |
+--+
| 12287871907  |
+--+
{code}

I guess we should switch to long


> jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE
> 
>
> Key: HIVE-24545
> URL: https://issues.apache.org/jira/browse/HIVE-24545
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Minor
>
> I found this while IOW on TPCDS 10TB:
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 ..  llap SUCCEEDED   4210   421000  
>  0 362
> Reducer 2 ..  llap SUCCEEDED10110100  
>  0   2
> Reducer 3 ..  llap SUCCEEDED   1009   100900  
>  0   1
> --
> VERTICES: 03/03  [==>>] 100%  ELAPSED TIME: 12613.62 s
> --
> 20/12/16 01:37:36 [main]: WARN jdbc.HiveStatement: Number of rows is greater 
> than Integer.MAX_VALUE
> {code}
> my scenario was:
> {code}
> set hive.exec.max.dynamic.partitions=2000;
> drop table if exists test_sales_2;
> create table test_sales_2 like 
> 

[jira] [Updated] (HIVE-24545) jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE

2020-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24545:

Description: 
I found this while IOW on TPCDS 10TB:

{code}
--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
--
Map 1 ..  llap SUCCEEDED   4210   421000
   0 362
Reducer 2 ..  llap SUCCEEDED10110100
   0   2
Reducer 3 ..  llap SUCCEEDED   1009   100900
   0   1
--
VERTICES: 03/03  [==>>] 100%  ELAPSED TIME: 12613.62 s
--
20/12/16 01:37:36 [main]: WARN jdbc.HiveStatement: Number of rows is greater 
than Integer.MAX_VALUE
{code}

my scenario was:
{code}
set hive.exec.max.dynamic.partitions=2000;
drop table if exists test_sales_2;
create table test_sales_2 like tpcds_bin_partitioned_acid_orc_1.store_sales;
insert overwrite table test_sales_2 select * from 
tpcds_bin_partitioned_acid_orc_1.store_sales where ss_sold_date_sk > 
2451868;
{code}

regaridng affected row numbers:
{code}
select count(*) from tpcds_bin_partitioned_acid_orc_1.store_sales where 
ss_sold_date_sk > 2451868;
+--+
| _c0  |
+--+
| 12287871907  |
+--+
{code}

I guess we should switch to long

> jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE
> 
>
> Key: HIVE-24545
> URL: https://issues.apache.org/jira/browse/HIVE-24545
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Minor
>
> I found this while IOW on TPCDS 10TB:
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 ..  llap SUCCEEDED   4210   421000  
>  0 362
> Reducer 2 ..  llap SUCCEEDED10110100  
>  0   2
> Reducer 3 ..  llap SUCCEEDED   1009   100900  
>  0   1
> --
> VERTICES: 03/03  [==>>] 100%  ELAPSED TIME: 12613.62 s
> --
> 20/12/16 01:37:36 [main]: WARN jdbc.HiveStatement: Number of rows is greater 
> than Integer.MAX_VALUE
> {code}
> my scenario was:
> {code}
> set hive.exec.max.dynamic.partitions=2000;
> drop table if exists test_sales_2;
> create table test_sales_2 like 
> tpcds_bin_partitioned_acid_orc_1.store_sales;
> insert overwrite table test_sales_2 select * from 
> tpcds_bin_partitioned_acid_orc_1.store_sales where ss_sold_date_sk > 
> 2451868;
> {code}
> regaridng affected row numbers:
> {code}
> select count(*) from tpcds_bin_partitioned_acid_orc_1.store_sales where 
> ss_sold_date_sk > 2451868;
> +--+
> | _c0  |
> +--+
> | 12287871907  |
> +--+
> {code}
> I guess we should switch to long



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24545) jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE

2020-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-24545:
---

Assignee: László Bodor

> jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE
> 
>
> Key: HIVE-24545
> URL: https://issues.apache.org/jira/browse/HIVE-24545
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-24513) Advance write Id during AlterTableDropConstraint DDL

2020-12-16 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das updated HIVE-24513:
--
Comment: was deleted

(was: https://github.com/apache/hive/pull/1788)

> Advance write Id during AlterTableDropConstraint DDL
> 
>
> Key: HIVE-24513
> URL: https://issues.apache.org/jira/browse/HIVE-24513
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For AlterTableDropConstraint related DDL tasks, although we might be 
> advancing the write ID, looks like it's not updated correctly during the 
> Analyzer phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24513) Advance write Id during AlterTableDropConstraint DDL

2020-12-16 Thread Kishen Das (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250220#comment-17250220
 ] 

Kishen Das commented on HIVE-24513:
---

https://github.com/apache/hive/pull/1788

> Advance write Id during AlterTableDropConstraint DDL
> 
>
> Key: HIVE-24513
> URL: https://issues.apache.org/jira/browse/HIVE-24513
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For AlterTableDropConstraint related DDL tasks, although we might be 
> advancing the write ID, looks like it's not updated correctly during the 
> Analyzer phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24513) Advance write Id during AlterTableDropConstraint DDL

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24513:
--
Labels: pull-request-available  (was: )

> Advance write Id during AlterTableDropConstraint DDL
> 
>
> Key: HIVE-24513
> URL: https://issues.apache.org/jira/browse/HIVE-24513
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For AlterTableDropConstraint related DDL tasks, although we might be 
> advancing the write ID, looks like it's not updated correctly during the 
> Analyzer phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24513) Advance write Id during AlterTableDropConstraint DDL

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24513?focusedWorklogId=524926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524926
 ]

ASF GitHub Bot logged work on HIVE-24513:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 10:08
Start Date: 16/Dec/20 10:08
Worklog Time Spent: 10m 
  Work Description: kishendas opened a new pull request #1788:
URL: https://github.com/apache/hive/pull/1788


   ### What changes were proposed in this pull request?
   Advance write ID during drop constraint
   
   ### Why are the changes needed?
   To server consistent metadata from HS2 cache
   
   ### Does this PR introduce _any_ user-facing change?
   No
   ### How was this patch tested?
   Added new tests



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524926)
Remaining Estimate: 0h
Time Spent: 10m

> Advance write Id during AlterTableDropConstraint DDL
> 
>
> Key: HIVE-24513
> URL: https://issues.apache.org/jira/browse/HIVE-24513
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For AlterTableDropConstraint related DDL tasks, although we might be 
> advancing the write ID, looks like it's not updated correctly during the 
> Analyzer phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24474) Failed compaction always logs TxnAbortedException (again)

2020-12-16 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage resolved HIVE-24474.
--
Resolution: Fixed

Committed to master branch. Thanks for the review [~kuczoram] and for the 
suggestions [~pvary]!

> Failed compaction always logs TxnAbortedException (again)
> -
>
> Key: HIVE-24474
> URL: https://issues.apache.org/jira/browse/HIVE-24474
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Re-introduced with HIVE-24096.
> If there is an error during compaction, the compaction's txn is aborted but 
> in the finally clause, we try to commit it (commitTxnIfSet), so Worker throws 
> a TxnAbortedException.
> We should set compactorTxnId to TXN_ID_NOT_SET if the compaction's txn is 
> aborted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL

2020-12-16 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das resolved HIVE-24482.
---
Resolution: Fixed

> Advance write Id during AlterTableAddConstraint DDL
> ---
>
> Key: HIVE-24482
> URL: https://issues.apache.org/jira/browse/HIVE-24482
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> For AlterTableAddConstraint related DDL tasks, although we might be advancing 
> the write ID, looks like it's not updated correctly during the Analyzer 
> phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24513) Advance write Id during AlterTableDropConstraint DDL

2020-12-16 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24513 started by Kishen Das.
-
> Advance write Id during AlterTableDropConstraint DDL
> 
>
> Key: HIVE-24513
> URL: https://issues.apache.org/jira/browse/HIVE-24513
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> For AlterTableDropConstraint related DDL tasks, although we might be 
> advancing the write ID, looks like it's not updated correctly during the 
> Analyzer phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24474) Failed compaction always logs TxnAbortedException (again)

2020-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24474?focusedWorklogId=524894=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524894
 ]

ASF GitHub Bot logged work on HIVE-24474:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 08:59
Start Date: 16/Dec/20 08:59
Worklog Time Spent: 10m 
  Work Description: klcopp merged pull request #1735:
URL: https://github.com/apache/hive/pull/1735


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524894)
Time Spent: 2h 10m  (was: 2h)

> Failed compaction always logs TxnAbortedException (again)
> -
>
> Key: HIVE-24474
> URL: https://issues.apache.org/jira/browse/HIVE-24474
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Re-introduced with HIVE-24096.
> If there is an error during compaction, the compaction's txn is aborted but 
> in the finally clause, we try to commit it (commitTxnIfSet), so Worker throws 
> a TxnAbortedException.
> We should set compactorTxnId to TXN_ID_NOT_SET if the compaction's txn is 
> aborted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)