[jira] [Updated] (HIVE-15796) HoS: poor reducer parallelism when operator stats are not accurate

2017-02-07 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15796:

Attachment: (was: HIVE-15796.1.patch)

> HoS: poor reducer parallelism when operator stats are not accurate
> --
>
> Key: HIVE-15796
> URL: https://issues.apache.org/jira/browse/HIVE-15796
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15796.1.patch, HIVE-15796.wip.1.patch, 
> HIVE-15796.wip.2.patch, HIVE-15796.wip.patch
>
>
> In HoS we use currently use operator stats to determine reducer parallelism. 
> However, it is often the case that operator stats are not accurate, 
> especially if column stats are not available. This sometimes will generate 
> extremely poor reducer parallelism, and cause HoS query to run forever. 
> This JIRA tries to offer an alternative way to compute reducer parallelism, 
> similar to how MR does. Here's the approach we are suggesting:
> 1. when computing the parallelism for a MapWork, use stats associated with 
> the TableScan operator;
> 2. when computing the parallelism for a ReduceWork, use the *maximum* 
> parallelism from all its parents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15796) HoS: poor reducer parallelism when operator stats are not accurate

2017-02-07 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15796:

Attachment: HIVE-15796.1.patch

> HoS: poor reducer parallelism when operator stats are not accurate
> --
>
> Key: HIVE-15796
> URL: https://issues.apache.org/jira/browse/HIVE-15796
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15796.1.patch, HIVE-15796.wip.1.patch, 
> HIVE-15796.wip.2.patch, HIVE-15796.wip.patch
>
>
> In HoS we use currently use operator stats to determine reducer parallelism. 
> However, it is often the case that operator stats are not accurate, 
> especially if column stats are not available. This sometimes will generate 
> extremely poor reducer parallelism, and cause HoS query to run forever. 
> This JIRA tries to offer an alternative way to compute reducer parallelism, 
> similar to how MR does. Here's the approach we are suggesting:
> 1. when computing the parallelism for a MapWork, use stats associated with 
> the TableScan operator;
> 2. when computing the parallelism for a ReduceWork, use the *maximum* 
> parallelism from all its parents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14754) Track the queries execution lifecycle times

2017-02-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857583#comment-15857583
 ] 

Lefty Leverenz commented on HIVE-14754:
---

Doc note:  The new metrics need to be documented in the wiki.

* [Hive Metrics | https://cwiki.apache.org/confluence/display/Hive/Hive+Metrics]

Added a TODOC2.2 label.

> Track the queries execution lifecycle times
> ---
>
> Key: HIVE-14754
> URL: https://issues.apache.org/jira/browse/HIVE-14754
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, HiveServer2
>Affects Versions: 2.2.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14754.1.patch, HIVE-14754.2.patch, HIVE-14754.patch
>
>
> We should be able to track the nr. of queries being compiled/executed at any 
> given time, as well as the duration of the execution and compilation phase.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14754) Track the queries execution lifecycle times

2017-02-07 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-14754:
--
Labels: TODOC2.2  (was: )

> Track the queries execution lifecycle times
> ---
>
> Key: HIVE-14754
> URL: https://issues.apache.org/jira/browse/HIVE-14754
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, HiveServer2
>Affects Versions: 2.2.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14754.1.patch, HIVE-14754.2.patch, HIVE-14754.patch
>
>
> We should be able to track the nr. of queries being compiled/executed at any 
> given time, as well as the duration of the execution and compilation phase.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15803) msck can hang when nested partitions are present

2017-02-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857579#comment-15857579
 ] 

Hive QA commented on HIVE-15803:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12851547/HIVE-15803.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10241 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable
 (batchId=210)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3432/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3432/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3432/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12851547 - PreCommit-HIVE-Build

> msck can hang when nested partitions are present
> 
>
> Key: HIVE-15803
> URL: https://issues.apache.org/jira/browse/HIVE-15803
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15803.1.patch, HIVE-15803.2.patch, HIVE-15803.patch
>
>
> Steps to reproduce. 
> {noformat}
> CREATE TABLE `repairtable`( `col` string) PARTITIONED BY (  `p1` string,  
> `p2` string)
> hive> dfs -mkdir -p /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b;
> hive> dfs -touchz 
> /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b/datafile;
> hive> set hive.mv.files.thread;
> hive.mv.files.thread=15
> hive> set hive.mv.files.thread=1;
> hive> MSCK TABLE repairtable;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15832) Hplsql UDF doesn't work in Hplsql

2017-02-07 Thread Fei Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HIVE-15832:
---
Target Version/s: 2.2.0  (was: 1.2.1)
  Status: Patch Available  (was: Open)

> Hplsql UDF doesn't work in Hplsql
> -
>
> Key: HIVE-15832
> URL: https://issues.apache.org/jira/browse/HIVE-15832
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Affects Versions: 1.2.1
> Environment: HDP : 2.4.2.0-258
> Hive lib : hive-XXX-1.2.1000.2.4.2.0-258.jar
> Hplsql : hplsql-0.3.17.jar
>Reporter: Sungwoon Ma
>Assignee: Fei Hui
>  Labels: test
> Fix For: 1.2.1
>
> Attachments: HIVE-15832.patch
>
>
> ※ http://www.hplsql.org/udf
> 1) UDF Test 
> [root@node2 /apps/hplsql]#./hplsql -e "SELECT hello(name) FROM USERS;" -trace
> ...
> Ln:1 SELECT completed successfully
> Ln:1 Standalone SELECT executed: 1 columns in the result set
> Unhandled exception in HPL/SQL
> ...
> 2) Add 'Exception' (org.apache.hive.hplsql.Select)
> >> 123 line 
> - before :
>   else if ((ctx.parent instanceof HplsqlParser.StmtContext)) {
> int cols = rm.getColumnCount();
> if (this.trace) {
>   trace(ctx, "Standalone SELECT executed: " + cols + " columns in the 
> result set");
> }
> while (rs.next()) {
> - after :
> try { 
>   while (rs.next()) {
> ...
> }
> catch (Exception e) {
>   e.printStackTrace();
> }
> - Error Log
> [root@node2 /apps/hplsql]#./hplsql -e "SELECT hello(1) FROM USERS;" -trace
> Configuration file: file:/apps/hplsql-0.3.17/hplsql-site.xml
> Parser tree: (program (block (stmt (select_stmt (fullselect_stmt 
> (fullselect_stmt_item (subselect_stmt SELECT (select_list (select_list_item 
> (expr (expr_func (ident hello) ( (expr_func_params (func_param (expr 
> (expr_atom (int_number 1) ))) (select_list_alias AS (ident A 
> (from_clause FROM (from_table_clause (from_table_name_clause (table_name 
> (ident USERS)) (stmt (semicolon_stmt ;
> INLCUDE CONTENT hplsqlrc (non-empty)
> Ln:1 CREATE FUNCTION hello
> Ln:1 SELECT
> >>registerUdf begin :false
> >>registerUdf end :true
> Ln:1 SELECT hplsql('hello(:1)', 1) AS A FROM USERS
> 17/02/06 20:28:13 INFO jdbc.Utils: Supplied authorities: node3:1
> 17/02/06 20:28:13 INFO jdbc.Utils: Resolved authority: node3:1
> Open connection: jdbc:hive2://node3:1 (225 ms)
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting query
> Query executed successfully (84 ms)
> Ln:1 SELECT completed successfully
> Ln:1 Standalone SELECT executed: 1 columns in the result set
> cols:1
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.util.zip.ZipException: 
> invalid code lengths set
>  at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:258)
>  at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:244)
>  at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:364)
>  at org.apache.hive.hplsql.Select.select(Select.java:116)
>  at org.apache.hive.hplsql.Exec.visitSelect_stmt(Exec.java:870)
>  at org.apache.hive.hplsql.Exec.visitSelect_stmt(Exec.java:1)
>  at 
> org.apache.hive.hplsql.HplsqlParser$Select_stmtContext.accept(HplsqlParser.java:14249)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70)
>  at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:865)
>  at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:1)
>  at 
> org.apache.hive.hplsql.HplsqlParser$StmtContext.accept(HplsqlParser.java:998)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70)
>  at 
> org.apache.hive.hplsql.HplsqlBaseVisitor.visitBlock(HplsqlBaseVisitor.java:28)
>  at 
> org.apache.hive.hplsql.HplsqlParser$BlockContext.accept(HplsqlParser.java:438)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70)
>  at org.apache.hive.hplsql.Exec.visitProgram(Exec.java:780)
>  at org.apache.hive.hplsql.Exec.visitProgram(Exec.java:1)
>  at 
> org.apache.hive.hplsql.HplsqlParser$ProgramContext.accept(HplsqlParser.java:381)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
>  at org.apache.hive.hplsql.Exec.run(Exec.java:652)
>  at org.apache.hive.hplsql.Exec.run(Exec.java:630)
>  at org.apache.hive.hplsql.Hplsql.main(Hplsql.java:23)
> Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.util.zip.ZipException: 
> invalid code lengths set
>  at 
> 

[jira] [Commented] (HIVE-15832) Hplsql UDF doesn't work in Hplsql

2017-02-07 Thread Fei Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857548#comment-15857548
 ] 

Fei Hui commented on HIVE-15832:


CC [~dmtolpeko]

> Hplsql UDF doesn't work in Hplsql
> -
>
> Key: HIVE-15832
> URL: https://issues.apache.org/jira/browse/HIVE-15832
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Affects Versions: 1.2.1
> Environment: HDP : 2.4.2.0-258
> Hive lib : hive-XXX-1.2.1000.2.4.2.0-258.jar
> Hplsql : hplsql-0.3.17.jar
>Reporter: Sungwoon Ma
>Assignee: Fei Hui
>  Labels: test
> Fix For: 1.2.1
>
> Attachments: HIVE-15832.patch
>
>
> ※ http://www.hplsql.org/udf
> 1) UDF Test 
> [root@node2 /apps/hplsql]#./hplsql -e "SELECT hello(name) FROM USERS;" -trace
> ...
> Ln:1 SELECT completed successfully
> Ln:1 Standalone SELECT executed: 1 columns in the result set
> Unhandled exception in HPL/SQL
> ...
> 2) Add 'Exception' (org.apache.hive.hplsql.Select)
> >> 123 line 
> - before :
>   else if ((ctx.parent instanceof HplsqlParser.StmtContext)) {
> int cols = rm.getColumnCount();
> if (this.trace) {
>   trace(ctx, "Standalone SELECT executed: " + cols + " columns in the 
> result set");
> }
> while (rs.next()) {
> - after :
> try { 
>   while (rs.next()) {
> ...
> }
> catch (Exception e) {
>   e.printStackTrace();
> }
> - Error Log
> [root@node2 /apps/hplsql]#./hplsql -e "SELECT hello(1) FROM USERS;" -trace
> Configuration file: file:/apps/hplsql-0.3.17/hplsql-site.xml
> Parser tree: (program (block (stmt (select_stmt (fullselect_stmt 
> (fullselect_stmt_item (subselect_stmt SELECT (select_list (select_list_item 
> (expr (expr_func (ident hello) ( (expr_func_params (func_param (expr 
> (expr_atom (int_number 1) ))) (select_list_alias AS (ident A 
> (from_clause FROM (from_table_clause (from_table_name_clause (table_name 
> (ident USERS)) (stmt (semicolon_stmt ;
> INLCUDE CONTENT hplsqlrc (non-empty)
> Ln:1 CREATE FUNCTION hello
> Ln:1 SELECT
> >>registerUdf begin :false
> >>registerUdf end :true
> Ln:1 SELECT hplsql('hello(:1)', 1) AS A FROM USERS
> 17/02/06 20:28:13 INFO jdbc.Utils: Supplied authorities: node3:1
> 17/02/06 20:28:13 INFO jdbc.Utils: Resolved authority: node3:1
> Open connection: jdbc:hive2://node3:1 (225 ms)
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting query
> Query executed successfully (84 ms)
> Ln:1 SELECT completed successfully
> Ln:1 Standalone SELECT executed: 1 columns in the result set
> cols:1
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.util.zip.ZipException: 
> invalid code lengths set
>  at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:258)
>  at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:244)
>  at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:364)
>  at org.apache.hive.hplsql.Select.select(Select.java:116)
>  at org.apache.hive.hplsql.Exec.visitSelect_stmt(Exec.java:870)
>  at org.apache.hive.hplsql.Exec.visitSelect_stmt(Exec.java:1)
>  at 
> org.apache.hive.hplsql.HplsqlParser$Select_stmtContext.accept(HplsqlParser.java:14249)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70)
>  at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:865)
>  at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:1)
>  at 
> org.apache.hive.hplsql.HplsqlParser$StmtContext.accept(HplsqlParser.java:998)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70)
>  at 
> org.apache.hive.hplsql.HplsqlBaseVisitor.visitBlock(HplsqlBaseVisitor.java:28)
>  at 
> org.apache.hive.hplsql.HplsqlParser$BlockContext.accept(HplsqlParser.java:438)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70)
>  at org.apache.hive.hplsql.Exec.visitProgram(Exec.java:780)
>  at org.apache.hive.hplsql.Exec.visitProgram(Exec.java:1)
>  at 
> org.apache.hive.hplsql.HplsqlParser$ProgramContext.accept(HplsqlParser.java:381)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
>  at org.apache.hive.hplsql.Exec.run(Exec.java:652)
>  at org.apache.hive.hplsql.Exec.run(Exec.java:630)
>  at org.apache.hive.hplsql.Hplsql.main(Hplsql.java:23)
> Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.util.zip.ZipException: 
> invalid code lengths set
>  at 
> 

[jira] [Updated] (HIVE-15832) Hplsql UDF doesn't work in Hplsql

2017-02-07 Thread Fei Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HIVE-15832:
---
Attachment: HIVE-15832.patch

patch uploaded

> Hplsql UDF doesn't work in Hplsql
> -
>
> Key: HIVE-15832
> URL: https://issues.apache.org/jira/browse/HIVE-15832
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Affects Versions: 1.2.1
> Environment: HDP : 2.4.2.0-258
> Hive lib : hive-XXX-1.2.1000.2.4.2.0-258.jar
> Hplsql : hplsql-0.3.17.jar
>Reporter: Sungwoon Ma
>Assignee: Fei Hui
>  Labels: test
> Fix For: 1.2.1
>
> Attachments: HIVE-15832.patch
>
>
> ※ http://www.hplsql.org/udf
> 1) UDF Test 
> [root@node2 /apps/hplsql]#./hplsql -e "SELECT hello(name) FROM USERS;" -trace
> ...
> Ln:1 SELECT completed successfully
> Ln:1 Standalone SELECT executed: 1 columns in the result set
> Unhandled exception in HPL/SQL
> ...
> 2) Add 'Exception' (org.apache.hive.hplsql.Select)
> >> 123 line 
> - before :
>   else if ((ctx.parent instanceof HplsqlParser.StmtContext)) {
> int cols = rm.getColumnCount();
> if (this.trace) {
>   trace(ctx, "Standalone SELECT executed: " + cols + " columns in the 
> result set");
> }
> while (rs.next()) {
> - after :
> try { 
>   while (rs.next()) {
> ...
> }
> catch (Exception e) {
>   e.printStackTrace();
> }
> - Error Log
> [root@node2 /apps/hplsql]#./hplsql -e "SELECT hello(1) FROM USERS;" -trace
> Configuration file: file:/apps/hplsql-0.3.17/hplsql-site.xml
> Parser tree: (program (block (stmt (select_stmt (fullselect_stmt 
> (fullselect_stmt_item (subselect_stmt SELECT (select_list (select_list_item 
> (expr (expr_func (ident hello) ( (expr_func_params (func_param (expr 
> (expr_atom (int_number 1) ))) (select_list_alias AS (ident A 
> (from_clause FROM (from_table_clause (from_table_name_clause (table_name 
> (ident USERS)) (stmt (semicolon_stmt ;
> INLCUDE CONTENT hplsqlrc (non-empty)
> Ln:1 CREATE FUNCTION hello
> Ln:1 SELECT
> >>registerUdf begin :false
> >>registerUdf end :true
> Ln:1 SELECT hplsql('hello(:1)', 1) AS A FROM USERS
> 17/02/06 20:28:13 INFO jdbc.Utils: Supplied authorities: node3:1
> 17/02/06 20:28:13 INFO jdbc.Utils: Resolved authority: node3:1
> Open connection: jdbc:hive2://node3:1 (225 ms)
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting query
> Query executed successfully (84 ms)
> Ln:1 SELECT completed successfully
> Ln:1 Standalone SELECT executed: 1 columns in the result set
> cols:1
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.util.zip.ZipException: 
> invalid code lengths set
>  at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:258)
>  at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:244)
>  at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:364)
>  at org.apache.hive.hplsql.Select.select(Select.java:116)
>  at org.apache.hive.hplsql.Exec.visitSelect_stmt(Exec.java:870)
>  at org.apache.hive.hplsql.Exec.visitSelect_stmt(Exec.java:1)
>  at 
> org.apache.hive.hplsql.HplsqlParser$Select_stmtContext.accept(HplsqlParser.java:14249)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70)
>  at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:865)
>  at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:1)
>  at 
> org.apache.hive.hplsql.HplsqlParser$StmtContext.accept(HplsqlParser.java:998)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70)
>  at 
> org.apache.hive.hplsql.HplsqlBaseVisitor.visitBlock(HplsqlBaseVisitor.java:28)
>  at 
> org.apache.hive.hplsql.HplsqlParser$BlockContext.accept(HplsqlParser.java:438)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70)
>  at org.apache.hive.hplsql.Exec.visitProgram(Exec.java:780)
>  at org.apache.hive.hplsql.Exec.visitProgram(Exec.java:1)
>  at 
> org.apache.hive.hplsql.HplsqlParser$ProgramContext.accept(HplsqlParser.java:381)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
>  at org.apache.hive.hplsql.Exec.run(Exec.java:652)
>  at org.apache.hive.hplsql.Exec.run(Exec.java:630)
>  at org.apache.hive.hplsql.Hplsql.main(Hplsql.java:23)
> Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.util.zip.ZipException: 
> invalid code lengths set
>  at 
> 

[jira] [Commented] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-07 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857532#comment-15857532
 ] 

Pengcheng Xiong commented on HIVE-15388:


[~kgyrtkirk], thanks for your attention. I have tried hard to make expression 
work but can not succeed. It does not matter if dt is deterministic or not, 
whenever we have an expression, things become complicated. I think your idea is 
worth trying. I will submit a new patch soon. Thanks.

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> HIVE-15388.06.patch, hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington 

[jira] [Commented] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-07 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857510#comment-15857510
 ] 

Zoltan Haindrich commented on HIVE-15388:
-

[~pxiong] I think that dropping the interval related udf makes it harder to 
later re-enable the feature; is there any reason to go beyond just the parser 
changes (disabling the (dt*dt) feature) - because: i assume that the 
deterministic udf usage is not affect by this problem.

i've an idea which might worth a try: by making the interval keyword mandatory 
for '(dt*dt)' like queries - it may simplify this problem for the parser; and 
could possibly leave the columns as interval arguments alive

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> HIVE-15388.06.patch, hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")

[jira] [Commented] (HIVE-15832) Hplsql UDF doesn't work in Hplsql

2017-02-07 Thread Fei Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857499#comment-15857499
 ] 

Fei Hui commented on HIVE-15832:


here is error log i get, it is different hive 1.2.1
Caused by: java.lang.NullPointerException
at org.apache.hive.hplsql.Exec.setVariable(Exec.java:148) 
~[hive-hplsql-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hive.hplsql.Exec.setVariable(Exec.java:158) 
~[hive-hplsql-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hive.hplsql.Udf.setParameters(Udf.java:92) 
~[hive-hplsql-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hive.hplsql.Udf.evaluate(Udf.java:74) 
~[hive-hplsql-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:187)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:68)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) 
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897) 
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:438) 
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:430) 
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) 
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2209) 
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:492)
 ~[hive-service-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]

> Hplsql UDF doesn't work in Hplsql
> -
>
> Key: HIVE-15832
> URL: https://issues.apache.org/jira/browse/HIVE-15832
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Affects Versions: 1.2.1
> Environment: HDP : 2.4.2.0-258
> Hive lib : hive-XXX-1.2.1000.2.4.2.0-258.jar
> Hplsql : hplsql-0.3.17.jar
>Reporter: Sungwoon Ma
>Assignee: Fei Hui
>  Labels: test
> Fix For: 1.2.1
>
>
> ※ http://www.hplsql.org/udf
> 1) UDF Test 
> [root@node2 /apps/hplsql]#./hplsql -e "SELECT hello(name) FROM USERS;" -trace
> ...
> Ln:1 SELECT completed successfully
> Ln:1 Standalone SELECT executed: 1 columns in the result set
> Unhandled exception in HPL/SQL
> ...
> 2) Add 'Exception' (org.apache.hive.hplsql.Select)
> >> 123 line 
> - before :
>   else if ((ctx.parent instanceof HplsqlParser.StmtContext)) {
> int cols = rm.getColumnCount();
> if (this.trace) {
>   trace(ctx, "Standalone SELECT executed: " + cols + " columns in the 
> result set");
> }
> while (rs.next()) {
> - after :
> try { 
>   while (rs.next()) {
> ...
> }
> catch (Exception e) {
>   e.printStackTrace();
> }
> - Error Log
> [root@node2 /apps/hplsql]#./hplsql -e "SELECT hello(1) FROM USERS;" -trace
> Configuration file: file:/apps/hplsql-0.3.17/hplsql-site.xml
> Parser tree: (program (block (stmt (select_stmt (fullselect_stmt 
> (fullselect_stmt_item (subselect_stmt SELECT (select_list (select_list_item 
> (expr (expr_func (ident hello) ( (expr_func_params (func_param (expr 
> (expr_atom (int_number 1) ))) (select_list_alias AS (ident A 
> (from_clause FROM (from_table_clause (from_table_name_clause (table_name 
> (ident USERS)) (stmt (semicolon_stmt ;
> INLCUDE CONTENT hplsqlrc (non-empty)
> Ln:1 CREATE FUNCTION hello
> Ln:1 SELECT
> >>registerUdf begin :false
> >>registerUdf end :true
> Ln:1 SELECT hplsql('hello(:1)', 1) AS A FROM USERS
> 17/02/06 20:28:13 INFO jdbc.Utils: Supplied authorities: node3:1
> 17/02/06 20:28:13 INFO jdbc.Utils: Resolved authority: node3:1
> Open connection: jdbc:hive2://node3:1 (225 ms)
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting query
> Query executed successfully (84 ms)
> Ln:1 SELECT completed successfully
> Ln:1 Standalone SELECT executed: 1 columns in the result set
> cols:1
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> 

[jira] [Assigned] (HIVE-15832) Hplsql UDF doesn't work in Hplsql

2017-02-07 Thread Fei Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui reassigned HIVE-15832:
--

Assignee: Fei Hui

> Hplsql UDF doesn't work in Hplsql
> -
>
> Key: HIVE-15832
> URL: https://issues.apache.org/jira/browse/HIVE-15832
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Affects Versions: 1.2.1
> Environment: HDP : 2.4.2.0-258
> Hive lib : hive-XXX-1.2.1000.2.4.2.0-258.jar
> Hplsql : hplsql-0.3.17.jar
>Reporter: Sungwoon Ma
>Assignee: Fei Hui
>  Labels: test
> Fix For: 1.2.1
>
>
> ※ http://www.hplsql.org/udf
> 1) UDF Test 
> [root@node2 /apps/hplsql]#./hplsql -e "SELECT hello(name) FROM USERS;" -trace
> ...
> Ln:1 SELECT completed successfully
> Ln:1 Standalone SELECT executed: 1 columns in the result set
> Unhandled exception in HPL/SQL
> ...
> 2) Add 'Exception' (org.apache.hive.hplsql.Select)
> >> 123 line 
> - before :
>   else if ((ctx.parent instanceof HplsqlParser.StmtContext)) {
> int cols = rm.getColumnCount();
> if (this.trace) {
>   trace(ctx, "Standalone SELECT executed: " + cols + " columns in the 
> result set");
> }
> while (rs.next()) {
> - after :
> try { 
>   while (rs.next()) {
> ...
> }
> catch (Exception e) {
>   e.printStackTrace();
> }
> - Error Log
> [root@node2 /apps/hplsql]#./hplsql -e "SELECT hello(1) FROM USERS;" -trace
> Configuration file: file:/apps/hplsql-0.3.17/hplsql-site.xml
> Parser tree: (program (block (stmt (select_stmt (fullselect_stmt 
> (fullselect_stmt_item (subselect_stmt SELECT (select_list (select_list_item 
> (expr (expr_func (ident hello) ( (expr_func_params (func_param (expr 
> (expr_atom (int_number 1) ))) (select_list_alias AS (ident A 
> (from_clause FROM (from_table_clause (from_table_name_clause (table_name 
> (ident USERS)) (stmt (semicolon_stmt ;
> INLCUDE CONTENT hplsqlrc (non-empty)
> Ln:1 CREATE FUNCTION hello
> Ln:1 SELECT
> >>registerUdf begin :false
> >>registerUdf end :true
> Ln:1 SELECT hplsql('hello(:1)', 1) AS A FROM USERS
> 17/02/06 20:28:13 INFO jdbc.Utils: Supplied authorities: node3:1
> 17/02/06 20:28:13 INFO jdbc.Utils: Resolved authority: node3:1
> Open connection: jdbc:hive2://node3:1 (225 ms)
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting query
> Query executed successfully (84 ms)
> Ln:1 SELECT completed successfully
> Ln:1 Standalone SELECT executed: 1 columns in the result set
> cols:1
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.util.zip.ZipException: 
> invalid code lengths set
>  at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:258)
>  at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:244)
>  at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:364)
>  at org.apache.hive.hplsql.Select.select(Select.java:116)
>  at org.apache.hive.hplsql.Exec.visitSelect_stmt(Exec.java:870)
>  at org.apache.hive.hplsql.Exec.visitSelect_stmt(Exec.java:1)
>  at 
> org.apache.hive.hplsql.HplsqlParser$Select_stmtContext.accept(HplsqlParser.java:14249)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70)
>  at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:865)
>  at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:1)
>  at 
> org.apache.hive.hplsql.HplsqlParser$StmtContext.accept(HplsqlParser.java:998)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70)
>  at 
> org.apache.hive.hplsql.HplsqlBaseVisitor.visitBlock(HplsqlBaseVisitor.java:28)
>  at 
> org.apache.hive.hplsql.HplsqlParser$BlockContext.accept(HplsqlParser.java:438)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70)
>  at org.apache.hive.hplsql.Exec.visitProgram(Exec.java:780)
>  at org.apache.hive.hplsql.Exec.visitProgram(Exec.java:1)
>  at 
> org.apache.hive.hplsql.HplsqlParser$ProgramContext.accept(HplsqlParser.java:381)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
>  at org.apache.hive.hplsql.Exec.run(Exec.java:652)
>  at org.apache.hive.hplsql.Exec.run(Exec.java:630)
>  at org.apache.hive.hplsql.Hplsql.main(Hplsql.java:23)
> Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.util.zip.ZipException: 
> invalid code lengths set
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:352)
>  at 
> 

[jira] [Commented] (HIVE-15832) Hplsql UDF doesn't work in Hplsql

2017-02-07 Thread Fei Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857497#comment-15857497
 ] 

Fei Hui commented on HIVE-15832:


i have reproduced it on hive 2.2.0
i will work on it

> Hplsql UDF doesn't work in Hplsql
> -
>
> Key: HIVE-15832
> URL: https://issues.apache.org/jira/browse/HIVE-15832
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Affects Versions: 1.2.1
> Environment: HDP : 2.4.2.0-258
> Hive lib : hive-XXX-1.2.1000.2.4.2.0-258.jar
> Hplsql : hplsql-0.3.17.jar
>Reporter: Sungwoon Ma
>  Labels: test
> Fix For: 1.2.1
>
>
> ※ http://www.hplsql.org/udf
> 1) UDF Test 
> [root@node2 /apps/hplsql]#./hplsql -e "SELECT hello(name) FROM USERS;" -trace
> ...
> Ln:1 SELECT completed successfully
> Ln:1 Standalone SELECT executed: 1 columns in the result set
> Unhandled exception in HPL/SQL
> ...
> 2) Add 'Exception' (org.apache.hive.hplsql.Select)
> >> 123 line 
> - before :
>   else if ((ctx.parent instanceof HplsqlParser.StmtContext)) {
> int cols = rm.getColumnCount();
> if (this.trace) {
>   trace(ctx, "Standalone SELECT executed: " + cols + " columns in the 
> result set");
> }
> while (rs.next()) {
> - after :
> try { 
>   while (rs.next()) {
> ...
> }
> catch (Exception e) {
>   e.printStackTrace();
> }
> - Error Log
> [root@node2 /apps/hplsql]#./hplsql -e "SELECT hello(1) FROM USERS;" -trace
> Configuration file: file:/apps/hplsql-0.3.17/hplsql-site.xml
> Parser tree: (program (block (stmt (select_stmt (fullselect_stmt 
> (fullselect_stmt_item (subselect_stmt SELECT (select_list (select_list_item 
> (expr (expr_func (ident hello) ( (expr_func_params (func_param (expr 
> (expr_atom (int_number 1) ))) (select_list_alias AS (ident A 
> (from_clause FROM (from_table_clause (from_table_name_clause (table_name 
> (ident USERS)) (stmt (semicolon_stmt ;
> INLCUDE CONTENT hplsqlrc (non-empty)
> Ln:1 CREATE FUNCTION hello
> Ln:1 SELECT
> >>registerUdf begin :false
> >>registerUdf end :true
> Ln:1 SELECT hplsql('hello(:1)', 1) AS A FROM USERS
> 17/02/06 20:28:13 INFO jdbc.Utils: Supplied authorities: node3:1
> 17/02/06 20:28:13 INFO jdbc.Utils: Resolved authority: node3:1
> Open connection: jdbc:hive2://node3:1 (225 ms)
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting query
> Query executed successfully (84 ms)
> Ln:1 SELECT completed successfully
> Ln:1 Standalone SELECT executed: 1 columns in the result set
> cols:1
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.util.zip.ZipException: 
> invalid code lengths set
>  at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:258)
>  at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:244)
>  at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:364)
>  at org.apache.hive.hplsql.Select.select(Select.java:116)
>  at org.apache.hive.hplsql.Exec.visitSelect_stmt(Exec.java:870)
>  at org.apache.hive.hplsql.Exec.visitSelect_stmt(Exec.java:1)
>  at 
> org.apache.hive.hplsql.HplsqlParser$Select_stmtContext.accept(HplsqlParser.java:14249)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70)
>  at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:865)
>  at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:1)
>  at 
> org.apache.hive.hplsql.HplsqlParser$StmtContext.accept(HplsqlParser.java:998)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70)
>  at 
> org.apache.hive.hplsql.HplsqlBaseVisitor.visitBlock(HplsqlBaseVisitor.java:28)
>  at 
> org.apache.hive.hplsql.HplsqlParser$BlockContext.accept(HplsqlParser.java:438)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:70)
>  at org.apache.hive.hplsql.Exec.visitProgram(Exec.java:780)
>  at org.apache.hive.hplsql.Exec.visitProgram(Exec.java:1)
>  at 
> org.apache.hive.hplsql.HplsqlParser$ProgramContext.accept(HplsqlParser.java:381)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
>  at org.apache.hive.hplsql.Exec.run(Exec.java:652)
>  at org.apache.hive.hplsql.Exec.run(Exec.java:630)
>  at org.apache.hive.hplsql.Hplsql.main(Hplsql.java:23)
> Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.util.zip.ZipException: 
> invalid code lengths set
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:352)
>  

[jira] [Commented] (HIVE-15222) replace org.json usage in ExplainTask/TezTask related classes with some alternative

2017-02-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857495#comment-15857495
 ] 

Hive QA commented on HIVE-15222:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12851538/HIVE-15222.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10241 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3431/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3431/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3431/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12851538 - PreCommit-HIVE-Build

> replace org.json usage in ExplainTask/TezTask related classes with some 
> alternative
> ---
>
> Key: HIVE-15222
> URL: https://issues.apache.org/jira/browse/HIVE-15222
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Teddy Choi
> Fix For: 2.2.0
>
> Attachments: HIVE-15222.1.patch, HIVE-15222.2.patch, 
> HIVE-15222.3.patch
>
>
> Replace org.json usage in these classes.
> It seems to me that json is probably only used to write some information - 
> but the application never reads it back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15789) Vectorization: limit reduce vectorization to 32Mb chunks

2017-02-07 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857479#comment-15857479
 ] 

Gopal V commented on HIVE-15789:


LGTM - +1.

> Vectorization: limit reduce vectorization to 32Mb chunks
> 
>
> Key: HIVE-15789
> URL: https://issues.apache.org/jira/browse/HIVE-15789
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-15789.1.patch, HIVE-15789.2.patch
>
>
> Reduce vectorization accumulates 1024 rows before forwarding it into the 
> reduce processor.
> Add a safety limit for 32Mb of writables, so that shorter sequences can be 
> forwarded into the operator trees.
> {code}
> rowIdx++;
> if (rowIdx >= BATCH_SIZE) {
>   VectorizedBatchUtil.setBatchSize(batch, rowIdx);
>   reducer.process(batch, tag);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15792) Hive should raise SemanticException when LPAD/RPAD pad character's length is 0

2017-02-07 Thread Nandakumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandakumar updated HIVE-15792:
--
Status: Patch Available  (was: Open)

> Hive should raise SemanticException when LPAD/RPAD pad character's length is 0
> --
>
> Key: HIVE-15792
> URL: https://issues.apache.org/jira/browse/HIVE-15792
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Chovan
>Assignee: Nandakumar
>Priority: Minor
> Attachments: HIVE-15792.000.patch
>
>
> For example SELECT LPAD('A', 2, ''); will cause an infinite loop and the 
> running query will hang without any error.
> It would be great if this could be prevented by checking the pad character's 
> length and if it's 0 then throw a SemanticException.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-6009) Add from_unixtime UDF that has controllable Timezone

2017-02-07 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857460#comment-15857460
 ] 

Alexander Pivovarov commented on HIVE-6009:
---

you can convert bigint to UTC timestamp and then convert UTC timestamp to GMT-5 
timestamp (EST)

{code}
select from_unixtime(129384);
2010-12-31 16:00:00   // in Greenwich

select from_utc_timestamp(from_unixtime(129384), 'GMT-5');
2010-12-31 11:00:00   // in NYC

> Add from_unixtime UDF that has controllable Timezone
> 
>
> Key: HIVE-6009
> URL: https://issues.apache.org/jira/browse/HIVE-6009
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.10.0
> Environment: CDH4.4
>Reporter: Johndee Burks
>Priority: Trivial
>
> Currently the from_unixtime UDF takes into a account timezone of the system 
> doing the transformation. I think that implementation is good, but it would 
> be nice to include or change the current UDF to have a configurable timezone. 
> It would be useful for looking at timestamp data from different regions in 
> the native region's timezone. 
> Example: 
> from_unixtime(unix_time, format, timezone)
> from_unixtime(129384, dd MMM , GMT-5)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-6009) Add from_unixtime UDF that has controllable Timezone

2017-02-07 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857460#comment-15857460
 ] 

Alexander Pivovarov edited comment on HIVE-6009 at 2/8/17 5:57 AM:
---

you can convert bigint to UTC timestamp and then convert UTC timestamp to GMT-5 
timestamp (EST)

{code}
select from_unixtime(129384);
2010-12-31 16:00:00   // in Greenwich

select from_utc_timestamp(from_unixtime(129384), 'GMT-5');
2010-12-31 11:00:00   // in NYC
{code}


was (Author: apivovarov):
you can convert bigint to UTC timestamp and then convert UTC timestamp to GMT-5 
timestamp (EST)

{code}
select from_unixtime(129384);
2010-12-31 16:00:00   // in Greenwich

select from_utc_timestamp(from_unixtime(129384), 'GMT-5');
2010-12-31 11:00:00   // in NYC

> Add from_unixtime UDF that has controllable Timezone
> 
>
> Key: HIVE-6009
> URL: https://issues.apache.org/jira/browse/HIVE-6009
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.10.0
> Environment: CDH4.4
>Reporter: Johndee Burks
>Priority: Trivial
>
> Currently the from_unixtime UDF takes into a account timezone of the system 
> doing the transformation. I think that implementation is good, but it would 
> be nice to include or change the current UDF to have a configurable timezone. 
> It would be useful for looking at timestamp data from different regions in 
> the native region's timezone. 
> Example: 
> from_unixtime(unix_time, format, timezone)
> from_unixtime(129384, dd MMM , GMT-5)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15682) Eliminate per-row based dummy iterator creation

2017-02-07 Thread Dapeng Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857457#comment-15857457
 ] 

Dapeng Sun commented on HIVE-15682:
---

Hi [~xuefuz], I will use TPCx-BB to run 1TB test about HIVE-15580,  HIVE-15682 
and no patched package, I would attach the result when I get it.

> Eliminate per-row based dummy iterator creation
> ---
>
> Key: HIVE-15682
> URL: https://issues.apache.org/jira/browse/HIVE-15682
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 2.2.0
>
> Attachments: HIVE-15682.patch
>
>
> HIVE-15580 introduced a dummy iterator per input row which can be eliminated. 
> This is because {{SparkReduceRecordHandler}} is able to handle single key 
> value pairs. We can refactor this part of code 1. to remove the need for a 
> iterator and 2. to optimize the code path for per (key, value) based (instead 
> of (key, value iterator)) processing. It would be also great if we can 
> measure the performance after the optimizations and compare to performance 
> prior to HIVE-15580.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-3558) UDF LEFT(string,position) to HIVE

2017-02-07 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov resolved HIVE-3558.
---
Resolution: Won't Fix

> UDF  LEFT(string,position) to HIVE
> --
>
> Key: HIVE-3558
> URL: https://issues.apache.org/jira/browse/HIVE-3558
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.9.0
>Reporter: Aruna Babu
>Priority: Minor
> Attachments: HIVE-3558.1.patch.txt, udf_left.q, udf_left.q.out
>
>
> Introduction
>   UDF (User Defined Function) to obtain the left most 'n' characters from 
> a string in  HIVE. 
> Relevance
>   Current releases of Hive lacks a function which would returns the 
> leftmost len characters from the string str, or NULL if any argument is NULL. 
>   
> The function LEFT(string,length)  would return the leftmost 'n' characters 
> from the string , or NULL if any argument is NULL which would be useful while 
> using HiveQL. This would find its use  in all the technical aspects where the 
> concept of strings are used.
> Functionality :-
> Function Name: LEFT(string,length) 
>
> Returns the leftmost length characters from the string  or NULL if any 
> argument is NULL.  
> Example: hive>SELECT LEFT('https://www.irctc.co.in',5);
>   -> 'https'
> Usage :-
> Case 1: To query a table to find details based on an https request
> Table :-Transaction
> Request_id|date|period_id|url_name
> 0001|01/07/2012|110001|https://www.irctc.co.in
> 0002|02/07/2012|110001|https://nextstep.tcs.com
> 0003|03/07/2012|110001|https://www.hdfcbank.com
> 0005|01/07/2012|110001|http://www.lmnm.co.in
> 0006|08/07/2012|110001|http://nextstart.com
> 0007|10/07/2012|110001|https://netbanking.icicibank.com
> 0012|21/07/2012|110001|http://www.people.co.in
> 0026|08/07/2012|110001|http://nextprobs.com
> 00023|25/07/2012|110001|https://netbanking.canarabank.com
> Query : select * from transaction where LEFT(url_name,5)='https';
> Result :-
> 0001|01/07/2012|110001|https://www.irctc.com
> 0002|02/07/2012|110001|https://nextstep.tcs.com  
> 0003|03/07/2012|110001|https://www.hdfcbank.com
> 0007|10/07/2012|110001|https://netbanking.icicibank.com
> 00023|25/07/2012|110001|https://netbanking.canarabank.com



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-3558) UDF LEFT(string,position) to HIVE

2017-02-07 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857451#comment-15857451
 ] 

Alexander Pivovarov commented on HIVE-3558:
---

You can use substr to get LEFT and RIGHT
{code}
// get characters from 1st to 5th included
SELECT substr('https://www.irctc.co.in', 1, 5);
https

// all RIGHT characters starting from 6th
SELECT substr('https://www.irctc.co.in', 6);
://www.irctc.co.in
{code}

> UDF  LEFT(string,position) to HIVE
> --
>
> Key: HIVE-3558
> URL: https://issues.apache.org/jira/browse/HIVE-3558
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.9.0
>Reporter: Aruna Babu
>Priority: Minor
> Attachments: HIVE-3558.1.patch.txt, udf_left.q, udf_left.q.out
>
>
> Introduction
>   UDF (User Defined Function) to obtain the left most 'n' characters from 
> a string in  HIVE. 
> Relevance
>   Current releases of Hive lacks a function which would returns the 
> leftmost len characters from the string str, or NULL if any argument is NULL. 
>   
> The function LEFT(string,length)  would return the leftmost 'n' characters 
> from the string , or NULL if any argument is NULL which would be useful while 
> using HiveQL. This would find its use  in all the technical aspects where the 
> concept of strings are used.
> Functionality :-
> Function Name: LEFT(string,length) 
>
> Returns the leftmost length characters from the string  or NULL if any 
> argument is NULL.  
> Example: hive>SELECT LEFT('https://www.irctc.co.in',5);
>   -> 'https'
> Usage :-
> Case 1: To query a table to find details based on an https request
> Table :-Transaction
> Request_id|date|period_id|url_name
> 0001|01/07/2012|110001|https://www.irctc.co.in
> 0002|02/07/2012|110001|https://nextstep.tcs.com
> 0003|03/07/2012|110001|https://www.hdfcbank.com
> 0005|01/07/2012|110001|http://www.lmnm.co.in
> 0006|08/07/2012|110001|http://nextstart.com
> 0007|10/07/2012|110001|https://netbanking.icicibank.com
> 0012|21/07/2012|110001|http://www.people.co.in
> 0026|08/07/2012|110001|http://nextprobs.com
> 00023|25/07/2012|110001|https://netbanking.canarabank.com
> Query : select * from transaction where LEFT(url_name,5)='https';
> Result :-
> 0001|01/07/2012|110001|https://www.irctc.com
> 0002|02/07/2012|110001|https://nextstep.tcs.com  
> 0003|03/07/2012|110001|https://www.hdfcbank.com
> 0007|10/07/2012|110001|https://netbanking.icicibank.com
> 00023|25/07/2012|110001|https://netbanking.canarabank.com



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15789) Vectorization: limit reduce vectorization to 32Mb chunks

2017-02-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857448#comment-15857448
 ] 

Hive QA commented on HIVE-15789:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12851534/HIVE-15789.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10237 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=230)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3430/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3430/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3430/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12851534 - PreCommit-HIVE-Build

> Vectorization: limit reduce vectorization to 32Mb chunks
> 
>
> Key: HIVE-15789
> URL: https://issues.apache.org/jira/browse/HIVE-15789
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-15789.1.patch, HIVE-15789.2.patch
>
>
> Reduce vectorization accumulates 1024 rows before forwarding it into the 
> reduce processor.
> Add a safety limit for 32Mb of writables, so that shorter sequences can be 
> forwarded into the operator trees.
> {code}
> rowIdx++;
> if (rowIdx >= BATCH_SIZE) {
>   VectorizedBatchUtil.setBatchSize(batch, rowIdx);
>   reducer.process(batch, tag);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15388:
---
Status: Open  (was: Patch Available)

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> HIVE-15388.06.patch, hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington Municipal")
> OR `airports`.`airport` = 
> "Algona Municipal")
>OR `airports`.`airport` = 
> "Chandler")
>   OR `airports`.`airport` = 
> "Altus 

[jira] [Commented] (HIVE-15847) In Progress update refreshes seem slow

2017-02-07 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857447#comment-15857447
 ] 

anishek commented on HIVE-15847:


additionally there are more columns being printed in task summary as shown by 

after patch HIVE-15473: 
https://issues.apache.org/jira/secure/attachment/12851509/summary_after_patch.png
before patch HIVE-15473: 
https://issues.apache.org/jira/secure/attachment/12851510/summary_before_patch.png

> In Progress update refreshes seem slow
> --
>
> Key: HIVE-15847
> URL: https://issues.apache.org/jira/browse/HIVE-15847
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: anishek
>
> After HIVE-15473, the refresh rates for in place progress bar seems to be 
> slow on hive cli. 
> As pointed out by [~prasanth_j] 
> {quote}
> The refresh rate is slow. Following video will show it
> before patch: https://asciinema.org/a/2fgcncxg5gjavcpxt6lfb8jg9
> after patch: https://asciinema.org/a/2tht5jf6l9b2dc3ylt5gtztqg
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15388:
---
Status: Patch Available  (was: Open)

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> HIVE-15388.06.patch, hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington Municipal")
> OR `airports`.`airport` = 
> "Algona Municipal")
>OR `airports`.`airport` = 
> "Chandler")
>   OR `airports`.`airport` = 
> "Altus 

[jira] [Commented] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-07 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857445#comment-15857445
 ] 

Pengcheng Xiong commented on HIVE-15388:


[~hagleitn], i have added back some of the tests in interval_alt.q. Please take 
a look.

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> HIVE-15388.06.patch, hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington Municipal")
> OR `airports`.`airport` = 
> "Algona Municipal")
>OR `airports`.`airport` = 
> "Chandler")

[jira] [Updated] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15388:
---
Attachment: HIVE-15388.06.patch

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> HIVE-15388.06.patch, hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington Municipal")
> OR `airports`.`airport` = 
> "Algona Municipal")
>OR `airports`.`airport` = 
> "Chandler")
>   OR `airports`.`airport` = 
> "Altus 

[jira] [Commented] (HIVE-15473) Progress Bar on Beeline client

2017-02-07 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857441#comment-15857441
 ] 

anishek commented on HIVE-15473:


There seems to 3 additional columns that are printed now, may be that is a 
problem will add the same to be checked as part of HIVE-15847

> Progress Bar on Beeline client
> --
>
> Key: HIVE-15473
> URL: https://issues.apache.org/jira/browse/HIVE-15473
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15473.10.patch, HIVE-15473.11.patch, 
> HIVE-15473.2.patch, HIVE-15473.3.patch, HIVE-15473.4.patch, 
> HIVE-15473.5.patch, HIVE-15473.6.patch, HIVE-15473.7.patch, 
> HIVE-15473.8.patch, HIVE-15473.9.patch, io_summary_after_patch.png, 
> io_summary_before_patch.png, screen_shot_beeline.jpg, status_after_patch.png, 
> status_before_patch.png, summary_after_patch.png, summary_before_patch.png
>
>
> Hive Cli allows showing progress bar for tez execution engine as shown in 
> https://issues.apache.org/jira/secure/attachment/12678767/ux-demo.gif
> it would be great to have similar progress bar displayed when user is 
> connecting via beeline command line client as well. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-15473) Progress Bar on Beeline client

2017-02-07 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857413#comment-15857413
 ] 

anishek edited comment on HIVE-15473 at 2/8/17 5:39 AM:


[~prasanth_j] the summary sections are no longer printed via the jline rendered 
to retain the color scheme, the reason being the report goes to log file for 
beeline and for hive cli its shown on the stdout, hence had to remove the color 
scheme for same  report . I hope that should be ok ? 

I have created HIVE-15847 for the slow refresh rates on hive cli, will look 
into it. There is no inherent change that was done to the way progress bar is 
printed for hive-cli.Thanks for your inputs!


was (Author: anishek):
[~prasanth_j] the summary sections are no longer printed via the jline rendered 
to retain the color scheme, the reason being the report goes to log file for 
beeline and for hive cli its shown on the stdout, hence had to remove the color 
scheme for same  report . I hope that should be ok ? 

I have created HIVE-15847 for the slow refresh rates on hive cli, will look 
into it. Thanks for your inputs!

> Progress Bar on Beeline client
> --
>
> Key: HIVE-15473
> URL: https://issues.apache.org/jira/browse/HIVE-15473
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15473.10.patch, HIVE-15473.11.patch, 
> HIVE-15473.2.patch, HIVE-15473.3.patch, HIVE-15473.4.patch, 
> HIVE-15473.5.patch, HIVE-15473.6.patch, HIVE-15473.7.patch, 
> HIVE-15473.8.patch, HIVE-15473.9.patch, io_summary_after_patch.png, 
> io_summary_before_patch.png, screen_shot_beeline.jpg, status_after_patch.png, 
> status_before_patch.png, summary_after_patch.png, summary_before_patch.png
>
>
> Hive Cli allows showing progress bar for tez execution engine as shown in 
> https://issues.apache.org/jira/secure/attachment/12678767/ux-demo.gif
> it would be great to have similar progress bar displayed when user is 
> connecting via beeline command line client as well. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-2710) row_sequence UDF is not documented

2017-02-07 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov resolved HIVE-2710.
---
Resolution: Won't Fix

> row_sequence UDF is not documented
> --
>
> Key: HIVE-2710
> URL: https://issues.apache.org/jira/browse/HIVE-2710
> Project: Hive
>  Issue Type: Bug
>Reporter: Sho Shimauchi
>Priority: Minor
>
> row_sequence UDF was implemented in HIVE-1304, however the function is not 
> documented on hive wiki.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-2710) row_sequence UDF is not documented

2017-02-07 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857440#comment-15857440
 ] 

Alexander Pivovarov commented on HIVE-2710:
---

row_sequence UDF was moved to contrib package.
Usually we do not describe contrib package UDFs in LanguageManual UDF

> row_sequence UDF is not documented
> --
>
> Key: HIVE-2710
> URL: https://issues.apache.org/jira/browse/HIVE-2710
> Project: Hive
>  Issue Type: Bug
>Reporter: Sho Shimauchi
>Priority: Minor
>
> row_sequence UDF was implemented in HIVE-1304, however the function is not 
> documented on hive wiki.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-6046) add UDF for converting date time from one presentation to another

2017-02-07 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-6046:
--
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

> add  UDF for converting date time from one presentation to another
> --
>
> Key: HIVE-6046
> URL: https://issues.apache.org/jira/browse/HIVE-6046
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.13.0
>Reporter: Kostiantyn Kudriavtsev
>Assignee: Kostiantyn Kudriavtsev
> Attachments: Hive-6046-Feb15.patch, Hive-6046.patch, HIVE-6046.patch
>
>
> it'd be nice to have function for converting datetime to different formats, 
> for example:
> format_date('2013-12-12 00:00:00.0', '-MM-dd HH:mm:ss.S', '/MM/dd')
> There are two signatures to facilitate further using:
> format_date(datetime, fromFormat, toFormat)
> format_date(timestamp, toFormat)
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-6214) Need a UDF to convert a Date String from any standard format to another. Should be able to provide the Date String, current format and to the format into which it need to

2017-02-07 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov resolved HIVE-6214.
---
Resolution: Duplicate

> Need a UDF to convert a Date String from any standard format to another. 
> Should be able to provide the Date String, current format and to the format 
> into which it need to be converted and returned as String output of UDF
> 
>
> Key: HIVE-6214
> URL: https://issues.apache.org/jira/browse/HIVE-6214
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
> Environment: Software
>Reporter: Rony Pius Manakkal
>Priority: Minor
>  Labels: features
>
> Need a UDF to convert a Date String from any standard format to another. 
> Should be able to provide the Date String, current format and to the format 
> into which it need to be converted and returned as String output of UDF
> Example : String convertDateFormat(String dateString, String 
> currentDateFormat, String requiredFormat);



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-9988) Evaluating UDF before query is run

2017-02-07 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857435#comment-15857435
 ] 

Alexander Pivovarov commented on HIVE-9988:
---

You can assign the expression to variable before query is evaluated and then 
use the variable in WHERE
{code}
set dt=from_unixtime(unix_timestamp(),'MMdd');

select * from A where dt=${hiveconf:dt};
{code}

> Evaluating UDF before query is run
> --
>
> Key: HIVE-9988
> URL: https://issues.apache.org/jira/browse/HIVE-9988
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ådne Brunborg
>
> When using UDFs on partition column in Hive, all partitions are scanned 
> before the UDF is resolved. 
> If the UDF could be evaluated before query is run, this would greatly improve 
> performance in cases like this.
> Example - the table has a partition by datestamp (bigint): 
> The following where clause touches upon all 82 partitions:
> {{WHERE datestamp=cast(from_unixtime(unix_timestamp(),'MMdd') as bigint)}}
> {{15/03/16 09:21:53 INFO mapred.FileInputFormat: Total input paths to process 
> : 82}}
> …whereas the following only touches the one partition:
> {{WHERE datestamp=20150316}}
> {{15/03/16 09:23:06 INFO input.FileInputFormat: Total input paths to process 
> : 1}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15473) Progress Bar on Beeline client

2017-02-07 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857430#comment-15857430
 ] 

Prasanth Jayachandran commented on HIVE-15473:
--

Yeah. That should not be a problem. More concerned about task summary exceeding 
the column width.

> Progress Bar on Beeline client
> --
>
> Key: HIVE-15473
> URL: https://issues.apache.org/jira/browse/HIVE-15473
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15473.10.patch, HIVE-15473.11.patch, 
> HIVE-15473.2.patch, HIVE-15473.3.patch, HIVE-15473.4.patch, 
> HIVE-15473.5.patch, HIVE-15473.6.patch, HIVE-15473.7.patch, 
> HIVE-15473.8.patch, HIVE-15473.9.patch, io_summary_after_patch.png, 
> io_summary_before_patch.png, screen_shot_beeline.jpg, status_after_patch.png, 
> status_before_patch.png, summary_after_patch.png, summary_before_patch.png
>
>
> Hive Cli allows showing progress bar for tez execution engine as shown in 
> https://issues.apache.org/jira/secure/attachment/12678767/ux-demo.gif
> it would be great to have similar progress bar displayed when user is 
> connecting via beeline command line client as well. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15473) Progress Bar on Beeline client

2017-02-07 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857413#comment-15857413
 ] 

anishek commented on HIVE-15473:


[~prasanth_j] the summary sections are no longer printed via the jline rendered 
to retain the color scheme, the reason being the report goes to log file for 
beeline and for hive cli its shown on the stdout, hence had to remove the color 
scheme for same  report . I hope that should be ok ? 

I have created HIVE-15847 for the slow refresh rates on hive cli, will look 
into it. Thanks for your inputs!

> Progress Bar on Beeline client
> --
>
> Key: HIVE-15473
> URL: https://issues.apache.org/jira/browse/HIVE-15473
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15473.10.patch, HIVE-15473.11.patch, 
> HIVE-15473.2.patch, HIVE-15473.3.patch, HIVE-15473.4.patch, 
> HIVE-15473.5.patch, HIVE-15473.6.patch, HIVE-15473.7.patch, 
> HIVE-15473.8.patch, HIVE-15473.9.patch, io_summary_after_patch.png, 
> io_summary_before_patch.png, screen_shot_beeline.jpg, status_after_patch.png, 
> status_before_patch.png, summary_after_patch.png, summary_before_patch.png
>
>
> Hive Cli allows showing progress bar for tez execution engine as shown in 
> https://issues.apache.org/jira/secure/attachment/12678767/ux-demo.gif
> it would be great to have similar progress bar displayed when user is 
> connecting via beeline command line client as well. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15796) HoS: poor reducer parallelism when operator stats are not accurate

2017-02-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857394#comment-15857394
 ] 

Hive QA commented on HIVE-15796:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12851528/HIVE-15796.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3429/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3429/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3429/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-02-08 04:50:23.401
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-3429/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-02-08 04:50:23.404
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at aa62dad HIVE-15840: Webhcat test TestPig_5 failing with Pig on 
Tez at check for percent complete of job (Daniel Dai, reviewed by Thejas Nair)
+ git clean -f -d
Removing ql/src/test/queries/clientpositive/view_cbo.q
Removing ql/src/test/results/clientpositive/view_cbo.q.out
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at aa62dad HIVE-15840: Webhcat test TestPig_5 failing with Pig on 
Tez at check for percent complete of job (Daniel Dai, reviewed by Thejas Nair)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-02-08 04:50:24.414
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:2886
error: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: patch does 
not apply
error: patch failed: 
ql/src/test/results/clientpositive/spark/subquery_in.q.out:6260
error: ql/src/test/results/clientpositive/spark/subquery_in.q.out: patch does 
not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12851528 - PreCommit-HIVE-Build

> HoS: poor reducer parallelism when operator stats are not accurate
> --
>
> Key: HIVE-15796
> URL: https://issues.apache.org/jira/browse/HIVE-15796
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15796.1.patch, HIVE-15796.wip.1.patch, 
> HIVE-15796.wip.2.patch, HIVE-15796.wip.patch
>
>
> In HoS we use currently use operator stats to determine reducer parallelism. 
> However, it is often the case that operator stats are not accurate, 
> especially if column stats are not available. This sometimes will generate 
> extremely poor reducer parallelism, and cause HoS query to run forever. 
> This JIRA tries to offer an alternative way to compute reducer parallelism, 
> similar to how MR does. Here's the approach we are suggesting:
> 1. when computing the parallelism for a MapWork, use stats associated with 
> the TableScan operator;
> 2. when computing the parallelism for a ReduceWork, use the *maximum* 
> parallelism from all its parents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15769) Support view creation in CBO

2017-02-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857391#comment-15857391
 ] 

Hive QA commented on HIVE-15769:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12851531/HIVE-15769.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 240 failed/errored test(s), 10242 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=219)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_without_localtask]
 (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_SortUnionTransposeRule]
 (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_input26] (batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constant_prop_3] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer13] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer15] 
(batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_join2] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_udf] (batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_topn] (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dynamic_rdd_cache] 
(batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explain_rearrange] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby2_limit] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_grouping_sets_grouping]
 (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_position] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input26] (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_vc] (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[limit_pushdown2] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mergejoin] (batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[nested_column_pruning] 
(batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pcr] (batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pcs] (batchId=45)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pointlookup2] 
(batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pointlookup3] (batchId=6)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pointlookup4] 
(batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_udf_case] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_vc] (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[reduce_deduplicate_extended2]
 (batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[regex_col] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoin_noskew] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_case_column_pruning] 
(batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udtf_json_tuple] 
(batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udtf_parse_url_tuple] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_cast_constant] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_2] 
(batchId=63)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_mapjoin1] 
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_date_1] 
(batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_round] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_mapjoin] 
(batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_reduce] 
(batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_if_expr] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_1] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_arithmetic]
 (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_mr_diff_schema_alias]
 (batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join1] 
(batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join2] 
(batchId=28)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join3] 
(batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join4] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_string_concat] 
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_varchar_mapjoin1] 
(batchId=24)

[jira] [Updated] (HIVE-15683) Measure performance impact on group by by HIVE-15580

2017-02-07 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-15683:
---
Status: Patch Available  (was: Open)

> Measure performance impact on group by by HIVE-15580
> 
>
> Key: HIVE-15683
> URL: https://issues.apache.org/jira/browse/HIVE-15683
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-15683.patch
>
>
> HIVE-15580 changed the way the data is shuffled for order by: instead of 
> using Spark's groupByKey to shuffle data, Hive on Spark now uses 
> repartitionAndSortWithinPartitions(), which generates (key, value) pairs 
> instead of original (key, value iterator). This might have some performance 
> implications, but it's needed to get rid of unbound memory usage by 
> {{groupByKey}}.
> Here we'd like to compare group by performance with or w/o HIVE-15580. If the 
> impact is significant, we can provide a configuration that allows user to 
> switch back to the original way of shuffling.
> This work should be ideally done after HIVE-15682 as the optimization there 
> should help the performance here as well. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15683) Measure performance impact on group by by HIVE-15580

2017-02-07 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-15683:
---
Attachment: HIVE-15683.patch

Patch brought back the old implementation and provide a configuration to switch 
on the new implementation.

> Measure performance impact on group by by HIVE-15580
> 
>
> Key: HIVE-15683
> URL: https://issues.apache.org/jira/browse/HIVE-15683
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-15683.patch
>
>
> HIVE-15580 changed the way the data is shuffled for order by: instead of 
> using Spark's groupByKey to shuffle data, Hive on Spark now uses 
> repartitionAndSortWithinPartitions(), which generates (key, value) pairs 
> instead of original (key, value iterator). This might have some performance 
> implications, but it's needed to get rid of unbound memory usage by 
> {{groupByKey}}.
> Here we'd like to compare group by performance with or w/o HIVE-15580. If the 
> impact is significant, we can provide a configuration that allows user to 
> switch back to the original way of shuffling.
> This work should be ideally done after HIVE-15682 as the optimization there 
> should help the performance here as well. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15840) Webhcat test TestPig_5 failing with Pig on Tez at check for percent complete of job

2017-02-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-15840:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: 2.2.0
  1.3.0
Target Version/s: 1.3.0, 2.2.0  (was: 2.2.0)
  Status: Resolved  (was: Patch Available)

UT failures are not related. Patch pushed to both master and branch-1.

> Webhcat test TestPig_5 failing with Pig on Tez at check for percent complete 
> of job
> ---
>
> Key: HIVE-15840
> URL: https://issues.apache.org/jira/browse/HIVE-15840
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 1.3.0, 2.2.0
>
> Attachments: HIVE-15840.1.patch
>
>
> TestPig_5 is failing at percentage check if the job is Pig on Tez:
> check_job_percent_complete failed. got percentComplete , expected 100% 
> complete
> Test command:
> curl -d user.name=daijy -d arg=-p -d arg=INPDIR=/tmp/templeton_test_data -d 
> arg=-p -d arg=OUTDIR=/tmp/output -d file=loadstore.pig -X POST 
> http://localhost:50111/templeton/v1/pig
> curl 
> http://localhost:50111/templeton/v1/jobs/job_1486502484681_0003?user.name=daijy
> This is similar to HIVE-9351, which fixes Hive on Tez.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15672) LLAP text cache: improve first query perf II

2017-02-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857329#comment-15857329
 ] 

Hive QA commented on HIVE-15672:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12851520/HIVE-15672.08.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10241 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDate2 (batchId=173)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3427/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3427/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3427/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12851520 - PreCommit-HIVE-Build

> LLAP text cache: improve first query perf II
> 
>
> Key: HIVE-15672
> URL: https://issues.apache.org/jira/browse/HIVE-15672
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15672.01.patch, HIVE-15672.02.patch, 
> HIVE-15672.03.patch, HIVE-15672.04.patch, HIVE-15672.05.patch, 
> HIVE-15672.06.patch, HIVE-15672.07.patch, HIVE-15672.08.patch
>
>
> 4) Send VRB to the pipeline and write ORC in parallel (in background).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15803) msck can hang when nested partitions are present

2017-02-07 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15803:

Attachment: HIVE-15803.2.patch

modified debug logs in .2 version.

> msck can hang when nested partitions are present
> 
>
> Key: HIVE-15803
> URL: https://issues.apache.org/jira/browse/HIVE-15803
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15803.1.patch, HIVE-15803.2.patch, HIVE-15803.patch
>
>
> Steps to reproduce. 
> {noformat}
> CREATE TABLE `repairtable`( `col` string) PARTITIONED BY (  `p1` string,  
> `p2` string)
> hive> dfs -mkdir -p /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b;
> hive> dfs -touchz 
> /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b/datafile;
> hive> set hive.mv.files.thread;
> hive.mv.files.thread=15
> hive> set hive.mv.files.thread=1;
> hive> MSCK TABLE repairtable;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15803) msck can hang when nested partitions are present

2017-02-07 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15803:

Attachment: HIVE-15803.1.patch

Modified patch, which checks for thread pool's usage. Similar to the suggestion 
by [~pattipaka]

> msck can hang when nested partitions are present
> 
>
> Key: HIVE-15803
> URL: https://issues.apache.org/jira/browse/HIVE-15803
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15803.1.patch, HIVE-15803.patch
>
>
> Steps to reproduce. 
> {noformat}
> CREATE TABLE `repairtable`( `col` string) PARTITIONED BY (  `p1` string,  
> `p2` string)
> hive> dfs -mkdir -p /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b;
> hive> dfs -touchz 
> /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b/datafile;
> hive> set hive.mv.files.thread;
> hive.mv.files.thread=15
> hive> set hive.mv.files.thread=1;
> hive> MSCK TABLE repairtable;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15803) msck can hang when nested partitions are present

2017-02-07 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857305#comment-15857305
 ] 

Rajesh Balamohan commented on HIVE-15803:
-

Thank you for sharing the patch. Deadlock would happen when multiple paths are 
there. For instance, following would deadlock with the patch.

{noformat}
DROP table repairtable;
CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING);
dfs -mkdir -p /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b;
dfs -mkdir -p /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b/;
dfs -mkdir -p /apps/hive/warehouse/test.db/repairtable/p1=cc/p2=aa/p3=bb/;
dfs -mkdir -p /apps/hive/warehouse/test.db/repairtable/p1=ccc/p2=aaa/p3=bbb/;
dfs -mkdir -p /apps/hive/warehouse/test.db/repairtable/p1=/p2=/p3=/;
dfs -mkdir -p 
/apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b/;
dfs -mkdir -p 
/apps/hive/warehouse/test.db/repairtable/p1=cc/p2=aa/p3=bb/;
dfs -mkdir -p 
/apps/hive/warehouse/test.db/repairtable/p1=ccc/p2=/p3=/;

dfs -touchz /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b/datafile;
dfs -touchz /apps/hive/warehouse/test.db/repairtable/p1=cc/p2=aa/p3=bb/datafile;
dfs -touchz 
/apps/hive/warehouse/test.db/repairtable/p1=ccc/p2=aaa/p3=bbb/datafile;
dfs -touchz 
/apps/hive/warehouse/test.db/repairtable/p1=/p2=/p3=/datafile;
dfs -touchz 
/apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b/datafile;
dfs -touchz 
/apps/hive/warehouse/test.db/repairtable/p1=cc/p2=aa/p3=bb/datafile;
dfs -touchz 
/apps/hive/warehouse/test.db/repairtable/p1=ccc/p2=/p3=/datafile;
set hive.mv.files.thread=1;
MSCK TABLE repairtable;
{noformat}

> msck can hang when nested partitions are present
> 
>
> Key: HIVE-15803
> URL: https://issues.apache.org/jira/browse/HIVE-15803
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15803.patch
>
>
> Steps to reproduce. 
> {noformat}
> CREATE TABLE `repairtable`( `col` string) PARTITIONED BY (  `p1` string,  
> `p2` string)
> hive> dfs -mkdir -p /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b;
> hive> dfs -touchz 
> /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b/datafile;
> hive> set hive.mv.files.thread;
> hive.mv.files.thread=15
> hive> set hive.mv.files.thread=1;
> hive> MSCK TABLE repairtable;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15222) replace org.json usage in ExplainTask/TezTask related classes with some alternative

2017-02-07 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-15222:
--
Attachment: HIVE-15222.3.patch

This patch replaced "o.getClass() == Map.class" with "o instanceof Map" to 
accept a map object as a correct argument.

> replace org.json usage in ExplainTask/TezTask related classes with some 
> alternative
> ---
>
> Key: HIVE-15222
> URL: https://issues.apache.org/jira/browse/HIVE-15222
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Teddy Choi
> Fix For: 2.2.0
>
> Attachments: HIVE-15222.1.patch, HIVE-15222.2.patch, 
> HIVE-15222.3.patch
>
>
> Replace org.json usage in these classes.
> It seems to me that json is probably only used to write some information - 
> but the application never reads it back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15791) Remove unused ant files

2017-02-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857280#comment-15857280
 ] 

Hive QA commented on HIVE-15791:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12851519/HIVE-15791.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3426/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3426/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3426/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/apache/hadoop/hadoop-common/2.7.2/hadoop-common-2.7.2.jar(org/apache/hadoop/util/StringUtils.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/apache/hadoop/hadoop-common/2.7.2/hadoop-common-2.7.2.jar(org/apache/hadoop/util/VersionInfo.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/lang/Iterable.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/apache/hadoop/hadoop-common/2.7.2/hadoop-common-2.7.2.jar(org/apache/hadoop/io/Writable.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/lang/String.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/aggregate/jetty-all-server/7.6.0.v20120127/jetty-all-server-7.6.0.v20120127.jar(org/eclipse/jetty/http/HttpStatus.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/util/HashMap.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/com/sun/jersey/jersey-core/1.14/jersey-core-1.14.jar(javax/ws/rs/core/MediaType.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/com/sun/jersey/jersey-core/1.14/jersey-core-1.14.jar(javax/ws/rs/core/Response.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar(org/codehaus/jackson/map/ObjectMapper.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/lang/Exception.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/lang/Throwable.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/Serializable.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/lang/Enum.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/lang/Comparable.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/com/sun/jersey/jersey-server/1.14/jersey-server-1.14.jar(com/sun/jersey/api/core/PackagesResourceConfig.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/com/sun/jersey/jersey-servlet/1.14/jersey-servlet-1.14.jar(com/sun/jersey/spi/container/servlet/ServletContainer.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/FileInputStream.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/apache-github-source-source/ql/target/hive-exec-2.2.0-SNAPSHOT.jar(org/apache/commons/lang3/StringUtils.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/apache-github-source-source/ql/target/hive-exec-2.2.0-SNAPSHOT.jar(org/apache/commons/lang3/ArrayUtils.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/apache-github-source-source/common/target/hive-common-2.2.0-SNAPSHOT.jar(org/apache/hadoop/hive/common/classification/InterfaceStability.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/apache/hadoop/hadoop-hdfs/2.7.2/hadoop-hdfs-2.7.2.jar(org/apache/hadoop/hdfs/web/AuthFilter.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/apache-github-source-source/shims/common/target/hive-shims-common-2.2.0-SNAPSHOT.jar(org/apache/hadoop/hive/shims/Utils.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/apache/hadoop/hadoop-common/2.7.2/hadoop-common-2.7.2.jar(org/apache/hadoop/security/UserGroupInformation.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/apache/hadoop/hadoop-auth/2.7.2/hadoop-auth-2.7.2.jar(org/apache/hadoop/security/authentication/client/PseudoAuthenticator.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/apache/hadoop/hadoop-auth/2.7.2/hadoop-auth-2.7.2.jar(org/apache/hadoop/security/authentication/server/PseudoAuthenticationHandler.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/apache/hadoop/hadoop-common/2.7.2/hadoop-common-2.7.2.jar(org/apache/hadoop/util/GenericOptionsParser.class)]]
[loading 

[jira] [Commented] (HIVE-15803) msck can hang when nested partitions are present

2017-02-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857271#comment-15857271
 ] 

Hive QA commented on HIVE-15803:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12851512/HIVE-15803.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10241 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3425/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3425/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3425/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12851512 - PreCommit-HIVE-Build

> msck can hang when nested partitions are present
> 
>
> Key: HIVE-15803
> URL: https://issues.apache.org/jira/browse/HIVE-15803
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15803.patch
>
>
> Steps to reproduce. 
> {noformat}
> CREATE TABLE `repairtable`( `col` string) PARTITIONED BY (  `p1` string,  
> `p2` string)
> hive> dfs -mkdir -p /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b;
> hive> dfs -touchz 
> /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b/datafile;
> hive> set hive.mv.files.thread;
> hive.mv.files.thread=15
> hive> set hive.mv.files.thread=1;
> hive> MSCK TABLE repairtable;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15789) Vectorization: limit reduce vectorization to 32Mb chunks

2017-02-07 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-15789:
--
Attachment: HIVE-15789.2.patch

> Vectorization: limit reduce vectorization to 32Mb chunks
> 
>
> Key: HIVE-15789
> URL: https://issues.apache.org/jira/browse/HIVE-15789
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-15789.1.patch, HIVE-15789.2.patch
>
>
> Reduce vectorization accumulates 1024 rows before forwarding it into the 
> reduce processor.
> Add a safety limit for 32Mb of writables, so that shorter sequences can be 
> forwarded into the operator trees.
> {code}
> rowIdx++;
> if (rowIdx >= BATCH_SIZE) {
>   VectorizedBatchUtil.setBatchSize(batch, rowIdx);
>   reducer.process(batch, tag);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15789) Vectorization: limit reduce vectorization to 32Mb chunks

2017-02-07 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857258#comment-15857258
 ] 

Teddy Choi commented on HIVE-15789:
---

This patch applies HIVE-15745 change and sets a key length as a default of 
batchBytes.

> Vectorization: limit reduce vectorization to 32Mb chunks
> 
>
> Key: HIVE-15789
> URL: https://issues.apache.org/jira/browse/HIVE-15789
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-15789.1.patch, HIVE-15789.2.patch
>
>
> Reduce vectorization accumulates 1024 rows before forwarding it into the 
> reduce processor.
> Add a safety limit for 32Mb of writables, so that shorter sequences can be 
> forwarded into the operator trees.
> {code}
> rowIdx++;
> if (rowIdx >= BATCH_SIZE) {
>   VectorizedBatchUtil.setBatchSize(batch, rowIdx);
>   reducer.process(batch, tag);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15803) msck can hang when nested partitions are present

2017-02-07 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857234#comment-15857234
 ] 

Pengcheng Xiong commented on HIVE-15803:


LGTM +1.

> msck can hang when nested partitions are present
> 
>
> Key: HIVE-15803
> URL: https://issues.apache.org/jira/browse/HIVE-15803
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15803.patch
>
>
> Steps to reproduce. 
> {noformat}
> CREATE TABLE `repairtable`( `col` string) PARTITIONED BY (  `p1` string,  
> `p2` string)
> hive> dfs -mkdir -p /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b;
> hive> dfs -touchz 
> /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b/datafile;
> hive> set hive.mv.files.thread;
> hive.mv.files.thread=15
> hive> set hive.mv.files.thread=1;
> hive> MSCK TABLE repairtable;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15769) Support view creation in CBO

2017-02-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15769:
---
Status: Open  (was: Patch Available)

> Support view creation in CBO
> 
>
> Key: HIVE-15769
> URL: https://issues.apache.org/jira/browse/HIVE-15769
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15769.01.patch, HIVE-15769.02.patch
>
>
> Right now, set operator needs to run in CBO. If a view contains a set op, it 
> will throw exception. We need to support view creation in CBO.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15769) Support view creation in CBO

2017-02-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15769:
---
Status: Patch Available  (was: Open)

> Support view creation in CBO
> 
>
> Key: HIVE-15769
> URL: https://issues.apache.org/jira/browse/HIVE-15769
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15769.01.patch, HIVE-15769.02.patch
>
>
> Right now, set operator needs to run in CBO. If a view contains a set op, it 
> will throw exception. We need to support view creation in CBO.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15769) Support view creation in CBO

2017-02-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15769:
---
Attachment: HIVE-15769.02.patch

> Support view creation in CBO
> 
>
> Key: HIVE-15769
> URL: https://issues.apache.org/jira/browse/HIVE-15769
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15769.01.patch, HIVE-15769.02.patch
>
>
> Right now, set operator needs to run in CBO. If a view contains a set op, it 
> will throw exception. We need to support view creation in CBO.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15796) HoS: poor reducer parallelism when operator stats are not accurate

2017-02-07 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15796:

Attachment: HIVE-15796.1.patch

> HoS: poor reducer parallelism when operator stats are not accurate
> --
>
> Key: HIVE-15796
> URL: https://issues.apache.org/jira/browse/HIVE-15796
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15796.1.patch, HIVE-15796.wip.1.patch, 
> HIVE-15796.wip.2.patch, HIVE-15796.wip.patch
>
>
> In HoS we use currently use operator stats to determine reducer parallelism. 
> However, it is often the case that operator stats are not accurate, 
> especially if column stats are not available. This sometimes will generate 
> extremely poor reducer parallelism, and cause HoS query to run forever. 
> This JIRA tries to offer an alternative way to compute reducer parallelism, 
> similar to how MR does. Here's the approach we are suggesting:
> 1. when computing the parallelism for a MapWork, use stats associated with 
> the TableScan operator;
> 2. when computing the parallelism for a ReduceWork, use the *maximum* 
> parallelism from all its parents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15843) disable slider YARN resource normalization for LLAP

2017-02-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857217#comment-15857217
 ] 

Hive QA commented on HIVE-15843:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12851503/HIVE-15843.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10240 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapNullKey[0] 
(batchId=173)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3424/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3424/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3424/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12851503 - PreCommit-HIVE-Build

> disable slider YARN resource normalization for LLAP
> ---
>
> Key: HIVE-15843
> URL: https://issues.apache.org/jira/browse/HIVE-15843
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15843.patch
>
>
> The normalization can lead to LLAP starting with invalid configuration with 
> regard to cache size, jmx and container size. If the memory configuration is 
> invalid, it should fail immediately.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15843) disable slider YARN resource normalization for LLAP

2017-02-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857214#comment-15857214
 ] 

Siddharth Seth commented on HIVE-15843:
---

The patch looks good to me. Slider build is fairly straightforward from their 
"develop" branch.

> disable slider YARN resource normalization for LLAP
> ---
>
> Key: HIVE-15843
> URL: https://issues.apache.org/jira/browse/HIVE-15843
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15843.patch
>
>
> The normalization can lead to LLAP starting with invalid configuration with 
> regard to cache size, jmx and container size. If the memory configuration is 
> invalid, it should fail immediately.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15846) Relocate more dependencies (e.g. org.apache.zookeeper) for JDBC uber jar

2017-02-07 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li reassigned HIVE-15846:
-


> Relocate more dependencies (e.g. org.apache.zookeeper) for JDBC uber jar
> 
>
> Key: HIVE-15846
> URL: https://issues.apache.org/jira/browse/HIVE-15846
> Project: Hive
>  Issue Type: Bug
>Reporter: Tao Li
>Assignee: Tao Li
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15718) Fix the NullPointer problem caused by split phase

2017-02-07 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-15718:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to the master. Thanks [~colin_mjj] for the contribution.

> Fix the NullPointer problem caused by split phase
> -
>
> Key: HIVE-15718
> URL: https://issues.apache.org/jira/browse/HIVE-15718
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-15718.001.patch, HIVE-15718.002.patch, 
> HIVE-15718.003.patch
>
>
> VectorizedParquetRecordReader.initialize() will throw NullPointer Exception 
> because the input split is null. This split should be ignored.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-15682) Eliminate per-row based dummy iterator creation

2017-02-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857172#comment-15857172
 ] 

Xuefu Zhang edited comment on HIVE-15682 at 2/8/17 1:23 AM:


Hi [~Ferd], when I ran the query, I had two day's data which is about 25m rows. 
I just ran the query again, with about 10 day's data, the runtime is about 600s 
with 130m rows. I have 32 executors, each having 4 cores. The query spends most 
of the time on the second stage where sorting via a single reducer occurs.

I don't think the scale matters much as long as the query runs for sometime (in 
minutes at least).  Thus, you should be able to use TPC-DS (or its 
alternatives) data for this exercise.


was (Author: xuefuz):
Hi [~Ferd], when I ran the query, I had two day's data which is about 25m rows. 
I just ran the query again, with about 10 day's data, the runtime is about 600s 
with 130m rows. I have 32 executors, each having 4 cores. The query spends most 
of the time on the second stage where sorting via a single reducer occurs.

I don't think the scale matters much as long as the query runs for sometime (in 
minutes at least).  Thus, you should be able to use TPC-DS data for this 
exercise.

> Eliminate per-row based dummy iterator creation
> ---
>
> Key: HIVE-15682
> URL: https://issues.apache.org/jira/browse/HIVE-15682
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 2.2.0
>
> Attachments: HIVE-15682.patch
>
>
> HIVE-15580 introduced a dummy iterator per input row which can be eliminated. 
> This is because {{SparkReduceRecordHandler}} is able to handle single key 
> value pairs. We can refactor this part of code 1. to remove the need for a 
> iterator and 2. to optimize the code path for per (key, value) based (instead 
> of (key, value iterator)) processing. It would be also great if we can 
> measure the performance after the optimizations and compare to performance 
> prior to HIVE-15580.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15682) Eliminate per-row based dummy iterator creation

2017-02-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857172#comment-15857172
 ] 

Xuefu Zhang commented on HIVE-15682:


Hi [~Ferd], when I ran the query, I had two day's data which is about 25m rows. 
I just ran the query again, with about 10 day's data, the runtime is about 600s 
with 130m rows. I have 32 executors, each having 4 cores. The query spends most 
of the time on the second stage where sorting via a single reducer occurs.

I don't think the scale matters much as long as the query runs for sometime (in 
minutes at least).  Thus, you should be able to use TPC-DS data for this 
exercise.

> Eliminate per-row based dummy iterator creation
> ---
>
> Key: HIVE-15682
> URL: https://issues.apache.org/jira/browse/HIVE-15682
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 2.2.0
>
> Attachments: HIVE-15682.patch
>
>
> HIVE-15580 introduced a dummy iterator per input row which can be eliminated. 
> This is because {{SparkReduceRecordHandler}} is able to handle single key 
> value pairs. We can refactor this part of code 1. to remove the need for a 
> iterator and 2. to optimize the code path for per (key, value) based (instead 
> of (key, value iterator)) processing. It would be also great if we can 
> measure the performance after the optimizations and compare to performance 
> prior to HIVE-15580.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2017-02-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668996#comment-15668996
 ] 

Sergey Shelukhin edited comment on HIVE-14990 at 2/8/17 1:12 AM:
-

Updated test list to fix/declare irrelevant before closing this. Only updated 
the CliDriver list actually, haven't made my way thru it yet
{panel}
TestCliDriver:
-stats_list_bucket-
-show_tablestatus-
-vector_udf2-
-list_bucket_dml_14-
autoColumnStats_9
stats_noscan_2
symlink_text_input_format
temp_table_precedence
offset_limit_global_optimizer
rand_partitionpruner2
materialized_view_authorization_sqlstd,materialized_*
merge_dynamic_partition, merge_dynamic_partition*
orc_vectorization_ppd
parquet_join2
repl_3_exim_metadata
sample6
sample_islocalmode_hook
smb_mapjoin_2,smb_mapjoin_3,smb_mapjoin_7
orc_createas1
exim_16_part_external,exim_17_part_managed,


TestEncryptedHDFSCliDriver:
encryption_ctas
encryption_drop_partition 
encryption_insert_values
encryption_join_unencrypted_tbl
encryption_load_data_to_encrypted_tables

MiniLlapLocal:
exchgpartition2lel
cbo_rp_lineage2
create_merge_compressed
deleteAnalyze
delete_where_no_match
delete_where_non_partitioned
dynpart_sort_optimization
escape2
insert1
lineage2
lineage3
orc_llap
schema_evol_orc_nonvec_part
schema_evol_orc_vec_part
schema_evol_text_nonvec_part
schema_evol_text_vec_part
schema_evol_text_vecrow_part
smb_mapjoin_6
tez_dml
union_fast_stats
update_all_types
update_tmp_table
update_where_no_match
update_where_non_partitioned
vector_outer_join1
vector_outer_join4

MiniLlap:
load_fs2
orc_ppd_basic
external_table_with_space_in_location_path
file_with_header_footer
import_exported_table
schemeAuthority,schemeAuthority2
table_nonprintable

Minimr:
infer_bucket_sort_map_operators
infer_bucket_sort_merge
infer_bucket_sort_reducers_power_two
root_dir_external_table
scriptfile1

TestSymlinkTextInputFormat#testCombine 
TestJdbcWithLocalClusterSpark, etc.
{panel}


was (Author: sershe):
Updated test list to fix/declare irrelevant before closing this. Only updated 
the CliDriver list actually, haven't made my way thru it yet
{panel}
TestCliDriver:
stats_list_bucket
show_tablestatus
-vector_udf2-
list_bucket_dml_14
autoColumnStats_9
stats_noscan_2
symlink_text_input_format
temp_table_precedence
offset_limit_global_optimizer
rand_partitionpruner2
materialized_view_authorization_sqlstd,materialized_*
merge_dynamic_partition, merge_dynamic_partition*
orc_vectorization_ppd
parquet_join2
repl_3_exim_metadata
sample6
sample_islocalmode_hook
smb_mapjoin_2,smb_mapjoin_3,smb_mapjoin_7
orc_createas1
exim_16_part_external,exim_17_part_managed,


TestEncryptedHDFSCliDriver:
encryption_ctas
encryption_drop_partition 
encryption_insert_values
encryption_join_unencrypted_tbl
encryption_load_data_to_encrypted_tables

MiniLlapLocal:
exchgpartition2lel
cbo_rp_lineage2
create_merge_compressed
deleteAnalyze
delete_where_no_match
delete_where_non_partitioned
dynpart_sort_optimization
escape2
insert1
lineage2
lineage3
orc_llap
schema_evol_orc_nonvec_part
schema_evol_orc_vec_part
schema_evol_text_nonvec_part
schema_evol_text_vec_part
schema_evol_text_vecrow_part
smb_mapjoin_6
tez_dml
union_fast_stats
update_all_types
update_tmp_table
update_where_no_match
update_where_non_partitioned
vector_outer_join1
vector_outer_join4

MiniLlap:
load_fs2
orc_ppd_basic
external_table_with_space_in_location_path
file_with_header_footer
import_exported_table
schemeAuthority,schemeAuthority2
table_nonprintable

Minimr:
infer_bucket_sort_map_operators
infer_bucket_sort_merge
infer_bucket_sort_reducers_power_two
root_dir_external_table
scriptfile1

TestSymlinkTextInputFormat#testCombine 
TestJdbcWithLocalClusterSpark, etc.
{panel}

> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, 
> HIVE-14990.09.patch, HIVE-14990.10.patch, HIVE-14990.10.patch, 
> HIVE-14990.10.patch, HIVE-14990.12.patch, HIVE-14990.patch
>
>
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths (path changes).
> 4) Truncate column (not supported).
> 5) Describe formatted will have the new table fields in the output (before 
> merging MM with ACID).
> 6) Many tests w/explain extended - diff in partition "base file name" (path 

[jira] [Updated] (HIVE-15844) Add WriteType to Explain Plan of ReduceSinkOperator and FileSinkOperator

2017-02-07 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15844:
--
Description: 
# both FileSinkDesk and ReduceSinkDesk have special code path for Update/Delete 
operations. It is not always set correctly for ReduceSink. 
ReduceSinkDeDuplication is one place where it gets lost. Even when it isn't set 
correctly, elsewhere we set ROW_ID to be the partition column of the 
ReduceSinkOperator and UDFToInteger special cases it to extract bucketId from 
ROW_ID. We need to modify Explain Plan to record Write Type (i.e. 
insert/update/delete) to make sure we have tests that can catch errors here.
# Add some validation at the end of the plan to make sure that RSO/FSO which 
represent the end of the pipeline and write to acid table have WriteType set 
(to something other than default).
#  We don't seem to have any tests where number of buckets is > number of 
reducers. Add those.

  was:both FileSinkDesk and ReduceSinkDesk have special code path for 
Update/Delete operations. It is not always set correctly for ReduceSink. 
ReduceSinkDeDuplication is one place where it gets lost. Even when it isn't set 
correctly, elsewhere we set ROW_ID to be the partition column of the 
ReduceSinkOperator and UDFToInteger special cases it to extract bucketId from 
ROW_ID. We need to modify Explain Plan to record Write Type (i.e. 
insert/update/delete) to make sure we have tests that can catch errors here.


> Add WriteType to Explain Plan of ReduceSinkOperator and FileSinkOperator
> 
>
> Key: HIVE-15844
> URL: https://issues.apache.org/jira/browse/HIVE-15844
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 1.0.0
>
>
> # both FileSinkDesk and ReduceSinkDesk have special code path for 
> Update/Delete operations. It is not always set correctly for ReduceSink. 
> ReduceSinkDeDuplication is one place where it gets lost. Even when it isn't 
> set correctly, elsewhere we set ROW_ID to be the partition column of the 
> ReduceSinkOperator and UDFToInteger special cases it to extract bucketId from 
> ROW_ID. We need to modify Explain Plan to record Write Type (i.e. 
> insert/update/delete) to make sure we have tests that can catch errors here.
> # Add some validation at the end of the plan to make sure that RSO/FSO which 
> represent the end of the pipeline and write to acid table have WriteType set 
> (to something other than default).
> #  We don't seem to have any tests where number of buckets is > number of 
> reducers. Add those.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15672) LLAP text cache: improve first query perf II

2017-02-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15672:

Attachment: HIVE-15672.08.patch

Small update based on RB

> LLAP text cache: improve first query perf II
> 
>
> Key: HIVE-15672
> URL: https://issues.apache.org/jira/browse/HIVE-15672
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15672.01.patch, HIVE-15672.02.patch, 
> HIVE-15672.03.patch, HIVE-15672.04.patch, HIVE-15672.05.patch, 
> HIVE-15672.06.patch, HIVE-15672.07.patch, HIVE-15672.08.patch
>
>
> 4) Send VRB to the pipeline and write ORC in parallel (in background).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15791) Remove unused ant files

2017-02-07 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-15791:
--
Attachment: HIVE-15791.2.patch

> Remove unused ant files
> ---
>
> Key: HIVE-15791
> URL: https://issues.apache.org/jira/browse/HIVE-15791
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-15791.1.patch, HIVE-15791.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15791) Remove unused ant files

2017-02-07 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-15791:
--
Status: Patch Available  (was: Open)

> Remove unused ant files
> ---
>
> Key: HIVE-15791
> URL: https://issues.apache.org/jira/browse/HIVE-15791
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-15791.1.patch, HIVE-15791.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15802) Changes to expected entries for dynamic bloomfilter runtime filtering

2017-02-07 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857153#comment-15857153
 ] 

Jason Dere commented on HIVE-15802:
---

Looks like the golden file for 
TestMiniLlapLocalCliDriver.testCliDriver[mergejoin] needs to be updated.

> Changes to expected entries for dynamic bloomfilter runtime filtering
> -
>
> Key: HIVE-15802
> URL: https://issues.apache.org/jira/browse/HIVE-15802
> Project: Hive
>  Issue Type: Improvement
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15802.1.patch, HIVE-15802.2.patch
>
>
> - Estimate bloom filter size based on distinct values from column stats if 
> available
> - Cap the bloom filter expected entries size to 
> hive.tez.max.bloom.filter.entries if the estimated size from stats exceeds 
> that amount.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15844) Add WriteType to Explain Plan of ReduceSinkOperator and FileSinkOperator

2017-02-07 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-15844:
-


> Add WriteType to Explain Plan of ReduceSinkOperator and FileSinkOperator
> 
>
> Key: HIVE-15844
> URL: https://issues.apache.org/jira/browse/HIVE-15844
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 1.0.0
>
>
> both FileSinkDesk and ReduceSinkDesk have special code path for Update/Delete 
> operations. It is not always set correctly for ReduceSink. 
> ReduceSinkDeDuplication is one place where it gets lost. Even when it isn't 
> set correctly, elsewhere we set ROW_ID to be the partition column of the 
> ReduceSinkOperator and UDFToInteger special cases it to extract bucketId from 
> ROW_ID. We need to modify Explain Plan to record Write Type (i.e. 
> insert/update/delete) to make sure we have tests that can catch errors here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15840) Webhcat test TestPig_5 failing with Pig on Tez at check for percent complete of job

2017-02-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857143#comment-15857143
 ] 

Hive QA commented on HIVE-15840:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12851494/HIVE-15840.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10236 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=230)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3423/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3423/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3423/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12851494 - PreCommit-HIVE-Build

> Webhcat test TestPig_5 failing with Pig on Tez at check for percent complete 
> of job
> ---
>
> Key: HIVE-15840
> URL: https://issues.apache.org/jira/browse/HIVE-15840
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15840.1.patch
>
>
> TestPig_5 is failing at percentage check if the job is Pig on Tez:
> check_job_percent_complete failed. got percentComplete , expected 100% 
> complete
> Test command:
> curl -d user.name=daijy -d arg=-p -d arg=INPDIR=/tmp/templeton_test_data -d 
> arg=-p -d arg=OUTDIR=/tmp/output -d file=loadstore.pig -X POST 
> http://localhost:50111/templeton/v1/pig
> curl 
> http://localhost:50111/templeton/v1/jobs/job_1486502484681_0003?user.name=daijy
> This is similar to HIVE-9351, which fixes Hive on Tez.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15473) Progress Bar on Beeline client

2017-02-07 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857132#comment-15857132
 ] 

Thejas M Nair commented on HIVE-15473:
--

[~anishek] Can you please create a follow up jira to address these concerns ?


> Progress Bar on Beeline client
> --
>
> Key: HIVE-15473
> URL: https://issues.apache.org/jira/browse/HIVE-15473
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15473.10.patch, HIVE-15473.11.patch, 
> HIVE-15473.2.patch, HIVE-15473.3.patch, HIVE-15473.4.patch, 
> HIVE-15473.5.patch, HIVE-15473.6.patch, HIVE-15473.7.patch, 
> HIVE-15473.8.patch, HIVE-15473.9.patch, io_summary_after_patch.png, 
> io_summary_before_patch.png, screen_shot_beeline.jpg, status_after_patch.png, 
> status_before_patch.png, summary_after_patch.png, summary_before_patch.png
>
>
> Hive Cli allows showing progress bar for tez execution engine as shown in 
> https://issues.apache.org/jira/secure/attachment/12678767/ux-demo.gif
> it would be great to have similar progress bar displayed when user is 
> connecting via beeline command line client as well. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15683) Measure performance impact on group by by HIVE-15580

2017-02-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857123#comment-15857123
 ] 

Xuefu Zhang commented on HIVE-15683:


The following measurement comes up with static allocation with out prod cluster 
for a query designed to measure group by performance. It offers a comparison 
between performance w/ and w/o HIVE-15580.
{code}
Query: select count(*) from (select driver_uuid, avg(base_fare_usd) from 
dwh.fact_trip where datestr > '2017-01-01' group by driver_uuid) x;
Origin: 55.1, 42.1, 39.6, 39.1, 39.1, 33.06, 61.6 AVG: 44.24
Patch: 59.1, 65.2, 58.3, 35.1, 45.1, 39.4, 47.3   AVG: 49.93 => 1.13X slower
{code}

The performance degradation seems noticeable however insignificant. For this, 
we plan to add a configuration to allow user to switch the two implementations.

However, our cluster is notoriously for large performance variations. Thus, 
it's great if others can also conduct some test to confirm. FYI,  
[~Ferd]/[~dapengsun].

> Measure performance impact on group by by HIVE-15580
> 
>
> Key: HIVE-15683
> URL: https://issues.apache.org/jira/browse/HIVE-15683
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> HIVE-15580 changed the way the data is shuffled for order by: instead of 
> using Spark's groupByKey to shuffle data, Hive on Spark now uses 
> repartitionAndSortWithinPartitions(), which generates (key, value) pairs 
> instead of original (key, value iterator). This might have some performance 
> implications, but it's needed to get rid of unbound memory usage by 
> {{groupByKey}}.
> Here we'd like to compare group by performance with or w/o HIVE-15580. If the 
> impact is significant, we can provide a configuration that allows user to 
> switch back to the original way of shuffling.
> This work should be ideally done after HIVE-15682 as the optimization there 
> should help the performance here as well. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-02-07 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857108#comment-15857108
 ] 

Eugene Koifman edited comment on HIVE-15691 at 2/8/17 12:36 AM:


What is main use case this is designed for that DelimitedWriter can't handle?  
Using different delimiters in the same row?  Maybe the unit tests should be 
more elaborate to illustrate that.


Why are there so many StrictRegexWriter() c'tors?
It seems like a StrictRegexWriter w/o a regex or EndPoint or Connection, etc is 
not useful - it will just NPE in various places. 

[~roshan_naik] do you have any comments on this patch?


was (Author: ekoifman):
What is main use case this is designed for that DelimitedWriter can't handle?  
Using different delimiters in the same row?  Maybe the unit tests should be 
more elaborate to illustrate that.


Why are there so many StrictRegexWriter() c'tors?
It seems like a StrictRegexWriter w/o a regex or EndPoint or Connection, etc is 
not useful - it will just NPE in various places. 

[~roshan_naik] do you have any comments on this?

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15682) Eliminate per-row based dummy iterator creation

2017-02-07 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857110#comment-15857110
 ] 

Ferdinand Xu commented on HIVE-15682:
-

Hi [~xuefuz]
{noformat}
select count(*) from (select request_lat from dwh.fact_trip where datestr > 
'2017-01-27' order by request_lat) x;
Origin: 246.56, 342.78, 216.40, 216.587, 270.805, 449.232, 233.406 AVG: 282.25
patch: 125.21, 123.22, 166.31, 168.30, 120.428, 119.21, 120.385AVG: 134.72
{noformat}
What kind of data scales do you use to evaluate the performance? We can 
evaluate this patch using TPC-DS and TPCx-BB.

> Eliminate per-row based dummy iterator creation
> ---
>
> Key: HIVE-15682
> URL: https://issues.apache.org/jira/browse/HIVE-15682
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 2.2.0
>
> Attachments: HIVE-15682.patch
>
>
> HIVE-15580 introduced a dummy iterator per input row which can be eliminated. 
> This is because {{SparkReduceRecordHandler}} is able to handle single key 
> value pairs. We can refactor this part of code 1. to remove the need for a 
> iterator and 2. to optimize the code path for per (key, value) based (instead 
> of (key, value iterator)) processing. It would be also great if we can 
> measure the performance after the optimizations and compare to performance 
> prior to HIVE-15580.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-02-07 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857108#comment-15857108
 ] 

Eugene Koifman edited comment on HIVE-15691 at 2/8/17 12:36 AM:


What is main use case this is designed for that DelimitedWriter can't handle?  
Using different delimiters in the same row?  Maybe the unit tests should be 
more elaborate to illustrate that.


Why are there so many StrictRegexWriter() c'tors?
It seems like a StrictRegexWriter w/o a regex or EndPoint or Connection, etc is 
not useful - it will just NPE in various places. 

[~roshan_naik] do you have any comments on this?


was (Author: ekoifman):
What is main use case this is designed for that DelimitedWriter can't handle?  
Using different delimiters in the same row?  Maybe the unit tests should be 
more elaborate to illustrate that.


Why are there so many StrictRegexWriter() c'tors?
It seems like a StrictRegexWriter w/o a regex or EndPoint or Connection, etc is 
not useful - it will just NPE in various places. 

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-02-07 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857108#comment-15857108
 ] 

Eugene Koifman commented on HIVE-15691:
---

What is main use case this is designed for that DelimitedWriter can't handle?  
Using different delimiters in the same row?  Maybe the unit tests should be 
more elaborate to illustrate that.


Why are there so many StrictRegexWriter() c'tors?
It seems like a StrictRegexWriter w/o a regex or EndPoint or Connection, etc is 
not useful - it will just NPE in various places. 

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15682) Eliminate per-row based dummy iterator creation

2017-02-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857098#comment-15857098
 ] 

Xuefu Zhang commented on HIVE-15682:


Hi [~Ferd]/[~dapengsun], it would be great if you guys can also run the test 
and confirm the conclusion drawn here. Thanks.

> Eliminate per-row based dummy iterator creation
> ---
>
> Key: HIVE-15682
> URL: https://issues.apache.org/jira/browse/HIVE-15682
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 2.2.0
>
> Attachments: HIVE-15682.patch
>
>
> HIVE-15580 introduced a dummy iterator per input row which can be eliminated. 
> This is because {{SparkReduceRecordHandler}} is able to handle single key 
> value pairs. We can refactor this part of code 1. to remove the need for a 
> iterator and 2. to optimize the code path for per (key, value) based (instead 
> of (key, value iterator)) processing. It would be also great if we can 
> measure the performance after the optimizations and compare to performance 
> prior to HIVE-15580.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15222) replace org.json usage in ExplainTask/TezTask related classes with some alternative

2017-02-07 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857087#comment-15857087
 ] 

Gunther Hagleitner commented on HIVE-15222:
---

cc [~pxiong]. i think you have made some changes to the json explain before?

> replace org.json usage in ExplainTask/TezTask related classes with some 
> alternative
> ---
>
> Key: HIVE-15222
> URL: https://issues.apache.org/jira/browse/HIVE-15222
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Teddy Choi
> Fix For: 2.2.0
>
> Attachments: HIVE-15222.1.patch, HIVE-15222.2.patch
>
>
> Replace org.json usage in these classes.
> It seems to me that json is probably only used to write some information - 
> but the application never reads it back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15803) msck can hang when nested partitions are present

2017-02-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15803:

Status: Patch Available  (was: Open)

> msck can hang when nested partitions are present
> 
>
> Key: HIVE-15803
> URL: https://issues.apache.org/jira/browse/HIVE-15803
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15803.patch
>
>
> Steps to reproduce. 
> {noformat}
> CREATE TABLE `repairtable`( `col` string) PARTITIONED BY (  `p1` string,  
> `p2` string)
> hive> dfs -mkdir -p /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b;
> hive> dfs -touchz 
> /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b/datafile;
> hive> set hive.mv.files.thread;
> hive.mv.files.thread=15
> hive> set hive.mv.files.thread=1;
> hive> MSCK TABLE repairtable;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15803) msck can hang when nested partitions are present

2017-02-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15803:

Attachment: HIVE-15803.patch

> msck can hang when nested partitions are present
> 
>
> Key: HIVE-15803
> URL: https://issues.apache.org/jira/browse/HIVE-15803
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15803.patch
>
>
> Steps to reproduce. 
> {noformat}
> CREATE TABLE `repairtable`( `col` string) PARTITIONED BY (  `p1` string,  
> `p2` string)
> hive> dfs -mkdir -p /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b;
> hive> dfs -touchz 
> /apps/hive/warehouse/test.db/repairtable/p1=c/p2=a/p3=b/datafile;
> hive> set hive.mv.files.thread;
> hive.mv.files.thread=15
> hive> set hive.mv.files.thread=1;
> hive> MSCK TABLE repairtable;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15843) disable slider YARN resource normalization for LLAP

2017-02-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857080#comment-15857080
 ] 

Sergey Shelukhin commented on HIVE-15843:
-

Hmm, I don't have access to a new enough version of Slider, but I do see the 
setting applied in the package; and it doesn't fail due to an unknown setting 
on Slider 0.91, which is something else I wanted to test.

> disable slider YARN resource normalization for LLAP
> ---
>
> Key: HIVE-15843
> URL: https://issues.apache.org/jira/browse/HIVE-15843
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15843.patch
>
>
> The normalization can lead to LLAP starting with invalid configuration with 
> regard to cache size, jmx and container size. If the memory configuration is 
> invalid, it should fail immediately.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15473) Progress Bar on Beeline client

2017-02-07 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857076#comment-15857076
 ] 

Prasanth Jayachandran commented on HIVE-15473:
--

The refresh rate is slow. Following video will show it
before patch: https://asciinema.org/a/2fgcncxg5gjavcpxt6lfb8jg9
after patch: https://asciinema.org/a/2tht5jf6l9b2dc3ylt5gtztqg

> Progress Bar on Beeline client
> --
>
> Key: HIVE-15473
> URL: https://issues.apache.org/jira/browse/HIVE-15473
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15473.10.patch, HIVE-15473.11.patch, 
> HIVE-15473.2.patch, HIVE-15473.3.patch, HIVE-15473.4.patch, 
> HIVE-15473.5.patch, HIVE-15473.6.patch, HIVE-15473.7.patch, 
> HIVE-15473.8.patch, HIVE-15473.9.patch, io_summary_after_patch.png, 
> io_summary_before_patch.png, screen_shot_beeline.jpg, status_after_patch.png, 
> status_before_patch.png, summary_after_patch.png, summary_before_patch.png
>
>
> Hive Cli allows showing progress bar for tez execution engine as shown in 
> https://issues.apache.org/jira/secure/attachment/12678767/ux-demo.gif
> it would be great to have similar progress bar displayed when user is 
> connecting via beeline command line client as well. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15473) Progress Bar on Beeline client

2017-02-07 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-15473:


Assignee: anishek  (was: Prasanth Jayachandran)

> Progress Bar on Beeline client
> --
>
> Key: HIVE-15473
> URL: https://issues.apache.org/jira/browse/HIVE-15473
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15473.10.patch, HIVE-15473.11.patch, 
> HIVE-15473.2.patch, HIVE-15473.3.patch, HIVE-15473.4.patch, 
> HIVE-15473.5.patch, HIVE-15473.6.patch, HIVE-15473.7.patch, 
> HIVE-15473.8.patch, HIVE-15473.9.patch, io_summary_after_patch.png, 
> io_summary_before_patch.png, screen_shot_beeline.jpg, status_after_patch.png, 
> status_before_patch.png, summary_after_patch.png, summary_before_patch.png
>
>
> Hive Cli allows showing progress bar for tez execution engine as shown in 
> https://issues.apache.org/jira/secure/attachment/12678767/ux-demo.gif
> it would be great to have similar progress bar displayed when user is 
> connecting via beeline command line client as well. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15802) Changes to expected entries for dynamic bloomfilter runtime filtering

2017-02-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857074#comment-15857074
 ] 

Hive QA commented on HIVE-15802:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12851479/HIVE-15802.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10206 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=162)

[scriptfile1.q,vector_outer_join5.q,file_with_header_footer.q,bucket4.q,input16_cc.q,bucket5.q,infer_bucket_sort_merge.q,constprog_partitioner.q,orc_merge2.q,reduce_deduplicate.q,schemeAuthority2.q,load_fs2.q,orc_merge8.q,orc_merge_incompat2.q,infer_bucket_sort_bucketed_table.q,vector_outer_join4.q,disable_merge_for_bucketing.q,vector_inner_join.q,orc_merge7.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=106)

[bucketsortoptimize_insert_4.q,multi_insert_mixed.q,vectorization_10.q,auto_join18_multi_distinct.q,join_cond_pushdown_3.q,custom_input_output_format.q,skewjoinopt5.q,vectorization_part_project.q,vector_count_distinct.q,skewjoinopt4.q,count.q,parallel.q,union33.q,union_lateralview.q,nullgroup4.q]
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mergejoin] 
(batchId=150)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3422/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3422/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3422/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12851479 - PreCommit-HIVE-Build

> Changes to expected entries for dynamic bloomfilter runtime filtering
> -
>
> Key: HIVE-15802
> URL: https://issues.apache.org/jira/browse/HIVE-15802
> Project: Hive
>  Issue Type: Improvement
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15802.1.patch, HIVE-15802.2.patch
>
>
> - Estimate bloom filter size based on distinct values from column stats if 
> available
> - Cap the bloom filter expected entries size to 
> hive.tez.max.bloom.filter.entries if the estimated size from stats exceeds 
> that amount.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15473) Progress Bar on Beeline client

2017-02-07 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15473:
-
Attachment: summary_before_patch.png
summary_after_patch.png
status_before_patch.png
status_after_patch.png
io_summary_before_patch.png
io_summary_after_patch.png

Attaching files with differences before and after this patch.

> Progress Bar on Beeline client
> --
>
> Key: HIVE-15473
> URL: https://issues.apache.org/jira/browse/HIVE-15473
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: Prasanth Jayachandran
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15473.10.patch, HIVE-15473.11.patch, 
> HIVE-15473.2.patch, HIVE-15473.3.patch, HIVE-15473.4.patch, 
> HIVE-15473.5.patch, HIVE-15473.6.patch, HIVE-15473.7.patch, 
> HIVE-15473.8.patch, HIVE-15473.9.patch, io_summary_after_patch.png, 
> io_summary_before_patch.png, screen_shot_beeline.jpg, status_after_patch.png, 
> status_before_patch.png, summary_after_patch.png, summary_before_patch.png
>
>
> Hive Cli allows showing progress bar for tez execution engine as shown in 
> https://issues.apache.org/jira/secure/attachment/12678767/ux-demo.gif
> it would be great to have similar progress bar displayed when user is 
> connecting via beeline command line client as well. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15473) Progress Bar on Beeline client

2017-02-07 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-15473:


Assignee: Prasanth Jayachandran  (was: anishek)

> Progress Bar on Beeline client
> --
>
> Key: HIVE-15473
> URL: https://issues.apache.org/jira/browse/HIVE-15473
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: Prasanth Jayachandran
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15473.10.patch, HIVE-15473.11.patch, 
> HIVE-15473.2.patch, HIVE-15473.3.patch, HIVE-15473.4.patch, 
> HIVE-15473.5.patch, HIVE-15473.6.patch, HIVE-15473.7.patch, 
> HIVE-15473.8.patch, HIVE-15473.9.patch, io_summary_after_patch.png, 
> io_summary_before_patch.png, screen_shot_beeline.jpg, status_after_patch.png, 
> status_before_patch.png, summary_after_patch.png, summary_before_patch.png
>
>
> Hive Cli allows showing progress bar for tez execution engine as shown in 
> https://issues.apache.org/jira/secure/attachment/12678767/ux-demo.gif
> it would be great to have similar progress bar displayed when user is 
> connecting via beeline command line client as well. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15843) disable slider YARN resource normalization for LLAP

2017-02-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15843:

Description: The normalization can lead to LLAP starting with invalid 
configuration with regard to cache size, jmx and container size. If the memory 
configuration is invalid, it should fail immediately.  (was: This can lead to 
LLAP starting with an invalid config with regard to cache size, jmx and 
container size. If the memory configuration is invalid, it should fail 
immediately.)

> disable slider YARN resource normalization for LLAP
> ---
>
> Key: HIVE-15843
> URL: https://issues.apache.org/jira/browse/HIVE-15843
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15843.patch
>
>
> The normalization can lead to LLAP starting with invalid configuration with 
> regard to cache size, jmx and container size. If the memory configuration is 
> invalid, it should fail immediately.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15843) disable slider YARN resource normalization for LLAP

2017-02-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15843:

Status: Patch Available  (was: Open)

> disable slider YARN resource normalization for LLAP
> ---
>
> Key: HIVE-15843
> URL: https://issues.apache.org/jira/browse/HIVE-15843
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15843.patch
>
>
> This can lead to LLAP starting with an invalid config with regard to cache 
> size, jmx and container size. If the memory configuration is invalid, it 
> should fail immediately.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15843) disable slider YARN resource normalization for LLAP

2017-02-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15843:

Attachment: HIVE-15843.patch

The patch. Need to test in the cluster. cc [~sseth]

> disable slider YARN resource normalization for LLAP
> ---
>
> Key: HIVE-15843
> URL: https://issues.apache.org/jira/browse/HIVE-15843
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15843.patch
>
>
> This can lead to LLAP starting with an invalid config with regard to cache 
> size, jmx and container size. If the memory configuration is invalid, it 
> should fail immediately.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15843) disable slider YARN resource normalization for LLAP

2017-02-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-15843:
---


> disable slider YARN resource normalization for LLAP
> ---
>
> Key: HIVE-15843
> URL: https://issues.apache.org/jira/browse/HIVE-15843
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
>
> This can lead to LLAP starting with an invalid config with regard to cache 
> size, jmx and container size. If the memory configuration is invalid, it 
> should fail immediately.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-15842) disable slider YARN resource normalization for LLAP

2017-02-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-15842.
-
Resolution: Invalid

> disable slider YARN resource normalization for LLAP
> ---
>
> Key: HIVE-15842
> URL: https://issues.apache.org/jira/browse/HIVE-15842
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> This can lead to LLAP starting with an invalid config with regard to cache 
> size, jmx and container size. If the memory configuration is invalid, it 
> should fail immediately.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15842) disable slider YARN resource normalization for LLAP

2017-02-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-15842:
---


> disable slider YARN resource normalization for LLAP
> ---
>
> Key: HIVE-15842
> URL: https://issues.apache.org/jira/browse/HIVE-15842
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> This can lead to LLAP starting with an invalid config with regard to cache 
> size, jmx and container size. If the memory configuration is invalid, it 
> should fail immediately.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857046#comment-15857046
 ] 

ASF GitHub Bot commented on HIVE-14007:
---

Github user omalley closed the pull request at:

https://github.com/apache/hive/pull/81


> Replace ORC module with ORC release
> ---
>
> Key: HIVE-14007
> URL: https://issues.apache.org/jira/browse/HIVE-14007
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, 
> HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, 
> HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch
>
>
> This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15688) LlapServiceDriver - an option to start the cluster immediately

2017-02-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857044#comment-15857044
 ] 

Siddharth Seth commented on HIVE-15688:
---

+1 for the latest patch.

> LlapServiceDriver - an option to start the cluster immediately
> --
>
> Key: HIVE-15688
> URL: https://issues.apache.org/jira/browse/HIVE-15688
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15688.01.patch, HIVE-15688.02.patch, 
> HIVE-15688.03.patch, HIVE-15688.04.patch, HIVE-15688.patch
>
>
> run.sh is very slow because it's 4 calls to slider, which means 4 JVMs, 4 
> connections to RM and other crap, for   2-5sec. of overhead per call, 
> depending on the machine/cluster.
> What we need is a mode for llapservicedriver that would not generate run.sh, 
> but would rather run the cluster immediately by calling the corresponding 4 
> slider APIs. Should probably be the default, too. For compat with scripts we 
> might generate blank run.sh for now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15840) Webhcat test TestPig_5 failing with Pig on Tez at check for percent complete of job

2017-02-07 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857030#comment-15857030
 ] 

Thejas M Nair commented on HIVE-15840:
--

+1

> Webhcat test TestPig_5 failing with Pig on Tez at check for percent complete 
> of job
> ---
>
> Key: HIVE-15840
> URL: https://issues.apache.org/jira/browse/HIVE-15840
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15840.1.patch
>
>
> TestPig_5 is failing at percentage check if the job is Pig on Tez:
> check_job_percent_complete failed. got percentComplete , expected 100% 
> complete
> Test command:
> curl -d user.name=daijy -d arg=-p -d arg=INPDIR=/tmp/templeton_test_data -d 
> arg=-p -d arg=OUTDIR=/tmp/output -d file=loadstore.pig -X POST 
> http://localhost:50111/templeton/v1/pig
> curl 
> http://localhost:50111/templeton/v1/jobs/job_1486502484681_0003?user.name=daijy
> This is similar to HIVE-9351, which fixes Hive on Tez.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15841) Upgrade Hive to ORC 1.3.2

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857023#comment-15857023
 ] 

ASF GitHub Bot commented on HIVE-15841:
---

GitHub user omalley opened a pull request:

https://github.com/apache/hive/pull/142

HIVE-15841. Upgrade to ORC 1.3.2.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/omalley/hive hive-15841

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/142.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #142


commit 3078b0f32fec97607b97fa31e29527da50631099
Author: Owen O'Malley 
Date:   2017-02-07T23:27:31Z

HIVE-15841. Upgrade to ORC 1.3.2.




> Upgrade Hive to ORC 1.3.2
> -
>
> Key: HIVE-15841
> URL: https://issues.apache.org/jira/browse/HIVE-15841
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>
> Hive needs ORC-141 and ORC-135, so we should upgrade to ORC 1.3.2 once it 
> releases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-10562) Add version column to NOTIFICATION_LOG table and DbNotificationListener

2017-02-07 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856998#comment-15856998
 ] 

Daniel Dai commented on HIVE-10562:
---

andFilter is a new feature and we shall add a test. Also I notice this patch 
also piggyback varchar -> clob change, we shall include this in Jira title. 
Otherwise looks good.

> Add version column to NOTIFICATION_LOG table and DbNotificationListener
> ---
>
> Key: HIVE-10562
> URL: https://issues.apache.org/jira/browse/HIVE-10562
> Project: Hive
>  Issue Type: Sub-task
>  Components: Import/Export
>Affects Versions: 1.2.0
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-10562.2.patch, HIVE-10562.3.patch, 
> HIVE-10562.4.patch, HIVE-10562.patch
>
>
> Currently, we have a JSON encoded message being stored in the 
> NOTIFICATION_LOG table.
> If we want to be future proof, we need to allow for versioning of this 
> message, since we might change what gets stored in the message. A prime 
> example of what we'd want to change is as in HIVE-10393.
> MessageFactory already has stubs to allow for versioning of messages, and we 
> could expand on this further in the future. NotificationListener currently 
> encodes the message version into the header for the JMS message it sends, 
> which seems to be the right place for a message version (instead of being 
> contained in the message, for eg.).
> So, we should have a similar ability for DbEventListener as well, and the 
> place this makes the most sense is to and add a version column to the 
> NOTIFICATION_LOG table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15754) exchange partition is not generating notifications

2017-02-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856991#comment-15856991
 ] 

Sergio Peña commented on HIVE-15754:


Test failures are not related, and flaky tests are already reported on 
HIVE-15058.

+1

> exchange partition is not generating notifications
> --
>
> Key: HIVE-15754
> URL: https://issues.apache.org/jira/browse/HIVE-15754
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0
>Reporter: Nachiket Vaidya
>Assignee: Nachiket Vaidya
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-15754.0.patch, HIVE-15754.1.patch
>
>
> exchange partition event is not generating notifications in notification_log.
> There should multiple events generated. one add_partition event and several 
> drop_partition events.
> for example:
> {noformat}
> ALTER TABLE tab1 EXCHANGE PARTITION (part=1) WITH TABLE tab2;
> {noformat}
> There should be the following events:
> ADD_PARTITION on tab2 on partition (part=1)
> DROP_PARTITION on tab1 on partition (part=1)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15841) Upgrade Hive to ORC 1.3.2

2017-02-07 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-15841:
-
Description: Hive needs ORC-141 and ORC-135, so we should upgrade to ORC 
1.3.2 once it releases.  (was: Hive needs ORC-141 and ORC-135, so we should 
upgrade to ORC-1.3.2 once it releases.)

> Upgrade Hive to ORC 1.3.2
> -
>
> Key: HIVE-15841
> URL: https://issues.apache.org/jira/browse/HIVE-15841
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>
> Hive needs ORC-141 and ORC-135, so we should upgrade to ORC 1.3.2 once it 
> releases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15840) Webhcat test TestPig_5 failing with Pig on Tez at check for percent complete of job

2017-02-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-15840:
--
Status: Patch Available  (was: Open)

> Webhcat test TestPig_5 failing with Pig on Tez at check for percent complete 
> of job
> ---
>
> Key: HIVE-15840
> URL: https://issues.apache.org/jira/browse/HIVE-15840
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15840.1.patch
>
>
> TestPig_5 is failing at percentage check if the job is Pig on Tez:
> check_job_percent_complete failed. got percentComplete , expected 100% 
> complete
> Test command:
> curl -d user.name=daijy -d arg=-p -d arg=INPDIR=/tmp/templeton_test_data -d 
> arg=-p -d arg=OUTDIR=/tmp/output -d file=loadstore.pig -X POST 
> http://localhost:50111/templeton/v1/pig
> curl 
> http://localhost:50111/templeton/v1/jobs/job_1486502484681_0003?user.name=daijy
> This is similar to HIVE-9351, which fixes Hive on Tez.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15840) Webhcat test TestPig_5 failing with Pig on Tez at check for percent complete of job

2017-02-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-15840:
--
Attachment: HIVE-15840.1.patch

> Webhcat test TestPig_5 failing with Pig on Tez at check for percent complete 
> of job
> ---
>
> Key: HIVE-15840
> URL: https://issues.apache.org/jira/browse/HIVE-15840
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15840.1.patch
>
>
> TestPig_5 is failing at percentage check if the job is Pig on Tez:
> check_job_percent_complete failed. got percentComplete , expected 100% 
> complete
> Test command:
> curl -d user.name=daijy -d arg=-p -d arg=INPDIR=/tmp/templeton_test_data -d 
> arg=-p -d arg=OUTDIR=/tmp/output -d file=loadstore.pig -X POST 
> http://localhost:50111/templeton/v1/pig
> curl 
> http://localhost:50111/templeton/v1/jobs/job_1486502484681_0003?user.name=daijy
> This is similar to HIVE-9351, which fixes Hive on Tez.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >