[jira] [Comment Edited] (HIVE-14633) #.of Files in a partition ! = #.Of buckets in a partitioned,bucketed table

2016-08-29 Thread Abhishek Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448135#comment-15448135
 ] 

Abhishek Somani edited comment on HIVE-14633 at 8/30/16 5:54 AM:
-

Isn't this expected? Insert into will just create those copy files you see, 
with the same bucket id as seen above. This is not expected to affect any 
functionality and hive takes care of those copies correctly. Others can confirm.

Do you seen any functionality broken due to this?


was (Author: asomani):
I think this is expected. Insert into will just create those copy files you 
see, with the same bucket id as seen above. This is not expected to affect any 
functionality and hive takes care of those copies correctly. Others can confirm.

Do you seen any functionality broken due to this?

> #.of Files in a partition ! = #.Of buckets in a partitioned,bucketed table
> --
>
> Key: HIVE-14633
> URL: https://issues.apache.org/jira/browse/HIVE-14633
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
> Environment: HDP 2.3.2
>Reporter: Hanu
>
> Ideally the number of files should be equal to number of buckets declared in 
> a table DDL. It is working fine whenever an initial insert or every insert 
> overwrite is performed. But, insert into hive bucketed table is creating 
> extra files. 
> ex:
> # of Buckets = 4
> No. of files after Initial insert --> 4
> No. of files after 2nd insert --> 8
> No. of files after 3rd insert --> 12
> No. of files after n insert --> n* # of Buckets.
> First insert list : 
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/00_0
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/01_0
> -rwxrwxrwx   3 hvallur hdfs308 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/02_0
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/03_0
> 2nd Insert:
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/00_0
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:47 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/00_0_copy_1
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/01_0
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:47 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/01_0_copy_1
> -rwxrwxrwx   3 hvallur hdfs308 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/02_0
> -rwxrwxrwx   3 hvallur hdfs302 2016-08-25 12:47 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/02_0_copy_1
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/03_0
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:47 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/03_0_copy_1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14576) Testing: Fixes to TestHBaseMinimrCliDriver

2016-08-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448141#comment-15448141
 ] 

Hive QA commented on HIVE-14576:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12826106/HIVE-14576.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10466 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1041/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1041/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1041/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12826106 - PreCommit-HIVE-MASTER-Build

> Testing: Fixes to TestHBaseMinimrCliDriver
> --
>
> Key: HIVE-14576
> URL: https://issues.apache.org/jira/browse/HIVE-14576
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14576.1.patch
>
>
> 1. Runtime over 1000s.
> 2. Runs as an isolated test.
> Need to fix both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14633) #.of Files in a partition ! = #.Of buckets in a partitioned,bucketed table

2016-08-29 Thread Abhishek Somani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448135#comment-15448135
 ] 

Abhishek Somani commented on HIVE-14633:


I think this is expected. Insert into will just create those copy files you 
see, with the same bucket id as seen above. This is not expected to affect any 
functionality and hive takes care of those copies correctly. Others can confirm.

Do you seen any functionality broken due to this?

> #.of Files in a partition ! = #.Of buckets in a partitioned,bucketed table
> --
>
> Key: HIVE-14633
> URL: https://issues.apache.org/jira/browse/HIVE-14633
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
> Environment: HDP 2.3.2
>Reporter: Hanu
>
> Ideally the number of files should be equal to number of buckets declared in 
> a table DDL. It is working fine whenever an initial insert or every insert 
> overwrite is performed. But, insert into hive bucketed table is creating 
> extra files. 
> ex:
> # of Buckets = 4
> No. of files after Initial insert --> 4
> No. of files after 2nd insert --> 8
> No. of files after 3rd insert --> 12
> No. of files after n insert --> n* # of Buckets.
> First insert list : 
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/00_0
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/01_0
> -rwxrwxrwx   3 hvallur hdfs308 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/02_0
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/03_0
> 2nd Insert:
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/00_0
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:47 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/00_0_copy_1
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/01_0
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:47 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/01_0_copy_1
> -rwxrwxrwx   3 hvallur hdfs308 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/02_0
> -rwxrwxrwx   3 hvallur hdfs302 2016-08-25 12:47 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/02_0_copy_1
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:42 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/03_0
> -rwxrwxrwx   3 hvallur hdfs 49 2016-08-25 12:47 
> hdfs://dshdp-dev-cluster/apps/hive/warehouse/upsert_testing.db/test3/lname=vr/03_0_copy_1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14540) Create batches for non qfile tests

2016-08-29 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14540:
--
Attachment: HIVE-14540.02.patch

Updated patch with the unit tests fixed.

> Create batches for non qfile tests
> --
>
> Key: HIVE-14540
> URL: https://issues.apache.org/jira/browse/HIVE-14540
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14540.01.patch, HIVE-14540.02.patch
>
>
> From run 790:
> Reported runtime by junit: 17 hours
> Reported runtime by ptest: 34 hours
> A lot of time is wasted spinning up mvn test for each individual test, which 
> otherwise takes less than 1 second. These tests could end up taking 20-30 
> seconds. Combined with HIVE-14539 - 60-70s.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14538) beeline throws exceptions with parsing hive config when using !sh statement

2016-08-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447922#comment-15447922
 ] 

Hive QA commented on HIVE-14538:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12826076/HIVE-14538.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10467 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1039/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1039/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1039/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12826076 - PreCommit-HIVE-MASTER-Build

> beeline throws exceptions with parsing hive config when using !sh statement
> ---
>
> Key: HIVE-14538
> URL: https://issues.apache.org/jira/browse/HIVE-14538
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-14538.1.patch, HIVE-14538.2.patch
>
>
> When beeline has a connection to a server, in some env it has following 
> problem:
> {noformat}
> 0: jdbc:hive2://localhost> !verbose
> verbose: on
> 0: jdbc:hive2://localhost> !sh id
> java.lang.ArrayIndexOutOfBoundsException: 1
> at org.apache.hive.beeline.Commands.addConf(Commands.java:758)
> at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:704)
> at org.apache.hive.beeline.Commands.sh(Commands.java:1002)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081)
> at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845)
> at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> 0: jdbc:hive2://localhost> !sh echo hello
> java.lang.ArrayIndexOutOfBoundsException: 1
> at org.apache.hive.beeline.Commands.addConf(Commands.java:758)
> at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:704)
> at org.apache.hive.beeline.Commands.sh(Commands.java:1002)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081)
> at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845)
> at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 

[jira] [Commented] (HIVE-14652) incorrect results for not in on partition columns

2016-08-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447746#comment-15447746
 ] 

Hive QA commented on HIVE-14652:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12826067/HIVE-14652.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10467 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pcs]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pointlookup2]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pointlookup3]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pointlookup4]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1038/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1038/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1038/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12826067 - PreCommit-HIVE-MASTER-Build

> incorrect results for not in on partition columns
> -
>
> Key: HIVE-14652
> URL: https://issues.apache.org/jira/browse/HIVE-14652
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: stephen sprague
>Assignee: Sergey Shelukhin
>Priority: Blocker
> Attachments: HIVE-14652.01.patch, HIVE-14652.patch
>
>
> {noformat}
> create table foo (i int) partitioned by (s string);
> insert overwrite table foo partition(s='foo') select cint from alltypesorc 
> limit 10;
> insert overwrite table foo partition(s='bar') select cint from alltypesorc 
> limit 10;
> select * from foo where s not in ('bar');
> {noformat}
> No results. IN ... works correctly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13610) Hive exec module won't compile with IBM JDK

2016-08-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447717#comment-15447717
 ] 

Sergey Shelukhin commented on HIVE-13610:
-

+1, will commit tomorrow if there are no objections

> Hive exec module won't compile with IBM JDK
> ---
>
> Key: HIVE-13610
> URL: https://issues.apache.org/jira/browse/HIVE-13610
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
> Environment: Hive 1.1.0 + IBM JDK 1.7 +ppc64 architecture
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Attachments: HIVE-13610.1.patch, HIVE-13610.2.patch, 
> HIVE-13610.3.patch, HIVE-13610.patch
>
>
> org.apache.hadoop.hive.ql.debug.Utils explicitly import 
> com.sun.management.HotSpotDiagnosticMXBean which is not supported by IBM JDK.
> So we can make HotSpotDiagnosticMXBean as runtime but not compile.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13610) Hive exec module won't compile with IBM JDK

2016-08-29 Thread Pan Yuxuan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447705#comment-15447705
 ] 

Pan Yuxuan commented on HIVE-13610:
---

[~sershe] [~prasanth_j] Could you please help to review this patch? Thanks very 
much.

> Hive exec module won't compile with IBM JDK
> ---
>
> Key: HIVE-13610
> URL: https://issues.apache.org/jira/browse/HIVE-13610
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
> Environment: Hive 1.1.0 + IBM JDK 1.7 +ppc64 architecture
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Attachments: HIVE-13610.1.patch, HIVE-13610.2.patch, 
> HIVE-13610.3.patch, HIVE-13610.patch
>
>
> org.apache.hadoop.hive.ql.debug.Utils explicitly import 
> com.sun.management.HotSpotDiagnosticMXBean which is not supported by IBM JDK.
> So we can make HotSpotDiagnosticMXBean as runtime but not compile.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14665) vector_join_part_col_char.q failure

2016-08-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447585#comment-15447585
 ] 

Hive QA commented on HIVE-14665:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12826055/HIVE-14665.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10466 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1037/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1037/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1037/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12826055 - PreCommit-HIVE-MASTER-Build

> vector_join_part_col_char.q failure
> ---
>
> Key: HIVE-14665
> URL: https://issues.apache.org/jira/browse/HIVE-14665
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14665.1.patch
>
>
> Happens 100% of the time. Looks like a missed golden file update from 
> HIVE-14502.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14372) Odd behavior with Beeline parsing server principal in Kerberized environment

2016-08-29 Thread Junjie Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junjie Chen reassigned HIVE-14372:
--

Assignee: Junjie Chen

> Odd behavior with Beeline parsing server principal in Kerberized environment
> 
>
> Key: HIVE-14372
> URL: https://issues.apache.org/jira/browse/HIVE-14372
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Junjie Chen
>
> Case 1:
> I can replace the realm with any garbage realm, and it still works.
> {code}
> [root@c62-n3 ~]# beeline
> Beeline version 0.10.0-cdh4.2.0 by Apache Hive
> beeline> !connect 
> jdbc:hive2://c62-n3.intuit.test:1/;principal=hive/c62-n3.intuit.t...@abc.xyz
>  
> scan complete in 4ms
> Connecting to 
> jdbc:hive2://c62-n3.intuit.test:1/;principal=hive/c62-n3.intuit.t...@abc.xyz
> Enter username for 
> jdbc:hive2://c62-n3.intuit.test:1/;principal=hive/c62-n3.intuit.t...@abc.xyz:
>  
> Enter password for 
> jdbc:hive2://c62-n3.intuit.test:1/;principal=hive/c62-n3.intuit.t...@abc.xyz:
>  
> Connected to: Hive (version 0.10.0)
> Driver: Hive (version 0.10.0-cdh4.2.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> 0: jdbc:hive2://c62-n3.intuit.test:1/> show tables;
> ---
> tab_name
> ---
> t1
> t2
> test
> ---
> 3 rows selected (1.749 seconds)
> 0: jdbc:hive2://c62-n3.intuit.test:1/>
> {code}
> Case 2:
> I can keep the garbage realm, but if I use a different hostname (notice I've 
> truncated it to c62-n3.intuit instead of c62-n3.intuit.test), it fails (as it 
> should) but the error message is not at all user-friendly.
> {code}
> [root@c62-n3 ~]# beeline
> Beeline version 0.10.0-cdh4.2.0 by Apache Hive
> beeline> !connect 
> jdbc:hive2://c62-n3.intuit.test:1/;principal=hive/c62-n3.intuit@ABC 
> scan complete in 4ms
> Connecting to 
> jdbc:hive2://c62-n3.intuit.test:1/;principal=hive/c62-n3.intuit@ABC
> Enter username for 
> jdbc:hive2://c62-n3.intuit.test:1/;principal=hive/c62-n3.intuit@ABC: 
> Enter password for 
> jdbc:hive2://c62-n3.intuit.test:1/;principal=hive/c62-n3.intuit@ABC: 
> 13/06/10 08:34:29 ERROR transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Server not 
> found in Kerberos database (7) - UNKNOWN_SERVER)]
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
> at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
> at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
> at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:156)
> at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:96)
> at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:104)
> at java.sql.DriverManager.getConnection(DriverManager.java:582)
> at java.sql.DriverManager.getConnection(DriverManager.java:185)
> at 
> org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:152)
> at 
> org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:193)
> at org.apache.hive.beeline.Commands.connect(Commands.java:965)
> at org.apache.hive.beeline.Commands.connect(Commands.java:896)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:66)
> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:755)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:631)
> at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:380)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:364)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> 

[jira] [Updated] (HIVE-14663) Change ptest java language version to 1.7, other version changes and fixes

2016-08-29 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14663:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the reviews [~prasanth_j] and [~spena]

> Change ptest java language version to 1.7, other version changes and fixes
> --
>
> Key: HIVE-14663
> URL: https://issues.apache.org/jira/browse/HIVE-14663
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 2.2.0
>
> Attachments: HIVE-14663.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14362) Support explain analyze in Hive

2016-08-29 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447468#comment-15447468
 ] 

Ashutosh Chauhan commented on HIVE-14362:
-

+1

> Support explain analyze in Hive
> ---
>
> Key: HIVE-14362
> URL: https://issues.apache.org/jira/browse/HIVE-14362
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14362.01.patch, HIVE-14362.02.patch, 
> HIVE-14362.03.patch, HIVE-14362.05.patch, compare_on_cluster.pdf
>
>
> Right now all the explain levels only support stats before query runs. We 
> would like to have an explain analyze similar to Postgres for real stats 
> after query runs. This will help to identify the major gap between 
> estimated/real stats and make not only query optimization better but also 
> query performance debugging easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14576) Testing: Fixes to TestHBaseMinimrCliDriver

2016-08-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447457#comment-15447457
 ] 

Hari Sankar Sivarama Subramaniyan edited comment on HIVE-14576 at 8/29/16 11:48 
PM:


Brings down the total runtime from ~1000 seconds to ~255 seconds.


was (Author: hsubramaniyan):
Brings down the runtime to ~255 seconds.

> Testing: Fixes to TestHBaseMinimrCliDriver
> --
>
> Key: HIVE-14576
> URL: https://issues.apache.org/jira/browse/HIVE-14576
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14576.1.patch
>
>
> 1. Runtime over 1000s.
> 2. Runs as an isolated test.
> Need to fix both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14576) Testing: Fixes to TestHBaseMinimrCliDriver

2016-08-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14576:
-
Attachment: HIVE-14576.1.patch

Brings down the runtime to ~255 seconds.

> Testing: Fixes to TestHBaseMinimrCliDriver
> --
>
> Key: HIVE-14576
> URL: https://issues.apache.org/jira/browse/HIVE-14576
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14576.1.patch
>
>
> 1. Runtime over 1000s.
> 2. Runs as an isolated test.
> Need to fix both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14576) Testing: Fixes to TestHBaseMinimrCliDriver

2016-08-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14576:
-
Status: Patch Available  (was: Open)

> Testing: Fixes to TestHBaseMinimrCliDriver
> --
>
> Key: HIVE-14576
> URL: https://issues.apache.org/jira/browse/HIVE-14576
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14576.1.patch
>
>
> 1. Runtime over 1000s.
> 2. Runs as an isolated test.
> Need to fix both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14576) Testing: Fixes to TestHBaseMinimrCliDriver

2016-08-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447454#comment-15447454
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-14576:
--

The immediate fix to point 1 mentioned in the description would be to use 
setInitScript("q_test_init_for_minimr.sql");

> Testing: Fixes to TestHBaseMinimrCliDriver
> --
>
> Key: HIVE-14576
> URL: https://issues.apache.org/jira/browse/HIVE-14576
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> 1. Runtime over 1000s.
> 2. Runs as an isolated test.
> Need to fix both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14576) Testing: Fixes to TestHBaseMinimrCliDriver

2016-08-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan reassigned HIVE-14576:


Assignee: Hari Sankar Sivarama Subramaniyan  (was: Vaibhav Gumashta)

> Testing: Fixes to TestHBaseMinimrCliDriver
> --
>
> Key: HIVE-14576
> URL: https://issues.apache.org/jira/browse/HIVE-14576
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> 1. Runtime over 1000s.
> 2. Runs as an isolated test.
> Need to fix both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3

2016-08-29 Thread Abdullah Yousufi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447371#comment-15447371
 ] 

Abdullah Yousufi commented on HIVE-14373:
-

How would this mkdir and rmdir on S3 be happening? Is there a way to automate 
this as part of the testing rather than having the user do it manually?

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Abdullah Yousufi
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14663) Change ptest java language version to 1.7, other version changes and fixes

2016-08-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447374#comment-15447374
 ] 

Hive QA commented on HIVE-14663:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12826049/HIVE-14663.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10466 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1036/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1036/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1036/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12826049 - PreCommit-HIVE-MASTER-Build

> Change ptest java language version to 1.7, other version changes and fixes
> --
>
> Key: HIVE-14663
> URL: https://issues.apache.org/jira/browse/HIVE-14663
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14663.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14536) Unit test code cleanup

2016-08-29 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447342#comment-15447342
 ] 

Ashutosh Chauhan commented on HIVE-14536:
-

Patch looks good to me. It has lot of red diffs and I am fan of those resulting 
in deletion of code : ) [~kgyrtkirk] & [~sseth] has dealt with test code a lot 
lately, so lets hear from them if they have any more suggestion.


> Unit test code cleanup
> --
>
> Key: HIVE-14536
> URL: https://issues.apache.org/jira/browse/HIVE-14536
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-14536.5.patch, HIVE-14536.6.patch, 
> HIVE-14536.7.patch, HIVE-14536.patch
>
>
> Clean up the itest infrastructure, to create a readable, easy to understand 
> code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14564) Column Pruning generates out of order columns in SelectOperator which cause ArrayIndexOutOfBoundsException.

2016-08-29 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447322#comment-15447322
 ] 

Ashutosh Chauhan commented on HIVE-14564:
-

I still would like to see a query which reproduces this problem, since we 
havent seen this in past. Also test will be useful for regression purposes.

> Column Pruning generates out of order columns in SelectOperator which cause 
> ArrayIndexOutOfBoundsException.
> ---
>
> Key: HIVE-14564
> URL: https://issues.apache.org/jira/browse/HIVE-14564
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: HIVE-14564.000.patch, HIVE-14564.001.patch, 
> HIVE-14564.002.patch
>
>
> Column Pruning generates out of order columns in SelectOperator which cause 
> ArrayIndexOutOfBoundsException.
> {code}
> 2016-07-26 21:49:24,390 FATAL [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
>   ... 9 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at java.lang.System.arraycopy(Native Method)
>   at org.apache.hadoop.io.Text.set(Text.java:225)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:550)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:377)
>   ... 13 more
> {code}
> The exception is because the serialization and deserialization doesn't match.
> The serialization by LazyBinarySerDe from previous MapReduce job used 
> different order of columns. When the current MapReduce job deserialized the 
> intermediate sequence file generated by previous MapReduce job, it will get 
> corrupted data from the deserialization using wrong order of columns by 
> LazyBinaryStruct. The unmatched columns between  serialization and 
> deserialization is caused by SelectOperator's Column Pruning 
> {{ColumnPrunerSelectProc}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14418) Hive config validation prevents unsetting the settings

2016-08-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14418:

Release Note: "-d" option has been added to Hive CLI "reset" command, 
allowing one to reset specific settings to built-in defaults, overriding any 
session-specific values, as well as configuration files in use. E.g. "reset -d 
hive.compute.splits.in.am hive.smbjoin.cache.rows".

> Hive config validation prevents unsetting the settings
> --
>
> Key: HIVE-14418
> URL: https://issues.apache.org/jira/browse/HIVE-14418
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14418.01.patch, HIVE-14418.02.patch, 
> HIVE-14418.03.patch, HIVE-14418.04.patch, HIVE-14418.patch
>
>
> {noformat}
> hive> set hive.tez.task.scale.memory.reserve.fraction.max=;
> Query returned non-zero code: 1, cause: 'SET 
> hive.tez.task.scale.memory.reserve.fraction.max=' FAILED because 
> hive.tez.task.scale.memory.reserve.fraction.max expects FLOAT type value.
> hive> set hive.tez.task.scale.memory.reserve.fraction.max=null;
> Query returned non-zero code: 1, cause: 'SET 
> hive.tez.task.scale.memory.reserve.fraction.max=null' FAILED because 
> hive.tez.task.scale.memory.reserve.fraction.max expects FLOAT type value.
> {noformat}
> unset also doesn't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14621) LLAP: memory.mode = none has NPE

2016-08-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14621:

   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Committed. Thanks for the review!


> LLAP: memory.mode = none has NPE
> 
>
> Key: HIVE-14621
> URL: https://issues.apache.org/jira/browse/HIVE-14621
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14621.01.patch, HIVE-14621.patch
>
>
> When IO elevator is enabled, but cache and allocator are both disabled, NPEs 
> happen. It's not really a recommended mode, but it's the only way to disable 
> cache, so we probably need to fix it. I am also going to nuke the 
> intermediate mode (allocator w/no cache) meanwhile cause it's pointless and 
> just creates a zoo of configurations.
> {noformat}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.llap.cache.LlapDataBuffer.getByteBufferDup(LlapDataBuffer.java:59)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.StreamUtils.createDiskRangeInfo(StreamUtils.java:63)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.StreamUtils.createSettableUncompressedStream(StreamUtils.java:48)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$LongStreamReader$StreamReaderBuilder.build(EncodedTreeReaderFactory.java:514)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory.createEncodedTreeReader(EncodedTreeReaderFactory.java:1737)
> at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:162)
> at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:55)
> at 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:76)
> at 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:30)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:408)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:424)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:227)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:224)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:224)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:93)
> ... 6 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14418) Hive config validation prevents unsetting the settings

2016-08-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14418:

   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Committed to branches. Thanks for the reviews :)

> Hive config validation prevents unsetting the settings
> --
>
> Key: HIVE-14418
> URL: https://issues.apache.org/jira/browse/HIVE-14418
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14418.01.patch, HIVE-14418.02.patch, 
> HIVE-14418.03.patch, HIVE-14418.04.patch, HIVE-14418.patch
>
>
> {noformat}
> hive> set hive.tez.task.scale.memory.reserve.fraction.max=;
> Query returned non-zero code: 1, cause: 'SET 
> hive.tez.task.scale.memory.reserve.fraction.max=' FAILED because 
> hive.tez.task.scale.memory.reserve.fraction.max expects FLOAT type value.
> hive> set hive.tez.task.scale.memory.reserve.fraction.max=null;
> Query returned non-zero code: 1, cause: 'SET 
> hive.tez.task.scale.memory.reserve.fraction.max=null' FAILED because 
> hive.tez.task.scale.memory.reserve.fraction.max expects FLOAT type value.
> {noformat}
> unset also doesn't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-29 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447222#comment-15447222
 ] 

Eugene Koifman commented on HIVE-14233:
---

Added some more comments on RB - mostly nits but not all

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-29 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447222#comment-15447222
 ] 

Eugene Koifman edited comment on HIVE-14233 at 8/29/16 10:16 PM:
-

Added some more comments on RB for patch10 - mostly nits but not all


was (Author: ekoifman):
Added some more comments on RB - mostly nits but not all

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14663) Change ptest java language version to 1.7, other version changes and fixes

2016-08-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447214#comment-15447214
 ] 

Sergio Peña commented on HIVE-14663:


LGTM
+1

> Change ptest java language version to 1.7, other version changes and fixes
> --
>
> Key: HIVE-14663
> URL: https://issues.apache.org/jira/browse/HIVE-14663
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14663.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14540) Create batches for non qfile tests

2016-08-29 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14540:
--
Status: Patch Available  (was: Open)

> Create batches for non qfile tests
> --
>
> Key: HIVE-14540
> URL: https://issues.apache.org/jira/browse/HIVE-14540
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14540.01.patch
>
>
> From run 790:
> Reported runtime by junit: 17 hours
> Reported runtime by ptest: 34 hours
> A lot of time is wasted spinning up mvn test for each individual test, which 
> otherwise takes less than 1 second. These tests could end up taking 20-30 
> seconds. Combined with HIVE-14539 - 60-70s.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14540) Create batches for non qfile tests

2016-08-29 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14540:
--
Attachment: HIVE-14540.01.patch

The patch adds support to set configuration for unitTests at a module level. 
This includes batchSize, include, exclude, isolate and skipBatching. (TBD: 
custom test runner)
Also includes some small changes to the source-prep and batch-exec files - 
skipped git gc, logging date occasionally, skipped a find, a test for the 
itests dir.

TODO going forward: Make the test list generator completely pluggable.

Addresses HIVE-14539 as part of the batch - to reach the correct dir to run the 
test.

[~prasanth_j], [~spena], [~vgumashta] - Could you please take a look. Applies 
on top of HIVE-14663.

> Create batches for non qfile tests
> --
>
> Key: HIVE-14540
> URL: https://issues.apache.org/jira/browse/HIVE-14540
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14540.01.patch
>
>
> From run 790:
> Reported runtime by junit: 17 hours
> Reported runtime by ptest: 34 hours
> A lot of time is wasted spinning up mvn test for each individual test, which 
> otherwise takes less than 1 second. These tests could end up taking 20-30 
> seconds. Combined with HIVE-14539 - 60-70s.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14564) Column Pruning generates out of order columns in SelectOperator which cause ArrayIndexOutOfBoundsException.

2016-08-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447139#comment-15447139
 ] 

Hive QA commented on HIVE-14564:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12826046/HIVE-14564.002.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 62 failed/errored test(s), 10466 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_aggregate]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_round_2]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_udf]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_3]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_reduce]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_2]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_orderby_5]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_0]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_13]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_15]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_limit]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_short_regress]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_parquet]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_parquet_types]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_ptf]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[windowing_gby2]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_1]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[limit_pushdown]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[ptf]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[vector_decimal_aggregate]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[vector_decimal_round_2]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[vector_decimal_udf]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[vector_groupby_3]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[vector_groupby_reduce]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[vector_interval_2]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[vector_orderby_5]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[vectorization_0]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[vectorization_13]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[vectorization_15]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[vectorization_short_regress]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[vectorized_parquet]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[vectorized_parquet_types]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[vectorized_ptf]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[windowing_gby]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vectorization_limit]
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query17]
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query72]
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query85]
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query89]
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query91]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[dynamic_rdd_cache]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby9]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_position]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[limit_pushdown]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[multi_insert_gby3]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[multi_insert_lateral_view]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[multigroupby_singlemr]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ptf]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_decimal_aggregate]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_groupby_3]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_orderby_5]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_0]

[jira] [Comment Edited] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-08-29 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447093#comment-15447093
 ] 

Sahil Takiar edited comment on HIVE-14170 at 8/29/16 9:26 PM:
--

You can test this locally by:

* Applying the patch locally
* Building the code, un-tarring the distribution, etc.
* Run Beeline with the --incremental and --incrementalBufferRows=5 options
* Load some dummy data into a table (15 rows should be sufficient, you can just 
have a single column that is a string; ideally each row is of varying length)
* Run a {{select *}} from the table

The output that is printed should show that the width of the output table is 
re-calculated every 5 rows. Note you can really only see this if the rows are 
of varying length. You can also run this without the {{--incremental}} to see 
what the output looks if a global width calculation is done.


was (Author: stakiar):
You can test this locally by:

* Applying the patch locally
* Building the code, un-tarring the distribution, etc.
* Run Beeline with the {{--incremental}} and {{--incrementalBufferRows=5}} 
options
* Load some dummy data into a table (15 rows should be sufficient, you can just 
have a single column that is a string; ideally each row is of varying length)
* Run a {{select *}} from the table

The output that is printed should show that the width of the output table is 
re-calculated every 5 rows. Note you can really only see this if the rows are 
of varying length. You can also run this without the {{--incremental}} to see 
what the output looks if a global width calculation is done.

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch, 
> HIVE-14170.3.patch, HIVE-14170.4.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-08-29 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447093#comment-15447093
 ] 

Sahil Takiar commented on HIVE-14170:
-

You can test this locally by:

* Applying the patch locally
* Building the code, un-tarring the distribution, etc.
* Run Beeline with the {{--incremental}} and {{--incrementalBufferRows=5}} 
options
* Load some dummy data into a table (15 rows should be sufficient, you can just 
have a single column that is a string; ideally each row is of varying length)
* Run a {{select *}} from the table

The output that is printed should show that the width of the output table is 
re-calculated every 5 rows. Note you can really only see this if the rows are 
of varying length. You can also run this without the {{--incremental}} to see 
what the output looks if a global width calculation is done.

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch, 
> HIVE-14170.3.patch, HIVE-14170.4.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14538) beeline throws exceptions with parsing hive config when using !sh statement

2016-08-29 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-14538:

Attachment: HIVE-14538.2.patch

> beeline throws exceptions with parsing hive config when using !sh statement
> ---
>
> Key: HIVE-14538
> URL: https://issues.apache.org/jira/browse/HIVE-14538
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-14538.1.patch, HIVE-14538.2.patch
>
>
> When beeline has a connection to a server, in some env it has following 
> problem:
> {noformat}
> 0: jdbc:hive2://localhost> !verbose
> verbose: on
> 0: jdbc:hive2://localhost> !sh id
> java.lang.ArrayIndexOutOfBoundsException: 1
> at org.apache.hive.beeline.Commands.addConf(Commands.java:758)
> at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:704)
> at org.apache.hive.beeline.Commands.sh(Commands.java:1002)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081)
> at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845)
> at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> 0: jdbc:hive2://localhost> !sh echo hello
> java.lang.ArrayIndexOutOfBoundsException: 1
> at org.apache.hive.beeline.Commands.addConf(Commands.java:758)
> at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:704)
> at org.apache.hive.beeline.Commands.sh(Commands.java:1002)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081)
> at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845)
> at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> 0: jdbc:hive2://localhost>
> {noformat}
> Also it breaks if there is no connection established:
> {noformat}
> beeline> !sh id
> java.lang.NullPointerException
> at org.apache.hive.beeline.BeeLine.createStatement(BeeLine.java:1897)
> at org.apache.hive.beeline.Commands.getConfInternal(Commands.java:724)
> at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:702)
> at org.apache.hive.beeline.Commands.sh(Commands.java:1002)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081)
> at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845)
> at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> 

[jira] [Updated] (HIVE-14627) Improvements to MiniMr tests

2016-08-29 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14627:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the reviews!

> Improvements to MiniMr tests
> 
>
> Key: HIVE-14627
> URL: https://issues.apache.org/jira/browse/HIVE-14627
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 2.2.0
>
> Attachments: HIVE-14627.1.patch, HIVE-14627.2.patch, 
> HIVE-14627.3.patch
>
>
> Currently MiniMr is extremely slow, I ran udf_using.q on MiniMr and following 
> are the execution time breakdown
> Total time - 13m59s
> Junit reported time for testcase - 50s
> Most of the time is spent in creating/loading/analyzing initial tables - ~12m
> Cleanup - ~1m
> There is huge overhead for running MiniMr tests when compared to the actual 
> test runtime. 
> Ran the same test without init script.
> Total time - 2m17s
> Junit reported time for testcase - 52s
> Also I noticed some tests that doesn't have to run on MiniMr (like 
> udf_using.q that does not require MiniMr. It just reads/write to hdfs which 
> we can do in MiniTez/MiniLlap which are way faster). Most tests access only 
> very few initial tables to read few rows from it. We can fix those tests to 
> load just the table that is required for the table instead of all initial 
> tables. Also we can remove q_init_script.sql initialization for MiniMr after 
> rewriting and moving over the unwanted tests which should cut down the 
> runtime a lot.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14627) Improvements to MiniMr tests

2016-08-29 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447058#comment-15447058
 ] 

Siddharth Seth commented on HIVE-14627:
---

Got it.
+1

> Improvements to MiniMr tests
> 
>
> Key: HIVE-14627
> URL: https://issues.apache.org/jira/browse/HIVE-14627
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14627.1.patch, HIVE-14627.2.patch, 
> HIVE-14627.3.patch
>
>
> Currently MiniMr is extremely slow, I ran udf_using.q on MiniMr and following 
> are the execution time breakdown
> Total time - 13m59s
> Junit reported time for testcase - 50s
> Most of the time is spent in creating/loading/analyzing initial tables - ~12m
> Cleanup - ~1m
> There is huge overhead for running MiniMr tests when compared to the actual 
> test runtime. 
> Ran the same test without init script.
> Total time - 2m17s
> Junit reported time for testcase - 52s
> Also I noticed some tests that doesn't have to run on MiniMr (like 
> udf_using.q that does not require MiniMr. It just reads/write to hdfs which 
> we can do in MiniTez/MiniLlap which are way faster). Most tests access only 
> very few initial tables to read few rows from it. We can fix those tests to 
> load just the table that is required for the table instead of all initial 
> tables. Also we can remove q_init_script.sql initialization for MiniMr after 
> rewriting and moving over the unwanted tests which should cut down the 
> runtime a lot.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14627) Improvements to MiniMr tests

2016-08-29 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447032#comment-15447032
 ] 

Prasanth Jayachandran commented on HIVE-14627:
--

If we did not add a q file testconfiguration.properties file then it will be 
run by TestCliDriver by default. orc_mr_pathalias.q ran successfully in 
TestCliDriver
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1023/testReport/org.apache.hadoop.hive.cli/TestCliDriver/testCliDriver_orc_mr_pathalias_/

optrstat_groupby.q file does not exist in qfile directory 
https://github.com/apache/hive/tree/master/ql/src/test/queries/clientpositive
so none of testsuite will execute the test. Any time reported for that test 
will just be initialization and cleanup. 

> Improvements to MiniMr tests
> 
>
> Key: HIVE-14627
> URL: https://issues.apache.org/jira/browse/HIVE-14627
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14627.1.patch, HIVE-14627.2.patch, 
> HIVE-14627.3.patch
>
>
> Currently MiniMr is extremely slow, I ran udf_using.q on MiniMr and following 
> are the execution time breakdown
> Total time - 13m59s
> Junit reported time for testcase - 50s
> Most of the time is spent in creating/loading/analyzing initial tables - ~12m
> Cleanup - ~1m
> There is huge overhead for running MiniMr tests when compared to the actual 
> test runtime. 
> Ran the same test without init script.
> Total time - 2m17s
> Junit reported time for testcase - 52s
> Also I noticed some tests that doesn't have to run on MiniMr (like 
> udf_using.q that does not require MiniMr. It just reads/write to hdfs which 
> we can do in MiniTez/MiniLlap which are way faster). Most tests access only 
> very few initial tables to read few rows from it. We can fix those tests to 
> load just the table that is required for the table instead of all initial 
> tables. Also we can remove q_init_script.sql initialization for MiniMr after 
> rewriting and moving over the unwanted tests which should cut down the 
> runtime a lot.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13680) HiveServer2: Provide a way to compress ResultSets

2016-08-29 Thread Kevin Liew (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Liew updated HIVE-13680:
--
Attachment: SnappyCompDe.zip

> HiveServer2: Provide a way to compress ResultSets
> -
>
> Key: HIVE-13680
> URL: https://issues.apache.org/jira/browse/HIVE-13680
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC
>Reporter: Vaibhav Gumashta
>Assignee: Kevin Liew
> Attachments: HIVE-13680.2.patch, HIVE-13680.3.patch, 
> HIVE-13680.4.patch, HIVE-13680.patch, SnappyCompDe.zip, proposal.pdf
>
>
> With HIVE-12049 in, we can provide an option to compress ResultSets before 
> writing to disk. The user can specify a compression library via a config 
> param which can be used in the tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13680) HiveServer2: Provide a way to compress ResultSets

2016-08-29 Thread Kevin Liew (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Liew updated HIVE-13680:
--
Attachment: (was: SnappyCompDe.zip)

> HiveServer2: Provide a way to compress ResultSets
> -
>
> Key: HIVE-13680
> URL: https://issues.apache.org/jira/browse/HIVE-13680
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC
>Reporter: Vaibhav Gumashta
>Assignee: Kevin Liew
> Attachments: HIVE-13680.2.patch, HIVE-13680.3.patch, 
> HIVE-13680.4.patch, HIVE-13680.patch, proposal.pdf
>
>
> With HIVE-12049 in, we can provide an option to compress ResultSets before 
> writing to disk. The user can specify a compression library via a config 
> param which can be used in the tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14621) LLAP: memory.mode = none has NPE

2016-08-29 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447023#comment-15447023
 ] 

Prasanth Jayachandran commented on HIVE-14621:
--

+1

> LLAP: memory.mode = none has NPE
> 
>
> Key: HIVE-14621
> URL: https://issues.apache.org/jira/browse/HIVE-14621
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14621.01.patch, HIVE-14621.patch
>
>
> When IO elevator is enabled, but cache and allocator are both disabled, NPEs 
> happen. It's not really a recommended mode, but it's the only way to disable 
> cache, so we probably need to fix it. I am also going to nuke the 
> intermediate mode (allocator w/no cache) meanwhile cause it's pointless and 
> just creates a zoo of configurations.
> {noformat}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.llap.cache.LlapDataBuffer.getByteBufferDup(LlapDataBuffer.java:59)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.StreamUtils.createDiskRangeInfo(StreamUtils.java:63)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.StreamUtils.createSettableUncompressedStream(StreamUtils.java:48)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$LongStreamReader$StreamReaderBuilder.build(EncodedTreeReaderFactory.java:514)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory.createEncodedTreeReader(EncodedTreeReaderFactory.java:1737)
> at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:162)
> at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:55)
> at 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:76)
> at 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:30)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:408)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:424)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:227)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:224)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:224)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:93)
> ... 6 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14614) Insert overwrite local directory fails with IllegalStateException

2016-08-29 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14614:
---
Priority: Major  (was: Minor)

> Insert overwrite local directory fails with IllegalStateException
> -
>
> Key: HIVE-14614
> URL: https://issues.apache.org/jira/browse/HIVE-14614
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Fix For: 2.2.0
>
> Attachments: HIVE-14614.05.patch, HIVE-14614.2.patch, 
> HIVE-14614.3.patch, HIVE-14614.4.patch
>
>
> insert overwrite local directory  select * from table; fails with 
> "java.lang.IllegalStateException: Cannot create staging directory" when the 
> path sent to the getTempDirForPath(Path path)  is a local fs path.
> This is a regression caused by the fix for HIVE-14270



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14652) incorrect results for not in on partition columns

2016-08-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14652:

Attachment: HIVE-14652.01.patch

Updated the patch to do the check first, and also to remove the special 
handling for the UDF. Do you know why this special handling was needed? 
What would be a good query to test the intended effect before and after? I've 
run auto_join19_inclause and filter_in_or_dup (tests added with the code), but 
they don't exercise the lhs UDF path, as far as I see from the added logging.

> incorrect results for not in on partition columns
> -
>
> Key: HIVE-14652
> URL: https://issues.apache.org/jira/browse/HIVE-14652
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: stephen sprague
>Assignee: Sergey Shelukhin
>Priority: Blocker
> Attachments: HIVE-14652.01.patch, HIVE-14652.patch
>
>
> {noformat}
> create table foo (i int) partitioned by (s string);
> insert overwrite table foo partition(s='foo') select cint from alltypesorc 
> limit 10;
> insert overwrite table foo partition(s='bar') select cint from alltypesorc 
> limit 10;
> select * from foo where s not in ('bar');
> {noformat}
> No results. IN ... works correctly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14614) Insert overwrite local directory fails with IllegalStateException

2016-08-29 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14614:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

thanks [~vihangk1] I committed to master 2.2

> Insert overwrite local directory fails with IllegalStateException
> -
>
> Key: HIVE-14614
> URL: https://issues.apache.org/jira/browse/HIVE-14614
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14614.05.patch, HIVE-14614.2.patch, 
> HIVE-14614.3.patch, HIVE-14614.4.patch
>
>
> insert overwrite local directory  select * from table; fails with 
> "java.lang.IllegalStateException: Cannot create staging directory" when the 
> path sent to the getTempDirForPath(Path path)  is a local fs path.
> This is a regression caused by the fix for HIVE-14270



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14614) Insert overwrite local directory fails with IllegalStateException

2016-08-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446940#comment-15446940
 ] 

Sergio Peña commented on HIVE-14614:


Patch looks good.
+1

> Insert overwrite local directory fails with IllegalStateException
> -
>
> Key: HIVE-14614
> URL: https://issues.apache.org/jira/browse/HIVE-14614
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-14614.05.patch, HIVE-14614.2.patch, 
> HIVE-14614.3.patch, HIVE-14614.4.patch
>
>
> insert overwrite local directory  select * from table; fails with 
> "java.lang.IllegalStateException: Cannot create staging directory" when the 
> path sent to the getTempDirForPath(Path path)  is a local fs path.
> This is a regression caused by the fix for HIVE-14270



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14614) Insert overwrite local directory fails with IllegalStateException

2016-08-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446919#comment-15446919
 ] 

Hive QA commented on HIVE-14614:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12826047/HIVE-14614.05.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10466 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1034/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1034/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1034/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12826047 - PreCommit-HIVE-MASTER-Build

> Insert overwrite local directory fails with IllegalStateException
> -
>
> Key: HIVE-14614
> URL: https://issues.apache.org/jira/browse/HIVE-14614
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-14614.05.patch, HIVE-14614.2.patch, 
> HIVE-14614.3.patch, HIVE-14614.4.patch
>
>
> insert overwrite local directory  select * from table; fails with 
> "java.lang.IllegalStateException: Cannot create staging directory" when the 
> path sent to the getTempDirForPath(Path path)  is a local fs path.
> This is a regression caused by the fix for HIVE-14270



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14538) beeline throws exceptions with parsing hive config when using !sh statement

2016-08-29 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446777#comment-15446777
 ] 

Yongzhi Chen commented on HIVE-14538:
-

Thanks [~ctang.ma] for the comments. 
1. I will change the code.
2. I need to run without a connection, for it is just a shell call, and let it 
succeed is what the customer wants. I will change the variable to dbconn. 
3. substitute does not work for beeline, hive does not support it because 
client should not control server's variable. So it is no point to run the 
parameter code when later the method is just ignored.

> beeline throws exceptions with parsing hive config when using !sh statement
> ---
>
> Key: HIVE-14538
> URL: https://issues.apache.org/jira/browse/HIVE-14538
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-14538.1.patch
>
>
> When beeline has a connection to a server, in some env it has following 
> problem:
> {noformat}
> 0: jdbc:hive2://localhost> !verbose
> verbose: on
> 0: jdbc:hive2://localhost> !sh id
> java.lang.ArrayIndexOutOfBoundsException: 1
> at org.apache.hive.beeline.Commands.addConf(Commands.java:758)
> at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:704)
> at org.apache.hive.beeline.Commands.sh(Commands.java:1002)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081)
> at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845)
> at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> 0: jdbc:hive2://localhost> !sh echo hello
> java.lang.ArrayIndexOutOfBoundsException: 1
> at org.apache.hive.beeline.Commands.addConf(Commands.java:758)
> at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:704)
> at org.apache.hive.beeline.Commands.sh(Commands.java:1002)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081)
> at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845)
> at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> 0: jdbc:hive2://localhost>
> {noformat}
> Also it breaks if there is no connection established:
> {noformat}
> beeline> !sh id
> java.lang.NullPointerException
> at org.apache.hive.beeline.BeeLine.createStatement(BeeLine.java:1897)
> at org.apache.hive.beeline.Commands.getConfInternal(Commands.java:724)
> at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:702)
> at org.apache.hive.beeline.Commands.sh(Commands.java:1002)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
> at 

[jira] [Commented] (HIVE-14627) Improvements to MiniMr tests

2016-08-29 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446780#comment-15446780
 ] 

Siddharth Seth commented on HIVE-14627:
---

Don't think the following two tests run anywhere else after this - 
optrstat_groupby.q - present in miniSparkOnYarn.query.files
orc_mr_pathalias.q - absent on the properties file


> Improvements to MiniMr tests
> 
>
> Key: HIVE-14627
> URL: https://issues.apache.org/jira/browse/HIVE-14627
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14627.1.patch, HIVE-14627.2.patch, 
> HIVE-14627.3.patch
>
>
> Currently MiniMr is extremely slow, I ran udf_using.q on MiniMr and following 
> are the execution time breakdown
> Total time - 13m59s
> Junit reported time for testcase - 50s
> Most of the time is spent in creating/loading/analyzing initial tables - ~12m
> Cleanup - ~1m
> There is huge overhead for running MiniMr tests when compared to the actual 
> test runtime. 
> Ran the same test without init script.
> Total time - 2m17s
> Junit reported time for testcase - 52s
> Also I noticed some tests that doesn't have to run on MiniMr (like 
> udf_using.q that does not require MiniMr. It just reads/write to hdfs which 
> we can do in MiniTez/MiniLlap which are way faster). Most tests access only 
> very few initial tables to read few rows from it. We can fix those tests to 
> load just the table that is required for the table instead of all initial 
> tables. Also we can remove q_init_script.sql initialization for MiniMr after 
> rewriting and moving over the unwanted tests which should cut down the 
> runtime a lot.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14665) vector_join_part_col_char.q failure

2016-08-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14665:
-
Status: Patch Available  (was: Open)

> vector_join_part_col_char.q failure
> ---
>
> Key: HIVE-14665
> URL: https://issues.apache.org/jira/browse/HIVE-14665
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14665.1.patch
>
>
> Happens 100% of the time. Looks like a missed golden file update from 
> HIVE-14502.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14665) vector_join_part_col_char.q failure

2016-08-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14665:
-
Attachment: HIVE-14665.1.patch

cc [~prasanth_j]

> vector_join_part_col_char.q failure
> ---
>
> Key: HIVE-14665
> URL: https://issues.apache.org/jira/browse/HIVE-14665
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14665.1.patch
>
>
> Happens 100% of the time. Looks like a missed golden file update from 
> HIVE-14502.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14663) Change ptest java language version to 1.7, other version changes and fixes

2016-08-29 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14663:
--
Description: (was: NO_PRECOMMIT_TESTS)

> Change ptest java language version to 1.7, other version changes and fixes
> --
>
> Key: HIVE-14663
> URL: https://issues.apache.org/jira/browse/HIVE-14663
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14663.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14663) Change ptest java language version to 1.7, other version changes and fixes

2016-08-29 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446755#comment-15446755
 ] 

Siddharth Seth edited comment on HIVE-14663 at 8/29/16 7:01 PM:


That's for a future jira. I've tested this locally. Committing in a bit. Thank 
for the review.

On second thought - I'll let jenkins kick in. At least the build and ptest 
client usage gets tested out.


was (Author: sseth):
That's for a future jira. I've tested this locally. Committing in a bit. Thank 
for the review.

> Change ptest java language version to 1.7, other version changes and fixes
> --
>
> Key: HIVE-14663
> URL: https://issues.apache.org/jira/browse/HIVE-14663
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14663.01.patch
>
>
> NO_PRECOMMIT_TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14663) Change ptest java language version to 1.7, other version changes and fixes

2016-08-29 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446755#comment-15446755
 ] 

Siddharth Seth commented on HIVE-14663:
---

That's for a future jira. I've tested this locally. Committing in a bit. Thank 
for the review.

> Change ptest java language version to 1.7, other version changes and fixes
> --
>
> Key: HIVE-14663
> URL: https://issues.apache.org/jira/browse/HIVE-14663
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14663.01.patch
>
>
> NO_PRECOMMIT_TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14663) Change ptest java language version to 1.7, other version changes and fixes

2016-08-29 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14663:
--
Description: NO_PRECOMMIT_TESTS

> Change ptest java language version to 1.7, other version changes and fixes
> --
>
> Key: HIVE-14663
> URL: https://issues.apache.org/jira/browse/HIVE-14663
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14663.01.patch
>
>
> NO_PRECOMMIT_TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14515) Schema evolution uses slow INSERT INTO .. VALUES

2016-08-29 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14515:
-
  Resolution: Fixed
   Fix Version/s: 2.2.0
Target Version/s: 2.2.0
  Status: Resolved  (was: Patch Available)

> Schema evolution uses slow INSERT INTO .. VALUES
> 
>
> Key: HIVE-14515
> URL: https://issues.apache.org/jira/browse/HIVE-14515
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-14515.01.patch, HIVE-14515.02.patch, 
> HIVE-14515.03.patch, HIVE-14515.04.patch
>
>
> Use LOAD DATA LOCAL INPATH and INSERT INTO TABLE ... SELECT * FROM instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14515) Schema evolution uses slow INSERT INTO .. VALUES

2016-08-29 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14515:
-
Affects Version/s: 2.2.0

> Schema evolution uses slow INSERT INTO .. VALUES
> 
>
> Key: HIVE-14515
> URL: https://issues.apache.org/jira/browse/HIVE-14515
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-14515.01.patch, HIVE-14515.02.patch, 
> HIVE-14515.03.patch, HIVE-14515.04.patch
>
>
> Use LOAD DATA LOCAL INPATH and INSERT INTO TABLE ... SELECT * FROM instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14418) Hive config validation prevents unsetting the settings

2016-08-29 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446732#comment-15446732
 ] 

Ashutosh Chauhan commented on HIVE-14418:
-

+1

> Hive config validation prevents unsetting the settings
> --
>
> Key: HIVE-14418
> URL: https://issues.apache.org/jira/browse/HIVE-14418
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14418.01.patch, HIVE-14418.02.patch, 
> HIVE-14418.03.patch, HIVE-14418.04.patch, HIVE-14418.patch
>
>
> {noformat}
> hive> set hive.tez.task.scale.memory.reserve.fraction.max=;
> Query returned non-zero code: 1, cause: 'SET 
> hive.tez.task.scale.memory.reserve.fraction.max=' FAILED because 
> hive.tez.task.scale.memory.reserve.fraction.max expects FLOAT type value.
> hive> set hive.tez.task.scale.memory.reserve.fraction.max=null;
> Query returned non-zero code: 1, cause: 'SET 
> hive.tez.task.scale.memory.reserve.fraction.max=null' FAILED because 
> hive.tez.task.scale.memory.reserve.fraction.max expects FLOAT type value.
> {noformat}
> unset also doesn't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14627) Improvements to MiniMr tests

2016-08-29 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446728#comment-15446728
 ] 

Prasanth Jayachandran commented on HIVE-14627:
--

The test run is clean. Failures unrelated to this patch.  [~sseth] Can you 
please review this patch?

> Improvements to MiniMr tests
> 
>
> Key: HIVE-14627
> URL: https://issues.apache.org/jira/browse/HIVE-14627
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14627.1.patch, HIVE-14627.2.patch, 
> HIVE-14627.3.patch
>
>
> Currently MiniMr is extremely slow, I ran udf_using.q on MiniMr and following 
> are the execution time breakdown
> Total time - 13m59s
> Junit reported time for testcase - 50s
> Most of the time is spent in creating/loading/analyzing initial tables - ~12m
> Cleanup - ~1m
> There is huge overhead for running MiniMr tests when compared to the actual 
> test runtime. 
> Ran the same test without init script.
> Total time - 2m17s
> Junit reported time for testcase - 52s
> Also I noticed some tests that doesn't have to run on MiniMr (like 
> udf_using.q that does not require MiniMr. It just reads/write to hdfs which 
> we can do in MiniTez/MiniLlap which are way faster). Most tests access only 
> very few initial tables to read few rows from it. We can fix those tests to 
> load just the table that is required for the table instead of all initial 
> tables. Also we can remove q_init_script.sql initialization for MiniMr after 
> rewriting and moving over the unwanted tests which should cut down the 
> runtime a lot.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14663) Change ptest java language version to 1.7, other version changes and fixes

2016-08-29 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446716#comment-15446716
 ] 

Prasanth Jayachandran commented on HIVE-14663:
--

+1. Although it will be easier to put all the versions into properties tag.

> Change ptest java language version to 1.7, other version changes and fixes
> --
>
> Key: HIVE-14663
> URL: https://issues.apache.org/jira/browse/HIVE-14663
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14663.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14635) establish a separate path for FSOP to write into final path

2016-08-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-14635.
-
Resolution: Fixed

Committed to the feature branch.

> establish a separate path for FSOP to write into final path
> ---
>
> Key: HIVE-14635
> URL: https://issues.apache.org/jira/browse/HIVE-14635
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: hive-14535
>
> Attachments: HIVE-14635.branch.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13930) upgrade Hive to Hadoop 2.7.2

2016-08-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13930:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review and test fixing effort!

> upgrade Hive to Hadoop 2.7.2
> 
>
> Key: HIVE-13930
> URL: https://issues.apache.org/jira/browse/HIVE-13930
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-13930.01.patch, HIVE-13930.02.patch, 
> HIVE-13930.07.patch, HIVE-13930.08.patch, HIVE-13930.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446687#comment-15446687
 ] 

Hive QA commented on HIVE-14233:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12826032/HIVE-14233.10.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10496 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1033/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1033/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1033/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12826032 - PreCommit-HIVE-MASTER-Build

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13930) upgrade Hive to Hadoop 2.7.2

2016-08-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13930:

Summary: upgrade Hive to Hadoop 2.7.2  (was: upgrade Hive to latest Hadoop 
version)

> upgrade Hive to Hadoop 2.7.2
> 
>
> Key: HIVE-13930
> URL: https://issues.apache.org/jira/browse/HIVE-13930
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13930.01.patch, HIVE-13930.02.patch, 
> HIVE-13930.07.patch, HIVE-13930.08.patch, HIVE-13930.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-29 Thread Saket Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446668#comment-15446668
 ] 

Saket Saurabh commented on HIVE-14233:
--

Oops, forgot to do that.. sure Eugene, done that now.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14659) OutputStream won't close if caught exception in funtion unparseExprForValuesClause in SemanticAnalyzer.java

2016-08-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446654#comment-15446654
 ] 

Sergey Shelukhin commented on HIVE-14659:
-

+1

> OutputStream won't close if caught exception in funtion 
> unparseExprForValuesClause in SemanticAnalyzer.java
> ---
>
> Key: HIVE-14659
> URL: https://issues.apache.org/jira/browse/HIVE-14659
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Fan Yunbo
>Assignee: Fan Yunbo
> Fix For: 2.2.0
>
> Attachments: HIVE-14659.1.patch
>
>
> I hava met the problem that Hive process cannot create new threads because of 
> lots of OutputStream not closed.
> Here is the part of jstack info:
> "Thread-35783" daemon prio=10 tid=0x7f8f58f02800 nid=0x18cc in 
> Object.wait() [0x7f8e632c]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:577)
> - locked <0x00061af52d50> (a java.util.LinkedList)
> and the related error log:
> org.apache.hadoop.hive.ql.parse.SemanticException: Unable to create temp file 
> for insert values Expression of type TOK_TABLE_OR_COL not supported in 
> insert/values
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genValuesTempTable(SemanticAnalyzer.java:812)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1207)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1410)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10136)
> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Expression of 
> type TOK_TABLE_OR_COL not supported in insert/values
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.unparseExprForValuesClause(SemanticAnalyzer.java:858)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genValuesTempTable(SemanticAnalyzer.java:785)
> ... 15 more
> It shows the output stream won't close if caught exception in funtion 
> unparseExprForValuesClause in SemanticAnalyzer.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-29 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446640#comment-15446640
 ] 

Eugene Koifman commented on HIVE-14233:
---

could you upload the latest patch to RB?

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14536) Unit test code cleanup

2016-08-29 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446635#comment-15446635
 ] 

Peter Vary commented on HIVE-14536:
---

Hi,

In HIVE-1 we discussed, that [~kgyrtkirk]'s patch goal is to remove the 
ant, velocity dependency and will keep the overall structure of the tests, but 
in the long run the qfile test code should be cleaned up. I proposed a 
solution, which greatly reduced the number of the used classes, and helps the 
future developers to find the used code with less hive specific knowledge 
easier. We agreed with [~ashutoshc], and [~kgyrtkirk], that it will be the 
scope of another patch, and I will do it myself.
I have submitted the patch, and pushed it to the review board, where I answered 
the questions, and comments, and asked clarification if I did not understand 
the original ones. I tried to make sure, that the patch does not contain any 
other changes, than it is needed for the refactor, so it would be easier to 
review. Please [~kgyrtkirk], [~sseth] help with further review this cleaned up 
version of the patch, or if you do not have the time and the capacity to do so, 
then tell me, so I could start looking for other reviewers (I would be more 
comfortable with you, since I think you are the ones who has the most 
understanding of this part of the code). It would be good to finish this patch, 
so it will not affect other ongoing test improvement activities, and I could 
move on to cleaning up other parts of the testing framework, like QTestUtil 
classes, which I think all of us agrees need refactoring badly.

Thanks,
Peter

> Unit test code cleanup
> --
>
> Key: HIVE-14536
> URL: https://issues.apache.org/jira/browse/HIVE-14536
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-14536.5.patch, HIVE-14536.6.patch, 
> HIVE-14536.7.patch, HIVE-14536.patch
>
>
> Clean up the itest infrastructure, to create a readable, easy to understand 
> code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14663) Change ptest java language version to 1.7, other version changes and fixes

2016-08-29 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14663:
--
Status: Patch Available  (was: Open)

> Change ptest java language version to 1.7, other version changes and fixes
> --
>
> Key: HIVE-14663
> URL: https://issues.apache.org/jira/browse/HIVE-14663
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14663.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14663) Change ptest java language version to 1.7, other version changes and fixes

2016-08-29 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14663:
--
Attachment: HIVE-14663.01.patch

- Update java language version to 1.7
- Dependencies to match those in Hive (in some cases they were already higher)
- Remove log4j jar being include which was causing logs to be skipped 
occasionally
- Fix the 2 failing test cases.

[~prasanth_j] - could you please take a look?

> Change ptest java language version to 1.7, other version changes and fixes
> --
>
> Key: HIVE-14663
> URL: https://issues.apache.org/jira/browse/HIVE-14663
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14663.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14614) Insert overwrite local directory fails with IllegalStateException

2016-08-29 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-14614:
---
Status: Patch Available  (was: Open)

> Insert overwrite local directory fails with IllegalStateException
> -
>
> Key: HIVE-14614
> URL: https://issues.apache.org/jira/browse/HIVE-14614
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-14614.05.patch, HIVE-14614.2.patch, 
> HIVE-14614.3.patch, HIVE-14614.4.patch
>
>
> insert overwrite local directory  select * from table; fails with 
> "java.lang.IllegalStateException: Cannot create staging directory" when the 
> path sent to the getTempDirForPath(Path path)  is a local fs path.
> This is a regression caused by the fix for HIVE-14270



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14614) Insert overwrite local directory fails with IllegalStateException

2016-08-29 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-14614:
---
Status: Open  (was: Patch Available)

> Insert overwrite local directory fails with IllegalStateException
> -
>
> Key: HIVE-14614
> URL: https://issues.apache.org/jira/browse/HIVE-14614
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-14614.05.patch, HIVE-14614.2.patch, 
> HIVE-14614.3.patch, HIVE-14614.4.patch
>
>
> insert overwrite local directory  select * from table; fails with 
> "java.lang.IllegalStateException: Cannot create staging directory" when the 
> path sent to the getTempDirForPath(Path path)  is a local fs path.
> This is a regression caused by the fix for HIVE-14270



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14564) Column Pruning generates out of order columns in SelectOperator which cause ArrayIndexOutOfBoundsException.

2016-08-29 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HIVE-14564:
-
Attachment: HIVE-14564.002.patch

> Column Pruning generates out of order columns in SelectOperator which cause 
> ArrayIndexOutOfBoundsException.
> ---
>
> Key: HIVE-14564
> URL: https://issues.apache.org/jira/browse/HIVE-14564
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: HIVE-14564.000.patch, HIVE-14564.001.patch, 
> HIVE-14564.002.patch
>
>
> Column Pruning generates out of order columns in SelectOperator which cause 
> ArrayIndexOutOfBoundsException.
> {code}
> 2016-07-26 21:49:24,390 FATAL [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
>   ... 9 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at java.lang.System.arraycopy(Native Method)
>   at org.apache.hadoop.io.Text.set(Text.java:225)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:550)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:377)
>   ... 13 more
> {code}
> The exception is because the serialization and deserialization doesn't match.
> The serialization by LazyBinarySerDe from previous MapReduce job used 
> different order of columns. When the current MapReduce job deserialized the 
> intermediate sequence file generated by previous MapReduce job, it will get 
> corrupted data from the deserialization using wrong order of columns by 
> LazyBinaryStruct. The unmatched columns between  serialization and 
> deserialization is caused by SelectOperator's Column Pruning 
> {{ColumnPrunerSelectProc}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14660) ArrayIndexOutOfBoundsException on delete

2016-08-29 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446510#comment-15446510
 ] 

Eugene Koifman commented on HIVE-14660:
---

[~bbonnet]
bq. we have a work-around setting mapred.reduce.tasks to number of buckets.
that is the correct solution.  This should happen automatically - it's a 
serious bug if it does not.  Can you describe more precisely how you are ending 
up in this situation?  (Your config settings, query to repro this, relevant 
DDL.)

> ArrayIndexOutOfBoundsException on delete
> 
>
> Key: HIVE-14660
> URL: https://issues.apache.org/jira/browse/HIVE-14660
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor, Transactions
>Affects Versions: 1.2.1
>Reporter: Benjamin BONNET
>Assignee: Benjamin BONNET
> Attachments: HIVE-14660.1-banch-1.2.patch
>
>
> Hi,
> DELETE on an ACID table may fail on an ArrayIndexOutOfBoundsException.
> That bug occurs at Reduce phase when there are less reducers than the number 
> of the table buckets.
> In order to reproduce, create a simple ACID table :
> {code:sql}
> CREATE TABLE test (`cle` bigint,`valeur` string)
>  PARTITIONED BY (`annee` string)
>  CLUSTERED BY (cle) INTO 5 BUCKETS
>  TBLPROPERTIES ('transactional'='true');
> {code}
> Populate it with lines distributed among all buckets, with random values and 
> a few partitions.
> Force the Reducers to be less than the buckets :
> {code:sql}
> set mapred.reduce.tasks=1;
> {code}
> Then execute a delete that will remove many lines from all the buckets.
> {code:sql}
> DELETE FROM test WHERE valeur<'some_value';
> {code}
> Then you will get an ArrayIndexOutOfBoundsException :
> {code}
> 2016-08-22 21:21:02,500 [FATAL] [TezChild] |tez.ReduceRecordSource|: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) 
> {"key":{"reducesinkkey0":{"transactionid":119,"bucketid":0,"rowid":0}},"value":{"_col0":"4"}}
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:769)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
> ... 17 more
> {code}
> Adding logs into FileSinkOperator, one sees the operator deals with buckets 
> 0, 1, 2, 3, 4, then 0 again and it fails at line 769 : actually each time you 
> switch bucket, you move forwards in a 5 (number of buckets) elements array. 
> So when you get bucket 0 for the second time, you get out of the array...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14591) HS2 is shut down unexpectedly during the startup time

2016-08-29 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446464#comment-15446464
 ] 

Tao Li commented on HIVE-14591:
---

[~vgumashta] Can you please take a look at the patch? Thanks.

> HS2 is shut down unexpectedly during the startup time
> -
>
> Key: HIVE-14591
> URL: https://issues.apache.org/jira/browse/HIVE-14591
> Project: Hive
>  Issue Type: Bug
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: BUG-64741.1.patch
>
>
> If there is issue with Zookeeper (e.g. connection issues), then it takes HS2 
> some time to connect. During this time, Ambari could issue health checks 
> against HS2 and the CloseSession call will trigger the shutdown of HS2, which 
> is not expected. That triggering should happen only when the HS2 has been 
> deregistered with Zookeeper, not during the startup time when HS2 is not 
> registered with ZK yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14618) beeline fetch logging delays before query completion

2016-08-29 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446461#comment-15446461
 ] 

Tao Li commented on HIVE-14618:
---

[~gopalv] Can you please commit this patch?

> beeline fetch logging delays before query completion
> 
>
> Key: HIVE-14618
> URL: https://issues.apache.org/jira/browse/HIVE-14618
> Project: Hive
>  Issue Type: Bug
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-14618.1.patch, HIVE-14618.2.patch, 
> HIVE-14618.3.patch
>
>
> Beeline has a thread that fetches logs from HS2. However, it uses the same 
> HiveStatement object to also wait for query completion using a long-poll 
> (with default interval of 5 seconds).
> The jdbc client has a lock around the thrift api calls, resulting in the 
> getLogs api blocking on the query completion check. ie the logs would get 
> shown only every 5 seconds by default.
> cc [~vgumashta] [~gopalv] [~thejas]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-29 Thread Saket Saurabh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Status: Open  (was: Patch Available)

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-29 Thread Saket Saurabh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Status: Patch Available  (was: Open)

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-29 Thread Saket Saurabh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Status: Patch Available  (was: Open)

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-29 Thread Saket Saurabh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Attachment: HIVE-14233.10.patch

Addressed comments at RB & added TestTxnCommands2 subclass that runs e2e ACID 
tests with split-update and vectorization enabled.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-29 Thread Saket Saurabh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Status: Open  (was: Patch Available)

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-08-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446386#comment-15446386
 ] 

Sergio Peña commented on HIVE-14170:


The patch looks good.
+1

I'd like to run a few tests in my machine. 
Do you have steps to do it?

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch, 
> HIVE-14170.3.patch, HIVE-14170.4.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14660) ArrayIndexOutOfBoundsException on delete

2016-08-29 Thread Benjamin BONNET (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445617#comment-15445617
 ] 

Benjamin BONNET edited comment on HIVE-14660 at 8/29/16 3:23 PM:
-

[~ekoifman] : actually, we encountered that bug without setting 
mapred.reduce.tasks=1. But I managed to reproduce it on a sandbox only by 
forcing the FileSinkOperator to deal with more than one bucket, forcing the 
number of reducers to 1.

On the platform where we encountered the issue (with default mapred settings), 
we have a work-around setting mapred.reduce.tasks to number of buckets.


was (Author: bbonnet):
[~ekoifman] : actually, we encountered that bug without setting 
mapred.reduce.tasks=1. But I managed to reproduce it on a sandbox only by 
forcing the FileSinkOperator to deal with more than one bucket, forcing the 
number of reducers to 1.

> ArrayIndexOutOfBoundsException on delete
> 
>
> Key: HIVE-14660
> URL: https://issues.apache.org/jira/browse/HIVE-14660
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor, Transactions
>Affects Versions: 1.2.1
>Reporter: Benjamin BONNET
>Assignee: Benjamin BONNET
> Attachments: HIVE-14660.1-banch-1.2.patch
>
>
> Hi,
> DELETE on an ACID table may fail on an ArrayIndexOutOfBoundsException.
> That bug occurs at Reduce phase when there are less reducers than the number 
> of the table buckets.
> In order to reproduce, create a simple ACID table :
> {code:sql}
> CREATE TABLE test (`cle` bigint,`valeur` string)
>  PARTITIONED BY (`annee` string)
>  CLUSTERED BY (cle) INTO 5 BUCKETS
>  TBLPROPERTIES ('transactional'='true');
> {code}
> Populate it with lines distributed among all buckets, with random values and 
> a few partitions.
> Force the Reducers to be less than the buckets :
> {code:sql}
> set mapred.reduce.tasks=1;
> {code}
> Then execute a delete that will remove many lines from all the buckets.
> {code:sql}
> DELETE FROM test WHERE valeur<'some_value';
> {code}
> Then you will get an ArrayIndexOutOfBoundsException :
> {code}
> 2016-08-22 21:21:02,500 [FATAL] [TezChild] |tez.ReduceRecordSource|: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) 
> {"key":{"reducesinkkey0":{"transactionid":119,"bucketid":0,"rowid":0}},"value":{"_col0":"4"}}
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:769)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
> ... 17 more
> {code}
> Adding logs into FileSinkOperator, one sees the operator deals with buckets 
> 0, 1, 2, 3, 4, then 0 again and it fails at line 769 : actually each time you 
> switch bucket, you move forwards in a 5 (number of buckets) elements array. 
> So when you get bucket 0 

[jira] [Commented] (HIVE-14530) Union All query returns incorrect results.

2016-08-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446166#comment-15446166
 ] 

Hive QA commented on HIVE-14530:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12825997/HIVE-14530.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10467 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer8]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join34]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join35]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join34]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join35]
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1032/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1032/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1032/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12825997 - PreCommit-HIVE-MASTER-Build

> Union All query returns incorrect results.
> --
>
> Key: HIVE-14530
> URL: https://issues.apache.org/jira/browse/HIVE-14530
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.0
> Environment: Hadoop 2.6
> Hive 2.1
>Reporter: wenhe li
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14530.01.patch
>
>
> create table dw_tmp.l_test1 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;
> create table dw_tmp.l_test2 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;  
> select * from dw_tmp.l_test1;
> 1   table_1  2016-08-11
> select * from dw_tmp.l_test2;
> 2   table_2  2016-08-11
> -- right like this
> select 
> id,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   table_1 2016-08-11
> 2   table_2 2016-08-11
> -- incorrect
> select 
> id,
> 999,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11
> 2   999 table_1 2016-08-11 <-- here is wrong
> -- incorrect
> select 
> id,
> 999,
> 666,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> 666,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 666 table_1 2016-08-11
> 2   999 666 table_1 2016-08-11 <-- here is wrong
> -- right
> select 
> id,
> 999,
> 'table_1' ,
> trans_date,
> '2016-11-11'
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11  2016-11-11
> 2   999 table_2 2016-08-11  2016-08-11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14440) Fix default value of USE_DEPRECATED_CLI in cli.cmd

2016-08-29 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14440:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

> Fix default value of USE_DEPRECATED_CLI in cli.cmd
> --
>
> Key: HIVE-14440
> URL: https://issues.apache.org/jira/browse/HIVE-14440
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14440.01.patch, HIVE-14440.02.patch, 
> HIVE-14440.03.patch, HIVE-14440.04.patch, HIVE-14440.05.patch, 
> HIVE-14440.06.patch
>
>
> cli.cmd script sets the default value of USE_DEPRECATED_CLI to false when it 
> is not set which is different than cli.sh which sets it to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14440) Fix default value of USE_DEPRECATED_CLI in cli.cmd

2016-08-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446129#comment-15446129
 ] 

Sergio Peña commented on HIVE-14440:


I don't know from where the smart-apply-patch.sh is downloaded. It is still 
using an old one.
Anyway, the patch is harmless, so I think is good.
+1

I will commit it.

> Fix default value of USE_DEPRECATED_CLI in cli.cmd
> --
>
> Key: HIVE-14440
> URL: https://issues.apache.org/jira/browse/HIVE-14440
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-14440.01.patch, HIVE-14440.02.patch, 
> HIVE-14440.03.patch, HIVE-14440.04.patch, HIVE-14440.05.patch, 
> HIVE-14440.06.patch
>
>
> cli.cmd script sets the default value of USE_DEPRECATED_CLI to false when it 
> is not set which is different than cli.sh which sets it to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12540) Create function failed, but show functions display it

2016-08-29 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam reassigned HIVE-12540:


Assignee: Naveen Gangam

> Create function failed, but show functions display it
> -
>
> Key: HIVE-12540
> URL: https://issues.apache.org/jira/browse/HIVE-12540
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Weizhong
>Assignee: Naveen Gangam
>Priority: Minor
>
> {noformat}
> 0: jdbc:hive2://vm119:1> create function udfTest as 
> 'hive.udf.UDFArrayNotE';
> ERROR : Failed to register default.udftest using class hive.udf.UDFArrayNotE
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)
> 0: jdbc:hive2://vm119:1> show functions;
> +-+--+
> |tab_name |
> +-+--+
> | ... |
> | default.udftest |
> | ... |
> +-+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12540) Create function failed, but show functions display it

2016-08-29 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446074#comment-15446074
 ] 

Naveen Gangam commented on HIVE-12540:
--

Thanks for the info. I will take a look.

> Create function failed, but show functions display it
> -
>
> Key: HIVE-12540
> URL: https://issues.apache.org/jira/browse/HIVE-12540
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Weizhong
>Assignee: Naveen Gangam
>Priority: Minor
>
> {noformat}
> 0: jdbc:hive2://vm119:1> create function udfTest as 
> 'hive.udf.UDFArrayNotE';
> ERROR : Failed to register default.udftest using class hive.udf.UDFArrayNotE
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)
> 0: jdbc:hive2://vm119:1> show functions;
> +-+--+
> |tab_name |
> +-+--+
> | ... |
> | default.udftest |
> | ... |
> +-+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14530) Union All query returns incorrect results.

2016-08-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445926#comment-15445926
 ] 

Hive QA commented on HIVE-14530:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12825995/HIVE-14530.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10467 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer8]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join34]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join35]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver[schemeAuthority]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join34]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join35]
org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testDelegationTokenSharedStore
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1031/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1031/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1031/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12825995 - PreCommit-HIVE-MASTER-Build

> Union All query returns incorrect results.
> --
>
> Key: HIVE-14530
> URL: https://issues.apache.org/jira/browse/HIVE-14530
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.0
> Environment: Hadoop 2.6
> Hive 2.1
>Reporter: wenhe li
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14530.01.patch
>
>
> create table dw_tmp.l_test1 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;
> create table dw_tmp.l_test2 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;  
> select * from dw_tmp.l_test1;
> 1   table_1  2016-08-11
> select * from dw_tmp.l_test2;
> 2   table_2  2016-08-11
> -- right like this
> select 
> id,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   table_1 2016-08-11
> 2   table_2 2016-08-11
> -- incorrect
> select 
> id,
> 999,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11
> 2   999 table_1 2016-08-11 <-- here is wrong
> -- incorrect
> select 
> id,
> 999,
> 666,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> 666,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 666 table_1 2016-08-11
> 2   999 666 table_1 2016-08-11 <-- here is wrong
> -- right
> select 
> id,
> 999,
> 'table_1' ,
> trans_date,
> '2016-11-11'
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11  2016-11-11
> 2   999 table_2 2016-08-11  2016-08-11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14530) Union All query returns incorrect results.

2016-08-29 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14530:
---
Attachment: HIVE-14530.01.patch

> Union All query returns incorrect results.
> --
>
> Key: HIVE-14530
> URL: https://issues.apache.org/jira/browse/HIVE-14530
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.0
> Environment: Hadoop 2.6
> Hive 2.1
>Reporter: wenhe li
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14530.01.patch
>
>
> create table dw_tmp.l_test1 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;
> create table dw_tmp.l_test2 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;  
> select * from dw_tmp.l_test1;
> 1   table_1  2016-08-11
> select * from dw_tmp.l_test2;
> 2   table_2  2016-08-11
> -- right like this
> select 
> id,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   table_1 2016-08-11
> 2   table_2 2016-08-11
> -- incorrect
> select 
> id,
> 999,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11
> 2   999 table_1 2016-08-11 <-- here is wrong
> -- incorrect
> select 
> id,
> 999,
> 666,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> 666,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 666 table_1 2016-08-11
> 2   999 666 table_1 2016-08-11 <-- here is wrong
> -- right
> select 
> id,
> 999,
> 'table_1' ,
> trans_date,
> '2016-11-11'
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11  2016-11-11
> 2   999 table_2 2016-08-11  2016-08-11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14530) Union All query returns incorrect results.

2016-08-29 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14530:
---
Attachment: (was: HIVE-14530.patch)

> Union All query returns incorrect results.
> --
>
> Key: HIVE-14530
> URL: https://issues.apache.org/jira/browse/HIVE-14530
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.0
> Environment: Hadoop 2.6
> Hive 2.1
>Reporter: wenhe li
>Assignee: Jesus Camacho Rodriguez
>
> create table dw_tmp.l_test1 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;
> create table dw_tmp.l_test2 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;  
> select * from dw_tmp.l_test1;
> 1   table_1  2016-08-11
> select * from dw_tmp.l_test2;
> 2   table_2  2016-08-11
> -- right like this
> select 
> id,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   table_1 2016-08-11
> 2   table_2 2016-08-11
> -- incorrect
> select 
> id,
> 999,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11
> 2   999 table_1 2016-08-11 <-- here is wrong
> -- incorrect
> select 
> id,
> 999,
> 666,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> 666,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 666 table_1 2016-08-11
> 2   999 666 table_1 2016-08-11 <-- here is wrong
> -- right
> select 
> id,
> 999,
> 'table_1' ,
> trans_date,
> '2016-11-11'
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11  2016-11-11
> 2   999 table_2 2016-08-11  2016-08-11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14530) Union All query returns incorrect results.

2016-08-29 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445691#comment-15445691
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-14530 at 8/29/16 12:16 PM:
--

[~liwenhe], [~Spring], thanks for pointing this out. There was indeed a bug in 
the logic introduced in HIVE-13639 to infer constant values that can be pull up 
from the Union.

I have just uploaded the fix, and I added the queries in the description to the 
testsuite.


was (Author: jcamachorodriguez):
[~liwenhe], [~Spring], thanks for pointing this out. There was indeed a bug in 
the logic introduced in HIVE-13639 to infers constant values that can be pull 
up from the Union.

I have just uploaded the fix, and I added the queries in the description to the 
testsuite.

> Union All query returns incorrect results.
> --
>
> Key: HIVE-14530
> URL: https://issues.apache.org/jira/browse/HIVE-14530
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.0
> Environment: Hadoop 2.6
> Hive 2.1
>Reporter: wenhe li
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14530.patch
>
>
> create table dw_tmp.l_test1 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;
> create table dw_tmp.l_test2 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;  
> select * from dw_tmp.l_test1;
> 1   table_1  2016-08-11
> select * from dw_tmp.l_test2;
> 2   table_2  2016-08-11
> -- right like this
> select 
> id,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   table_1 2016-08-11
> 2   table_2 2016-08-11
> -- incorrect
> select 
> id,
> 999,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11
> 2   999 table_1 2016-08-11 <-- here is wrong
> -- incorrect
> select 
> id,
> 999,
> 666,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> 666,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 666 table_1 2016-08-11
> 2   999 666 table_1 2016-08-11 <-- here is wrong
> -- right
> select 
> id,
> 999,
> 'table_1' ,
> trans_date,
> '2016-11-11'
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11  2016-11-11
> 2   999 table_2 2016-08-11  2016-08-11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14530) Union All query returns incorrect results.

2016-08-29 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445691#comment-15445691
 ] 

Jesus Camacho Rodriguez commented on HIVE-14530:


[~liwenhe], [~Spring], thanks for pointing this out. There was indeed a bug in 
the logic introduced in HIVE-13639 to infers constant values that can be pull 
up from the Union.

I have just uploaded the fix, and I added the queries in the description to the 
testsuite.

> Union All query returns incorrect results.
> --
>
> Key: HIVE-14530
> URL: https://issues.apache.org/jira/browse/HIVE-14530
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.0
> Environment: Hadoop 2.6
> Hive 2.1
>Reporter: wenhe li
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14530.patch
>
>
> create table dw_tmp.l_test1 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;
> create table dw_tmp.l_test2 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;  
> select * from dw_tmp.l_test1;
> 1   table_1  2016-08-11
> select * from dw_tmp.l_test2;
> 2   table_2  2016-08-11
> -- right like this
> select 
> id,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   table_1 2016-08-11
> 2   table_2 2016-08-11
> -- incorrect
> select 
> id,
> 999,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11
> 2   999 table_1 2016-08-11 <-- here is wrong
> -- incorrect
> select 
> id,
> 999,
> 666,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> 666,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 666 table_1 2016-08-11
> 2   999 666 table_1 2016-08-11 <-- here is wrong
> -- right
> select 
> id,
> 999,
> 'table_1' ,
> trans_date,
> '2016-11-11'
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11  2016-11-11
> 2   999 table_2 2016-08-11  2016-08-11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14530) Union All query returns incorrect results.

2016-08-29 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14530:
---
Attachment: HIVE-14530.patch

> Union All query returns incorrect results.
> --
>
> Key: HIVE-14530
> URL: https://issues.apache.org/jira/browse/HIVE-14530
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.0
> Environment: Hadoop 2.6
> Hive 2.1
>Reporter: wenhe li
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14530.patch
>
>
> create table dw_tmp.l_test1 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;
> create table dw_tmp.l_test2 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;  
> select * from dw_tmp.l_test1;
> 1   table_1  2016-08-11
> select * from dw_tmp.l_test2;
> 2   table_2  2016-08-11
> -- right like this
> select 
> id,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   table_1 2016-08-11
> 2   table_2 2016-08-11
> -- incorrect
> select 
> id,
> 999,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11
> 2   999 table_1 2016-08-11 <-- here is wrong
> -- incorrect
> select 
> id,
> 999,
> 666,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> 666,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 666 table_1 2016-08-11
> 2   999 666 table_1 2016-08-11 <-- here is wrong
> -- right
> select 
> id,
> 999,
> 'table_1' ,
> trans_date,
> '2016-11-11'
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11  2016-11-11
> 2   999 table_2 2016-08-11  2016-08-11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14530) Union All query returns incorrect results.

2016-08-29 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14530:
---
Status: Patch Available  (was: In Progress)

> Union All query returns incorrect results.
> --
>
> Key: HIVE-14530
> URL: https://issues.apache.org/jira/browse/HIVE-14530
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.0
> Environment: Hadoop 2.6
> Hive 2.1
>Reporter: wenhe li
>Assignee: Jesus Camacho Rodriguez
>
> create table dw_tmp.l_test1 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;
> create table dw_tmp.l_test2 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;  
> select * from dw_tmp.l_test1;
> 1   table_1  2016-08-11
> select * from dw_tmp.l_test2;
> 2   table_2  2016-08-11
> -- right like this
> select 
> id,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   table_1 2016-08-11
> 2   table_2 2016-08-11
> -- incorrect
> select 
> id,
> 999,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11
> 2   999 table_1 2016-08-11 <-- here is wrong
> -- incorrect
> select 
> id,
> 999,
> 666,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> 666,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 666 table_1 2016-08-11
> 2   999 666 table_1 2016-08-11 <-- here is wrong
> -- right
> select 
> id,
> 999,
> 'table_1' ,
> trans_date,
> '2016-11-11'
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11  2016-11-11
> 2   999 table_2 2016-08-11  2016-08-11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-14530) Union All query returns incorrect results.

2016-08-29 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-14530 started by Jesus Camacho Rodriguez.
--
> Union All query returns incorrect results.
> --
>
> Key: HIVE-14530
> URL: https://issues.apache.org/jira/browse/HIVE-14530
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.0
> Environment: Hadoop 2.6
> Hive 2.1
>Reporter: wenhe li
>Assignee: Jesus Camacho Rodriguez
>
> create table dw_tmp.l_test1 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;
> create table dw_tmp.l_test2 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;  
> select * from dw_tmp.l_test1;
> 1   table_1  2016-08-11
> select * from dw_tmp.l_test2;
> 2   table_2  2016-08-11
> -- right like this
> select 
> id,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   table_1 2016-08-11
> 2   table_2 2016-08-11
> -- incorrect
> select 
> id,
> 999,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11
> 2   999 table_1 2016-08-11 <-- here is wrong
> -- incorrect
> select 
> id,
> 999,
> 666,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> 666,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 666 table_1 2016-08-11
> 2   999 666 table_1 2016-08-11 <-- here is wrong
> -- right
> select 
> id,
> 999,
> 'table_1' ,
> trans_date,
> '2016-11-11'
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11  2016-11-11
> 2   999 table_2 2016-08-11  2016-08-11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14660) ArrayIndexOutOfBoundsException on delete

2016-08-29 Thread Benjamin BONNET (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445617#comment-15445617
 ] 

Benjamin BONNET commented on HIVE-14660:


[~ekoifman] : actually, we encountered that bug without setting 
mapred.reduce.tasks=1. But I managed to reproduce it on a sandbox only by 
forcing the FileSinkOperator to deal with more than one bucket, forcing the 
number of reducers to 1.

> ArrayIndexOutOfBoundsException on delete
> 
>
> Key: HIVE-14660
> URL: https://issues.apache.org/jira/browse/HIVE-14660
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor, Transactions
>Affects Versions: 1.2.1
>Reporter: Benjamin BONNET
>Assignee: Benjamin BONNET
> Attachments: HIVE-14660.1-banch-1.2.patch
>
>
> Hi,
> DELETE on an ACID table may fail on an ArrayIndexOutOfBoundsException.
> That bug occurs at Reduce phase when there are less reducers than the number 
> of the table buckets.
> In order to reproduce, create a simple ACID table :
> {code:sql}
> CREATE TABLE test (`cle` bigint,`valeur` string)
>  PARTITIONED BY (`annee` string)
>  CLUSTERED BY (cle) INTO 5 BUCKETS
>  TBLPROPERTIES ('transactional'='true');
> {code}
> Populate it with lines distributed among all buckets, with random values and 
> a few partitions.
> Force the Reducers to be less than the buckets :
> {code:sql}
> set mapred.reduce.tasks=1;
> {code}
> Then execute a delete that will remove many lines from all the buckets.
> {code:sql}
> DELETE FROM test WHERE valeur<'some_value';
> {code}
> Then you will get an ArrayIndexOutOfBoundsException :
> {code}
> 2016-08-22 21:21:02,500 [FATAL] [TezChild] |tez.ReduceRecordSource|: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) 
> {"key":{"reducesinkkey0":{"transactionid":119,"bucketid":0,"rowid":0}},"value":{"_col0":"4"}}
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:769)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
> ... 17 more
> {code}
> Adding logs into FileSinkOperator, one sees the operator deals with buckets 
> 0, 1, 2, 3, 4, then 0 again and it fails at line 769 : actually each time you 
> switch bucket, you move forwards in a 5 (number of buckets) elements array. 
> So when you get bucket 0 for the second time, you get out of the array...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14530) Union All query returns incorrect results.

2016-08-29 Thread Sergey Zadoroshnyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Zadoroshnyak reassigned HIVE-14530:
--

Assignee: Jesus Camacho Rodriguez

https://issues.apache.org/jira/browse/HIVE-13639

> Union All query returns incorrect results.
> --
>
> Key: HIVE-14530
> URL: https://issues.apache.org/jira/browse/HIVE-14530
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.0
> Environment: Hadoop 2.6
> Hive 2.1
>Reporter: wenhe li
>Assignee: Jesus Camacho Rodriguez
>
> create table dw_tmp.l_test1 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;
> create table dw_tmp.l_test2 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;  
> select * from dw_tmp.l_test1;
> 1   table_1  2016-08-11
> select * from dw_tmp.l_test2;
> 2   table_2  2016-08-11
> -- right like this
> select 
> id,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   table_1 2016-08-11
> 2   table_2 2016-08-11
> -- incorrect
> select 
> id,
> 999,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11
> 2   999 table_1 2016-08-11 <-- here is wrong
> -- incorrect
> select 
> id,
> 999,
> 666,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> 666,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 666 table_1 2016-08-11
> 2   999 666 table_1 2016-08-11 <-- here is wrong
> -- right
> select 
> id,
> 999,
> 'table_1' ,
> trans_date,
> '2016-11-11'
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11  2016-11-11
> 2   999 table_2 2016-08-11  2016-08-11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-14530) Union All query returns incorrect results.

2016-08-29 Thread Sergey Zadoroshnyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Zadoroshnyak updated HIVE-14530:
---
Comment: was deleted

(was: https://issues.apache.org/jira/browse/HIVE-13639)

> Union All query returns incorrect results.
> --
>
> Key: HIVE-14530
> URL: https://issues.apache.org/jira/browse/HIVE-14530
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.0
> Environment: Hadoop 2.6
> Hive 2.1
>Reporter: wenhe li
>Assignee: Jesus Camacho Rodriguez
>
> create table dw_tmp.l_test1 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;
> create table dw_tmp.l_test2 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;  
> select * from dw_tmp.l_test1;
> 1   table_1  2016-08-11
> select * from dw_tmp.l_test2;
> 2   table_2  2016-08-11
> -- right like this
> select 
> id,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   table_1 2016-08-11
> 2   table_2 2016-08-11
> -- incorrect
> select 
> id,
> 999,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11
> 2   999 table_1 2016-08-11 <-- here is wrong
> -- incorrect
> select 
> id,
> 999,
> 666,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> 666,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 666 table_1 2016-08-11
> 2   999 666 table_1 2016-08-11 <-- here is wrong
> -- right
> select 
> id,
> 999,
> 'table_1' ,
> trans_date,
> '2016-11-11'
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11  2016-11-11
> 2   999 table_2 2016-08-11  2016-08-11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14530) Union All query returns incorrect results.

2016-08-29 Thread Sergey Zadoroshnyak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445362#comment-15445362
 ] 

Sergey Zadoroshnyak commented on HIVE-14530:


[~liwenhe] [~jcamachorodriguez]

In my opinion, this issue is broken by 
https://issues.apache.org/jira/browse/HIVE-13639 

By default Cost-based optimization in Hive, which uses the Calcite framework is 
enabled. (set hive.cbo.enable=true). 

 [~jcamachorodriguez] introduced a new rule HiveUnionPullUpConstantsRule, which 
was added in relOptRules by CalcitePlanner.

Please take a look at review request: 
https://reviews.apache.org/r/46974/diff/1#2

If we set hive.cbo.enable=false, the issue is not reproduced. 

[~liwenhe] Please update component -> CBO.

[~jcamachorodriguez] Could you please take a look? 





> Union All query returns incorrect results.
> --
>
> Key: HIVE-14530
> URL: https://issues.apache.org/jira/browse/HIVE-14530
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.0
> Environment: Hadoop 2.6
> Hive 2.1
>Reporter: wenhe li
>Assignee: Jesus Camacho Rodriguez
>
> create table dw_tmp.l_test1 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;
> create table dw_tmp.l_test2 (id bigint,val string,trans_date string) row 
> format delimited fields terminated by ' ' ;  
> select * from dw_tmp.l_test1;
> 1   table_1  2016-08-11
> select * from dw_tmp.l_test2;
> 2   table_2  2016-08-11
> -- right like this
> select 
> id,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   table_1 2016-08-11
> 2   table_2 2016-08-11
> -- incorrect
> select 
> id,
> 999,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11
> 2   999 table_1 2016-08-11 <-- here is wrong
> -- incorrect
> select 
> id,
> 999,
> 666,
> 'table_1' ,
> trans_date
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> 666,
> val,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 666 table_1 2016-08-11
> 2   999 666 table_1 2016-08-11 <-- here is wrong
> -- right
> select 
> id,
> 999,
> 'table_1' ,
> trans_date,
> '2016-11-11'
> from dw_tmp.l_test1
> union all
> select 
> id,
> 999,
> val,
> trans_date,
> trans_date
> from dw_tmp.l_test2 ;
> 1   999 table_1 2016-08-11  2016-11-11
> 2   999 table_2 2016-08-11  2016-08-11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14661) Hive should extract deterministic conditions from where clause and use them for partition pruning

2016-08-29 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445096#comment-15445096
 ] 

Chao Sun commented on HIVE-14661:
-

I think this is same issue as described HIVE-14630. Marked as duplicate.

> Hive should extract deterministic conditions from where clause and use them 
> for partition pruning
> -
>
> Key: HIVE-14661
> URL: https://issues.apache.org/jira/browse/HIVE-14661
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yibing Shi
>
> Currently, if a non-deterministic function is used in where clause, partition 
> pruning doesn't work. This can be reproduced as below:
> {code:sql}
> create table part1 (id int, content string) partitioned by (p int);
> alter table part1 add partition(p=1);
> alter table part1 add partition(p=2);
> create table part2 (id int, another_content string);
> set hive.mapred.mode=strict;
> set hive.cbo.enable=false;
> explain select p1.id, p1.content, p2.another_content from part1 p1 join part2 
> p2 on p1.id=p2.id where p1.p=1 and rand < 0.5;
> {code}
> The last query would fail with below error:
> {noformat}
> 16/08/23 23:55:52 ERROR ql.Driver: [main]: FAILED: SemanticException [Error 
> 10041]: No partition predicate found for Alias "p1" Table "part1"
> org.apache.hadoop.hive.ql.parse.SemanticException: No partition predicate 
> found for Alias "p1" Table "part1"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14631) HiveServer2 regularly fails to connect to metastore

2016-08-29 Thread Alexandre Linte (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445078#comment-15445078
 ] 

Alexandre Linte commented on HIVE-14631:


It seems to be an Hive issue. When it happens, all jobs are failing even if you 
are using mapreduce or tez. 
{noformat}
0: jdbc:hive2://hiveserver2.bigdata.fr> SET hive.execution.engine=tez;
No rows affected (0.073 seconds)
0: jdbc:hive2://hiveserver2.bigdata.fr> INSERT INTO TABLE shfs3453.camille_test 
VALUES ('coucou');
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=1)
0: jdbc:hive2://hiveserver2.bigdata.fr> SET hive.execution.engine=mr;
No rows affected (0.004 seconds)
0: jdbc:hive2://hiveserver2.bigdata.fr> INSERT INTO TABLE shfs3453.camille_test 
VALUES ('coucou');
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. 
org.apache.hadoop.security.authentication.client.AuthenticationException: 
Authentication failed, status: 403, message: Forbidden (state=08S01,code=1)
{noformat}
Moreover, this doesn't only affect Hue / Beeswax; all JDBC connections are more 
generally impacted (eg: beeline).

> HiveServer2 regularly fails to connect to metastore
> ---
>
> Key: HIVE-14631
> URL: https://issues.apache.org/jira/browse/HIVE-14631
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.1, 2.0.0, 2.1.0
> Environment: Hive 2.1.0, Hue 3.10.0, Hadoop 2.7.2, Tez 0.8.3
>Reporter: Alexandre Linte
>
> I have a cluster secured with Kerberos and Hive is configured to work with 
> Tez by default. Everything works well through hive-cli and beeline; however, 
> I'm facing a strange behavior through Hue.
> I can have a lot of client connections (these can reach 600) and after a day, 
> the client connections fail. But this is not the case for all clients 
> connection attempts.
> When it fails, I have the following logs on the HiveServer2:
> {noformat}
> Aug  3 09:28:04 hiveserver2.bigdata.fr Executing 
> command(queryId=hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112):
>  INSERT INTO TABLE shfs3453.camille_test VALUES ('coucou')
> Aug  3 09:28:04 hiveserver2.bigdata.fr Query ID = 
> hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112
> Aug  3 09:28:04 hiveserver2.bigdata.fr Total jobs = 1
> Aug  3 09:28:04 hiveserver2.bigdata.fr Launching Job 1 out of 1
> Aug  3 09:28:04 hiveserver2.bigdata.fr Starting task [Stage-1:MAPRED] in 
> parallel
> Aug  3 09:28:04 hiveserver2.bigdata.fr Trying to connect to metastore with 
> URI thrift://metastore01.bigdata.fr:9083
> Aug  3 09:28:04 hiveserver2.bigdata.fr Failed to connect to the MetaStore 
> Server...
> Aug  3 09:28:04 hiveserver2.bigdata.fr Waiting 1 seconds before next 
> connection attempt.
> Aug  3 09:28:05 hiveserver2.bigdata.fr Trying to connect to metastore with 
> URI thrift://metastore01.bigdata.fr:9083
> Aug  3 09:28:05 hiveserver2.bigdata.fr Failed to connect to the MetaStore 
> Server...
> Aug  3 09:28:05 hiveserver2.bigdata.fr Waiting 1 seconds before next 
> connection attempt.
> Aug  3 09:28:06 hiveserver2.bigdata.fr Trying to connect to metastore with 
> URI thrift://metastore01.bigdata.fr:9083
> Aug  3 09:28:06 hiveserver2.bigdata.fr Failed to connect to the MetaStore 
> Server...
> Aug  3 09:28:06 hiveserver2.bigdata.fr Waiting 1 seconds before next 
> connection attempt.
> Aug  3 09:28:08 hiveserver2.bigdata.fr FAILED: Execution Error, return code 
> -1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
> Aug  3 09:28:08 hiveserver2.bigdata.fr Completed executing 
> command(queryId=hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112);
>  Time taken: 4.002 seconds
> {noformat}
> At the same time I have the following logs on the Metastore are:
> {noformat}
> Aug  3 09:28:03 metastore01.bigdata.fr 180: get_table : db=shfs3453 
> tbl=camille_test
> Aug  3 09:28:03 metastore01.bigdata.fr 
> ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 
> tbl=camille_test#011
> Aug  3 09:28:04 metastore01.bigdata.fr 180: get_table : db=shfs3453 
> tbl=camille_test
> Aug  3 09:28:04 metastore01.bigdata.fr 
> ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 
> tbl=camille_test#011
> Aug  3 09:28:04 metastore01.bigdata.fr 180: get_table : db=shfs3453 
> tbl=camille_test
> Aug  3 09:28:04 metastore01.bigdata.fr 
> ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 
> tbl=camille_test#011
> Aug  3 09:28:04 metastore01.bigdata.fr SASL negotiation failure
> Aug  3 09:28:04 metastore01.bigdata.fr Error occurred during processing of 
> message.
> Aug  3 09:28:05 metastore01.bigdata.fr SASL negotiation failure
> Aug  3 09:28:05 

[jira] [Commented] (HIVE-14564) Column Pruning generates out of order columns in SelectOperator which cause ArrayIndexOutOfBoundsException.

2016-08-29 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445006#comment-15445006
 ] 

zhihai xu commented on HIVE-14564:
--

Thanks for the review, [~ashutoshc]! a lot of test cases are updated to adapt 
to this patch, Looks like all these cases can verify this patch.

> Column Pruning generates out of order columns in SelectOperator which cause 
> ArrayIndexOutOfBoundsException.
> ---
>
> Key: HIVE-14564
> URL: https://issues.apache.org/jira/browse/HIVE-14564
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: HIVE-14564.000.patch, HIVE-14564.001.patch
>
>
> Column Pruning generates out of order columns in SelectOperator which cause 
> ArrayIndexOutOfBoundsException.
> {code}
> 2016-07-26 21:49:24,390 FATAL [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
>   ... 9 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at java.lang.System.arraycopy(Native Method)
>   at org.apache.hadoop.io.Text.set(Text.java:225)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:550)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:377)
>   ... 13 more
> {code}
> The exception is because the serialization and deserialization doesn't match.
> The serialization by LazyBinarySerDe from previous MapReduce job used 
> different order of columns. When the current MapReduce job deserialized the 
> intermediate sequence file generated by previous MapReduce job, it will get 
> corrupted data from the deserialization using wrong order of columns by 
> LazyBinaryStruct. The unmatched columns between  serialization and 
> deserialization is caused by SelectOperator's Column Pruning 
> {{ColumnPrunerSelectProc}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >