[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-27 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716281#comment-14716281
 ] 

Jesus Camacho Rodriguez commented on HIVE-11604:


[~ychena], thanks for your reply. I thought the problem was not reproducible in 
master because of the affected versions field.

I checked back the patch; LGTM, +1.

 HIVE return wrong results in some queries with PTF function
 ---

 Key: HIVE-11604
 URL: https://issues.apache.org/jira/browse/HIVE-11604
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0, 1.1.0, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11604.1.patch, HIVE-11604.2.patch


 Following query returns empty result which is not right:
 {noformat}
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 {noformat}
 After remove row_number() over (partition by id, fkey) as rnum from query, 
 the right result returns.
 Reproduce:
 {noformat}
 create table tlb1 (id int, fkey int, val string);
 create table tlb2 (fid int, name string);
 insert into table tlb1 values(100,1,'abc');
 insert into table tlb1 values(200,1,'efg');
 insert into table tlb2 values(1, 'key1');
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 
 INFO  : Ended Job = job_local1070163923_0017
 +-+---+---+--+
 No rows selected (14.248 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 +-+---+---+--+
 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey 
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
 0: jdbc:hive2://localhost:1 from (
 0: jdbc:hive2://localhost:1 select id, fkey 
 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey
 0: jdbc:hive2://localhost:1  ) ddd 
 0: jdbc:hive2://localhost:1 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 INFO  : Number of reduce tasks not specified. Estimated from input data size: 
 1
 ...
 INFO  : Ended Job = job_local672340505_0019
 +-+---+---+--+
 2 rows selected (14.383 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 | 100 | 1 | key1  |
 | 200 | 1 | key1  |
 +-+---+---+--+
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-26 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715810#comment-14715810
 ] 

Szehon Ho commented on HIVE-11604:
--

+1 from me unless there's a better alternative, the IdentityProjectRemover has 
caused a lot of issues, and had to be workaround in cases other than this one.

 HIVE return wrong results in some queries with PTF function
 ---

 Key: HIVE-11604
 URL: https://issues.apache.org/jira/browse/HIVE-11604
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0, 1.1.0, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11604.1.patch, HIVE-11604.2.patch


 Following query returns empty result which is not right:
 {noformat}
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 {noformat}
 After remove row_number() over (partition by id, fkey) as rnum from query, 
 the right result returns.
 Reproduce:
 {noformat}
 create table tlb1 (id int, fkey int, val string);
 create table tlb2 (fid int, name string);
 insert into table tlb1 values(100,1,'abc');
 insert into table tlb1 values(200,1,'efg');
 insert into table tlb2 values(1, 'key1');
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 
 INFO  : Ended Job = job_local1070163923_0017
 +-+---+---+--+
 No rows selected (14.248 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 +-+---+---+--+
 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey 
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
 0: jdbc:hive2://localhost:1 from (
 0: jdbc:hive2://localhost:1 select id, fkey 
 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey
 0: jdbc:hive2://localhost:1  ) ddd 
 0: jdbc:hive2://localhost:1 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 INFO  : Number of reduce tasks not specified. Estimated from input data size: 
 1
 ...
 INFO  : Ended Job = job_local672340505_0019
 +-+---+---+--+
 2 rows selected (14.383 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 | 100 | 1 | key1  |
 | 200 | 1 | key1  |
 +-+---+---+--+
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-26 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715795#comment-14715795
 ] 

Yongzhi Chen commented on HIVE-11604:
-

[~csun], [~xuefuz], [~szehon], Could you review the patch? Thanks

 HIVE return wrong results in some queries with PTF function
 ---

 Key: HIVE-11604
 URL: https://issues.apache.org/jira/browse/HIVE-11604
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0, 1.1.0, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11604.1.patch, HIVE-11604.2.patch


 Following query returns empty result which is not right:
 {noformat}
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 {noformat}
 After remove row_number() over (partition by id, fkey) as rnum from query, 
 the right result returns.
 Reproduce:
 {noformat}
 create table tlb1 (id int, fkey int, val string);
 create table tlb2 (fid int, name string);
 insert into table tlb1 values(100,1,'abc');
 insert into table tlb1 values(200,1,'efg');
 insert into table tlb2 values(1, 'key1');
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 
 INFO  : Ended Job = job_local1070163923_0017
 +-+---+---+--+
 No rows selected (14.248 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 +-+---+---+--+
 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey 
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
 0: jdbc:hive2://localhost:1 from (
 0: jdbc:hive2://localhost:1 select id, fkey 
 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey
 0: jdbc:hive2://localhost:1  ) ddd 
 0: jdbc:hive2://localhost:1 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 INFO  : Number of reduce tasks not specified. Estimated from input data size: 
 1
 ...
 INFO  : Ended Job = job_local672340505_0019
 +-+---+---+--+
 2 rows selected (14.383 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 | 100 | 1 | key1  |
 | 200 | 1 | key1  |
 +-+---+---+--+
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706710#comment-14706710
 ] 

Hive QA commented on HIVE-11604:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12751643/HIVE-11604.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9372 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5031/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5031/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5031/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12751643 - PreCommit-HIVE-TRUNK-Build

 HIVE return wrong results in some queries with PTF function
 ---

 Key: HIVE-11604
 URL: https://issues.apache.org/jira/browse/HIVE-11604
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0, 1.1.0, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11604.1.patch


 Following query returns empty result which is not right:
 {noformat}
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 {noformat}
 After remove row_number() over (partition by id, fkey) as rnum from query, 
 the right result returns.
 Reproduce:
 {noformat}
 create table tlb1 (id int, fkey int, val string);
 create table tlb2 (fid int, name string);
 insert into table tlb1 values(100,1,'abc');
 insert into table tlb1 values(200,1,'efg');
 insert into table tlb2 values(1, 'key1');
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 
 INFO  : Ended Job = job_local1070163923_0017
 +-+---+---+--+
 No rows selected (14.248 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 +-+---+---+--+
 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey 
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
 0: jdbc:hive2://localhost:1 from (
 0: jdbc:hive2://localhost:1 select id, fkey 
 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey
 0: jdbc:hive2://localhost:1  ) ddd 
 0: jdbc:hive2://localhost:1 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 INFO  : Number of reduce tasks not specified. Estimated from input data size: 
 1
 ...
 INFO  : Ended Job = job_local672340505_0019
 +-+---+---+--+
 2 rows selected (14.383 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 | 100 | 1 | key1  |
 | 200 | 1 | key1  |
 +-+---+---+--+
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-21 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706832#comment-14706832
 ] 

Yongzhi Chen commented on HIVE-11604:
-

My fix will change query plan a little bit in some cases(especially wrong 
results return without the fixes cases). 
The failure is because of an extra select operator in the query plan, it should 
be no harm: it does correct RS_5's output column
to the right sequence, although it might not be so  important in this scenario. 

Attach second patch to change the test output, and add more test cases which 
succeed in master even without my fix to catch possible regressions in the 
future. 

In current master:
This query returns wrong results:
{noformat}
select ddd.id, ddd.fkey, aaa.name
from (
select id, fkey, 
row_number() over (partition by id, fkey) as rnum
from tlb1 group by id, fkey
 ) ddd 
inner join tlb2 aaa on aaa.fid = ddd.fkey;
{noformat}

while following returns right value(only different is the extra ddd.rnum):
{noformat}
select ddd.id, ddd.fkey, aaa.name, ddd.rnum
from (
select id, fkey, 
row_number() over (partition by id, fkey) as rnum
from tlb1 group by id, fkey
 ) ddd 
inner join tlb2 aaa on aaa.fid = ddd.fkey;
{noformat}


 HIVE return wrong results in some queries with PTF function
 ---

 Key: HIVE-11604
 URL: https://issues.apache.org/jira/browse/HIVE-11604
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0, 1.1.0, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11604.1.patch, HIVE-11604.2.patch


 Following query returns empty result which is not right:
 {noformat}
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 {noformat}
 After remove row_number() over (partition by id, fkey) as rnum from query, 
 the right result returns.
 Reproduce:
 {noformat}
 create table tlb1 (id int, fkey int, val string);
 create table tlb2 (fid int, name string);
 insert into table tlb1 values(100,1,'abc');
 insert into table tlb1 values(200,1,'efg');
 insert into table tlb2 values(1, 'key1');
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 
 INFO  : Ended Job = job_local1070163923_0017
 +-+---+---+--+
 No rows selected (14.248 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 +-+---+---+--+
 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey 
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
 0: jdbc:hive2://localhost:1 from (
 0: jdbc:hive2://localhost:1 select id, fkey 
 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey
 0: jdbc:hive2://localhost:1  ) ddd 
 0: jdbc:hive2://localhost:1 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 INFO  : Number of reduce tasks not specified. Estimated from input data size: 
 1
 ...
 INFO  : Ended Job = job_local672340505_0019
 +-+---+---+--+
 2 rows selected (14.383 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 | 100 | 1 | key1  |
 | 200 | 1 | key1  |
 +-+---+---+--+
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-21 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707580#comment-14707580
 ] 

Yongzhi Chen commented on HIVE-11604:
-

All the 15 spark failures are not related. They are because of same error: 
Timed out waiting for Spark cluster to init
And the patch2's source code is the same as patch1, there is no spark failure 
in the first patch.

 HIVE return wrong results in some queries with PTF function
 ---

 Key: HIVE-11604
 URL: https://issues.apache.org/jira/browse/HIVE-11604
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0, 1.1.0, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11604.1.patch, HIVE-11604.2.patch


 Following query returns empty result which is not right:
 {noformat}
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 {noformat}
 After remove row_number() over (partition by id, fkey) as rnum from query, 
 the right result returns.
 Reproduce:
 {noformat}
 create table tlb1 (id int, fkey int, val string);
 create table tlb2 (fid int, name string);
 insert into table tlb1 values(100,1,'abc');
 insert into table tlb1 values(200,1,'efg');
 insert into table tlb2 values(1, 'key1');
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 
 INFO  : Ended Job = job_local1070163923_0017
 +-+---+---+--+
 No rows selected (14.248 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 +-+---+---+--+
 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey 
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
 0: jdbc:hive2://localhost:1 from (
 0: jdbc:hive2://localhost:1 select id, fkey 
 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey
 0: jdbc:hive2://localhost:1  ) ddd 
 0: jdbc:hive2://localhost:1 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 INFO  : Number of reduce tasks not specified. Estimated from input data size: 
 1
 ...
 INFO  : Ended Job = job_local672340505_0019
 +-+---+---+--+
 2 rows selected (14.383 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 | 100 | 1 | key1  |
 | 200 | 1 | key1  |
 +-+---+---+--+
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707503#comment-14707503
 ] 

Hive QA commented on HIVE-11604:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12751736/HIVE-11604.2.patch

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 9376 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_stats
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_avro_compression_enabled_native
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_avro_joins
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_stats
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dynamic_rdd_cache
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_escape_clusterby1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby5_map_skew
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_ppr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_list_bucket_dml_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_louter_join_ppr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_outer_join5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_reduce_deduplicate_exclude_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_timestamp_3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5036/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5036/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5036/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12751736 - PreCommit-HIVE-TRUNK-Build

 HIVE return wrong results in some queries with PTF function
 ---

 Key: HIVE-11604
 URL: https://issues.apache.org/jira/browse/HIVE-11604
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0, 1.1.0, 2.0.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11604.1.patch, HIVE-11604.2.patch


 Following query returns empty result which is not right:
 {noformat}
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 {noformat}
 After remove row_number() over (partition by id, fkey) as rnum from query, 
 the right result returns.
 Reproduce:
 {noformat}
 create table tlb1 (id int, fkey int, val string);
 create table tlb2 (fid int, name string);
 insert into table tlb1 values(100,1,'abc');
 insert into table tlb1 values(200,1,'efg');
 insert into table tlb2 values(1, 'key1');
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 
 INFO  : Ended Job = job_local1070163923_0017
 +-+---+---+--+
 No rows selected (14.248 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 +-+---+---+--+
 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey 
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
 0: jdbc:hive2://localhost:1 from (
 0: jdbc:hive2://localhost:1 select id, fkey 
 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey
 0: jdbc:hive2://localhost:1  ) ddd 
 0: jdbc:hive2://localhost:1 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 INFO  : Number of reduce tasks not specified. Estimated from input data size: 
 1
 ...
 INFO  : Ended Job = job_local672340505_0019
 +-+---+---+--+
 2 rows selected (14.383 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 | 100 | 1 | key1  |
 | 200 | 1 | key1  |
 +-+---+---+--+
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-21 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706577#comment-14706577
 ] 

Yongzhi Chen commented on HIVE-11604:
-

[~jcamachorodriguez], the problem is reproducible in master (My master is 
around 1 week old though). The fix follows the similar pattern in other cases 
in ProjectRemover, for example:
{noformat}
  Operator? extends OperatorDesc parent = parents.get(0);
  if (parent instanceof ReduceSinkOperator  
Iterators.any(sel.getChildOperators().iterator(),
  Predicates.instanceOf(ReduceSinkOperator.class))) {
// For RS-SEL-RS case. reducer operator in reducer task cannot be null 
in task compiler
return null;
  }
{noformat}
For the PTF case, it need a select operator follows it. We can add other cases 
before return null if they fall into the same category. 

 HIVE return wrong results in some queries with PTF function
 ---

 Key: HIVE-11604
 URL: https://issues.apache.org/jira/browse/HIVE-11604
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0, 1.1.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11604.1.patch


 Following query returns empty result which is not right:
 {noformat}
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 {noformat}
 After remove row_number() over (partition by id, fkey) as rnum from query, 
 the right result returns.
 Reproduce:
 {noformat}
 create table tlb1 (id int, fkey int, val string);
 create table tlb2 (fid int, name string);
 insert into table tlb1 values(100,1,'abc');
 insert into table tlb1 values(200,1,'efg');
 insert into table tlb2 values(1, 'key1');
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 
 INFO  : Ended Job = job_local1070163923_0017
 +-+---+---+--+
 No rows selected (14.248 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 +-+---+---+--+
 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey 
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
 0: jdbc:hive2://localhost:1 from (
 0: jdbc:hive2://localhost:1 select id, fkey 
 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey
 0: jdbc:hive2://localhost:1  ) ddd 
 0: jdbc:hive2://localhost:1 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 INFO  : Number of reduce tasks not specified. Estimated from input data size: 
 1
 ...
 INFO  : Ended Job = job_local672340505_0019
 +-+---+---+--+
 2 rows selected (14.383 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 | 100 | 1 | key1  |
 | 200 | 1 | key1  |
 +-+---+---+--+
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-21 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706519#comment-14706519
 ] 

Jesus Camacho Rodriguez commented on HIVE-11604:


[~ychena], the patch seems like a workaround to me.

Through the JIRA case I assume that the problem cannot be reproduced in master? 
How was it fixed?
If you compare master vs any of the branches where the problem appears:
1) Does the RowSchema out of the PTF differs in master vs branch?
2) If it doesn't, does {{isIdentityProject}} method in SelectOperator returns a 
different result in master vs branch?

Thanks

 HIVE return wrong results in some queries with PTF function
 ---

 Key: HIVE-11604
 URL: https://issues.apache.org/jira/browse/HIVE-11604
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0, 1.1.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11604.1.patch


 Following query returns empty result which is not right:
 {noformat}
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 {noformat}
 After remove row_number() over (partition by id, fkey) as rnum from query, 
 the right result returns.
 Reproduce:
 {noformat}
 create table tlb1 (id int, fkey int, val string);
 create table tlb2 (fid int, name string);
 insert into table tlb1 values(100,1,'abc');
 insert into table tlb1 values(200,1,'efg');
 insert into table tlb2 values(1, 'key1');
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 
 INFO  : Ended Job = job_local1070163923_0017
 +-+---+---+--+
 No rows selected (14.248 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 +-+---+---+--+
 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey 
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
 0: jdbc:hive2://localhost:1 from (
 0: jdbc:hive2://localhost:1 select id, fkey 
 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey
 0: jdbc:hive2://localhost:1  ) ddd 
 0: jdbc:hive2://localhost:1 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 INFO  : Number of reduce tasks not specified. Estimated from input data size: 
 1
 ...
 INFO  : Ended Job = job_local672340505_0019
 +-+---+---+--+
 2 rows selected (14.383 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 | 100 | 1 | key1  |
 | 200 | 1 | key1  |
 +-+---+---+--+
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706219#comment-14706219
 ] 

Ashutosh Chauhan commented on HIVE-11604:
-

I am not sure special casing for PTF is a good idea. [~jcamachorodriguez] what 
do you think?

 HIVE return wrong results in some queries with PTF function
 ---

 Key: HIVE-11604
 URL: https://issues.apache.org/jira/browse/HIVE-11604
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0, 1.1.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11604.1.patch


 Following query returns empty result which is not right:
 {noformat}
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 {noformat}
 After remove row_number() over (partition by id, fkey) as rnum from query, 
 the right result returns.
 Reproduce:
 {noformat}
 create table tlb1 (id int, fkey int, val string);
 create table tlb2 (fid int, name string);
 insert into table tlb1 values(100,1,'abc');
 insert into table tlb1 values(200,1,'efg');
 insert into table tlb2 values(1, 'key1');
 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey, 
 row_number() over (partition by id, fkey) as rnum
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 
 INFO  : Ended Job = job_local1070163923_0017
 +-+---+---+--+
 No rows selected (14.248 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 +-+---+---+--+
 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name
 from (
 select id, fkey 
 from tlb1 group by id, fkey
  ) ddd 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
 0: jdbc:hive2://localhost:1 from (
 0: jdbc:hive2://localhost:1 select id, fkey 
 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey
 0: jdbc:hive2://localhost:1  ) ddd 
 0: jdbc:hive2://localhost:1 
 inner join tlb2 aaa on aaa.fid = ddd.fkey;
 INFO  : Number of reduce tasks not specified. Estimated from input data size: 
 1
 ...
 INFO  : Ended Job = job_local672340505_0019
 +-+---+---+--+
 2 rows selected (14.383 seconds)
 | ddd.id  | ddd.fkey  | aaa.name  |
 +-+---+---+--+
 | 100 | 1 | key1  |
 | 200 | 1 | key1  |
 +-+---+---+--+
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)