[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function
[ https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716281#comment-14716281 ] Jesus Camacho Rodriguez commented on HIVE-11604: [~ychena], thanks for your reply. I thought the problem was not reproducible in master because of the affected versions field. I checked back the patch; LGTM, +1. HIVE return wrong results in some queries with PTF function --- Key: HIVE-11604 URL: https://issues.apache.org/jira/browse/HIVE-11604 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.2.0, 1.1.0, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11604.1.patch, HIVE-11604.2.patch Following query returns empty result which is not right: {noformat} select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; {noformat} After remove row_number() over (partition by id, fkey) as rnum from query, the right result returns. Reproduce: {noformat} create table tlb1 (id int, fkey int, val string); create table tlb2 (fid int, name string); insert into table tlb1 values(100,1,'abc'); insert into table tlb1 values(200,1,'efg'); insert into table tlb2 values(1, 'key1'); select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Ended Job = job_local1070163923_0017 +-+---+---+--+ No rows selected (14.248 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ +-+---+---+--+ 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name from ( select id, fkey from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name 0: jdbc:hive2://localhost:1 from ( 0: jdbc:hive2://localhost:1 select id, fkey 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey 0: jdbc:hive2://localhost:1 ) ddd 0: jdbc:hive2://localhost:1 inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Number of reduce tasks not specified. Estimated from input data size: 1 ... INFO : Ended Job = job_local672340505_0019 +-+---+---+--+ 2 rows selected (14.383 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ | 100 | 1 | key1 | | 200 | 1 | key1 | +-+---+---+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function
[ https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715810#comment-14715810 ] Szehon Ho commented on HIVE-11604: -- +1 from me unless there's a better alternative, the IdentityProjectRemover has caused a lot of issues, and had to be workaround in cases other than this one. HIVE return wrong results in some queries with PTF function --- Key: HIVE-11604 URL: https://issues.apache.org/jira/browse/HIVE-11604 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.2.0, 1.1.0, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11604.1.patch, HIVE-11604.2.patch Following query returns empty result which is not right: {noformat} select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; {noformat} After remove row_number() over (partition by id, fkey) as rnum from query, the right result returns. Reproduce: {noformat} create table tlb1 (id int, fkey int, val string); create table tlb2 (fid int, name string); insert into table tlb1 values(100,1,'abc'); insert into table tlb1 values(200,1,'efg'); insert into table tlb2 values(1, 'key1'); select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Ended Job = job_local1070163923_0017 +-+---+---+--+ No rows selected (14.248 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ +-+---+---+--+ 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name from ( select id, fkey from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name 0: jdbc:hive2://localhost:1 from ( 0: jdbc:hive2://localhost:1 select id, fkey 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey 0: jdbc:hive2://localhost:1 ) ddd 0: jdbc:hive2://localhost:1 inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Number of reduce tasks not specified. Estimated from input data size: 1 ... INFO : Ended Job = job_local672340505_0019 +-+---+---+--+ 2 rows selected (14.383 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ | 100 | 1 | key1 | | 200 | 1 | key1 | +-+---+---+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function
[ https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715795#comment-14715795 ] Yongzhi Chen commented on HIVE-11604: - [~csun], [~xuefuz], [~szehon], Could you review the patch? Thanks HIVE return wrong results in some queries with PTF function --- Key: HIVE-11604 URL: https://issues.apache.org/jira/browse/HIVE-11604 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.2.0, 1.1.0, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11604.1.patch, HIVE-11604.2.patch Following query returns empty result which is not right: {noformat} select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; {noformat} After remove row_number() over (partition by id, fkey) as rnum from query, the right result returns. Reproduce: {noformat} create table tlb1 (id int, fkey int, val string); create table tlb2 (fid int, name string); insert into table tlb1 values(100,1,'abc'); insert into table tlb1 values(200,1,'efg'); insert into table tlb2 values(1, 'key1'); select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Ended Job = job_local1070163923_0017 +-+---+---+--+ No rows selected (14.248 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ +-+---+---+--+ 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name from ( select id, fkey from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name 0: jdbc:hive2://localhost:1 from ( 0: jdbc:hive2://localhost:1 select id, fkey 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey 0: jdbc:hive2://localhost:1 ) ddd 0: jdbc:hive2://localhost:1 inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Number of reduce tasks not specified. Estimated from input data size: 1 ... INFO : Ended Job = job_local672340505_0019 +-+---+---+--+ 2 rows selected (14.383 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ | 100 | 1 | key1 | | 200 | 1 | key1 | +-+---+---+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function
[ https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706710#comment-14706710 ] Hive QA commented on HIVE-11604: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12751643/HIVE-11604.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9372 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5031/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5031/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5031/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12751643 - PreCommit-HIVE-TRUNK-Build HIVE return wrong results in some queries with PTF function --- Key: HIVE-11604 URL: https://issues.apache.org/jira/browse/HIVE-11604 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.2.0, 1.1.0, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11604.1.patch Following query returns empty result which is not right: {noformat} select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; {noformat} After remove row_number() over (partition by id, fkey) as rnum from query, the right result returns. Reproduce: {noformat} create table tlb1 (id int, fkey int, val string); create table tlb2 (fid int, name string); insert into table tlb1 values(100,1,'abc'); insert into table tlb1 values(200,1,'efg'); insert into table tlb2 values(1, 'key1'); select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Ended Job = job_local1070163923_0017 +-+---+---+--+ No rows selected (14.248 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ +-+---+---+--+ 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name from ( select id, fkey from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name 0: jdbc:hive2://localhost:1 from ( 0: jdbc:hive2://localhost:1 select id, fkey 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey 0: jdbc:hive2://localhost:1 ) ddd 0: jdbc:hive2://localhost:1 inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Number of reduce tasks not specified. Estimated from input data size: 1 ... INFO : Ended Job = job_local672340505_0019 +-+---+---+--+ 2 rows selected (14.383 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ | 100 | 1 | key1 | | 200 | 1 | key1 | +-+---+---+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function
[ https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706832#comment-14706832 ] Yongzhi Chen commented on HIVE-11604: - My fix will change query plan a little bit in some cases(especially wrong results return without the fixes cases). The failure is because of an extra select operator in the query plan, it should be no harm: it does correct RS_5's output column to the right sequence, although it might not be so important in this scenario. Attach second patch to change the test output, and add more test cases which succeed in master even without my fix to catch possible regressions in the future. In current master: This query returns wrong results: {noformat} select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; {noformat} while following returns right value(only different is the extra ddd.rnum): {noformat} select ddd.id, ddd.fkey, aaa.name, ddd.rnum from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; {noformat} HIVE return wrong results in some queries with PTF function --- Key: HIVE-11604 URL: https://issues.apache.org/jira/browse/HIVE-11604 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.2.0, 1.1.0, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11604.1.patch, HIVE-11604.2.patch Following query returns empty result which is not right: {noformat} select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; {noformat} After remove row_number() over (partition by id, fkey) as rnum from query, the right result returns. Reproduce: {noformat} create table tlb1 (id int, fkey int, val string); create table tlb2 (fid int, name string); insert into table tlb1 values(100,1,'abc'); insert into table tlb1 values(200,1,'efg'); insert into table tlb2 values(1, 'key1'); select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Ended Job = job_local1070163923_0017 +-+---+---+--+ No rows selected (14.248 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ +-+---+---+--+ 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name from ( select id, fkey from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name 0: jdbc:hive2://localhost:1 from ( 0: jdbc:hive2://localhost:1 select id, fkey 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey 0: jdbc:hive2://localhost:1 ) ddd 0: jdbc:hive2://localhost:1 inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Number of reduce tasks not specified. Estimated from input data size: 1 ... INFO : Ended Job = job_local672340505_0019 +-+---+---+--+ 2 rows selected (14.383 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ | 100 | 1 | key1 | | 200 | 1 | key1 | +-+---+---+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function
[ https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707580#comment-14707580 ] Yongzhi Chen commented on HIVE-11604: - All the 15 spark failures are not related. They are because of same error: Timed out waiting for Spark cluster to init And the patch2's source code is the same as patch1, there is no spark failure in the first patch. HIVE return wrong results in some queries with PTF function --- Key: HIVE-11604 URL: https://issues.apache.org/jira/browse/HIVE-11604 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.2.0, 1.1.0, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11604.1.patch, HIVE-11604.2.patch Following query returns empty result which is not right: {noformat} select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; {noformat} After remove row_number() over (partition by id, fkey) as rnum from query, the right result returns. Reproduce: {noformat} create table tlb1 (id int, fkey int, val string); create table tlb2 (fid int, name string); insert into table tlb1 values(100,1,'abc'); insert into table tlb1 values(200,1,'efg'); insert into table tlb2 values(1, 'key1'); select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Ended Job = job_local1070163923_0017 +-+---+---+--+ No rows selected (14.248 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ +-+---+---+--+ 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name from ( select id, fkey from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name 0: jdbc:hive2://localhost:1 from ( 0: jdbc:hive2://localhost:1 select id, fkey 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey 0: jdbc:hive2://localhost:1 ) ddd 0: jdbc:hive2://localhost:1 inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Number of reduce tasks not specified. Estimated from input data size: 1 ... INFO : Ended Job = job_local672340505_0019 +-+---+---+--+ 2 rows selected (14.383 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ | 100 | 1 | key1 | | 200 | 1 | key1 | +-+---+---+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function
[ https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707503#comment-14707503 ] Hive QA commented on HIVE-11604: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12751736/HIVE-11604.2.patch {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 9376 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_stats org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_avro_compression_enabled_native org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_avro_joins org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_stats org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dynamic_rdd_cache org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_escape_clusterby1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby5_map_skew org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_ppr org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_list_bucket_dml_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_louter_join_ppr org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_outer_join5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_reduce_deduplicate_exclude_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_timestamp_3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union16 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5036/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5036/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5036/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12751736 - PreCommit-HIVE-TRUNK-Build HIVE return wrong results in some queries with PTF function --- Key: HIVE-11604 URL: https://issues.apache.org/jira/browse/HIVE-11604 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.2.0, 1.1.0, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11604.1.patch, HIVE-11604.2.patch Following query returns empty result which is not right: {noformat} select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; {noformat} After remove row_number() over (partition by id, fkey) as rnum from query, the right result returns. Reproduce: {noformat} create table tlb1 (id int, fkey int, val string); create table tlb2 (fid int, name string); insert into table tlb1 values(100,1,'abc'); insert into table tlb1 values(200,1,'efg'); insert into table tlb2 values(1, 'key1'); select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Ended Job = job_local1070163923_0017 +-+---+---+--+ No rows selected (14.248 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ +-+---+---+--+ 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name from ( select id, fkey from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name 0: jdbc:hive2://localhost:1 from ( 0: jdbc:hive2://localhost:1 select id, fkey 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey 0: jdbc:hive2://localhost:1 ) ddd 0: jdbc:hive2://localhost:1 inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Number of reduce tasks not specified. Estimated from input data size: 1 ... INFO : Ended Job = job_local672340505_0019 +-+---+---+--+ 2 rows selected (14.383 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ | 100 | 1 | key1 | | 200 | 1 | key1 | +-+---+---+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function
[ https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706577#comment-14706577 ] Yongzhi Chen commented on HIVE-11604: - [~jcamachorodriguez], the problem is reproducible in master (My master is around 1 week old though). The fix follows the similar pattern in other cases in ProjectRemover, for example: {noformat} Operator? extends OperatorDesc parent = parents.get(0); if (parent instanceof ReduceSinkOperator Iterators.any(sel.getChildOperators().iterator(), Predicates.instanceOf(ReduceSinkOperator.class))) { // For RS-SEL-RS case. reducer operator in reducer task cannot be null in task compiler return null; } {noformat} For the PTF case, it need a select operator follows it. We can add other cases before return null if they fall into the same category. HIVE return wrong results in some queries with PTF function --- Key: HIVE-11604 URL: https://issues.apache.org/jira/browse/HIVE-11604 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.2.0, 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11604.1.patch Following query returns empty result which is not right: {noformat} select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; {noformat} After remove row_number() over (partition by id, fkey) as rnum from query, the right result returns. Reproduce: {noformat} create table tlb1 (id int, fkey int, val string); create table tlb2 (fid int, name string); insert into table tlb1 values(100,1,'abc'); insert into table tlb1 values(200,1,'efg'); insert into table tlb2 values(1, 'key1'); select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Ended Job = job_local1070163923_0017 +-+---+---+--+ No rows selected (14.248 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ +-+---+---+--+ 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name from ( select id, fkey from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name 0: jdbc:hive2://localhost:1 from ( 0: jdbc:hive2://localhost:1 select id, fkey 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey 0: jdbc:hive2://localhost:1 ) ddd 0: jdbc:hive2://localhost:1 inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Number of reduce tasks not specified. Estimated from input data size: 1 ... INFO : Ended Job = job_local672340505_0019 +-+---+---+--+ 2 rows selected (14.383 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ | 100 | 1 | key1 | | 200 | 1 | key1 | +-+---+---+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function
[ https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706519#comment-14706519 ] Jesus Camacho Rodriguez commented on HIVE-11604: [~ychena], the patch seems like a workaround to me. Through the JIRA case I assume that the problem cannot be reproduced in master? How was it fixed? If you compare master vs any of the branches where the problem appears: 1) Does the RowSchema out of the PTF differs in master vs branch? 2) If it doesn't, does {{isIdentityProject}} method in SelectOperator returns a different result in master vs branch? Thanks HIVE return wrong results in some queries with PTF function --- Key: HIVE-11604 URL: https://issues.apache.org/jira/browse/HIVE-11604 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.2.0, 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11604.1.patch Following query returns empty result which is not right: {noformat} select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; {noformat} After remove row_number() over (partition by id, fkey) as rnum from query, the right result returns. Reproduce: {noformat} create table tlb1 (id int, fkey int, val string); create table tlb2 (fid int, name string); insert into table tlb1 values(100,1,'abc'); insert into table tlb1 values(200,1,'efg'); insert into table tlb2 values(1, 'key1'); select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Ended Job = job_local1070163923_0017 +-+---+---+--+ No rows selected (14.248 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ +-+---+---+--+ 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name from ( select id, fkey from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name 0: jdbc:hive2://localhost:1 from ( 0: jdbc:hive2://localhost:1 select id, fkey 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey 0: jdbc:hive2://localhost:1 ) ddd 0: jdbc:hive2://localhost:1 inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Number of reduce tasks not specified. Estimated from input data size: 1 ... INFO : Ended Job = job_local672340505_0019 +-+---+---+--+ 2 rows selected (14.383 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ | 100 | 1 | key1 | | 200 | 1 | key1 | +-+---+---+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function
[ https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706219#comment-14706219 ] Ashutosh Chauhan commented on HIVE-11604: - I am not sure special casing for PTF is a good idea. [~jcamachorodriguez] what do you think? HIVE return wrong results in some queries with PTF function --- Key: HIVE-11604 URL: https://issues.apache.org/jira/browse/HIVE-11604 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.2.0, 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11604.1.patch Following query returns empty result which is not right: {noformat} select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; {noformat} After remove row_number() over (partition by id, fkey) as rnum from query, the right result returns. Reproduce: {noformat} create table tlb1 (id int, fkey int, val string); create table tlb2 (fid int, name string); insert into table tlb1 values(100,1,'abc'); insert into table tlb1 values(200,1,'efg'); insert into table tlb2 values(1, 'key1'); select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Ended Job = job_local1070163923_0017 +-+---+---+--+ No rows selected (14.248 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ +-+---+---+--+ 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name from ( select id, fkey from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name 0: jdbc:hive2://localhost:1 from ( 0: jdbc:hive2://localhost:1 select id, fkey 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey 0: jdbc:hive2://localhost:1 ) ddd 0: jdbc:hive2://localhost:1 inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Number of reduce tasks not specified. Estimated from input data size: 1 ... INFO : Ended Job = job_local672340505_0019 +-+---+---+--+ 2 rows selected (14.383 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ | 100 | 1 | key1 | | 200 | 1 | key1 | +-+---+---+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)