[jira] [Comment Edited] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected
[ https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313657#comment-16313657 ] Paul Jackson edited comment on HIVE-18375 at 1/5/18 6:49 PM: - [~pxiong] I'm referring to the four queries in this comment: https://issues.apache.org/jira/browse/HIVE-18375?focusedCommentId=16312051=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16312051 was (Author: pauljackson123): [~pxiong] I'm referring to the for queries in this comment: https://issues.apache.org/jira/browse/HIVE-18375?focusedCommentId=16312051=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16312051 > Cannot ORDER by subquery fields unless they are selected > > > Key: HIVE-18375 > URL: https://issues.apache.org/jira/browse/HIVE-18375 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.3.2 > Environment: Amazon AWS > Release label:emr-5.11.0 > Hadoop distribution:Amazon 2.7.3 > Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1 > classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict] >Reporter: Paul Jackson >Priority: Minor > > Give these tables: > {code:SQL} > CREATE TABLE employees ( > emp_no INT, > first_name VARCHAR(14), > last_name VARCHAR(16) > ); > insert into employees values > (1, 'Gottlob', 'Frege'), > (2, 'Bertrand', 'Russell'), > (3, 'Ludwig', 'Wittgenstein'); > CREATE TABLE salaries ( > emp_no INT, > salary INT, > from_date DATE, > to_date DATE > ); > insert into salaries values > (1, 10, '1900-01-01', '1900-01-31'), > (1, 18, '1900-09-01', '1900-09-30'), > (2, 15, '1940-03-01', '1950-01-01'), > (3, 20, '1920-01-01', '1950-01-01'); > {code} > This query returns the names of the employees ordered by their peak salary: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, this should still work even if the max_salary is not part of the > projection: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, that fails with this error: > {code} > Error while compiling statement: FAILED: SemanticException [Error 10004]: > line 9:9 Invalid table alias or column reference 't1': (possible column names > are: last_name, first_name) > {code} > FWIW, this also fails: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > But this succeeds: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `max_sal` DESC; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected
[ https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313657#comment-16313657 ] Paul Jackson commented on HIVE-18375: - [~pxiong] I'm referring to the for queries in this comment: https://issues.apache.org/jira/browse/HIVE-18375?focusedCommentId=16312051=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16312051 > Cannot ORDER by subquery fields unless they are selected > > > Key: HIVE-18375 > URL: https://issues.apache.org/jira/browse/HIVE-18375 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.3.2 > Environment: Amazon AWS > Release label:emr-5.11.0 > Hadoop distribution:Amazon 2.7.3 > Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1 > classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict] >Reporter: Paul Jackson >Priority: Minor > > Give these tables: > {code:SQL} > CREATE TABLE employees ( > emp_no INT, > first_name VARCHAR(14), > last_name VARCHAR(16) > ); > insert into employees values > (1, 'Gottlob', 'Frege'), > (2, 'Bertrand', 'Russell'), > (3, 'Ludwig', 'Wittgenstein'); > CREATE TABLE salaries ( > emp_no INT, > salary INT, > from_date DATE, > to_date DATE > ); > insert into salaries values > (1, 10, '1900-01-01', '1900-01-31'), > (1, 18, '1900-09-01', '1900-09-30'), > (2, 15, '1940-03-01', '1950-01-01'), > (3, 20, '1920-01-01', '1950-01-01'); > {code} > This query returns the names of the employees ordered by their peak salary: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, this should still work even if the max_salary is not part of the > projection: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, that fails with this error: > {code} > Error while compiling statement: FAILED: SemanticException [Error 10004]: > line 9:9 Invalid table alias or column reference 't1': (possible column names > are: last_name, first_name) > {code} > FWIW, this also fails: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > But this succeeds: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `max_sal` DESC; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected
[ https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312359#comment-16312359 ] Paul Jackson commented on HIVE-18375: - There is no doubt these are the same issue. What do you think about the simpler issue in my comment that does not involve ORDER BY? > Cannot ORDER by subquery fields unless they are selected > > > Key: HIVE-18375 > URL: https://issues.apache.org/jira/browse/HIVE-18375 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.3.2 > Environment: Amazon AWS > Release label:emr-5.11.0 > Hadoop distribution:Amazon 2.7.3 > Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1 > classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict] >Reporter: Paul Jackson >Priority: Minor > > Give these tables: > {code:SQL} > CREATE TABLE employees ( > emp_no INT, > first_name VARCHAR(14), > last_name VARCHAR(16) > ); > insert into employees values > (1, 'Gottlob', 'Frege'), > (2, 'Bertrand', 'Russell'), > (3, 'Ludwig', 'Wittgenstein'); > CREATE TABLE salaries ( > emp_no INT, > salary INT, > from_date DATE, > to_date DATE > ); > insert into salaries values > (1, 10, '1900-01-01', '1900-01-31'), > (1, 18, '1900-09-01', '1900-09-30'), > (2, 15, '1940-03-01', '1950-01-01'), > (3, 20, '1920-01-01', '1950-01-01'); > {code} > This query returns the names of the employees ordered by their peak salary: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, this should still work even if the max_salary is not part of the > projection: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, that fails with this error: > {code} > Error while compiling statement: FAILED: SemanticException [Error 10004]: > line 9:9 Invalid table alias or column reference 't1': (possible column names > are: last_name, first_name) > {code} > FWIW, this also fails: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > But this succeeds: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `max_sal` DESC; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected
[ https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312051#comment-16312051 ] Paul Jackson commented on HIVE-18375: - This seems related. Posting as a comment, but perhaps it should be in its own bug report. Order By cannot see fields if they are projected with an alias. The first two queries fail with: {code}SemanticException [Error 10004]: line 7:9 Invalid table alias or column reference 'emp_no': (possible column names are: f_4, f_3, f_5){code} The last two succeed. {code:SQL} SELECT `first_name` `F_4`, `last_name` `F_5` FROM `default`.`employees` ORDER BY `emp_no` DESC; SELECT `first_name` `F_4`, `emp_no` `F_3`, `last_name` `F_5` FROM `default`.`employees` ORDER BY `emp_no` DESC; SELECT `first_name` `F_4`, `emp_no`, `last_name` `F_5` FROM `default`.`employees` ORDER BY `emp_no` DESC; SELECT `first_name` `F_4`, `emp_no` `F_3`, `last_name` `F_5` FROM `default`.`employees` ORDER BY `F_3` DESC; {code} > Cannot ORDER by subquery fields unless they are selected > > > Key: HIVE-18375 > URL: https://issues.apache.org/jira/browse/HIVE-18375 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.3.2 > Environment: Amazon AWS > Release label:emr-5.11.0 > Hadoop distribution:Amazon 2.7.3 > Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1 > classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict] >Reporter: Paul Jackson >Priority: Minor > > Give these tables: > {code:SQL} > CREATE TABLE employees ( > emp_no INT, > first_name VARCHAR(14), > last_name VARCHAR(16) > ); > insert into employees values > (1, 'Gottlob', 'Frege'), > (2, 'Bertrand', 'Russell'), > (3, 'Ludwig', 'Wittgenstein'); > CREATE TABLE salaries ( > emp_no INT, > salary INT, > from_date DATE, > to_date DATE > ); > insert into salaries values > (1, 10, '1900-01-01', '1900-01-31'), > (1, 18, '1900-09-01', '1900-09-30'), > (2, 15, '1940-03-01', '1950-01-01'), > (3, 20, '1920-01-01', '1950-01-01'); > {code} > This query returns the names of the employees ordered by their peak salary: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, this should still work even if the max_salary is not part of the > projection: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, that fails with this error: > {code} > Error while compiling statement: FAILED: SemanticException [Error 10004]: > line 9:9 Invalid table alias or column reference 't1': (possible column names > are: last_name, first_name) > {code} > FWIW, this also fails: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > But this succeeds: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `max_sal` DESC; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18334) Cannot JOIN ON result of COALESCE
[ https://issues.apache.org/jira/browse/HIVE-18334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Jackson updated HIVE-18334: Component/s: Query Processor > Cannot JOIN ON result of COALESCE > -- > > Key: HIVE-18334 > URL: https://issues.apache.org/jira/browse/HIVE-18334 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.3.2 > Environment: Amazon AWS > Release label:emr-5.11.0 > Hadoop distribution:Amazon 2.7.3 > Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1 > classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict] >Reporter: Paul Jackson >Priority: Minor > > A join is returning no results when the ON clause is equating the results of > two COALESCE functions. To reproduce: > {code:SQL} > CREATE TABLE t5 ( > dno INTEGER, > dname VARCHAR(30), > eno INTEGER, > ename VARCHAR(30)); > CREATE TABLE t6 ( > dno INTEGER, > dname VARCHAR(30), > eno INTEGER, > ename VARCHAR(30)); > INSERT INTO t5 VALUES > (10, 'FOO', NULL, NULL), > (20, 'BAR', NULL, NULL), > (NULL, NULL, 7300, 'LARRY'), > (NULL, NULL, 7400, 'MOE'), > (NULL, NULL, 7500, 'CURLY'); > INSERT INTO t6 VALUES > (10, 'LENNON', NULL, NULL), > (20, 'MCCARTNEY', NULL, NULL), > (NULL, NULL, 7300, 'READY'), > (NULL, NULL, 7400, 'WILLING'), > (NULL, NULL, 7500, 'ABLE'); > -- Fails with 0 results > SELECT * > FROM t5 > INNER JOIN t6 > ON COALESCE(`t5`.`eno`, `t5`.`dno`) = COALESCE(`t6`.`eno`, `t6`.`dno`) > -- Full cross with where clause works (in nonstrict mode), returning 5 results > SELECT * > FROM t5 > JOIN t6 > WHERE `t5`.`eno` = `t6`.`eno` OR `t5`.`dno` = `t6`.`dno` > -- Strange that coalescing the same field returns 2 results... > SELECT * > FROM t5 > INNER JOIN t6 > ON COALESCE(`t5`.`dno`, `t5`.`dno`) = COALESCE(`t6`.`dno`, `t6`.`dno`) > -- ...and coalescing the other field returns 3 results > SELECT * > FROM t5 > INNER JOIN t6 > ON COALESCE(`t5`.`eno`, `t5`.`eno`) = COALESCE(`t6`.`eno`, `t6`.`eno`) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected
[ https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Jackson updated HIVE-18375: Description: Give these tables: {code:SQL} CREATE TABLE employees ( emp_no INT, first_name VARCHAR(14), last_name VARCHAR(16) ); insert into employees values (1, 'Gottlob', 'Frege'), (2, 'Bertrand', 'Russell'), (3, 'Ludwig', 'Wittgenstein'); CREATE TABLE salaries ( emp_no INT, salary INT, from_date DATE, to_date DATE ); insert into salaries values (1, 10, '1900-01-01', '1900-01-31'), (1, 18, '1900-09-01', '1900-09-30'), (2, 15, '1940-03-01', '1950-01-01'), (3, 20, '1920-01-01', '1950-01-01'); {code} This query returns the names of the employees ordered by their peak salary: {code:SQL} SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` FROM `default`.`employees` INNER JOIN (SELECT `emp_no`, MAX(`salary`) `max_salary` FROM `default`.`salaries` WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL GROUP BY `emp_no`) AS `t1` ON `employees`.`emp_no` = `t1`.`emp_no` ORDER BY `t1`.`max_salary` DESC; {code} However, this should still work even if the max_salary is not part of the projection: {code:SQL} SELECT `employees`.`last_name`, `employees`.`first_name` FROM `default`.`employees` INNER JOIN (SELECT `emp_no`, MAX(`salary`) `max_salary` FROM `default`.`salaries` WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL GROUP BY `emp_no`) AS `t1` ON `employees`.`emp_no` = `t1`.`emp_no` ORDER BY `t1`.`max_salary` DESC; {code} However, that fails with this error: {code} Error while compiling statement: FAILED: SemanticException [Error 10004]: line 9:9 Invalid table alias or column reference 't1': (possible column names are: last_name, first_name) {code} FWIW, this also fails: {code:SQL} SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` AS `max_sal` FROM `default`.`employees` INNER JOIN (SELECT `emp_no`, MAX(`salary`) `max_salary` FROM `default`.`salaries` WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL GROUP BY `emp_no`) AS `t1` ON `employees`.`emp_no` = `t1`.`emp_no` ORDER BY `t1`.`max_salary` DESC; {code} But this succeeds: {code:SQL} SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` AS `max_sal` FROM `default`.`employees` INNER JOIN (SELECT `emp_no`, MAX(`salary`) `max_salary` FROM `default`.`salaries` WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL GROUP BY `emp_no`) AS `t1` ON `employees`.`emp_no` = `t1`.`emp_no` ORDER BY `max_sal` DESC; {code} was: Give these tables: {code:SQL} CREATE TABLE employees ( emp_no INT, first_name VARCHAR(14), last_name VARCHAR(16) ); insert into employees values (1, 'Gottlob', 'Frege'), (2, 'Bertrand', 'Russell'), (3, 'Ludwig', 'Wittgenstein'); CREATE TABLE salaries ( emp_no INT, salary INT, from_date DATE, to_date DATE ); insert into salaries values (1, 10, '1900-01-01', '1900-01-31'), (1, 18, '1900-09-01', '1900-09-30'), (2, 15, '1940-03-01', '1950-01-01'), (3, 20, '1920-01-01', '1950-01-01'); {code} This query returns the names of the employees ordered by their peak salary: {code:SQL} SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` FROM `default`.`employees` INNER JOIN (SELECT `emp_no`, MAX(`salary`) `max_salary` FROM `default`.`salaries` WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL GROUP BY `emp_no`) AS `t1` ON `employees`.`emp_no` = `t1`.`emp_no` ORDER BY `t1`.`max_salary` DESC; {code} However, this should still work even if the max_salary is not part of the projection: {code:SQL} SELECT `employees`.`last_name`, `employees`.`first_name` FROM `default`.`employees` INNER JOIN (SELECT `emp_no`, MAX(`salary`) `max_salary` FROM `default`.`salaries` WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL GROUP BY `emp_no`) AS `t1` ON `employees`.`emp_no` = `t1`.`emp_no` ORDER BY `t1`.`max_salary` DESC; {code} However, that fails with this error: {code} Error while compiling statement: FAILED: SemanticException [Error 10004]: line 9:9 Invalid table alias or column reference 't1': (possible column names are: last_name, first_name) {code} > Cannot ORDER by subquery fields unless they are selected > > > Key: HIVE-18375 > URL: https://issues.apache.org/jira/browse/HIVE-18375 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.3.2 > Environment: Amazon AWS > Release label:emr-5.11.0 > Hadoop distribution:Amazon 2.7.3 > Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1 > classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict] >Reporter: Paul Jackson >Priority: Minor > > Give these tables: >
[jira] [Updated] (HIVE-18334) Cannot JOIN ON result of COALESCE
[ https://issues.apache.org/jira/browse/HIVE-18334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Jackson updated HIVE-18334: Description: A join is returning no results when the ON clause is equating the results of two COALESCE functions. To reproduce: {code:SQL} CREATE TABLE t5 ( dno INTEGER, dname VARCHAR(30), eno INTEGER, ename VARCHAR(30)); CREATE TABLE t6 ( dno INTEGER, dname VARCHAR(30), eno INTEGER, ename VARCHAR(30)); INSERT INTO t5 VALUES (10, 'FOO', NULL, NULL), (20, 'BAR', NULL, NULL), (NULL, NULL, 7300, 'LARRY'), (NULL, NULL, 7400, 'MOE'), (NULL, NULL, 7500, 'CURLY'); INSERT INTO t6 VALUES (10, 'LENNON', NULL, NULL), (20, 'MCCARTNEY', NULL, NULL), (NULL, NULL, 7300, 'READY'), (NULL, NULL, 7400, 'WILLING'), (NULL, NULL, 7500, 'ABLE'); -- Fails with 0 results SELECT * FROM t5 INNER JOIN t6 ON COALESCE(`t5`.`eno`, `t5`.`dno`) = COALESCE(`t6`.`eno`, `t6`.`dno`) -- Full cross with where clause works (in nonstrict mode), returning 5 results SELECT * FROM t5 JOIN t6 WHERE `t5`.`eno` = `t6`.`eno` OR `t5`.`dno` = `t6`.`dno` -- Strange that coalescing the same field returns 2 results... SELECT * FROM t5 INNER JOIN t6 ON COALESCE(`t5`.`dno`, `t5`.`dno`) = COALESCE(`t6`.`dno`, `t6`.`dno`) -- ...and coalescing the other field returns 3 results SELECT * FROM t5 INNER JOIN t6 ON COALESCE(`t5`.`eno`, `t5`.`eno`) = COALESCE(`t6`.`eno`, `t6`.`eno`) {code} was: A join is returning no results when the ON clause is equating the results of two COALESCE functions. To reproduce: {code:SQL} CREATE TABLE t5 ( dno INTEGER, dname VARCHAR(30), eno INTEGER, ename VARCHAR(30)); CREATE TABLE t6 ( dno INTEGER, dname VARCHAR(30), eno INTEGER, ename VARCHAR(30)); INSERT INTO t5 VALUES (10, 'FOO', NULL, NULL), (20, 'BAR', NULL, NULL), (NULL, NULL, 7300, 'LARRY'), (NULL, NULL, 7400, 'MOE'), (NULL, NULL, 7500, 'CURLY'); INSERT INTO t6 VALUES (10, 'LENNON', NULL, NULL), (20, 'MCCARTNEY', NULL, NULL), (NULL, NULL, 7300, 'READY'), (NULL, NULL, 7400, 'WILLING'), (NULL, NULL, 7500, 'ABLE'); -- Fails with 0 results SELECT * FROM t5 INNER JOIN t6 ON COALESCE(`t5`.`eno`, `t5`.`dno`) = COALESCE(`t6`.`eno`, `t6`.`dno`) -- Full cross with where clause works, returning 5 results SELECT * FROM t5 JOIN t6 WHERE `t5`.`eno` = `t6`.`eno` OR `t5`.`dno` = `t6`.`dno` -- Strange that coalescing the same field returns 2 results... SELECT * FROM t5 INNER JOIN t6 ON COALESCE(`t5`.`dno`, `t5`.`dno`) = COALESCE(`t6`.`dno`, `t6`.`dno`) -- ...and coalescing the other field returns 3 results SELECT * FROM t5 INNER JOIN t6 ON COALESCE(`t5`.`eno`, `t5`.`eno`) = COALESCE(`t6`.`eno`, `t6`.`eno`) {code} > Cannot JOIN ON result of COALESCE > -- > > Key: HIVE-18334 > URL: https://issues.apache.org/jira/browse/HIVE-18334 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.2 > Environment: Amazon AWS > Release label:emr-5.11.0 > Hadoop distribution:Amazon 2.7.3 > Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1 > classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict] >Reporter: Paul Jackson >Priority: Minor > > A join is returning no results when the ON clause is equating the results of > two COALESCE functions. To reproduce: > {code:SQL} > CREATE TABLE t5 ( > dno INTEGER, > dname VARCHAR(30), > eno INTEGER, > ename VARCHAR(30)); > CREATE TABLE t6 ( > dno INTEGER, > dname VARCHAR(30), > eno INTEGER, > ename VARCHAR(30)); > INSERT INTO t5 VALUES > (10, 'FOO', NULL, NULL), > (20, 'BAR', NULL, NULL), > (NULL, NULL, 7300, 'LARRY'), > (NULL, NULL, 7400, 'MOE'), > (NULL, NULL, 7500, 'CURLY'); > INSERT INTO t6 VALUES > (10, 'LENNON', NULL, NULL), > (20, 'MCCARTNEY', NULL, NULL), > (NULL, NULL, 7300, 'READY'), > (NULL, NULL, 7400, 'WILLING'), > (NULL, NULL, 7500, 'ABLE'); > -- Fails with 0 results > SELECT * > FROM t5 > INNER JOIN t6 > ON COALESCE(`t5`.`eno`, `t5`.`dno`) = COALESCE(`t6`.`eno`, `t6`.`dno`) > -- Full cross with where clause works (in nonstrict mode), returning 5 results > SELECT * > FROM t5 > JOIN t6 > WHERE `t5`.`eno` = `t6`.`eno` OR `t5`.`dno` = `t6`.`dno` > -- Strange that coalescing the same field returns 2 results... > SELECT * > FROM t5 > INNER JOIN t6 > ON COALESCE(`t5`.`dno`, `t5`.`dno`) = COALESCE(`t6`.`dno`, `t6`.`dno`) > -- ...and coalescing the other field returns 3 results > SELECT * > FROM t5 > INNER JOIN t6 > ON COALESCE(`t5`.`eno`, `t5`.`eno`) = COALESCE(`t6`.`eno`, `t6`.`eno`) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)