[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected
[ https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313674#comment-16313674 ] Pengcheng Xiong commented on HIVE-18375: [~pauljackson123], i see. But all of the 4 queries involve ORDER BY if "involve" means we have ORDER BY in the query text. For those queries, they should be runnable on current Hive master as it contains HIVE-15160, which enables "order by non-selected column". The reason why you can not run that on your cluster is because HIVE-15160 is not in any release (including what you are using) yet. I think you may need to wait until the next release that include this patch. Thanks. > Cannot ORDER by subquery fields unless they are selected > > > Key: HIVE-18375 > URL: https://issues.apache.org/jira/browse/HIVE-18375 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.3.2 > Environment: Amazon AWS > Release label:emr-5.11.0 > Hadoop distribution:Amazon 2.7.3 > Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1 > classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict] >Reporter: Paul Jackson >Priority: Minor > > Give these tables: > {code:SQL} > CREATE TABLE employees ( > emp_no INT, > first_name VARCHAR(14), > last_name VARCHAR(16) > ); > insert into employees values > (1, 'Gottlob', 'Frege'), > (2, 'Bertrand', 'Russell'), > (3, 'Ludwig', 'Wittgenstein'); > CREATE TABLE salaries ( > emp_no INT, > salary INT, > from_date DATE, > to_date DATE > ); > insert into salaries values > (1, 10, '1900-01-01', '1900-01-31'), > (1, 18, '1900-09-01', '1900-09-30'), > (2, 15, '1940-03-01', '1950-01-01'), > (3, 20, '1920-01-01', '1950-01-01'); > {code} > This query returns the names of the employees ordered by their peak salary: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, this should still work even if the max_salary is not part of the > projection: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, that fails with this error: > {code} > Error while compiling statement: FAILED: SemanticException [Error 10004]: > line 9:9 Invalid table alias or column reference 't1': (possible column names > are: last_name, first_name) > {code} > FWIW, this also fails: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > But this succeeds: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `max_sal` DESC; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected
[ https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313657#comment-16313657 ] Paul Jackson commented on HIVE-18375: - [~pxiong] I'm referring to the for queries in this comment: https://issues.apache.org/jira/browse/HIVE-18375?focusedCommentId=16312051=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16312051 > Cannot ORDER by subquery fields unless they are selected > > > Key: HIVE-18375 > URL: https://issues.apache.org/jira/browse/HIVE-18375 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.3.2 > Environment: Amazon AWS > Release label:emr-5.11.0 > Hadoop distribution:Amazon 2.7.3 > Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1 > classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict] >Reporter: Paul Jackson >Priority: Minor > > Give these tables: > {code:SQL} > CREATE TABLE employees ( > emp_no INT, > first_name VARCHAR(14), > last_name VARCHAR(16) > ); > insert into employees values > (1, 'Gottlob', 'Frege'), > (2, 'Bertrand', 'Russell'), > (3, 'Ludwig', 'Wittgenstein'); > CREATE TABLE salaries ( > emp_no INT, > salary INT, > from_date DATE, > to_date DATE > ); > insert into salaries values > (1, 10, '1900-01-01', '1900-01-31'), > (1, 18, '1900-09-01', '1900-09-30'), > (2, 15, '1940-03-01', '1950-01-01'), > (3, 20, '1920-01-01', '1950-01-01'); > {code} > This query returns the names of the employees ordered by their peak salary: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, this should still work even if the max_salary is not part of the > projection: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, that fails with this error: > {code} > Error while compiling statement: FAILED: SemanticException [Error 10004]: > line 9:9 Invalid table alias or column reference 't1': (possible column names > are: last_name, first_name) > {code} > FWIW, this also fails: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > But this succeeds: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `max_sal` DESC; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected
[ https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312508#comment-16312508 ] Gopal V commented on HIVE-18375: [~pauljackson123]:the first two queries run on HDP3, there's probably a fix that went in for this which isn't in hive-2 branch. {code} 0: jdbc:hive2://localhost:10007/tpcds_bin_par> EXPLAIN SELECT `first_name` `F_4`, `last_name` `F_5` 0: jdbc:hive2://localhost:10007/tpcds_bin_par> FROM `employees` 0: jdbc:hive2://localhost:10007/tpcds_bin_par> ORDER BY `emp_no` DESC; Plan optimized by CBO. Vertex dependency in root stage Reducer 2 <- Map 1 (SIMPLE_EDGE) Stage-0 Fetch Operator limit:-1 Stage-1 Reducer 2 vectorized, llap File Output Operator [FS_9] Select Operator [SEL_8] (rows=6 width=202) Output:["_col0","_col1"] <-Map 1 [SIMPLE_EDGE] vectorized, llap SHUFFLE [RS_7] Select Operator [SEL_6] (rows=6 width=202) Output:["_col0","_col1","_col2"] TableScan [TS_0] (rows=6 width=202) testing@employees,employees,Tbl:COMPLETE,Col:NONE,Output:["first_name","last_name","emp_no"] {code} {code} 0: jdbc:hive2://localhost:10007/tpcds_bin_par> 0: jdbc:hive2://localhost:10007/tpcds_bin_par> EXPLAIN SELECT `first_name` `F_4`, `emp_no` `F_3`, `last_name` `F_5` 0: jdbc:hive2://localhost:10007/tpcds_bin_par> FROM `employees` 0: jdbc:hive2://localhost:10007/tpcds_bin_par> ORDER BY `emp_no` DESC; Plan optimized by CBO. Vertex dependency in root stage Reducer 2 <- Map 1 (SIMPLE_EDGE) Stage-0 Fetch Operator limit:-1 Stage-1 Reducer 2 vectorized, llap File Output Operator [FS_8] Select Operator [SEL_7] (rows=6 width=202) Output:["_col0","_col1","_col2"] <-Map 1 [SIMPLE_EDGE] vectorized, llap SHUFFLE [RS_6] Select Operator [SEL_5] (rows=6 width=202) Output:["_col0","_col1","_col2"] TableScan [TS_0] (rows=6 width=202) testing@employees,employees,Tbl:COMPLETE,Col:NONE,Output:["first_name","emp_no","last_name"] {code} > Cannot ORDER by subquery fields unless they are selected > > > Key: HIVE-18375 > URL: https://issues.apache.org/jira/browse/HIVE-18375 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.3.2 > Environment: Amazon AWS > Release label:emr-5.11.0 > Hadoop distribution:Amazon 2.7.3 > Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1 > classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict] >Reporter: Paul Jackson >Priority: Minor > > Give these tables: > {code:SQL} > CREATE TABLE employees ( > emp_no INT, > first_name VARCHAR(14), > last_name VARCHAR(16) > ); > insert into employees values > (1, 'Gottlob', 'Frege'), > (2, 'Bertrand', 'Russell'), > (3, 'Ludwig', 'Wittgenstein'); > CREATE TABLE salaries ( > emp_no INT, > salary INT, > from_date DATE, > to_date DATE > ); > insert into salaries values > (1, 10, '1900-01-01', '1900-01-31'), > (1, 18, '1900-09-01', '1900-09-30'), > (2, 15, '1940-03-01', '1950-01-01'), > (3, 20, '1920-01-01', '1950-01-01'); > {code} > This query returns the names of the employees ordered by their peak salary: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, this should still work even if the max_salary is not part of the > projection: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, that fails with this error: > {code} > Error while compiling statement: FAILED: SemanticException [Error 10004]: > line 9:9 Invalid table alias or column reference 't1': (possible column names > are: last_name, first_name) > {code} > FWIW, this also fails: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER
[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected
[ https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312371#comment-16312371 ] Pengcheng Xiong commented on HIVE-18375: [~pauljackson123], i am sorry but i saw that all of your above cases involve ORDER BY. Which simpler issue do you mean? > Cannot ORDER by subquery fields unless they are selected > > > Key: HIVE-18375 > URL: https://issues.apache.org/jira/browse/HIVE-18375 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.3.2 > Environment: Amazon AWS > Release label:emr-5.11.0 > Hadoop distribution:Amazon 2.7.3 > Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1 > classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict] >Reporter: Paul Jackson >Priority: Minor > > Give these tables: > {code:SQL} > CREATE TABLE employees ( > emp_no INT, > first_name VARCHAR(14), > last_name VARCHAR(16) > ); > insert into employees values > (1, 'Gottlob', 'Frege'), > (2, 'Bertrand', 'Russell'), > (3, 'Ludwig', 'Wittgenstein'); > CREATE TABLE salaries ( > emp_no INT, > salary INT, > from_date DATE, > to_date DATE > ); > insert into salaries values > (1, 10, '1900-01-01', '1900-01-31'), > (1, 18, '1900-09-01', '1900-09-30'), > (2, 15, '1940-03-01', '1950-01-01'), > (3, 20, '1920-01-01', '1950-01-01'); > {code} > This query returns the names of the employees ordered by their peak salary: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, this should still work even if the max_salary is not part of the > projection: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, that fails with this error: > {code} > Error while compiling statement: FAILED: SemanticException [Error 10004]: > line 9:9 Invalid table alias or column reference 't1': (possible column names > are: last_name, first_name) > {code} > FWIW, this also fails: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > But this succeeds: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `max_sal` DESC; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected
[ https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312359#comment-16312359 ] Paul Jackson commented on HIVE-18375: - There is no doubt these are the same issue. What do you think about the simpler issue in my comment that does not involve ORDER BY? > Cannot ORDER by subquery fields unless they are selected > > > Key: HIVE-18375 > URL: https://issues.apache.org/jira/browse/HIVE-18375 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.3.2 > Environment: Amazon AWS > Release label:emr-5.11.0 > Hadoop distribution:Amazon 2.7.3 > Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1 > classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict] >Reporter: Paul Jackson >Priority: Minor > > Give these tables: > {code:SQL} > CREATE TABLE employees ( > emp_no INT, > first_name VARCHAR(14), > last_name VARCHAR(16) > ); > insert into employees values > (1, 'Gottlob', 'Frege'), > (2, 'Bertrand', 'Russell'), > (3, 'Ludwig', 'Wittgenstein'); > CREATE TABLE salaries ( > emp_no INT, > salary INT, > from_date DATE, > to_date DATE > ); > insert into salaries values > (1, 10, '1900-01-01', '1900-01-31'), > (1, 18, '1900-09-01', '1900-09-30'), > (2, 15, '1940-03-01', '1950-01-01'), > (3, 20, '1920-01-01', '1950-01-01'); > {code} > This query returns the names of the employees ordered by their peak salary: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, this should still work even if the max_salary is not part of the > projection: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, that fails with this error: > {code} > Error while compiling statement: FAILED: SemanticException [Error 10004]: > line 9:9 Invalid table alias or column reference 't1': (possible column names > are: last_name, first_name) > {code} > FWIW, this also fails: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > But this succeeds: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `max_sal` DESC; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected
[ https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312344#comment-16312344 ] Pengcheng Xiong commented on HIVE-18375: [~pauljackson123], if possible, could u try Hive master? As this is a new feature in HIVE-15160 targeting version 3.0, I doubt it is available in any published version yet. > Cannot ORDER by subquery fields unless they are selected > > > Key: HIVE-18375 > URL: https://issues.apache.org/jira/browse/HIVE-18375 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.3.2 > Environment: Amazon AWS > Release label:emr-5.11.0 > Hadoop distribution:Amazon 2.7.3 > Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1 > classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict] >Reporter: Paul Jackson >Priority: Minor > > Give these tables: > {code:SQL} > CREATE TABLE employees ( > emp_no INT, > first_name VARCHAR(14), > last_name VARCHAR(16) > ); > insert into employees values > (1, 'Gottlob', 'Frege'), > (2, 'Bertrand', 'Russell'), > (3, 'Ludwig', 'Wittgenstein'); > CREATE TABLE salaries ( > emp_no INT, > salary INT, > from_date DATE, > to_date DATE > ); > insert into salaries values > (1, 10, '1900-01-01', '1900-01-31'), > (1, 18, '1900-09-01', '1900-09-30'), > (2, 15, '1940-03-01', '1950-01-01'), > (3, 20, '1920-01-01', '1950-01-01'); > {code} > This query returns the names of the employees ordered by their peak salary: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, this should still work even if the max_salary is not part of the > projection: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, that fails with this error: > {code} > Error while compiling statement: FAILED: SemanticException [Error 10004]: > line 9:9 Invalid table alias or column reference 't1': (possible column names > are: last_name, first_name) > {code} > FWIW, this also fails: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > But this succeeds: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `max_sal` DESC; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected
[ https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312255#comment-16312255 ] Pengcheng Xiong commented on HIVE-18375: May be related to HIVE-15160. > Cannot ORDER by subquery fields unless they are selected > > > Key: HIVE-18375 > URL: https://issues.apache.org/jira/browse/HIVE-18375 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.3.2 > Environment: Amazon AWS > Release label:emr-5.11.0 > Hadoop distribution:Amazon 2.7.3 > Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1 > classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict] >Reporter: Paul Jackson >Priority: Minor > > Give these tables: > {code:SQL} > CREATE TABLE employees ( > emp_no INT, > first_name VARCHAR(14), > last_name VARCHAR(16) > ); > insert into employees values > (1, 'Gottlob', 'Frege'), > (2, 'Bertrand', 'Russell'), > (3, 'Ludwig', 'Wittgenstein'); > CREATE TABLE salaries ( > emp_no INT, > salary INT, > from_date DATE, > to_date DATE > ); > insert into salaries values > (1, 10, '1900-01-01', '1900-01-31'), > (1, 18, '1900-09-01', '1900-09-30'), > (2, 15, '1940-03-01', '1950-01-01'), > (3, 20, '1920-01-01', '1950-01-01'); > {code} > This query returns the names of the employees ordered by their peak salary: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, this should still work even if the max_salary is not part of the > projection: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, that fails with this error: > {code} > Error while compiling statement: FAILED: SemanticException [Error 10004]: > line 9:9 Invalid table alias or column reference 't1': (possible column names > are: last_name, first_name) > {code} > FWIW, this also fails: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > But this succeeds: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `max_sal` DESC; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected
[ https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312051#comment-16312051 ] Paul Jackson commented on HIVE-18375: - This seems related. Posting as a comment, but perhaps it should be in its own bug report. Order By cannot see fields if they are projected with an alias. The first two queries fail with: {code}SemanticException [Error 10004]: line 7:9 Invalid table alias or column reference 'emp_no': (possible column names are: f_4, f_3, f_5){code} The last two succeed. {code:SQL} SELECT `first_name` `F_4`, `last_name` `F_5` FROM `default`.`employees` ORDER BY `emp_no` DESC; SELECT `first_name` `F_4`, `emp_no` `F_3`, `last_name` `F_5` FROM `default`.`employees` ORDER BY `emp_no` DESC; SELECT `first_name` `F_4`, `emp_no`, `last_name` `F_5` FROM `default`.`employees` ORDER BY `emp_no` DESC; SELECT `first_name` `F_4`, `emp_no` `F_3`, `last_name` `F_5` FROM `default`.`employees` ORDER BY `F_3` DESC; {code} > Cannot ORDER by subquery fields unless they are selected > > > Key: HIVE-18375 > URL: https://issues.apache.org/jira/browse/HIVE-18375 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.3.2 > Environment: Amazon AWS > Release label:emr-5.11.0 > Hadoop distribution:Amazon 2.7.3 > Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1 > classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict] >Reporter: Paul Jackson >Priority: Minor > > Give these tables: > {code:SQL} > CREATE TABLE employees ( > emp_no INT, > first_name VARCHAR(14), > last_name VARCHAR(16) > ); > insert into employees values > (1, 'Gottlob', 'Frege'), > (2, 'Bertrand', 'Russell'), > (3, 'Ludwig', 'Wittgenstein'); > CREATE TABLE salaries ( > emp_no INT, > salary INT, > from_date DATE, > to_date DATE > ); > insert into salaries values > (1, 10, '1900-01-01', '1900-01-31'), > (1, 18, '1900-09-01', '1900-09-30'), > (2, 15, '1940-03-01', '1950-01-01'), > (3, 20, '1920-01-01', '1950-01-01'); > {code} > This query returns the names of the employees ordered by their peak salary: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, this should still work even if the max_salary is not part of the > projection: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, that fails with this error: > {code} > Error while compiling statement: FAILED: SemanticException [Error 10004]: > line 9:9 Invalid table alias or column reference 't1': (possible column names > are: last_name, first_name) > {code} > FWIW, this also fails: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > But this succeeds: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > AS `max_sal` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `max_sal` DESC; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected
[ https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311995#comment-16311995 ] Andrew Sherman commented on HIVE-18375: --- Great bug report. [~minions] do you want to take a look at this? > Cannot ORDER by subquery fields unless they are selected > > > Key: HIVE-18375 > URL: https://issues.apache.org/jira/browse/HIVE-18375 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.3.2 > Environment: Amazon AWS > Release label:emr-5.11.0 > Hadoop distribution:Amazon 2.7.3 > Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1 > classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict] >Reporter: Paul Jackson >Priority: Minor > > Give these tables: > {code:SQL} > CREATE TABLE employees ( > emp_no INT, > first_name VARCHAR(14), > last_name VARCHAR(16) > ); > insert into employees values > (1, 'Gottlob', 'Frege'), > (2, 'Bertrand', 'Russell'), > (3, 'Ludwig', 'Wittgenstein'); > CREATE TABLE salaries ( > emp_no INT, > salary INT, > from_date DATE, > to_date DATE > ); > insert into salaries values > (1, 10, '1900-01-01', '1900-01-31'), > (1, 18, '1900-09-01', '1900-09-30'), > (2, 15, '1940-03-01', '1950-01-01'), > (3, 20, '1920-01-01', '1950-01-01'); > {code} > This query returns the names of the employees ordered by their peak salary: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, this should still work even if the max_salary is not part of the > projection: > {code:SQL} > SELECT `employees`.`last_name`, `employees`.`first_name` > FROM `default`.`employees` > INNER JOIN > (SELECT `emp_no`, MAX(`salary`) `max_salary` > FROM `default`.`salaries` > WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL > GROUP BY `emp_no`) AS `t1` > ON `employees`.`emp_no` = `t1`.`emp_no` > ORDER BY `t1`.`max_salary` DESC; > {code} > However, that fails with this error: > {code} > Error while compiling statement: FAILED: SemanticException [Error 10004]: > line 9:9 Invalid table alias or column reference 't1': (possible column names > are: last_name, first_name) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)