[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-05 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313674#comment-16313674
 ] 

Pengcheng Xiong commented on HIVE-18375:


[~pauljackson123], i see. But all of the 4 queries involve ORDER BY if 
"involve" means we have ORDER BY in the query text. For those queries, they 
should be runnable on current Hive master as it contains HIVE-15160, which 
enables "order by non-selected column". The reason why you can not run that on 
your cluster is because  HIVE-15160 is not in any release (including what you 
are using) yet. I think you may need to wait until the next release that 
include this patch. Thanks.

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> But this succeeds:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `max_sal` DESC;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-05 Thread Paul Jackson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313657#comment-16313657
 ] 

Paul Jackson commented on HIVE-18375:
-

[~pxiong] I'm referring to the for queries in this comment: 
https://issues.apache.org/jira/browse/HIVE-18375?focusedCommentId=16312051=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16312051

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> But this succeeds:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `max_sal` DESC;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312508#comment-16312508
 ] 

Gopal V commented on HIVE-18375:


[~pauljackson123]:the first two queries run on HDP3, there's probably a fix 
that went in for this which isn't in hive-2 branch.

{code}
0: jdbc:hive2://localhost:10007/tpcds_bin_par> EXPLAIN SELECT `first_name` 
`F_4`, `last_name` `F_5`
0: jdbc:hive2://localhost:10007/tpcds_bin_par> FROM `employees`
0: jdbc:hive2://localhost:10007/tpcds_bin_par> ORDER BY `emp_no` DESC;

Plan optimized by CBO.

Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Reducer 2 vectorized, llap
  File Output Operator [FS_9]
Select Operator [SEL_8] (rows=6 width=202)
  Output:["_col0","_col1"]
<-Map 1 [SIMPLE_EDGE] vectorized, llap
  SHUFFLE [RS_7]
Select Operator [SEL_6] (rows=6 width=202)
  Output:["_col0","_col1","_col2"]
  TableScan [TS_0] (rows=6 width=202)

testing@employees,employees,Tbl:COMPLETE,Col:NONE,Output:["first_name","last_name","emp_no"]
{code}

{code}
0: jdbc:hive2://localhost:10007/tpcds_bin_par> 
0: jdbc:hive2://localhost:10007/tpcds_bin_par> EXPLAIN SELECT `first_name` 
`F_4`, `emp_no` `F_3`, `last_name` `F_5`
0: jdbc:hive2://localhost:10007/tpcds_bin_par> FROM `employees`
0: jdbc:hive2://localhost:10007/tpcds_bin_par> ORDER BY `emp_no` DESC;

Plan optimized by CBO.

Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Reducer 2 vectorized, llap
  File Output Operator [FS_8]
Select Operator [SEL_7] (rows=6 width=202)
  Output:["_col0","_col1","_col2"]
<-Map 1 [SIMPLE_EDGE] vectorized, llap
  SHUFFLE [RS_6]
Select Operator [SEL_5] (rows=6 width=202)
  Output:["_col0","_col1","_col2"]
  TableScan [TS_0] (rows=6 width=202)

testing@employees,employees,Tbl:COMPLETE,Col:NONE,Output:["first_name","emp_no","last_name"]
{code}

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER 

[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312371#comment-16312371
 ] 

Pengcheng Xiong commented on HIVE-18375:


[~pauljackson123], i am sorry but i saw that all of your above cases involve 
ORDER BY. Which simpler issue do you mean?

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> But this succeeds:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `max_sal` DESC;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Paul Jackson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312359#comment-16312359
 ] 

Paul Jackson commented on HIVE-18375:
-

There is no doubt these are the same issue. What do you think about the simpler 
issue in my comment that does not involve ORDER BY?

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> But this succeeds:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `max_sal` DESC;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312344#comment-16312344
 ] 

Pengcheng Xiong commented on HIVE-18375:


[~pauljackson123], if possible, could u try Hive master? As this is a new 
feature in HIVE-15160 targeting version 3.0, I doubt it is available in any 
published version yet.

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> But this succeeds:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `max_sal` DESC;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312255#comment-16312255
 ] 

Pengcheng Xiong commented on HIVE-18375:


May be related to HIVE-15160.

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> But this succeeds:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `max_sal` DESC;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Paul Jackson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312051#comment-16312051
 ] 

Paul Jackson commented on HIVE-18375:
-

This seems related. Posting as a comment, but perhaps it should be in its own 
bug report.

Order By cannot see fields if they are projected with an alias.

The first two queries fail with:
{code}SemanticException [Error 10004]: line 7:9 Invalid table alias or column 
reference 'emp_no': (possible column names are: f_4, f_3, f_5){code}
The last two succeed.
{code:SQL}
SELECT `first_name` `F_4`, `last_name` `F_5`
FROM `default`.`employees`
ORDER BY `emp_no` DESC;

SELECT `first_name` `F_4`, `emp_no` `F_3`, `last_name` `F_5`
FROM `default`.`employees`
ORDER BY `emp_no` DESC;

SELECT `first_name` `F_4`, `emp_no`, `last_name` `F_5`
FROM `default`.`employees`
ORDER BY `emp_no` DESC;

SELECT `first_name` `F_4`, `emp_no` `F_3`, `last_name` `F_5`
FROM `default`.`employees`
ORDER BY `F_3` DESC;
{code}

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> But this succeeds:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `max_sal` DESC;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Andrew Sherman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311995#comment-16311995
 ] 

Andrew Sherman commented on HIVE-18375:
---

Great bug report. 
[~minions] do you want to take a look at this?

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)