[jira] [Commented] (HIVE-16042) special characters in the comment of sql file cause ParseException

2018-05-08 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468144#comment-16468144
 ] 

Pengcheng Xiong commented on HIVE-16042:


hi [~jameszhouyi], as i said in the previous thread,  if you want to use 
comment, you should use "--" at the beginning of a line rather than in the 
middle of a line.

> special characters in the comment of sql file cause ParseException
> --
>
> Key: HIVE-16042
> URL: https://issues.apache.org/jira/browse/HIVE-16042
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
> Environment: Hive2.2 (commit: 2768361)
> TPCx-BB v1.2
>Reporter: KaiXu
>Priority: Major
> Attachments: q04.sql, q17.sql, q18.sql, q23.sql
>
>
> current Hive upstream(commit: 2768361) failed to parse some 
> queries(q04,q17,q18,q23) in TPCx-BB v1.2, while it's ok with Hive(commit: 
> ac68aed).
> Q04: FAILED: ParseException line 24:0 missing EOF at ';' near 
> 'abandonedShoppingCartsPageCountsPerSession'
> Q17:
> NoViableAltException(350@[])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.limitClause(HiveParser.java:38898)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:37002)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36404)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35722)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:35610)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2279)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1328)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)
> at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:75)
> at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:68)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:468)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> FAILED: ParseException line 39:0 cannot recognize input near 'LIMIT' '100' 
> ';' in limit clause
> Q18:
> NoViableAltException(350@[()* loopback of 424:20: ( ( LSQUARE ^ expression 
> RSQUARE !) | ( DOT ^ identifier ) )*])
> at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
> at org.antlr.runtime.DFA.predict(DFA.java:116)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6665)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnaryPrefixExpression(HiveParser_IdentifiersParser.java:6992)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnarySuffixExpression(HiveParser_IdentifiersParser.java:7048)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceBitwiseXorExpression(HiveParser_IdentifiersParser.java:7210)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceStarExpression(HiveParser_IdentifiersParser.java:7353)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedencePlusExpression(HiveParser_IdentifiersParser.jav

[jira] [Commented] (HIVE-19059) Support DEFAULT keyword with INSERT and UPDATE

2018-03-26 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414858#comment-16414858
 ] 

Pengcheng Xiong commented on HIVE-19059:


"
|-> \{$expr.tree.getText() == "default"}?|

"

Maybe you should use .equals for string compare?

> Support DEFAULT keyword with INSERT and UPDATE
> --
>
> Key: HIVE-19059
> URL: https://issues.apache.org/jira/browse/HIVE-19059
> Project: Hive
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-19059.1.patch
>
>
> Support DEFAULT keyword in INSERT e.g.
> {code:sql}
> INSERT INTO TABLE t values (DEFAULT, DEFAULT)
> {code}
> or with UPDATE
> {code:sql}
> UPDATE TABLE t SET col1=DEFAULT WHERE col2 > 4
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312255#comment-16312255
 ] 

Pengcheng Xiong commented on HIVE-18375:


May be related to HIVE-15160.

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> But this succeeds:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `max_sal` DESC;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312344#comment-16312344
 ] 

Pengcheng Xiong commented on HIVE-18375:


[~pauljackson123], if possible, could u try Hive master? As this is a new 
feature in HIVE-15160 targeting version 3.0, I doubt it is available in any 
published version yet.

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> But this succeeds:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `max_sal` DESC;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-04 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312371#comment-16312371
 ] 

Pengcheng Xiong commented on HIVE-18375:


[~pauljackson123], i am sorry but i saw that all of your above cases involve 
ORDER BY. Which simpler issue do you mean?

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> But this succeeds:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `max_sal` DESC;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18359) Extend grouping set limits from int to long

2018-01-04 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312461#comment-16312461
 ] 

Pengcheng Xiong commented on HIVE-18359:


LGTM +1 pending tests.  :)

> Extend grouping set limits from int to long
> ---
>
> Key: HIVE-18359
> URL: https://issues.apache.org/jira/browse/HIVE-18359
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18359.1.patch, HIVE-18359.2.patch
>
>
> Grouping sets is broken for >32 columns because of usage of Int for bitmap 
> (also GROUPING__ID virtual column). This assumption breaks grouping 
> sets/rollups/cube when number of participating aggregation columns is >32. 
> The easier fix would be extend it to Long for now. The correct fix would be 
> to use BitSets everywhere but that would require GROUPING__ID column type to 
> binary which will make predicates on GROUPING__ID difficult to deal with. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18375) Cannot ORDER by subquery fields unless they are selected

2018-01-05 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313674#comment-16313674
 ] 

Pengcheng Xiong commented on HIVE-18375:


[~pauljackson123], i see. But all of the 4 queries involve ORDER BY if 
"involve" means we have ORDER BY in the query text. For those queries, they 
should be runnable on current Hive master as it contains HIVE-15160, which 
enables "order by non-selected column". The reason why you can not run that on 
your cluster is because  HIVE-15160 is not in any release (including what you 
are using) yet. I think you may need to wait until the next release that 
include this patch. Thanks.

> Cannot ORDER by subquery fields unless they are selected
> 
>
> Key: HIVE-18375
> URL: https://issues.apache.org/jira/browse/HIVE-18375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.2
> Environment: Amazon AWS
> Release label:emr-5.11.0
> Hadoop distribution:Amazon 2.7.3
> Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.0.1
> classification=hive-site,properties=[hive.strict.checks.cartesian.product=false,hive.mapred.mode=nonstrict]
>Reporter: Paul Jackson
>Priority: Minor
>
> Give these tables:
> {code:SQL}
> CREATE TABLE employees (
> emp_no  INT,
> first_name  VARCHAR(14),
> last_name   VARCHAR(16)
> );
> insert into employees values
> (1, 'Gottlob', 'Frege'),
> (2, 'Bertrand', 'Russell'),
> (3, 'Ludwig', 'Wittgenstein');
> CREATE TABLE salaries (
> emp_no  INT,
> salary  INT,
> from_date   DATE,
> to_date DATE
> );
> insert into salaries values
> (1, 10, '1900-01-01', '1900-01-31'),
> (1, 18, '1900-09-01', '1900-09-30'),
> (2, 15, '1940-03-01', '1950-01-01'),
> (3, 20, '1920-01-01', '1950-01-01');
> {code}
> This query returns the names of the employees ordered by their peak salary:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, this should still work even if the max_salary is not part of the 
> projection:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> However, that fails with this error:
> {code}
> Error while compiling statement: FAILED: SemanticException [Error 10004]: 
> line 9:9 Invalid table alias or column reference 't1': (possible column names 
> are: last_name, first_name)
> {code}
> FWIW, this also fails:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `t1`.`max_salary` DESC;
> {code}
> But this succeeds:
> {code:SQL}
> SELECT `employees`.`last_name`, `employees`.`first_name`, `t1`.`max_salary` 
> AS `max_sal`
> FROM `default`.`employees`
> INNER JOIN
>  (SELECT `emp_no`, MAX(`salary`) `max_salary`
>   FROM `default`.`salaries`
>   WHERE `emp_no` IS NOT NULL AND `salary` IS NOT NULL
>   GROUP BY `emp_no`) AS `t1`
> ON `employees`.`emp_no` = `t1`.`emp_no`
> ORDER BY `max_sal` DESC;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-20867) Rewrite INTERSECT into LEFT SEMI JOIN instead of UNION + Group by

2018-11-05 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675550#comment-16675550
 ] 

Pengcheng Xiong commented on HIVE-20867:


I have some questions about this jira. Could you share your design document on 
this? I assumed that we compared several candidates when we made the decision, 
and lefts semi join was one of them. We chose union-based one because a) a 
similar approach can be applied to except(all) as well, thus we have better 
code reuse. b) when we have more then 2 branches as the inputs of intersect, we 
assume that in the future those branches can be executed in parallel. Comparing 
with left-semi join one, we need to do the join one by one.

> Rewrite INTERSECT into LEFT SEMI JOIN instead of UNION + Group by
> -
>
> Key: HIVE-20867
> URL: https://issues.apache.org/jira/browse/HIVE-20867
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Affects Versions: 4.0.0
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20867) Rewrite INTERSECT into LEFT SEMI JOIN instead of UNION + Group by

2018-11-05 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675765#comment-16675765
 ] 

Pengcheng Xiong commented on HIVE-20867:


Thanks Gopal for the explanation. I can see the potential benefit of using left 
semi join over the existing implementation in some scenarios. If it is decided 
case-by-case, I think it may be better to add some cost-based metrics or a hive 
configuration on which the decision can be made. That is only my suggestion. 
You guys can decided what to do after all.  :)

> Rewrite INTERSECT into LEFT SEMI JOIN instead of UNION + Group by
> -
>
> Key: HIVE-20867
> URL: https://issues.apache.org/jira/browse/HIVE-20867
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Affects Versions: 4.0.0
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20000) woooohoo20000ooooooo

2018-06-26 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524223#comment-16524223
 ] 

Pengcheng Xiong commented on HIVE-2:


congrats! :)

> whoo2ooo
> 
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: All Versions
>Reporter: Prasanth Jayachandran
>Assignee: Hive QA
>Priority: Blocker
> Fix For: All Versions
>
>
> {code:java}
>    :::  :::  :::  ::: 
> :+::+::+:   :+::+:   :+::+:   :+::+:   :+:
>   +:+ +:+  :+:++:+  :+:++:+  :+:++:+  :+:+
> +#+   +#+ + +:++#+ + +:++#+ + +:++#+ + +:+
>   +#+ +#+#  +#++#+#  +#++#+#  +#++#+#  +#+
>  #+#  #+#   #+##+#   #+##+#   #+##+#   #+#
> ## ###  ###  ###  ### 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)

2015-05-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8769:
--
Attachment: HIVE-8769.01.patch

> Physical optimizer : Incorrect CE results in a shuffle join instead of a Map 
> join (PK/FK pattern not detected)
> --
>
> Key: HIVE-8769
> URL: https://issues.apache.org/jira/browse/HIVE-8769
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Attachments: HIVE-8769.01.patch
>
>
> TPC-DS Q82 is running slower than hive 13 because the join type is not 
> correct.
> The estimate for item x inventory x date_dim is 227 Million rows while the 
> actual is  3K rows.
> Hive 13 finishes in  753  seconds.
> Hive 14 finishes in  1,267  seconds.
> Hive 14 + force map join finished in 431 seconds.
> Query
> {code}
> select  i_item_id
>,i_item_desc
>,i_current_price
>  from item, inventory, date_dim, store_sales
>  where i_current_price between 30 and 30+30
>  and inv_item_sk = i_item_sk
>  and d_date_sk=inv_date_sk
>  and d_date between '2002-05-30' and '2002-07-30'
>  and i_manufact_id in (437,129,727,663)
>  and inv_quantity_on_hand between 100 and 500
>  and ss_item_sk = i_item_sk
>  group by i_item_id,i_item_desc,i_current_price
>  order by i_item_id
>  limit 100
> {code}
> Plan 
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 7 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
> Reducer 4 <- Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE)
> Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
> Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
>   DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: item
>   filterExpr: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
> Statistics: Num rows: 115500 Data size: 34185680 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: i_item_sk (type: int), i_item_id (type: 
> string), i_item_desc (type: string), i_current_price (type: float)
>   outputColumnNames: _col0, _col1, _col2, _col3
>   Statistics: Num rows: 115500 Data size: 33724832 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 115500 Data size: 33724832 
> Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string), _col2 (type: 
> string), _col3 (type: float)
> Execution mode: vectorized
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 81741831 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 36524 Data size: 3579352 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: d_date_sk (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: int)
> outputColumnNames: _co

[jira] [Assigned] (HIVE-10698) query on view results fails with table not found error if view is created with subquery alias (CTE).

2015-05-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-10698:
--

Assignee: Pengcheng Xiong

> query on view results fails with table not found error if view is created 
> with subquery alias (CTE).
> 
>
> Key: HIVE-10698
> URL: https://issues.apache.org/jira/browse/HIVE-10698
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> To reproduce it, 
> {code}
> use bugtest;
> create table basetb(id int, name string);
> create view testv1 as
> with subtb as (select id, name from bugtest.basetb)
> select id from subtb;
> use castest;
> explain select * from bugtest.testv1;
> hive> explain select * from bugtest.testv1;
> FAILED: SemanticException Line 2:15 Table not found 'subtb' in definition of 
> VIEW testv1 [
> with subtb as (select id, name from bugtest.basetb)
> select id from `bugtest`.`subtb`
> ] used as testv1 at Line 1:22
> Note that there is a database prefix `bugtest`.`subtb`
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)

2015-05-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8769:
--
Attachment: HIVE-8769.02.patch

> Physical optimizer : Incorrect CE results in a shuffle join instead of a Map 
> join (PK/FK pattern not detected)
> --
>
> Key: HIVE-8769
> URL: https://issues.apache.org/jira/browse/HIVE-8769
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Attachments: HIVE-8769.01.patch, HIVE-8769.02.patch
>
>
> TPC-DS Q82 is running slower than hive 13 because the join type is not 
> correct.
> The estimate for item x inventory x date_dim is 227 Million rows while the 
> actual is  3K rows.
> Hive 13 finishes in  753  seconds.
> Hive 14 finishes in  1,267  seconds.
> Hive 14 + force map join finished in 431 seconds.
> Query
> {code}
> select  i_item_id
>,i_item_desc
>,i_current_price
>  from item, inventory, date_dim, store_sales
>  where i_current_price between 30 and 30+30
>  and inv_item_sk = i_item_sk
>  and d_date_sk=inv_date_sk
>  and d_date between '2002-05-30' and '2002-07-30'
>  and i_manufact_id in (437,129,727,663)
>  and inv_quantity_on_hand between 100 and 500
>  and ss_item_sk = i_item_sk
>  group by i_item_id,i_item_desc,i_current_price
>  order by i_item_id
>  limit 100
> {code}
> Plan 
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 7 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
> Reducer 4 <- Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE)
> Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
> Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
>   DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: item
>   filterExpr: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
> Statistics: Num rows: 115500 Data size: 34185680 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: i_item_sk (type: int), i_item_id (type: 
> string), i_item_desc (type: string), i_current_price (type: float)
>   outputColumnNames: _col0, _col1, _col2, _col3
>   Statistics: Num rows: 115500 Data size: 33724832 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 115500 Data size: 33724832 
> Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string), _col2 (type: 
> string), _col3 (type: float)
> Execution mode: vectorized
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 81741831 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 36524 Data size: 3579352 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: d_date_sk (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: int)
> ou

[jira] [Commented] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)

2015-05-14 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543989#comment-14543989
 ] 

Pengcheng Xiong commented on HIVE-8769:
---

The TestMultiSessionsHS2WithLocalClusterSpark failing is mentioned in 
HIVE-9990. TestMinimrCliDriver.testCliDriver_schemeAuthority passed on my 
laptop. Both of them are unrelated to my patch. [~ashutoshc] and 
[~jpullokkaran], could you please take a look? Thanks.

> Physical optimizer : Incorrect CE results in a shuffle join instead of a Map 
> join (PK/FK pattern not detected)
> --
>
> Key: HIVE-8769
> URL: https://issues.apache.org/jira/browse/HIVE-8769
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Attachments: HIVE-8769.01.patch, HIVE-8769.02.patch
>
>
> TPC-DS Q82 is running slower than hive 13 because the join type is not 
> correct.
> The estimate for item x inventory x date_dim is 227 Million rows while the 
> actual is  3K rows.
> Hive 13 finishes in  753  seconds.
> Hive 14 finishes in  1,267  seconds.
> Hive 14 + force map join finished in 431 seconds.
> Query
> {code}
> select  i_item_id
>,i_item_desc
>,i_current_price
>  from item, inventory, date_dim, store_sales
>  where i_current_price between 30 and 30+30
>  and inv_item_sk = i_item_sk
>  and d_date_sk=inv_date_sk
>  and d_date between '2002-05-30' and '2002-07-30'
>  and i_manufact_id in (437,129,727,663)
>  and inv_quantity_on_hand between 100 and 500
>  and ss_item_sk = i_item_sk
>  group by i_item_id,i_item_desc,i_current_price
>  order by i_item_id
>  limit 100
> {code}
> Plan 
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 7 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
> Reducer 4 <- Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE)
> Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
> Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
>   DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: item
>   filterExpr: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
> Statistics: Num rows: 115500 Data size: 34185680 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: i_item_sk (type: int), i_item_id (type: 
> string), i_item_desc (type: string), i_current_price (type: float)
>   outputColumnNames: _col0, _col1, _col2, _col3
>   Statistics: Num rows: 115500 Data size: 33724832 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 115500 Data size: 33724832 
> Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string), _col2 (type: 
> string), _col3 (type: float)
> Execution mode: vectorized
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 81741831 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 36524 Data size: 3579352 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: d_date_sk (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map

[jira] [Commented] (HIVE-10698) query on view results fails with table not found error if view is created with subquery alias (CTE).

2015-05-14 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544177#comment-14544177
 ] 

Pengcheng Xiong commented on HIVE-10698:


According to [~jpullokkaran]'s comments, add the following things to do

1. When a fully qualified identifier (db.tablename) is specified in the from 
clause we seems to resolve it against CTE aliases. This is wrong if table 
doesn't exist in catalog then we should fail.
2. If fully qualified name is not used in the from clause then 
a) we should first resolve the identifier against CTE aliases 
b) if identifier is not found in the CTE list then try to resolve against 
catalog.
3) Views: in unparsetranslator we treat CTE name as catalog table; this is a 
bug.

> query on view results fails with table not found error if view is created 
> with subquery alias (CTE).
> 
>
> Key: HIVE-10698
> URL: https://issues.apache.org/jira/browse/HIVE-10698
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> To reproduce it, 
> {code}
> use bugtest;
> create table basetb(id int, name string);
> create view testv1 as
> with subtb as (select id, name from bugtest.basetb)
> select id from subtb;
> use castest;
> explain select * from bugtest.testv1;
> hive> explain select * from bugtest.testv1;
> FAILED: SemanticException Line 2:15 Table not found 'subtb' in definition of 
> VIEW testv1 [
> with subtb as (select id, name from bugtest.basetb)
> select id from `bugtest`.`subtb`
> ] used as testv1 at Line 1:22
> Note that there is a database prefix `bugtest`.`subtb`
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10698) query on view results fails with table not found error if view is created with subquery alias (CTE).

2015-05-14 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544679#comment-14544679
 ] 

Pengcheng Xiong commented on HIVE-10698:


We will address this in this patch, i.e., when there is an alias in CTE and 
catalog, we prefer CTE.

> query on view results fails with table not found error if view is created 
> with subquery alias (CTE).
> 
>
> Key: HIVE-10698
> URL: https://issues.apache.org/jira/browse/HIVE-10698
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> To reproduce it, 
> {code}
> use bugtest;
> create table basetb(id int, name string);
> create view testv1 as
> with subtb as (select id, name from bugtest.basetb)
> select id from subtb;
> use castest;
> explain select * from bugtest.testv1;
> hive> explain select * from bugtest.testv1;
> FAILED: SemanticException Line 2:15 Table not found 'subtb' in definition of 
> VIEW testv1 [
> with subtb as (select id, name from bugtest.basetb)
> select id from `bugtest`.`subtb`
> ] used as testv1 at Line 1:22
> Note that there is a database prefix `bugtest`.`subtb`
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9897) Issue a warning when using an existing table/view name as an alias in a with statement.

2015-05-14 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544680#comment-14544680
 ] 

Pengcheng Xiong commented on HIVE-9897:
---

I agree with [~hagleitn] and [~jpullokkaran] that this is a bug. We will 
address this together in HIVE-10698.

> Issue a warning when using an existing table/view name as an alias in a with 
> statement. 
> 
>
> Key: HIVE-9897
> URL: https://issues.apache.org/jira/browse/HIVE-9897
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 0.13.1
> Environment: cdh5.3.0
>Reporter: Mario Konschake
>Priority: Minor
>
> Consider the following query:
> {code:sql}
> WITH
> table_a AS (
> SELECT
> 'johndoe' AS name
> FROM
> my_table
> )
> SELECT
> DISTINCT name
> FROM
> table_a;
> {code}
> Observation: 
> If a table or a view with name `table_a` exists it is used instead of the one 
> defined in the WITH statement.
> Expectation:
> As the expectation is ambiguous (using the alias in the WITH statement vs. 
> using the existing table) issuing a warning when using a existing name in a 
> WITH statement is recommended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10698) query on view results fails with table not found error if view is created with subquery alias (CTE).

2015-05-14 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10698:
---
Attachment: HIVE-10698.01.patch

first patch, may need to update q files.

> query on view results fails with table not found error if view is created 
> with subquery alias (CTE).
> 
>
> Key: HIVE-10698
> URL: https://issues.apache.org/jira/browse/HIVE-10698
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10698.01.patch
>
>
> To reproduce it, 
> {code}
> use bugtest;
> create table basetb(id int, name string);
> create view testv1 as
> with subtb as (select id, name from bugtest.basetb)
> select id from subtb;
> use castest;
> explain select * from bugtest.testv1;
> hive> explain select * from bugtest.testv1;
> FAILED: SemanticException Line 2:15 Table not found 'subtb' in definition of 
> VIEW testv1 [
> with subtb as (select id, name from bugtest.basetb)
> select id from `bugtest`.`subtb`
> ] used as testv1 at Line 1:22
> Note that there is a database prefix `bugtest`.`subtb`
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10698) query on view results fails with table not found error if view is created with subquery alias (CTE).

2015-05-15 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545781#comment-14545781
 ] 

Pengcheng Xiong commented on HIVE-10698:


[~jpullokkaran], the failed tests are unrelated. It seems that the patch passes 
all the CTE tests in cte_1.q, cte_2.q, cbo_views.q and also two negative tests. 
Could you recommend some more tests? Thanks.

> query on view results fails with table not found error if view is created 
> with subquery alias (CTE).
> 
>
> Key: HIVE-10698
> URL: https://issues.apache.org/jira/browse/HIVE-10698
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10698.01.patch
>
>
> To reproduce it, 
> {code}
> use bugtest;
> create table basetb(id int, name string);
> create view testv1 as
> with subtb as (select id, name from bugtest.basetb)
> select id from subtb;
> use castest;
> explain select * from bugtest.testv1;
> hive> explain select * from bugtest.testv1;
> FAILED: SemanticException Line 2:15 Table not found 'subtb' in definition of 
> VIEW testv1 [
> with subtb as (select id, name from bugtest.basetb)
> select id from `bugtest`.`subtb`
> ] used as testv1 at Line 1:22
> Note that there is a database prefix `bugtest`.`subtb`
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10731) NullPointerException in HiveParser.g

2015-05-15 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546282#comment-14546282
 ] 

Pengcheng Xiong commented on HIVE-10731:


Could you post the environment and the query that you run? I assume that this 
follows the standard practice when we pass conf. Another example is for 
quotedId in HiveLexer.g. cc'ing [~jpullokkaran]
{code}
public void setHiveConf(Configuration hiveConf) {
this.hiveConf = hiveConf;
  }
  
  protected boolean allowQuotedId() {
String supportedQIds = HiveConf.getVar(hiveConf, 
HiveConf.ConfVars.HIVE_QUOTEDID_SUPPORT);
return !"none".equals(supportedQIds);
  }
{code}

> NullPointerException in HiveParser.g
> 
>
> Key: HIVE-10731
> URL: https://issues.apache.org/jira/browse/HIVE-10731
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.0
>Reporter: Xiu
>Priority: Minor
>
> In HiveParser.g:
> protected boolean useSQL11ReservedKeywordsForIdentifier() {
> return !HiveConf.getBoolVar(hiveConf, 
> HiveConf.ConfVars.HIVE_SUPPORT_SQL11_RESERVED_KEYWORDS);
> }
> NullPointerException is thrown when hiveConf is not set.
> Stack trace:
> java.lang.NullPointerException
> at org.apache.hadoop.hive.conf.HiveConf.getBoolVar(HiveConf.java:2583)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.useSQL11ReservedKeywordsForIdentifier(HiveParser.java:1000)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.useSQL11ReservedKeywordsForIdentifier(HiveParser_IdentifiersParser.java:726)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:10922)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:45808)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser.java:38008)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeList(HiveParser.java:36167)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:5214)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2640)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1650)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1109)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:161)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)

2015-05-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8769:
--
Attachment: HIVE-8769.03.patch

rebase on master

> Physical optimizer : Incorrect CE results in a shuffle join instead of a Map 
> join (PK/FK pattern not detected)
> --
>
> Key: HIVE-8769
> URL: https://issues.apache.org/jira/browse/HIVE-8769
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Attachments: HIVE-8769.01.patch, HIVE-8769.02.patch, 
> HIVE-8769.03.patch
>
>
> TPC-DS Q82 is running slower than hive 13 because the join type is not 
> correct.
> The estimate for item x inventory x date_dim is 227 Million rows while the 
> actual is  3K rows.
> Hive 13 finishes in  753  seconds.
> Hive 14 finishes in  1,267  seconds.
> Hive 14 + force map join finished in 431 seconds.
> Query
> {code}
> select  i_item_id
>,i_item_desc
>,i_current_price
>  from item, inventory, date_dim, store_sales
>  where i_current_price between 30 and 30+30
>  and inv_item_sk = i_item_sk
>  and d_date_sk=inv_date_sk
>  and d_date between '2002-05-30' and '2002-07-30'
>  and i_manufact_id in (437,129,727,663)
>  and inv_quantity_on_hand between 100 and 500
>  and ss_item_sk = i_item_sk
>  group by i_item_id,i_item_desc,i_current_price
>  order by i_item_id
>  limit 100
> {code}
> Plan 
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 7 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
> Reducer 4 <- Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE)
> Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
> Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
>   DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: item
>   filterExpr: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
> Statistics: Num rows: 115500 Data size: 34185680 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: i_item_sk (type: int), i_item_id (type: 
> string), i_item_desc (type: string), i_current_price (type: float)
>   outputColumnNames: _col0, _col1, _col2, _col3
>   Statistics: Num rows: 115500 Data size: 33724832 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 115500 Data size: 33724832 
> Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string), _col2 (type: 
> string), _col3 (type: float)
> Execution mode: vectorized
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 81741831 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 36524 Data size: 3579352 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: d_date_sk (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Select Operator
> expressions: _col0

[jira] [Assigned] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-16 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-6867:
-

Assignee: Pengcheng Xiong  (was: Laljo John Pullokkaran)

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Pengcheng Xiong
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-16 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-6867:
--
Attachment: HIVE-6867.01.patch

temporary patch, need to update q files.

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Pengcheng Xiong
> Attachments: HIVE-6867.01.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10731) NullPointerException in HiveParser.g

2015-05-18 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-10731:
--

Assignee: Pengcheng Xiong

> NullPointerException in HiveParser.g
> 
>
> Key: HIVE-10731
> URL: https://issues.apache.org/jira/browse/HIVE-10731
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.0
>Reporter: Xiu
>Assignee: Pengcheng Xiong
>Priority: Minor
>
> In HiveParser.g:
> {code:Java}
> protected boolean useSQL11ReservedKeywordsForIdentifier() {
> return !HiveConf.getBoolVar(hiveConf, 
> HiveConf.ConfVars.HIVE_SUPPORT_SQL11_RESERVED_KEYWORDS);
> }
> {code}
> NullPointerException is thrown when hiveConf is not set.
> Stack trace:
> {code:Java}
> java.lang.NullPointerException
> at org.apache.hadoop.hive.conf.HiveConf.getBoolVar(HiveConf.java:2583)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.useSQL11ReservedKeywordsForIdentifier(HiveParser.java:1000)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.useSQL11ReservedKeywordsForIdentifier(HiveParser_IdentifiersParser.java:726)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:10922)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:45808)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser.java:38008)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeList(HiveParser.java:36167)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:5214)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2640)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1650)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1109)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:161)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10731) NullPointerException in HiveParser.g

2015-05-18 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549302#comment-14549302
 ] 

Pengcheng Xiong commented on HIVE-10731:


I see that you are leveraging HiveParser only. I attached a patch for you. If 
you leverage HiveLexer in the future, you will have a similar issue.

> NullPointerException in HiveParser.g
> 
>
> Key: HIVE-10731
> URL: https://issues.apache.org/jira/browse/HIVE-10731
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.0
>Reporter: Xiu
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-10731.01.patch
>
>
> In HiveParser.g:
> {code:Java}
> protected boolean useSQL11ReservedKeywordsForIdentifier() {
> return !HiveConf.getBoolVar(hiveConf, 
> HiveConf.ConfVars.HIVE_SUPPORT_SQL11_RESERVED_KEYWORDS);
> }
> {code}
> NullPointerException is thrown when hiveConf is not set.
> Stack trace:
> {code:Java}
> java.lang.NullPointerException
> at org.apache.hadoop.hive.conf.HiveConf.getBoolVar(HiveConf.java:2583)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.useSQL11ReservedKeywordsForIdentifier(HiveParser.java:1000)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.useSQL11ReservedKeywordsForIdentifier(HiveParser_IdentifiersParser.java:726)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:10922)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:45808)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser.java:38008)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeList(HiveParser.java:36167)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:5214)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2640)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1650)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1109)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:161)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10731) NullPointerException in HiveParser.g

2015-05-18 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10731:
---
Attachment: HIVE-10731.01.patch

> NullPointerException in HiveParser.g
> 
>
> Key: HIVE-10731
> URL: https://issues.apache.org/jira/browse/HIVE-10731
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.0
>Reporter: Xiu
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-10731.01.patch
>
>
> In HiveParser.g:
> {code:Java}
> protected boolean useSQL11ReservedKeywordsForIdentifier() {
> return !HiveConf.getBoolVar(hiveConf, 
> HiveConf.ConfVars.HIVE_SUPPORT_SQL11_RESERVED_KEYWORDS);
> }
> {code}
> NullPointerException is thrown when hiveConf is not set.
> Stack trace:
> {code:Java}
> java.lang.NullPointerException
> at org.apache.hadoop.hive.conf.HiveConf.getBoolVar(HiveConf.java:2583)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.useSQL11ReservedKeywordsForIdentifier(HiveParser.java:1000)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.useSQL11ReservedKeywordsForIdentifier(HiveParser_IdentifiersParser.java:726)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:10922)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:45808)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser.java:38008)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeList(HiveParser.java:36167)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:5214)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2640)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1650)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1109)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:161)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10731) NullPointerException in HiveParser.g

2015-05-18 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549718#comment-14549718
 ] 

Pengcheng Xiong commented on HIVE-10731:


[~jpullokkaran], could you please take a look? The test failure is unrelated. 
Thanks.

> NullPointerException in HiveParser.g
> 
>
> Key: HIVE-10731
> URL: https://issues.apache.org/jira/browse/HIVE-10731
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.0
>Reporter: Xiu
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-10731.01.patch
>
>
> In HiveParser.g:
> {code:Java}
> protected boolean useSQL11ReservedKeywordsForIdentifier() {
> return !HiveConf.getBoolVar(hiveConf, 
> HiveConf.ConfVars.HIVE_SUPPORT_SQL11_RESERVED_KEYWORDS);
> }
> {code}
> NullPointerException is thrown when hiveConf is not set.
> Stack trace:
> {code:Java}
> java.lang.NullPointerException
> at org.apache.hadoop.hive.conf.HiveConf.getBoolVar(HiveConf.java:2583)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.useSQL11ReservedKeywordsForIdentifier(HiveParser.java:1000)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.useSQL11ReservedKeywordsForIdentifier(HiveParser_IdentifiersParser.java:726)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:10922)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:45808)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser.java:38008)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeList(HiveParser.java:36167)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:5214)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2640)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1650)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1109)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:161)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-6867:
--
Attachment: HIVE-6867.02.patch

with q files updated

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Pengcheng Xiong
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-6867:
--
Attachment: HIVE-6867.02.patch

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Pengcheng Xiong
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-6867:
--
Attachment: (was: HIVE-6867.02.patch)

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Pengcheng Xiong
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats

2015-05-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-10677:
--

Assignee: Pengcheng Xiong

> hive.exec.parallel=true has problem when it is used for analyze table column 
> stats
> --
>
> Key: HIVE-10677
> URL: https://issues.apache.org/jira/browse/HIVE-10677
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> To reproduce it, in q tests.
> {code}
> hive> set hive.exec.parallel;
> hive.exec.parallel=true
> hive> analyze table src compute statistics for columns;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.ColumnStatsTask
> java.lang.RuntimeException: Error caching map.xml: java.io.IOException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
> Caused by: java.io.IOException: java.lang.InterruptedException
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:541)
>   at org.apache.hadoop.util.Shell.run(Shell.java:455)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715)
>   ... 7 more
> hive> Job Submission failed with exception 'java.lang.RuntimeException(Error 
> caching map.xml: java.io.IOException: java.lang.InterruptedException)'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats

2015-05-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10677:
---
Attachment: HIVE-10677.01.patch

> hive.exec.parallel=true has problem when it is used for analyze table column 
> stats
> --
>
> Key: HIVE-10677
> URL: https://issues.apache.org/jira/browse/HIVE-10677
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10677.01.patch
>
>
> To reproduce it, in q tests.
> {code}
> hive> set hive.exec.parallel;
> hive.exec.parallel=true
> hive> analyze table src compute statistics for columns;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.ColumnStatsTask
> java.lang.RuntimeException: Error caching map.xml: java.io.IOException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
> Caused by: java.io.IOException: java.lang.InterruptedException
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:541)
>   at org.apache.hadoop.util.Shell.run(Shell.java:455)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715)
>   ... 7 more
> hive> Job Submission failed with exception 'java.lang.RuntimeException(Error 
> caching map.xml: java.io.IOException: java.lang.InterruptedException)'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats

2015-05-19 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551310#comment-14551310
 ] 

Pengcheng Xiong commented on HIVE-10677:


[~ashutoshc] and [~jpullokkaran], could you please review the patch? Thanks.

> hive.exec.parallel=true has problem when it is used for analyze table column 
> stats
> --
>
> Key: HIVE-10677
> URL: https://issues.apache.org/jira/browse/HIVE-10677
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10677.01.patch
>
>
> To reproduce it, in q tests.
> {code}
> hive> set hive.exec.parallel;
> hive.exec.parallel=true
> hive> analyze table src compute statistics for columns;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.ColumnStatsTask
> java.lang.RuntimeException: Error caching map.xml: java.io.IOException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
> Caused by: java.io.IOException: java.lang.InterruptedException
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:541)
>   at org.apache.hadoop.util.Shell.run(Shell.java:455)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715)
>   ... 7 more
> hive> Job Submission failed with exception 'java.lang.RuntimeException(Error 
> caching map.xml: java.io.IOException: java.lang.InterruptedException)'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10404) hive.exec.parallel=true causes "out of sequence response" and SocketTimeoutException: Read timed out

2015-05-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10404:
---
Attachment: HIVE-10404.01.patch

After discussing with [~ashutoshc], we would like to estimate the efforts 
needed if we would like to set hive.exec.parallel=true as default.

> hive.exec.parallel=true causes "out of sequence response" and 
> SocketTimeoutException: Read timed out
> 
>
> Key: HIVE-10404
> URL: https://issues.apache.org/jira/browse/HIVE-10404
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Eugene Koifman
> Attachments: HIVE-10404.01.patch
>
>
> With hive.exec.parallel=true, Driver.lauchTask() calls Task.initialize() from 
> 1 thread on several Tasks.  It then starts new threads to run those tasks.
> Taks.initiazlie() gets an instance of Hive and holds on to it.  Hive.java 
> internally uses ThreadLocal to hand out instances, but since 
> Task.initialize() is called by a single thread from the Driver multiple tasks 
> share an instance of Hive.
> Each Hive instances has a single instance of MetaStoreClient; the later is 
> not thread safe.
> With hive.exec.parallel=true, different threads actually execute the tasks, 
> different threads end up sharing the same MetaStoreClient.
> If you make 2 concurrent calls, for example Hive.getTable(String), the Thrift 
> responses may return to the wrong caller.
> Thus the first caller gets "out of sequence response", drops this message and 
> reconnects.  If the timing is right, it will consume the other's response, 
> but the the other caller will block for hive.metastore.client.socket.timeout 
> since its response message has now been lost.
> This is just one concrete example.
> One possible fix is to make Task.db use ThreadLocal.
> This could be related to HIVE-6893



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10404) hive.exec.parallel=true causes "out of sequence response" and SocketTimeoutException: Read timed out

2015-05-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10404:
---
Attachment: (was: HIVE-10404.01.patch)

> hive.exec.parallel=true causes "out of sequence response" and 
> SocketTimeoutException: Read timed out
> 
>
> Key: HIVE-10404
> URL: https://issues.apache.org/jira/browse/HIVE-10404
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Eugene Koifman
>
> With hive.exec.parallel=true, Driver.lauchTask() calls Task.initialize() from 
> 1 thread on several Tasks.  It then starts new threads to run those tasks.
> Taks.initiazlie() gets an instance of Hive and holds on to it.  Hive.java 
> internally uses ThreadLocal to hand out instances, but since 
> Task.initialize() is called by a single thread from the Driver multiple tasks 
> share an instance of Hive.
> Each Hive instances has a single instance of MetaStoreClient; the later is 
> not thread safe.
> With hive.exec.parallel=true, different threads actually execute the tasks, 
> different threads end up sharing the same MetaStoreClient.
> If you make 2 concurrent calls, for example Hive.getTable(String), the Thrift 
> responses may return to the wrong caller.
> Thus the first caller gets "out of sequence response", drops this message and 
> reconnects.  If the timing is right, it will consume the other's response, 
> but the the other caller will block for hive.metastore.client.socket.timeout 
> since its response message has now been lost.
> This is just one concrete example.
> One possible fix is to make Task.db use ThreadLocal.
> This could be related to HIVE-6893



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-1033) change default value of hive.exec.parallel to true

2015-05-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-1033:
--
Attachment: HIVE-1033.4.patch

After discussing with [~ashutoshc], we would like to estimate the efforts 
needed if we would like to set hive.exec.parallel=true as default.

> change default value of hive.exec.parallel to true
> --
>
> Key: HIVE-1033
> URL: https://issues.apache.org/jira/browse/HIVE-1033
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
> Attachments: HIVE-1033.2.patch, HIVE-1033.3.patch, HIVE-1033.4.patch, 
> hive.1033.1.patch
>
>
> There is no harm in changing it to true. 
> Inside facebook, we have been testing it and it seems to be stable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-19 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551864#comment-14551864
 ] 

Pengcheng Xiong commented on HIVE-6867:
---

[~jpullokkaran], could you please take a look? The failed test is not related.

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Pengcheng Xiong
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)

2015-05-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8769:
--
Attachment: HIVE-8769.04.patch

> Physical optimizer : Incorrect CE results in a shuffle join instead of a Map 
> join (PK/FK pattern not detected)
> --
>
> Key: HIVE-8769
> URL: https://issues.apache.org/jira/browse/HIVE-8769
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Attachments: HIVE-8769.01.patch, HIVE-8769.02.patch, 
> HIVE-8769.03.patch, HIVE-8769.04.patch
>
>
> TPC-DS Q82 is running slower than hive 13 because the join type is not 
> correct.
> The estimate for item x inventory x date_dim is 227 Million rows while the 
> actual is  3K rows.
> Hive 13 finishes in  753  seconds.
> Hive 14 finishes in  1,267  seconds.
> Hive 14 + force map join finished in 431 seconds.
> Query
> {code}
> select  i_item_id
>,i_item_desc
>,i_current_price
>  from item, inventory, date_dim, store_sales
>  where i_current_price between 30 and 30+30
>  and inv_item_sk = i_item_sk
>  and d_date_sk=inv_date_sk
>  and d_date between '2002-05-30' and '2002-07-30'
>  and i_manufact_id in (437,129,727,663)
>  and inv_quantity_on_hand between 100 and 500
>  and ss_item_sk = i_item_sk
>  group by i_item_id,i_item_desc,i_current_price
>  order by i_item_id
>  limit 100
> {code}
> Plan 
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 7 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
> Reducer 4 <- Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE)
> Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
> Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
>   DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: item
>   filterExpr: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
> Statistics: Num rows: 115500 Data size: 34185680 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: i_item_sk (type: int), i_item_id (type: 
> string), i_item_desc (type: string), i_current_price (type: float)
>   outputColumnNames: _col0, _col1, _col2, _col3
>   Statistics: Num rows: 115500 Data size: 33724832 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 115500 Data size: 33724832 
> Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string), _col2 (type: 
> string), _col3 (type: float)
> Execution mode: vectorized
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 81741831 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 36524 Data size: 3579352 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: d_date_sk (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Select Operator
> expressions: _co

[jira] [Updated] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats

2015-05-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10677:
---
Attachment: HIVE-10677.02.patch

> hive.exec.parallel=true has problem when it is used for analyze table column 
> stats
> --
>
> Key: HIVE-10677
> URL: https://issues.apache.org/jira/browse/HIVE-10677
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10677.01.patch, HIVE-10677.02.patch
>
>
> To reproduce it, in q tests.
> {code}
> hive> set hive.exec.parallel;
> hive.exec.parallel=true
> hive> analyze table src compute statistics for columns;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.ColumnStatsTask
> java.lang.RuntimeException: Error caching map.xml: java.io.IOException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
> Caused by: java.io.IOException: java.lang.InterruptedException
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:541)
>   at org.apache.hadoop.util.Shell.run(Shell.java:455)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715)
>   ... 7 more
> hive> Job Submission failed with exception 'java.lang.RuntimeException(Error 
> caching map.xml: java.io.IOException: java.lang.InterruptedException)'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)

2015-05-21 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8769:
--
Attachment: (was: HIVE-8769.04.patch)

> Physical optimizer : Incorrect CE results in a shuffle join instead of a Map 
> join (PK/FK pattern not detected)
> --
>
> Key: HIVE-8769
> URL: https://issues.apache.org/jira/browse/HIVE-8769
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Attachments: HIVE-8769.01.patch, HIVE-8769.02.patch, 
> HIVE-8769.03.patch
>
>
> TPC-DS Q82 is running slower than hive 13 because the join type is not 
> correct.
> The estimate for item x inventory x date_dim is 227 Million rows while the 
> actual is  3K rows.
> Hive 13 finishes in  753  seconds.
> Hive 14 finishes in  1,267  seconds.
> Hive 14 + force map join finished in 431 seconds.
> Query
> {code}
> select  i_item_id
>,i_item_desc
>,i_current_price
>  from item, inventory, date_dim, store_sales
>  where i_current_price between 30 and 30+30
>  and inv_item_sk = i_item_sk
>  and d_date_sk=inv_date_sk
>  and d_date between '2002-05-30' and '2002-07-30'
>  and i_manufact_id in (437,129,727,663)
>  and inv_quantity_on_hand between 100 and 500
>  and ss_item_sk = i_item_sk
>  group by i_item_id,i_item_desc,i_current_price
>  order by i_item_id
>  limit 100
> {code}
> Plan 
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 7 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
> Reducer 4 <- Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE)
> Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
> Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
>   DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: item
>   filterExpr: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
> Statistics: Num rows: 115500 Data size: 34185680 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: i_item_sk (type: int), i_item_id (type: 
> string), i_item_desc (type: string), i_current_price (type: float)
>   outputColumnNames: _col0, _col1, _col2, _col3
>   Statistics: Num rows: 115500 Data size: 33724832 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 115500 Data size: 33724832 
> Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string), _col2 (type: 
> string), _col3 (type: float)
> Execution mode: vectorized
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 81741831 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 36524 Data size: 3579352 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: d_date_sk (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type:

[jira] [Updated] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats

2015-05-21 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10677:
---
Attachment: (was: HIVE-10677.02.patch)

> hive.exec.parallel=true has problem when it is used for analyze table column 
> stats
> --
>
> Key: HIVE-10677
> URL: https://issues.apache.org/jira/browse/HIVE-10677
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10677.01.patch
>
>
> To reproduce it, in q tests.
> {code}
> hive> set hive.exec.parallel;
> hive.exec.parallel=true
> hive> analyze table src compute statistics for columns;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.ColumnStatsTask
> java.lang.RuntimeException: Error caching map.xml: java.io.IOException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
> Caused by: java.io.IOException: java.lang.InterruptedException
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:541)
>   at org.apache.hadoop.util.Shell.run(Shell.java:455)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715)
>   ... 7 more
> hive> Job Submission failed with exception 'java.lang.RuntimeException(Error 
> caching map.xml: java.io.IOException: java.lang.InterruptedException)'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats

2015-05-21 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10677:
---
Attachment: HIVE-10677.02.patch

> hive.exec.parallel=true has problem when it is used for analyze table column 
> stats
> --
>
> Key: HIVE-10677
> URL: https://issues.apache.org/jira/browse/HIVE-10677
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10677.01.patch, HIVE-10677.02.patch
>
>
> To reproduce it, in q tests.
> {code}
> hive> set hive.exec.parallel;
> hive.exec.parallel=true
> hive> analyze table src compute statistics for columns;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.ColumnStatsTask
> java.lang.RuntimeException: Error caching map.xml: java.io.IOException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
> Caused by: java.io.IOException: java.lang.InterruptedException
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:541)
>   at org.apache.hadoop.util.Shell.run(Shell.java:455)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715)
>   ... 7 more
> hive> Job Submission failed with exception 'java.lang.RuntimeException(Error 
> caching map.xml: java.io.IOException: java.lang.InterruptedException)'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)

2015-05-21 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8769:
--
Attachment: HIVE-8769.04.patch

address review comments.

> Physical optimizer : Incorrect CE results in a shuffle join instead of a Map 
> join (PK/FK pattern not detected)
> --
>
> Key: HIVE-8769
> URL: https://issues.apache.org/jira/browse/HIVE-8769
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Attachments: HIVE-8769.01.patch, HIVE-8769.02.patch, 
> HIVE-8769.03.patch, HIVE-8769.04.patch
>
>
> TPC-DS Q82 is running slower than hive 13 because the join type is not 
> correct.
> The estimate for item x inventory x date_dim is 227 Million rows while the 
> actual is  3K rows.
> Hive 13 finishes in  753  seconds.
> Hive 14 finishes in  1,267  seconds.
> Hive 14 + force map join finished in 431 seconds.
> Query
> {code}
> select  i_item_id
>,i_item_desc
>,i_current_price
>  from item, inventory, date_dim, store_sales
>  where i_current_price between 30 and 30+30
>  and inv_item_sk = i_item_sk
>  and d_date_sk=inv_date_sk
>  and d_date between '2002-05-30' and '2002-07-30'
>  and i_manufact_id in (437,129,727,663)
>  and inv_quantity_on_hand between 100 and 500
>  and ss_item_sk = i_item_sk
>  group by i_item_id,i_item_desc,i_current_price
>  order by i_item_id
>  limit 100
> {code}
> Plan 
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 7 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
> Reducer 4 <- Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE)
> Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
> Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
>   DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: item
>   filterExpr: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
> Statistics: Num rows: 115500 Data size: 34185680 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: i_item_sk (type: int), i_item_id (type: 
> string), i_item_desc (type: string), i_current_price (type: float)
>   outputColumnNames: _col0, _col1, _col2, _col3
>   Statistics: Num rows: 115500 Data size: 33724832 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 115500 Data size: 33724832 
> Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string), _col2 (type: 
> string), _col3 (type: float)
> Execution mode: vectorized
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 81741831 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 36524 Data size: 3579352 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: d_date_sk (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Select Operator
>   

[jira] [Updated] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-21 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-6867:
--
Attachment: HIVE-6867.03.patch

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Pengcheng Xiong
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
> HIVE-6867.03.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-21 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555283#comment-14555283
 ] 

Pengcheng Xiong commented on HIVE-6867:
---

address [~jpullokkaran]'s comments

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Pengcheng Xiong
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
> HIVE-6867.03.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats

2015-05-21 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1425#comment-1425
 ] 

Pengcheng Xiong commented on HIVE-10677:


[~ashutoshc] and [~jpullokkaran], the test failure is unrelated and i think the 
patch is ready to go. Thanks.

> hive.exec.parallel=true has problem when it is used for analyze table column 
> stats
> --
>
> Key: HIVE-10677
> URL: https://issues.apache.org/jira/browse/HIVE-10677
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10677.01.patch, HIVE-10677.02.patch
>
>
> To reproduce it, in q tests.
> {code}
> hive> set hive.exec.parallel;
> hive.exec.parallel=true
> hive> analyze table src compute statistics for columns;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.ColumnStatsTask
> java.lang.RuntimeException: Error caching map.xml: java.io.IOException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
> Caused by: java.io.IOException: java.lang.InterruptedException
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:541)
>   at org.apache.hadoop.util.Shell.run(Shell.java:455)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715)
>   ... 7 more
> hive> Job Submission failed with exception 'java.lang.RuntimeException(Error 
> caching map.xml: java.io.IOException: java.lang.InterruptedException)'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)

2015-05-22 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556408#comment-14556408
 ] 

Pengcheng Xiong commented on HIVE-8769:
---

[~ashutoshc] and [~jpullokkaran], the failed tests, udf_sha2 is newly added and 
encryption_insert_partition_static is unrelated. I will update them in the next 
patch. Thus, I think the patch is ready to go. Thanks.

> Physical optimizer : Incorrect CE results in a shuffle join instead of a Map 
> join (PK/FK pattern not detected)
> --
>
> Key: HIVE-8769
> URL: https://issues.apache.org/jira/browse/HIVE-8769
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Attachments: HIVE-8769.01.patch, HIVE-8769.02.patch, 
> HIVE-8769.03.patch, HIVE-8769.04.patch
>
>
> TPC-DS Q82 is running slower than hive 13 because the join type is not 
> correct.
> The estimate for item x inventory x date_dim is 227 Million rows while the 
> actual is  3K rows.
> Hive 13 finishes in  753  seconds.
> Hive 14 finishes in  1,267  seconds.
> Hive 14 + force map join finished in 431 seconds.
> Query
> {code}
> select  i_item_id
>,i_item_desc
>,i_current_price
>  from item, inventory, date_dim, store_sales
>  where i_current_price between 30 and 30+30
>  and inv_item_sk = i_item_sk
>  and d_date_sk=inv_date_sk
>  and d_date between '2002-05-30' and '2002-07-30'
>  and i_manufact_id in (437,129,727,663)
>  and inv_quantity_on_hand between 100 and 500
>  and ss_item_sk = i_item_sk
>  group by i_item_id,i_item_desc,i_current_price
>  order by i_item_id
>  limit 100
> {code}
> Plan 
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 7 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
> Reducer 4 <- Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE)
> Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
> Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
>   DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: item
>   filterExpr: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
> Statistics: Num rows: 115500 Data size: 34185680 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: i_item_sk (type: int), i_item_id (type: 
> string), i_item_desc (type: string), i_current_price (type: float)
>   outputColumnNames: _col0, _col1, _col2, _col3
>   Statistics: Num rows: 115500 Data size: 33724832 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 115500 Data size: 33724832 
> Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string), _col2 (type: 
> string), _col3 (type: float)
> Execution mode: vectorized
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 81741831 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 36524 Data size: 3579352 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: d_date_sk (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce

[jira] [Updated] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-22 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-6867:
--
Attachment: HIVE-6867.04.patch

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Pengcheng Xiong
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
> HIVE-6867.03.patch, HIVE-6867.04.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)

2015-05-22 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8769:
--
Attachment: HIVE-8769.05.patch

[~jpullokkaran], according to your comments, I have rebased the patch.

> Physical optimizer : Incorrect CE results in a shuffle join instead of a Map 
> join (PK/FK pattern not detected)
> --
>
> Key: HIVE-8769
> URL: https://issues.apache.org/jira/browse/HIVE-8769
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Attachments: HIVE-8769.01.patch, HIVE-8769.02.patch, 
> HIVE-8769.03.patch, HIVE-8769.04.patch, HIVE-8769.05.patch
>
>
> TPC-DS Q82 is running slower than hive 13 because the join type is not 
> correct.
> The estimate for item x inventory x date_dim is 227 Million rows while the 
> actual is  3K rows.
> Hive 13 finishes in  753  seconds.
> Hive 14 finishes in  1,267  seconds.
> Hive 14 + force map join finished in 431 seconds.
> Query
> {code}
> select  i_item_id
>,i_item_desc
>,i_current_price
>  from item, inventory, date_dim, store_sales
>  where i_current_price between 30 and 30+30
>  and inv_item_sk = i_item_sk
>  and d_date_sk=inv_date_sk
>  and d_date between '2002-05-30' and '2002-07-30'
>  and i_manufact_id in (437,129,727,663)
>  and inv_quantity_on_hand between 100 and 500
>  and ss_item_sk = i_item_sk
>  group by i_item_id,i_item_desc,i_current_price
>  order by i_item_id
>  limit 100
> {code}
> Plan 
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 7 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
> Reducer 4 <- Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE)
> Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
> Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
>   DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: item
>   filterExpr: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
>   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((i_current_price BETWEEN 30 AND 60 and 
> (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
> boolean)
> Statistics: Num rows: 115500 Data size: 34185680 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: i_item_sk (type: int), i_item_id (type: 
> string), i_item_desc (type: string), i_current_price (type: float)
>   outputColumnNames: _col0, _col1, _col2, _col3
>   Statistics: Num rows: 115500 Data size: 33724832 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 115500 Data size: 33724832 
> Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string), _col2 (type: 
> string), _col3 (type: float)
> Execution mode: vectorized
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 81741831 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
> and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 36524 Data size: 3579352 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: d_date_sk (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats:

[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-23 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557445#comment-14557445
 ] 

Pengcheng Xiong commented on HIVE-6867:
---

[~jpullokkaran], the test failure is unrelated and I think the patch is ready 
to go. Thanks.

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Pengcheng Xiong
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
> HIVE-6867.03.patch, HIVE-6867.04.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10107) Union All : Vertex missing stats resulting in OOM and in-efficient plans

2015-05-23 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong resolved HIVE-10107.

Resolution: Fixed

resolved following hive-8769

> Union All : Vertex missing stats resulting in OOM and in-efficient plans
> 
>
> Key: HIVE-10107
> URL: https://issues.apache.org/jira/browse/HIVE-10107
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
>
> Reducer Vertices sending data to a Union all edge are missing statistics and 
> as a result we either use very few reducers in the UNION ALL edge or decide 
> to broadcast the results of UNION ALL.
> Query
> {code}
> select 
> count(*) rowcount
> from
> (select 
> ss_item_sk, ss_ticket_number, ss_store_sk
> from
> store_sales a, store_returns b
> where
> a.ss_item_sk = b.sr_item_sk
> and a.ss_ticket_number = b.sr_ticket_number union all select 
> ss_item_sk, ss_ticket_number, ss_store_sk
> from
> store_sales c, store_returns d
> where
> c.ss_item_sk = d.sr_item_sk
> and c.ss_ticket_number = d.sr_ticket_number) t
> group by t.ss_store_sk , t.ss_item_sk , t.ss_ticket_number
> having rowcount > 1;
> {code}
> Plan snippet 
> {code}
>  Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 
> (CONTAINS)
> Reducer 4 <- Union 3 (SIMPLE_EDGE)
> Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 
> (CONTAINS)
>   Reducer 4
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 
> (type: int)
> mode: mergepartial
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
> Filter Operator
>   predicate: (_col3 > 1) (type: boolean)
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col3 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: COMPLETE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: COMPLETE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Reducer 7
> Reduce Operator Tree:
>   Merge Join Operator
> condition map:
>  Inner Join 0 to 1
> keys:
>   0 ss_item_sk (type: int), ss_ticket_number (type: int)
>   1 sr_item_sk (type: int), sr_ticket_number (type: int)
> outputColumnNames: _col1, _col6, _col8, _col27, _col34
> Filter Operator
>   predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: 
> boolean)
>   Select Operator
> expressions: _col1 (type: int), _col8 (type: int), _col6 
> (type: int)
> outputColumnNames: _col0, _col1, _col2
> Group By Operator
>   aggregations: count()
>   keys: _col2 (type: int), _col0 (type: int), _col1 
> (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1, _col2, _col3
>   Reduce Output Operator
> key expressions: _col0 (type: int), _col1 (type: 
> int), _col2 (type: int)
> sort order: +++
> Map-reduce partition columns: _col0 (type: int), 
> _col1 (type: int), _col2 (type: int)
> value expressions: _col3 (type: bigint)
> {code}
> The full explain plan 
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 
> (CONTAINS)
> Reducer 4 <- Union 3 (SIMPLE_EDGE)
> Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 
> (CONTAINS)
>   DagName: mmokhtar_20150214132

[jira] [Commented] (HIVE-10812) Scaling PK/FK's selectivity for stats annotation

2015-05-23 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557530#comment-14557530
 ] 

Pengcheng Xiong commented on HIVE-10812:


[~jpullokkaran], [~ashutoshc] and [~mmokhtar], we will address the PK/FK 
selectivity scaling problem in this patch. And also it will address 
[~ashutoshc]'s previous comments regarding the SERDE.

> Scaling PK/FK's selectivity for stats annotation
> 
>
> Key: HIVE-10812
> URL: https://issues.apache.org/jira/browse/HIVE-10812
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> Right now, the computation of the selectivity of FK side based on PK side 
> does not take into consideration of the range of FK and the range of PK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10812) Scaling PK/FK's selectivity for stats annotation

2015-05-23 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10812:
---
Issue Type: Improvement  (was: Bug)

> Scaling PK/FK's selectivity for stats annotation
> 
>
> Key: HIVE-10812
> URL: https://issues.apache.org/jira/browse/HIVE-10812
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10812.01.patch
>
>
> Right now, the computation of the selectivity of FK side based on PK side 
> does not take into consideration of the range of FK and the range of PK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10812) Scaling PK/FK's selectivity for stats annotation

2015-05-23 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10812:
---
Attachment: HIVE-10812.01.patch

> Scaling PK/FK's selectivity for stats annotation
> 
>
> Key: HIVE-10812
> URL: https://issues.apache.org/jira/browse/HIVE-10812
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10812.01.patch
>
>
> Right now, the computation of the selectivity of FK side based on PK side 
> does not take into consideration of the range of FK and the range of PK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10812) Scaling PK/FK's selectivity for stats annotation

2015-05-24 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10812:
---
Attachment: HIVE-10812.02.patch

> Scaling PK/FK's selectivity for stats annotation
> 
>
> Key: HIVE-10812
> URL: https://issues.apache.org/jira/browse/HIVE-10812
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10812.01.patch, HIVE-10812.02.patch
>
>
> Right now, the computation of the selectivity of FK side based on PK side 
> does not take into consideration of the range of FK and the range of PK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10812) Scaling PK/FK's selectivity for stats annotation

2015-05-25 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558448#comment-14558448
 ] 

Pengcheng Xiong commented on HIVE-10812:


The test failures are unrelated.

> Scaling PK/FK's selectivity for stats annotation
> 
>
> Key: HIVE-10812
> URL: https://issues.apache.org/jira/browse/HIVE-10812
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10812.01.patch, HIVE-10812.02.patch
>
>
> Right now, the computation of the selectivity of FK side based on PK side 
> does not take into consideration of the range of FK and the range of PK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10812) Scaling PK/FK's selectivity for stats annotation

2015-05-26 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10812:
---
Attachment: HIVE-10812.03.patch

> Scaling PK/FK's selectivity for stats annotation
> 
>
> Key: HIVE-10812
> URL: https://issues.apache.org/jira/browse/HIVE-10812
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10812.01.patch, HIVE-10812.02.patch, 
> HIVE-10812.03.patch
>
>
> Right now, the computation of the selectivity of FK side based on PK side 
> does not take into consideration of the range of FK and the range of PK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9105) Hive-0.13 select constant in union all followed by group by gives wrong result

2015-05-26 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong resolved HIVE-9105.
---
Resolution: Fixed

> Hive-0.13 select constant in union all followed by group by gives wrong result
> --
>
> Key: HIVE-9105
> URL: https://issues.apache.org/jira/browse/HIVE-9105
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> select '1' as key from srcpart where ds="2008-04-09"
> UNION all
> SELECT key from srcpart where ds="2008-04-09" and hr="11"
> ) tab group by key 
> will generate wrong results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10804) CBO: Calcite Operator To Hive Operator (Calcite Return Path): optimizer for limit 0 does not work

2015-05-26 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10804:
---
Attachment: HIVE-10804.01.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): optimizer for 
> limit 0 does not work
> -
>
> Key: HIVE-10804
> URL: https://issues.apache.org/jira/browse/HIVE-10804
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10804.01.patch
>
>
> {code}
> explain
> select key,value from src order by key limit 0
> POSTHOOK: type: QUERY
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: src
> Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: key (type: string), value (type: string)
>   outputColumnNames: key, value
>   Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: key (type: string)
> sort order: +
> Statistics: Num rows: 500 Data size: 5312 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: value (type: string)
>   Reduce Operator Tree:
> Select Operator
>   expressions: KEY.reducesinkkey0 (type: string), VALUE.value (type: 
> string)
>   outputColumnNames: key, value
>   Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
>   Limit
> Number of rows: 0
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>   table:
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10731) NullPointerException in HiveParser.g

2015-05-26 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560177#comment-14560177
 ] 

Pengcheng Xiong commented on HIVE-10731:


[~jpullokkaran], this patch also needs your review. Thanks.

> NullPointerException in HiveParser.g
> 
>
> Key: HIVE-10731
> URL: https://issues.apache.org/jira/browse/HIVE-10731
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.0
>Reporter: Xiu
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-10731.01.patch
>
>
> In HiveParser.g:
> {code:Java}
> protected boolean useSQL11ReservedKeywordsForIdentifier() {
> return !HiveConf.getBoolVar(hiveConf, 
> HiveConf.ConfVars.HIVE_SUPPORT_SQL11_RESERVED_KEYWORDS);
> }
> {code}
> NullPointerException is thrown when hiveConf is not set.
> Stack trace:
> {code:Java}
> java.lang.NullPointerException
> at org.apache.hadoop.hive.conf.HiveConf.getBoolVar(HiveConf.java:2583)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.useSQL11ReservedKeywordsForIdentifier(HiveParser.java:1000)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.useSQL11ReservedKeywordsForIdentifier(HiveParser_IdentifiersParser.java:726)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:10922)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:45808)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser.java:38008)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeList(HiveParser.java:36167)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:5214)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2640)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1650)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1109)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:161)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10829) ATS hook fails for explainTask

2015-05-26 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10829:
---
Attachment: HIVE-10829.01.patch

> ATS hook fails for explainTask
> --
>
> Key: HIVE-10829
> URL: https://issues.apache.org/jira/browse/HIVE-10829
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-10829.01.patch
>
>
> Commands:
> create table idtable(id string);
> create table ctastable as select * from idtable;
> With ATS hook:
> 2015-05-22 18:54:47,092 INFO  [ATS Logger 0]: hooks.ATSHook 
> (ATSHook.java:run(136)) - Failed to submit plan to ATS: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:589)
> at 
> org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:576)
> at 
> org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:821)
> at 
> org.apache.hadoop.hive.ql.exec.ExplainTask.outputStagePlans(ExplainTask.java:965)
> at 
> org.apache.hadoop.hive.ql.exec.ExplainTask.getJSONPlan(ExplainTask.java:219)
> at org.apache.hadoop.hive.ql.hooks.ATSHook$2.run(ATSHook.java:120)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10804) CBO: Calcite Operator To Hive Operator (Calcite Return Path): optimizer for limit 0 does not work

2015-05-27 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561257#comment-14561257
 ] 

Pengcheng Xiong commented on HIVE-10804:


[~jpullokkaran] and [~ashutoshc], the failed tests are unrelated. I assume that 
someone has broken the trunk. Thanks.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): optimizer for 
> limit 0 does not work
> -
>
> Key: HIVE-10804
> URL: https://issues.apache.org/jira/browse/HIVE-10804
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10804.01.patch
>
>
> {code}
> explain
> select key,value from src order by key limit 0
> POSTHOOK: type: QUERY
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: src
> Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: key (type: string), value (type: string)
>   outputColumnNames: key, value
>   Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: key (type: string)
> sort order: +
> Statistics: Num rows: 500 Data size: 5312 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: value (type: string)
>   Reduce Operator Tree:
> Select Operator
>   expressions: KEY.reducesinkkey0 (type: string), VALUE.value (type: 
> string)
>   outputColumnNames: key, value
>   Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
>   Limit
> Number of rows: 0
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>   table:
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10829) ATS hook fails for explainTask

2015-05-27 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561259#comment-14561259
 ] 

Pengcheng Xiong commented on HIVE-10829:


The test failures are unrelated. QA reports that it works. [~jpullokkaran] and 
[~ashutoshc], could you please take a look? Thanks.

> ATS hook fails for explainTask
> --
>
> Key: HIVE-10829
> URL: https://issues.apache.org/jira/browse/HIVE-10829
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-10829.01.patch
>
>
> Commands:
> create table idtable(id string);
> create table ctastable as select * from idtable;
> With ATS hook:
> 2015-05-22 18:54:47,092 INFO  [ATS Logger 0]: hooks.ATSHook 
> (ATSHook.java:run(136)) - Failed to submit plan to ATS: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:589)
> at 
> org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:576)
> at 
> org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:821)
> at 
> org.apache.hadoop.hive.ql.exec.ExplainTask.outputStagePlans(ExplainTask.java:965)
> at 
> org.apache.hadoop.hive.ql.exec.ExplainTask.getJSONPlan(ExplainTask.java:219)
> at org.apache.hadoop.hive.ql.hooks.ATSHook$2.run(ATSHook.java:120)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10833) RowResolver looks mangled with CBO

2015-05-27 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561755#comment-14561755
 ] 

Pengcheng Xiong commented on HIVE-10833:


[~ekoifman], could you try again on the latest trunk? I was not able to 
reproduce it.
with CBO enabled, the first time I run insert and I put a break point at   
_selectStar 
(currently 3865) and examine _out_rwsch.rslvMap
{code}
values__tmp__table__1{(tmp_values_col1,_col0: string)(tmp_values_col2,_col1: 
string)} 
{code}
The query failed, but I did see "values__tmp__table__1" rather than null.
{code}
FAILED: NullPointerException null
{code}
If I run the same insert again, I will see
{code}
values__tmp__table__2{(tmp_values_col1,_col0: string)(tmp_values_col2,_col1: 
string)} 
{code}
I still can not see "null".

> RowResolver looks mangled with CBO 
> ---
>
> Key: HIVE-10833
> URL: https://issues.apache.org/jira/browse/HIVE-10833
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Pengcheng Xiong
>
> While working on HIVE-10828 I noticed that internal state of RowResolver 
> looks odd when CBO is enabled.
> Consider the script below.
> {noformat}
> set hive.enforce.bucketing=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.cbo.enable=false;
> drop table if exists acid_partitioned;
> create table acid_partitioned (a int, c string)
>   partitioned by (p int)
>   clustered by (a) into 1 buckets;
>   
> insert into acid_partitioned partition (p) (a,p) values(1,1);
> {noformat}
> (this test is part of 
> ql/src/test/results/clientpositive/insert_into_with_schema2.q)
> With CBO on,
> if you put a break point in {noformat}SemanticAnalyzer.genSelectPlan(String 
> dest, ASTNode selExprList, QB qb, Operator input,
>   Operator inputForSelectStar, boolean outerLV){noformat} at line 
> _selectStar = selectStar && exprList.getChildCount() == posn + 1;_
> (currently 3865) and examine _out_rwsch.rslvMap_ variable looks like 
> {noformat}{null={values__tmp__table__1.tmp_values_col1=_col0: string, 
> values__tmp__table__1.tmp_values_col2=_col1: string}}{noformat}
> with CBO disabled, the same _out_rwsch.rslvMap_ looks like
> {noformat}{values__tmp__table__1={tmp_values_col1=_col0: string, 
> tmp_values_col2=_col1: string}}{noformat}
> The _out_rwsch.invRslvMap_ also differs in the same way.
> It seems that the version you get with CBO off is the correct one since
> _insert into acid_partitioned partition (p) (a,p) values(1,1)_ is rewritten to
> _insert into acid_partitioned partition (p) (a,p) select * from 
> values__tmp__table__1_
> CC [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10833) RowResolver looks mangled with CBO

2015-05-27 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561760#comment-14561760
 ] 

Pengcheng Xiong commented on HIVE-10833:


[~ekoifman], could you try again on the latest trunk? I was not able to 
reproduce it.
with CBO enabled, the first time I run insert and I put a break point at   
_selectStar 
(currently 3865) and examine _out_rwsch.rslvMap
{code}
values__tmp__table__1{(tmp_values_col1,_col0: string)(tmp_values_col2,_col1: 
string)} 
{code}
The query failed, but I did see "values__tmp__table__1" rather than null.
{code}
FAILED: NullPointerException null
{code}
If I run the same insert again, I will see
{code}
values__tmp__table__2{(tmp_values_col1,_col0: string)(tmp_values_col2,_col1: 
string)} 
{code}
I still can not see "null".

> RowResolver looks mangled with CBO 
> ---
>
> Key: HIVE-10833
> URL: https://issues.apache.org/jira/browse/HIVE-10833
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Pengcheng Xiong
>
> While working on HIVE-10828 I noticed that internal state of RowResolver 
> looks odd when CBO is enabled.
> Consider the script below.
> {noformat}
> set hive.enforce.bucketing=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.cbo.enable=false;
> drop table if exists acid_partitioned;
> create table acid_partitioned (a int, c string)
>   partitioned by (p int)
>   clustered by (a) into 1 buckets;
>   
> insert into acid_partitioned partition (p) (a,p) values(1,1);
> {noformat}
> (this test is part of 
> ql/src/test/results/clientpositive/insert_into_with_schema2.q)
> With CBO on,
> if you put a break point in {noformat}SemanticAnalyzer.genSelectPlan(String 
> dest, ASTNode selExprList, QB qb, Operator input,
>   Operator inputForSelectStar, boolean outerLV){noformat} at line 
> _selectStar = selectStar && exprList.getChildCount() == posn + 1;_
> (currently 3865) and examine _out_rwsch.rslvMap_ variable looks like 
> {noformat}{null={values__tmp__table__1.tmp_values_col1=_col0: string, 
> values__tmp__table__1.tmp_values_col2=_col1: string}}{noformat}
> with CBO disabled, the same _out_rwsch.rslvMap_ looks like
> {noformat}{values__tmp__table__1={tmp_values_col1=_col0: string, 
> tmp_values_col2=_col1: string}}{noformat}
> The _out_rwsch.invRslvMap_ also differs in the same way.
> It seems that the version you get with CBO off is the correct one since
> _insert into acid_partitioned partition (p) (a,p) values(1,1)_ is rewritten to
> _insert into acid_partitioned partition (p) (a,p) select * from 
> values__tmp__table__1_
> CC [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10833) RowResolver looks mangled with CBO

2015-05-27 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561812#comment-14561812
 ] 

Pengcheng Xiong commented on HIVE-10833:


[~ekoifman], I assume that we are focusing on the RowResolver rather than the 
NPE? So I should be able to reproduce that without HIVE-10828 patch? And we are 
talking about the null that you found in
{code}
{null={values__tmp__table__1.tmp_values_col1=_col0: string, 
values__tmp__table__1.tmp_values_col2=_col1: string}}
{code}?
If that is the case, I tried many times, but I can not reproduce that...

> RowResolver looks mangled with CBO 
> ---
>
> Key: HIVE-10833
> URL: https://issues.apache.org/jira/browse/HIVE-10833
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Pengcheng Xiong
>
> While working on HIVE-10828 I noticed that internal state of RowResolver 
> looks odd when CBO is enabled.
> Consider the script below.
> {noformat}
> set hive.enforce.bucketing=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.cbo.enable=false;
> drop table if exists acid_partitioned;
> create table acid_partitioned (a int, c string)
>   partitioned by (p int)
>   clustered by (a) into 1 buckets;
>   
> insert into acid_partitioned partition (p) (a,p) values(1,1);
> {noformat}
> (this test is part of 
> ql/src/test/results/clientpositive/insert_into_with_schema2.q)
> With CBO on,
> if you put a break point in {noformat}SemanticAnalyzer.genSelectPlan(String 
> dest, ASTNode selExprList, QB qb, Operator input,
>   Operator inputForSelectStar, boolean outerLV){noformat} at line 
> _selectStar = selectStar && exprList.getChildCount() == posn + 1;_
> (currently 3865) and examine _out_rwsch.rslvMap_ variable looks like 
> {noformat}{null={values__tmp__table__1.tmp_values_col1=_col0: string, 
> values__tmp__table__1.tmp_values_col2=_col1: string}}{noformat}
> with CBO disabled, the same _out_rwsch.rslvMap_ looks like
> {noformat}{values__tmp__table__1={tmp_values_col1=_col0: string, 
> tmp_values_col2=_col1: string}}{noformat}
> The _out_rwsch.invRslvMap_ also differs in the same way.
> It seems that the version you get with CBO off is the correct one since
> _insert into acid_partitioned partition (p) (a,p) values(1,1)_ is rewritten to
> _insert into acid_partitioned partition (p) (a,p) select * from 
> values__tmp__table__1_
> CC [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10833) RowResolver looks mangled with CBO

2015-05-27 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561835#comment-14561835
 ] 

Pengcheng Xiong commented on HIVE-10833:


Yes, I set the properties as you mentioned and follow the instructions step by 
step. Thanks.

> RowResolver looks mangled with CBO 
> ---
>
> Key: HIVE-10833
> URL: https://issues.apache.org/jira/browse/HIVE-10833
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Pengcheng Xiong
>
> While working on HIVE-10828 I noticed that internal state of RowResolver 
> looks odd when CBO is enabled.
> Consider the script below.
> {noformat}
> set hive.enforce.bucketing=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.cbo.enable=false;
> drop table if exists acid_partitioned;
> create table acid_partitioned (a int, c string)
>   partitioned by (p int)
>   clustered by (a) into 1 buckets;
>   
> insert into acid_partitioned partition (p) (a,p) values(1,1);
> {noformat}
> (this test is part of 
> ql/src/test/results/clientpositive/insert_into_with_schema2.q)
> With CBO on,
> if you put a break point in {noformat}SemanticAnalyzer.genSelectPlan(String 
> dest, ASTNode selExprList, QB qb, Operator input,
>   Operator inputForSelectStar, boolean outerLV){noformat} at line 
> _selectStar = selectStar && exprList.getChildCount() == posn + 1;_
> (currently 3865) and examine _out_rwsch.rslvMap_ variable looks like 
> {noformat}{null={values__tmp__table__1.tmp_values_col1=_col0: string, 
> values__tmp__table__1.tmp_values_col2=_col1: string}}{noformat}
> with CBO disabled, the same _out_rwsch.rslvMap_ looks like
> {noformat}{values__tmp__table__1={tmp_values_col1=_col0: string, 
> tmp_values_col2=_col1: string}}{noformat}
> The _out_rwsch.invRslvMap_ also differs in the same way.
> It seems that the version you get with CBO off is the correct one since
> _insert into acid_partitioned partition (p) (a,p) values(1,1)_ is rewritten to
> _insert into acid_partitioned partition (p) (a,p) select * from 
> values__tmp__table__1_
> CC [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10698) query on view results fails with table not found error if view is created with subquery alias (CTE).

2015-05-28 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563390#comment-14563390
 ] 

Pengcheng Xiong commented on HIVE-10698:


[~ashutoshc], could you please review it? Thanks.

> query on view results fails with table not found error if view is created 
> with subquery alias (CTE).
> 
>
> Key: HIVE-10698
> URL: https://issues.apache.org/jira/browse/HIVE-10698
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10698.01.patch
>
>
> To reproduce it, 
> {code}
> use bugtest;
> create table basetb(id int, name string);
> create view testv1 as
> with subtb as (select id, name from bugtest.basetb)
> select id from subtb;
> use castest;
> explain select * from bugtest.testv1;
> hive> explain select * from bugtest.testv1;
> FAILED: SemanticException Line 2:15 Table not found 'subtb' in definition of 
> VIEW testv1 [
> with subtb as (select id, name from bugtest.basetb)
> select id from `bugtest`.`subtb`
> ] used as testv1 at Line 1:22
> Note that there is a database prefix `bugtest`.`subtb`
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10853) Create ExplainTask in ATS hook through ExplainWork

2015-05-28 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10853:
---
Attachment: HIVE-10853.01.patch

> Create ExplainTask in ATS hook through ExplainWork
> --
>
> Key: HIVE-10853
> URL: https://issues.apache.org/jira/browse/HIVE-10853
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10853.01.patch
>
>
> Right now ExplainTask is created directly. That's fragile and can lead to 
> stuff like: HIVE-10829



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-29 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-6867:
--
Attachment: HIVE-6867.05.patch

address [~jpullokkaran], [~xuefuz]'s comments.

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Pengcheng Xiong
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
> HIVE-6867.03.patch, HIVE-6867.04.patch, HIVE-6867.05.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10853) Create ExplainTask in ATS hook through ExplainWork

2015-05-29 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10853:
---
Attachment: HIVE-10853.02.patch

address [~hagleitn]'s comment

> Create ExplainTask in ATS hook through ExplainWork
> --
>
> Key: HIVE-10853
> URL: https://issues.apache.org/jira/browse/HIVE-10853
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10853.01.patch, HIVE-10853.02.patch
>
>
> Right now ExplainTask is created directly. That's fragile and can lead to 
> stuff like: HIVE-10829



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-29 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565340#comment-14565340
 ] 

Pengcheng Xiong commented on HIVE-6867:
---

[~xuefuz], thanks for your comments. If you read the comments from 
[~jpullokkaran] in the review board, you will find out that this patch is 
targeting "load into" a bucketed table rather than "insert into" a bucketed 
table.

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Pengcheng Xiong
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
> HIVE-6867.03.patch, HIVE-6867.04.patch, HIVE-6867.05.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-29 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565495#comment-14565495
 ] 

Pengcheng Xiong commented on HIVE-6867:
---

[~xuefuz], Yes, the problem still remains. If you read my comment on RB this 
morning, you will find that "And after we discussed with Hive JDBC guy, we 
found that current infrastructure does not support warning msg to be passed 
through JDBC. We acknowledge that this is something that we need to improve in 
the future."

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Pengcheng Xiong
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
> HIVE-6867.03.patch, HIVE-6867.04.patch, HIVE-6867.05.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-29 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565554#comment-14565554
 ] 

Pengcheng Xiong commented on HIVE-6867:
---

[~xuefuz], we agree with you but that will be a major change and should be done 
only as part of a major release.

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Pengcheng Xiong
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
> HIVE-6867.03.patch, HIVE-6867.04.patch, HIVE-6867.05.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10479) Empty tabAlias in columnInfo which triggers PPD

2015-06-01 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10479:
---
Attachment: HIVE-10479.01.patch

> Empty tabAlias in columnInfo which triggers PPD
> ---
>
> Key: HIVE-10479
> URL: https://issues.apache.org/jira/browse/HIVE-10479
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-10479.01.patch, HIVE-10479.patch
>
>
> in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, 
> when aliases contains empty string "" and key is an empty string "" too, it 
> assumes that aliases contains key. This will trigger incorrect PPD. To 
> reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10479) Empty tabAlias in columnInfo which triggers PPD

2015-06-01 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567589#comment-14567589
 ] 

Pengcheng Xiong commented on HIVE-10479:


cbo sub not in q will pass with this patch.

> Empty tabAlias in columnInfo which triggers PPD
> ---
>
> Key: HIVE-10479
> URL: https://issues.apache.org/jira/browse/HIVE-10479
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-10479.01.patch, HIVE-10479.patch
>
>
> in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, 
> when aliases contains empty string "" and key is an empty string "" too, it 
> assumes that aliases contains key. This will trigger incorrect PPD. To 
> reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10479) Empty tabAlias in columnInfo which triggers PPD

2015-06-01 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-10479:
--

Assignee: Pengcheng Xiong  (was: Laljo John Pullokkaran)

> Empty tabAlias in columnInfo which triggers PPD
> ---
>
> Key: HIVE-10479
> URL: https://issues.apache.org/jira/browse/HIVE-10479
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10479.01.patch, HIVE-10479.patch
>
>
> in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, 
> when aliases contains empty string "" and key is an empty string "" too, it 
> assumes that aliases contains key. This will trigger incorrect PPD. To 
> reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10479) CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias in columnInfo which triggers PPD

2015-06-01 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10479:
---
Summary: CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty 
tabAlias in columnInfo which triggers PPD  (was: Empty tabAlias in columnInfo 
which triggers PPD)

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias 
> in columnInfo which triggers PPD
> 
>
> Key: HIVE-10479
> URL: https://issues.apache.org/jira/browse/HIVE-10479
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10479.01.patch, HIVE-10479.patch
>
>
> in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, 
> when aliases contains empty string "" and key is an empty string "" too, it 
> assumes that aliases contains key. This will trigger incorrect PPD. To 
> reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-9897) Issue a warning when using an existing table/view name as an alias in a with statement.

2015-06-01 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-9897:
-

Assignee: Pengcheng Xiong

> Issue a warning when using an existing table/view name as an alias in a with 
> statement. 
> 
>
> Key: HIVE-9897
> URL: https://issues.apache.org/jira/browse/HIVE-9897
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 0.13.1
> Environment: cdh5.3.0
>Reporter: Mario Konschake
>Assignee: Pengcheng Xiong
>Priority: Minor
>
> Consider the following query:
> {code:sql}
> WITH
> table_a AS (
> SELECT
> 'johndoe' AS name
> FROM
> my_table
> )
> SELECT
> DISTINCT name
> FROM
> table_a;
> {code}
> Observation: 
> If a table or a view with name `table_a` exists it is used instead of the one 
> defined in the WITH statement.
> Expectation:
> As the expectation is ambiguous (using the alias in the WITH statement vs. 
> using the existing table) issuing a warning when using a existing name in a 
> WITH statement is recommended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9897) Issue a warning when using an existing table/view name as an alias in a with statement.

2015-06-01 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong resolved HIVE-9897.
---
Resolution: Fixed

resolved due to the recent checkin of HIVE-10698

> Issue a warning when using an existing table/view name as an alias in a with 
> statement. 
> 
>
> Key: HIVE-9897
> URL: https://issues.apache.org/jira/browse/HIVE-9897
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 0.13.1
> Environment: cdh5.3.0
>Reporter: Mario Konschake
>Assignee: Pengcheng Xiong
>Priority: Minor
>
> Consider the following query:
> {code:sql}
> WITH
> table_a AS (
> SELECT
> 'johndoe' AS name
> FROM
> my_table
> )
> SELECT
> DISTINCT name
> FROM
> table_a;
> {code}
> Observation: 
> If a table or a view with name `table_a` exists it is used instead of the one 
> defined in the WITH statement.
> Expectation:
> As the expectation is ambiguous (using the alias in the WITH statement vs. 
> using the existing table) issuing a warning when using a existing name in a 
> WITH statement is recommended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10853) Create ExplainTask in ATS hook through ExplainWork

2015-06-01 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567798#comment-14567798
 ] 

Pengcheng Xiong commented on HIVE-10853:


[~hagleitn], could you please take a look? The failed tests are unrelated. 
Thanks.

> Create ExplainTask in ATS hook through ExplainWork
> --
>
> Key: HIVE-10853
> URL: https://issues.apache.org/jira/browse/HIVE-10853
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10853.01.patch, HIVE-10853.02.patch
>
>
> Right now ExplainTask is created directly. That's fragile and can lead to 
> stuff like: HIVE-10829



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10479) CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias in columnInfo which triggers PPD

2015-06-01 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568285#comment-14568285
 ] 

Pengcheng Xiong commented on HIVE-10479:


The test failures are unrelated. [~ashutoshc], could you please take a look? 
Thanks.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias 
> in columnInfo which triggers PPD
> 
>
> Key: HIVE-10479
> URL: https://issues.apache.org/jira/browse/HIVE-10479
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10479.01.patch, HIVE-10479.patch
>
>
> in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, 
> when aliases contains empty string "" and key is an empty string "" too, it 
> assumes that aliases contains key. This will trigger incorrect PPD. To 
> reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6867) Bucketized Table feature fails in some cases

2015-06-02 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-6867:
--
Assignee: Hari Sankar Sivarama Subramaniyan  (was: Pengcheng Xiong)

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
> HIVE-6867.03.patch, HIVE-6867.04.patch, HIVE-6867.05.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases

2015-06-02 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569981#comment-14569981
 ] 

Pengcheng Xiong commented on HIVE-6867:
---

reassign to [~hsubramaniyan]. Here is what we are expecting in a new patch 
after discussing: (1) Remove the enforce bucketing and the enforce sorting 
flags. (2) For insert into a bucketized table, get the table property if it is 
a bucketized one (sort by some key) and do the correct insert  (3) For load 
into a bucketized table, always throw an exception. (4) update all the related 
q files. (5) add a backward compatibility flag to allow to go back to the 
previous usage.


> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
> HIVE-6867.03.patch, HIVE-6867.04.patch, HIVE-6867.05.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10923) encryption_join_with_different_encryption_keys.q fails on CentOS 6

2015-06-05 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574790#comment-14574790
 ] 

Pengcheng Xiong commented on HIVE-10923:


[~spena], thanks a lot for your reply. It is now resolved. Could you please 
also take a quick look at https://issues.apache.org/jira/browse/HIVE-10938 too 
if you have time? Any suggestions/comments are welcome. Thanks again.  

> encryption_join_with_different_encryption_keys.q fails on CentOS 6
> --
>
> Key: HIVE-10923
> URL: https://issues.apache.org/jira/browse/HIVE-10923
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>
> Here is the stack trace
> {code}
> Task with the most failures(4):
> -
> Task ID:
>   task_1433377676690_0015_m_00
> URL:
>   
> http://ip-10-0-0-249.ec2.internal:44717/taskdetails.jsp?jobid=job_1433377676690_0015&tipid=task_1433377676690_0015_m_00
> -
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"key":"238","value":"val_238"}
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"key":"238","value":"val_238"}
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
>   ... 8 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
> java.security.InvalidKeyException: Illegal key size
>   at 
> org.apache.hadoop.crypto.JceAesCtrCryptoCodec$JceAesCtrCipher.init(JceAesCtrCryptoCodec.java:116)
>   at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension$DefaultCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:264)
>   at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:371)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2489)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2620)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2519)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:566)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:394)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> Caused by: java.security.InvalidKeyException: Illegal key size
>   at javax.crypto.Cipher.checkCryptoPerm(Cipher.java:1024)
>   at javax.crypto.Cipher.implInit(Cipher.java:790)
>   at javax.crypto.Cipher.chooseProvider(Cipher.java:849)
>   at javax.crypto.Cipher.init(Cipher.java:1348)
>   at javax.crypto.Cipher.init(Cipher.java:1282)
>   at 
> org.apache.hadoop.crypto.JceAesCtrCryptoCodec$JceAesCtrCipher.init(JceAesCtrCryptoCodec.java:113)
>   ... 16 more
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:577)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:675)
>   at org.ap

[jira] [Resolved] (HIVE-10923) encryption_join_with_different_encryption_keys.q fails on CentOS 6

2015-06-05 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong resolved HIVE-10923.

Resolution: Fixed

> encryption_join_with_different_encryption_keys.q fails on CentOS 6
> --
>
> Key: HIVE-10923
> URL: https://issues.apache.org/jira/browse/HIVE-10923
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>
> Here is the stack trace
> {code}
> Task with the most failures(4):
> -
> Task ID:
>   task_1433377676690_0015_m_00
> URL:
>   
> http://ip-10-0-0-249.ec2.internal:44717/taskdetails.jsp?jobid=job_1433377676690_0015&tipid=task_1433377676690_0015_m_00
> -
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"key":"238","value":"val_238"}
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"key":"238","value":"val_238"}
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
>   ... 8 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
> java.security.InvalidKeyException: Illegal key size
>   at 
> org.apache.hadoop.crypto.JceAesCtrCryptoCodec$JceAesCtrCipher.init(JceAesCtrCryptoCodec.java:116)
>   at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension$DefaultCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:264)
>   at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:371)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2489)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2620)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2519)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:566)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:394)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> Caused by: java.security.InvalidKeyException: Illegal key size
>   at javax.crypto.Cipher.checkCryptoPerm(Cipher.java:1024)
>   at javax.crypto.Cipher.implInit(Cipher.java:790)
>   at javax.crypto.Cipher.chooseProvider(Cipher.java:849)
>   at javax.crypto.Cipher.init(Cipher.java:1348)
>   at javax.crypto.Cipher.init(Cipher.java:1282)
>   at 
> org.apache.hadoop.crypto.JceAesCtrCryptoCodec$JceAesCtrCipher.init(JceAesCtrCryptoCodec.java:113)
>   ... 16 more
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:577)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:675)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
>   at 
> org.apache.hadoop.

[jira] [Updated] (HIVE-10900) Fix the indeterministic stats for some hive queries

2015-06-05 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10900:
---
Attachment: HIVE-10900.01.patch

temporary fix for accumulo stats. [~ashutoshc], could you please take a look? 
Also ccing [~jpullokkaran]

> Fix the indeterministic stats for some hive queries 
> 
>
> Key: HIVE-10900
> URL: https://issues.apache.org/jira/browse/HIVE-10900
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-10900.01.patch
>
>
> If we do not run compute stats for a table and then we do some operation on 
> that table, we will get different stats numbers when we run explain. The main 
> reason is due to the different OS/FS configurations that Hive Stats depends 
> on when there is no table stats. A simple fix is to add compute stats for 
> those  indeterministic stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10958) Centos: TestMiniTezCliDriver.testCliDriver_mergejoin fails

2015-06-06 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10958:
---
Attachment: HIVE-10958.01.patch

> Centos: TestMiniTezCliDriver.testCliDriver_mergejoin fails
> --
>
> Key: HIVE-10958
> URL: https://issues.apache.org/jira/browse/HIVE-10958
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10958.01.patch
>
>
> Centos: TestMiniTezCliDriver.testCliDriver_mergejoin fails due to the 
> statement "set mapred.reduce.tasks = 18;"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10853) Create ExplainTask in ATS hook through ExplainWork

2015-06-08 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577564#comment-14577564
 ] 

Pengcheng Xiong commented on HIVE-10853:


[~alangates], I just talked with [~ashutoshc]. As this is an improvement and we 
assume that branch-1 is a maintenance branch, we think it is OK to not to push 
it to branch-1. Thanks.

> Create ExplainTask in ATS hook through ExplainWork
> --
>
> Key: HIVE-10853
> URL: https://issues.apache.org/jira/browse/HIVE-10853
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Pengcheng Xiong
> Fix For: 2.0.0
>
> Attachments: HIVE-10853.01.patch, HIVE-10853.02.patch
>
>
> Right now ExplainTask is created directly. That's fragile and can lead to 
> stuff like: HIVE-10829



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10938) All the analyze table statements are failing on encryption testing framework

2015-06-08 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577609#comment-14577609
 ] 

Pengcheng Xiong commented on HIVE-10938:


[~spena], yes, i can still reproduce that on the current master. You can just 
save the commands in a q file. If you run that with TestCliDriver, you will have
{code}
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: unencryptedtable
  Statistics: Num rows: 2 Data size: 22 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: key (type: string), value (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 2 Data size: 22 Basic stats: COMPLETE Column 
stats: NONE
ListSink
{code}
This is correct.
However, if you run that with TestEncryptedHDFSCliDriver, you will have
{code}
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: unencryptedtable
  Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: key (type: string), value (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE Column 
stats: NONE
ListSink
{code}
This is not correct. If you look into more details, you will find that analyze 
table statement never works. Thanks.

> All the analyze table statements are failing on encryption testing framework
> 
>
> Key: HIVE-10938
> URL: https://issues.apache.org/jira/browse/HIVE-10938
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>
> To reproduce, in recent q test environment, create a q file
> {code}
> drop table IF EXISTS unencryptedTable;
> create table unencryptedTable(key string, value string);
> insert into table unencryptedTable values
> ('501', 'val_501'),
> ('502', 'val_502');
> analyze table unencryptedTable compute statistics;
> explain select * from unencryptedTable;
> {code}
> Then run with TestEncryptedHDFSCliDriver.
> analyze table will generate a MapRed task and a StatsTask. The MapRed task 
> will fail silently without generating the stats, e.g., numRows for the table. 
> And the following StatsTask can not read any results. This will fail not only 
> for encrypted tables but also non-encrypted one as shown above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10938) All the analyze table statements are failing on encryption testing framework

2015-06-08 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577613#comment-14577613
 ] 

Pengcheng Xiong commented on HIVE-10938:


[~spena], more information: I actually deployed an encrypted real environment 
myself over the weekend. In the real environment, the analyze table statement 
works. Thus, the problem is related with the TestEncryptedHDFSCliDriver 
framework. Thanks.

> All the analyze table statements are failing on encryption testing framework
> 
>
> Key: HIVE-10938
> URL: https://issues.apache.org/jira/browse/HIVE-10938
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>
> To reproduce, in recent q test environment, create a q file
> {code}
> drop table IF EXISTS unencryptedTable;
> create table unencryptedTable(key string, value string);
> insert into table unencryptedTable values
> ('501', 'val_501'),
> ('502', 'val_502');
> analyze table unencryptedTable compute statistics;
> explain select * from unencryptedTable;
> {code}
> Then run with TestEncryptedHDFSCliDriver.
> analyze table will generate a MapRed task and a StatsTask. The MapRed task 
> will fail silently without generating the stats, e.g., numRows for the table. 
> And the following StatsTask can not read any results. This will fail not only 
> for encrypted tables but also non-encrypted one as shown above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10965) direct SQL for stats fails in 0-column case

2015-06-08 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577789#comment-14577789
 ] 

Pengcheng Xiong commented on HIVE-10965:


LGTM. Thanks!

> direct SQL for stats fails in 0-column case
> ---
>
> Key: HIVE-10965
> URL: https://issues.apache.org/jira/browse/HIVE-10965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.3.0, 1.2.1, 2.0.0
>
> Attachments: HIVE-10965.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10965) direct SQL for stats fails in 0-column case

2015-06-08 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577808#comment-14577808
 ] 

Pengcheng Xiong commented on HIVE-10965:


I have no idea but I would start from the query that request stats with 0 
columns. And if I remembered correctly, I remembered that [~ashutoshc] 
committed a patch to deal with a similar issue (empty partition or empty 
columns) several weeks ago. He may have a better answer. Thanks.

> direct SQL for stats fails in 0-column case
> ---
>
> Key: HIVE-10965
> URL: https://issues.apache.org/jira/browse/HIVE-10965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.3.0, 1.2.1, 2.0.0
>
> Attachments: HIVE-10965.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10882) CBO: Calcite Operator To Hive Operator (Calcite Return Path) empty filterMap of join operator causes NPE exception

2015-06-09 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579242#comment-14579242
 ] 

Pengcheng Xiong commented on HIVE-10882:


[~jcamachorodriguez], I have started but I have not figured out a solution yet. 
Thus, please go ahead and take it as I am busy with UT failures these days. 
Also ccing [~jpullokkaran]. Thanks.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) empty filterMap 
> of join operator causes NPE exception
> --
>
> Key: HIVE-10882
> URL: https://issues.apache.org/jira/browse/HIVE-10882
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> CBO return path creates join operator with empty filters. However, 
> vectorization is checking the filters of bigTable in join. This causes NPE 
> exception. To reproduce, run vector_outer_join2.q with return path turned on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10882) CBO: Calcite Operator To Hive Operator (Calcite Return Path) empty filterMap of join operator causes NPE exception

2015-06-09 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10882:
---
Assignee: Jesus Camacho Rodriguez  (was: Pengcheng Xiong)

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) empty filterMap 
> of join operator causes NPE exception
> --
>
> Key: HIVE-10882
> URL: https://issues.apache.org/jira/browse/HIVE-10882
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Jesus Camacho Rodriguez
>
> CBO return path creates join operator with empty filters. However, 
> vectorization is checking the filters of bigTable in join. This causes NPE 
> exception. To reproduce, run vector_outer_join2.q with return path turned on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10958) Centos: TestMiniTezCliDriver.testCliDriver_mergejoin fails

2015-06-09 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579898#comment-14579898
 ] 

Pengcheng Xiong commented on HIVE-10958:


I am not sure if it is also committed to branch-1 too. [~ashutoshc], could you 
please take a look at [~thejas] question? Thanks.

> Centos: TestMiniTezCliDriver.testCliDriver_mergejoin fails
> --
>
> Key: HIVE-10958
> URL: https://issues.apache.org/jira/browse/HIVE-10958
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 1.2.1, 2.0.0
>
> Attachments: HIVE-10958.01.patch
>
>
> Centos: TestMiniTezCliDriver.testCliDriver_mergejoin fails due to the 
> statement "set mapred.reduce.tasks = 18;"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases

2015-06-11 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582330#comment-14582330
 ] 

Pengcheng Xiong commented on HIVE-6867:
---

[~ychena], yes, it would be the best if insert into is also supported. That 
depends on how far [~hsubramaniyan] would like to go. Thanks. 

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
> HIVE-6867.03.patch, HIVE-6867.04.patch, HIVE-6867.05.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10479) CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias in columnInfo which triggers PPD

2015-06-12 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583776#comment-14583776
 ] 

Pengcheng Xiong commented on HIVE-10479:


[~jcamachorodriguez], as per [~jpullokkaran]'s request, could you please review 
the patch? Thanks.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias 
> in columnInfo which triggers PPD
> 
>
> Key: HIVE-10479
> URL: https://issues.apache.org/jira/browse/HIVE-10479
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10479.01.patch, HIVE-10479.patch
>
>
> in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, 
> when aliases contains empty string "" and key is an empty string "" too, it 
> assumes that aliases contains key. This will trigger incorrect PPD. To 
> reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11005) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : Regression on the latest master

2015-06-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11005:
---
Assignee: Jesus Camacho Rodriguez

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) : Regression on 
> the latest master
> --
>
> Key: HIVE-11005
> URL: https://issues.apache.org/jira/browse/HIVE-11005
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Jesus Camacho Rodriguez
>
> Test cbo_join.q and cbo_views.q on return path failed. Part of the stack 
> trace is 
> {code}
> 2015-06-15 09:51:53,377 ERROR [main]: parse.CalcitePlanner 
> (CalcitePlanner.java:genOPTree(282)) - CBO failed, skipping CBO.
> java.lang.IndexOutOfBoundsException: index (0) must be less than size (0)
> at 
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305)
> at 
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284)
> at 
> com.google.common.collect.EmptyImmutableList.get(EmptyImmutableList.java:80)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveInsertExchange4JoinRule.onMatch(HiveInsertExchange4JoinRule.java:101)
> at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:326)
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:515)
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:392)
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:255)
> at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:125)
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:207)
> at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:194)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:888)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:771)
> at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:876)
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11007) CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL

2015-06-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11007:
---
Attachment: HIVE-11007.01.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's 
> mapInputToDP should depends on the last SEL
> -
>
> Key: HIVE-11007
> URL: https://issues.apache.org/jira/browse/HIVE-11007
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11007.01.patch
>
>
> In dynamic partitioning case, for example, we are going to have 
> TS0-SEL1-SEL2-FS3. The dpCtx's mapInputToDP is populated by SEL1 rather than 
> SEL2, which causes error in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >