[jira] [Commented] (HIVE-7012) Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer

2014-05-15 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995846#comment-13995846
 ] 

Sun Rui commented on HIVE-7012:
---

For the issue about distinct, I will investigate it later and if I can find a 
real test case, I will submit a separate jira.

> Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
> 
>
> Key: HIVE-7012
> URL: https://issues.apache.org/jira/browse/HIVE-7012
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Sun Rui
>Assignee: Navis
> Attachments: HIVE-7012.1.patch.txt, HIVE-7012.2.patch.txt
>
>
> With HIVE 0.13.0, run the following test case:
> {code:sql}
> create table src(key bigint, value string);
> select  
>count(distinct key) as col0
> from src
> order by col0;
> {code}
> The following exception will be thrown:
> {noformat}
> java.lang.RuntimeException: Error in configuring object
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>   ... 9 more
> Caused by: java.lang.RuntimeException: Reduce operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173)
>   ... 14 more
> Caused by: java.lang.RuntimeException: cannot find field _col0 from 
> [0:reducesinkkey0]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:288)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166)
>   ... 14 more
> {noformat}
> This issue is related to HIVE-6455. When hive.optimize.reducededuplication is 
> set to false, then this issue will be gone.
> Logical plan when hive.optimize.reducededuplication=false;
> {noformat}
> src 
>   TableScan (TS_0)
> alias: src
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
> Select Operator (SEL_1)
>   expressions: key (type: bigint)
>   outputColumnNames: key
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
> NONE
>   Group By Operator (GBY_2)
> aggregations: count(DISTINCT key)
> keys: key (type: bigint)
> mode: hash
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
> NONE
> Reduce Output Operator (RS_3)
>   istinctColumnIndices:
>   key expressions: _col0 (type: bigint)
>   DistributionKeys: 0
>   sort order: +
>   OutputKeyColumnNames: _col0
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>   Group By Operator (GBY_4)
> aggregations: count(DISTINCT KEY._col0:0._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator (SEL_5)
>   expressions: _col0 (type: bigint)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
> Column stats: NO

[jira] [Commented] (HIVE-7012) Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer

2014-05-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995204#comment-13995204
 ] 

Ashutosh Chauhan commented on HIVE-7012:


+1 Issue raised by [~sunrui] if exists will probably require a different fix, 
which we shall take up in separate jira. 

> Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
> 
>
> Key: HIVE-7012
> URL: https://issues.apache.org/jira/browse/HIVE-7012
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Sun Rui
>Assignee: Navis
> Attachments: HIVE-7012.1.patch.txt, HIVE-7012.2.patch.txt
>
>
> With HIVE 0.13.0, run the following test case:
> {code:sql}
> create table src(key bigint, value string);
> select  
>count(distinct key) as col0
> from src
> order by col0;
> {code}
> The following exception will be thrown:
> {noformat}
> java.lang.RuntimeException: Error in configuring object
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>   ... 9 more
> Caused by: java.lang.RuntimeException: Reduce operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173)
>   ... 14 more
> Caused by: java.lang.RuntimeException: cannot find field _col0 from 
> [0:reducesinkkey0]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:288)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166)
>   ... 14 more
> {noformat}
> This issue is related to HIVE-6455. When hive.optimize.reducededuplication is 
> set to false, then this issue will be gone.
> Logical plan when hive.optimize.reducededuplication=false;
> {noformat}
> src 
>   TableScan (TS_0)
> alias: src
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
> Select Operator (SEL_1)
>   expressions: key (type: bigint)
>   outputColumnNames: key
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
> NONE
>   Group By Operator (GBY_2)
> aggregations: count(DISTINCT key)
> keys: key (type: bigint)
> mode: hash
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
> NONE
> Reduce Output Operator (RS_3)
>   istinctColumnIndices:
>   key expressions: _col0 (type: bigint)
>   DistributionKeys: 0
>   sort order: +
>   OutputKeyColumnNames: _col0
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>   Group By Operator (GBY_4)
> aggregations: count(DISTINCT KEY._col0:0._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator (SEL_5)
>   expressions: _col0 (type: bigint)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
> Col

[jira] [Commented] (HIVE-7012) Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer

2014-05-12 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994895#comment-13994895
 ] 

Navis commented on HIVE-7012:
-

[~ashutoshc] Yes, it's intended. In the query ppd2.q
{code}
select a.*
  from (
select key, count(value) as cc
from srcpart a
where a.ds = '2008-04-08' and a.hr = '11'
group by key
  )a
  distribute by a.key
  sort by a.key,a.cc desc
{code}
cc is generated field by GBY operator, so It's semantically wrong to merged RS 
for GBY with following RS. But the same time, sort on "a.cc" is meaningless so 
it can be removed in optimizing, but not in here (maybe in SemanticAnalyzer?).

[~sunrui] Yes, RS for distinct should be avoided from any dedup process. Could 
you take this issue? I think you knows better than me.

> Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
> 
>
> Key: HIVE-7012
> URL: https://issues.apache.org/jira/browse/HIVE-7012
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Sun Rui
>Assignee: Navis
> Attachments: HIVE-7012.1.patch.txt, HIVE-7012.2.patch.txt
>
>
> With HIVE 0.13.0, run the following test case:
> {code:sql}
> create table src(key bigint, value string);
> select  
>count(distinct key) as col0
> from src
> order by col0;
> {code}
> The following exception will be thrown:
> {noformat}
> java.lang.RuntimeException: Error in configuring object
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>   ... 9 more
> Caused by: java.lang.RuntimeException: Reduce operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173)
>   ... 14 more
> Caused by: java.lang.RuntimeException: cannot find field _col0 from 
> [0:reducesinkkey0]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:288)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166)
>   ... 14 more
> {noformat}
> This issue is related to HIVE-6455. When hive.optimize.reducededuplication is 
> set to false, then this issue will be gone.
> Logical plan when hive.optimize.reducededuplication=false;
> {noformat}
> src 
>   TableScan (TS_0)
> alias: src
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
> Select Operator (SEL_1)
>   expressions: key (type: bigint)
>   outputColumnNames: key
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
> NONE
>   Group By Operator (GBY_2)
> aggregations: count(DISTINCT key)
> keys: key (type: bigint)
> mode: hash
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
> NONE
> Reduce Output Operator (RS_3)
>   istinctColumnIndices:
>   key expressions: _col0 (type: bigint)
>   DistributionKeys: 0
>   sort order: +
>   OutputKeyColumnNames: _col0
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> st

[jira] [Commented] (HIVE-7012) Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer

2014-05-11 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993941#comment-13993941
 ] 

Ashutosh Chauhan commented on HIVE-7012:


In ppd2.q.out looks like new MR stage got added, looks like RS-dedup 
optimization got disabled for it. That looks like performance regression. Was 
that intentional ?

> Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
> 
>
> Key: HIVE-7012
> URL: https://issues.apache.org/jira/browse/HIVE-7012
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Sun Rui
>Assignee: Navis
> Attachments: HIVE-7012.1.patch.txt, HIVE-7012.2.patch.txt
>
>
> With HIVE 0.13.0, run the following test case:
> {code:sql}
> create table src(key bigint, value string);
> select  
>count(distinct key) as col0
> from src
> order by col0;
> {code}
> The following exception will be thrown:
> {noformat}
> java.lang.RuntimeException: Error in configuring object
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>   ... 9 more
> Caused by: java.lang.RuntimeException: Reduce operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173)
>   ... 14 more
> Caused by: java.lang.RuntimeException: cannot find field _col0 from 
> [0:reducesinkkey0]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:288)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166)
>   ... 14 more
> {noformat}
> This issue is related to HIVE-6455. When hive.optimize.reducededuplication is 
> set to false, then this issue will be gone.
> Logical plan when hive.optimize.reducededuplication=false;
> {noformat}
> src 
>   TableScan (TS_0)
> alias: src
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
> Select Operator (SEL_1)
>   expressions: key (type: bigint)
>   outputColumnNames: key
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
> NONE
>   Group By Operator (GBY_2)
> aggregations: count(DISTINCT key)
> keys: key (type: bigint)
> mode: hash
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
> NONE
> Reduce Output Operator (RS_3)
>   istinctColumnIndices:
>   key expressions: _col0 (type: bigint)
>   DistributionKeys: 0
>   sort order: +
>   OutputKeyColumnNames: _col0
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>   Group By Operator (GBY_4)
> aggregations: count(DISTINCT KEY._col0:0._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator (SEL_5)
>   expressions: _col0 (type: bigint)
>   outputColumnNames: _col0
>   Statistics: Num row

[jira] [Commented] (HIVE-7012) Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer

2014-05-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994580#comment-13994580
 ] 

Hive QA commented on HIVE-7012:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12644076/HIVE-7012.2.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5503 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/174/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/174/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12644076

> Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
> 
>
> Key: HIVE-7012
> URL: https://issues.apache.org/jira/browse/HIVE-7012
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Sun Rui
>Assignee: Navis
> Attachments: HIVE-7012.1.patch.txt, HIVE-7012.2.patch.txt
>
>
> With HIVE 0.13.0, run the following test case:
> {code:sql}
> create table src(key bigint, value string);
> select  
>count(distinct key) as col0
> from src
> order by col0;
> {code}
> The following exception will be thrown:
> {noformat}
> java.lang.RuntimeException: Error in configuring object
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>   ... 9 more
> Caused by: java.lang.RuntimeException: Reduce operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173)
>   ... 14 more
> Caused by: java.lang.RuntimeException: cannot find field _col0 from 
> [0:reducesinkkey0]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:288)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166)
>   ... 14 more
> {noformat}
> This issue is related to HIVE-6455. When hive.optimize.reducededuplication is 
> set to false, then this issue will be gone.
> Logical plan when hive.optimize.reducededuplication=false;
> {noformat}
> src 
>   TableScan (TS_0)
> alias: src
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
> Select Operator (SEL_1)
>   expressions: key (type: bigint)
>   outputColumnNames: key
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
> NONE
>   Group By Operator (GBY_2)
> aggregations: count(DISTINCT key)
> keys: key (type: bigint)
> mode: hash
> outputColumnNames: _col0, 

[jira] [Commented] (HIVE-7012) Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer

2014-05-10 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994368#comment-13994368
 ] 

Sun Rui commented on HIVE-7012:
---

[~navis] I verified that your patch solved my problem. 

[~navis] and [~yhuai] However, I suspect that the optimizer may still have bug 
when there are distinct expressions. It seems that the optimizer has not taken 
support for distinct keys into consideration when it was being implemented. 
Note that keyCols in ReduceSinkDesc is composed of groupby keys and possibly 
distinct keys. For example, assume cRS and pRS both have KeyCols as (a, b, c, 
d) and numDistributionKeys=2. cRS may have distinct expressions like 
distinct(c, d) while pRS may have distinct expressions like distinct(c), 
distinct(d). In this case, they have different sort keys while their KeyCols 
are same. [~yhuai] what do you think?


> Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
> 
>
> Key: HIVE-7012
> URL: https://issues.apache.org/jira/browse/HIVE-7012
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Sun Rui
>Assignee: Navis
> Attachments: HIVE-7012.1.patch.txt, HIVE-7012.2.patch.txt
>
>
> With HIVE 0.13.0, run the following test case:
> {code:sql}
> create table src(key bigint, value string);
> select  
>count(distinct key) as col0
> from src
> order by col0;
> {code}
> The following exception will be thrown:
> {noformat}
> java.lang.RuntimeException: Error in configuring object
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>   ... 9 more
> Caused by: java.lang.RuntimeException: Reduce operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173)
>   ... 14 more
> Caused by: java.lang.RuntimeException: cannot find field _col0 from 
> [0:reducesinkkey0]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:288)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166)
>   ... 14 more
> {noformat}
> This issue is related to HIVE-6455. When hive.optimize.reducededuplication is 
> set to false, then this issue will be gone.
> Logical plan when hive.optimize.reducededuplication=false;
> {noformat}
> src 
>   TableScan (TS_0)
> alias: src
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
> Select Operator (SEL_1)
>   expressions: key (type: bigint)
>   outputColumnNames: key
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
> NONE
>   Group By Operator (GBY_2)
> aggregations: count(DISTINCT key)
> keys: key (type: bigint)
> mode: hash
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
> NONE
> Reduce Output Operator (RS_3)
>   istinctColumnIndices:
>   key expressions: _col0 (type: bigint)
>   DistributionKeys: 0
>   sort order: +
>   OutputKeyColumnNames: _

[jira] [Commented] (HIVE-7012) Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer

2014-05-03 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988631#comment-13988631
 ] 

Sun Rui commented on HIVE-7012:
---

I am thinking about the following fix, but not sure if right:

sameKeys():

ExprNodeDesc pexpr = pexprs.get(i);
ExprNodeDesc cexpr = ExprNodeDescUtils.backtrack(cexprs.get(i), child, 
parent);
// check if cexpr is from the parent
if (cexpr == null || (cexpr not contained in the colExprMap of the 
parent operator) || !pexpr.isSame(cexpr)) {
  return null;
}

> Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
> 
>
> Key: HIVE-7012
> URL: https://issues.apache.org/jira/browse/HIVE-7012
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Sun Rui
>
> With HIVE 0.13.0, run the following test case:
> {code:sql}
> create table src(key bigint, value string);
> select  
>count(distinct key) as col0
> from src
> order by col0;
> {code}
> The following exception will be thrown:
> {noformat}
> java.lang.RuntimeException: Error in configuring object
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>   ... 9 more
> Caused by: java.lang.RuntimeException: Reduce operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173)
>   ... 14 more
> Caused by: java.lang.RuntimeException: cannot find field _col0 from 
> [0:reducesinkkey0]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:288)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166)
>   ... 14 more
> {noformat}
> This issue is related to HIVE-6455. When hive.optimize.reducededuplication is 
> set to false, then this issue will be gone.
> Logical plan when hive.optimize.reducededuplication=false;
> {noformat}
> src 
>   TableScan (TS_0)
> alias: src
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
> Select Operator (SEL_1)
>   expressions: key (type: bigint)
>   outputColumnNames: key
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
> NONE
>   Group By Operator (GBY_2)
> aggregations: count(DISTINCT key)
> keys: key (type: bigint)
> mode: hash
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
> NONE
> Reduce Output Operator (RS_3)
>   istinctColumnIndices:
>   key expressions: _col0 (type: bigint)
>   DistributionKeys: 0
>   sort order: +
>   OutputKeyColumnNames: _col0
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>   Group By Operator (GBY_4)
> aggregations: count(DISTINCT KEY._col0:0._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator (SEL_5)
>