[jira] [Commented] (HIVE-7217) Inner join query fails in the reducer when join key file is spilled to tmp by RowContainer
[ https://issues.apache.org/jira/browse/HIVE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030226#comment-14030226 ] Muthu commented on HIVE-7217: - [~navis] Could you share more details about the issue. Appreciate if you could create a separate patch for 0.13, I could help in testing the patch. Thanks, Muthu Inner join query fails in the reducer when join key file is spilled to tmp by RowContainer -- Key: HIVE-7217 URL: https://issues.apache.org/jira/browse/HIVE-7217 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.13.1 Reporter: Muthu Attachments: reducer.log {code} SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id = T2.video_id WHERE T1.hourid=389567 hive show create table video; OK CREATE TABLE `video`( `video_id` int, `video_title` string, ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video' TBLPROPERTIES ( 'numPartitions'='0', 'numFiles'='1', 'last_modified_by'='hadoop', 'last_modified_time'='1336446601', 'COLUMN_STATS_ACCURATE'='true', 'transient_lastDdlTime'='1402514051', 'numRows'='0', 'totalSize'='586773666', 'rawDataSize'='0') Time taken: 0.249 seconds, Fetched: 98 row(s) {code} The reducer fails with the following exception: {code} 2014-06-11 12:32:39,051 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 16000 rows for join key [663184] 2014-06-11 12:32:39,061 INFO org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created temp file /mnt/volume2/mapred/local/taskTracker/muthu.nivas/jobcache/job_201405301214_170634/attempt_201405301214_170634_r_00_0/work/tmp/hive-rowcontainer413460656723947992/RowContainer1053550561043043830.tmp 2014-06-11 12:32:39,237 INFO org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 2 2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) ... 7 more Caused by: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728) at org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226) ... 12 more {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7217) Inner join query fails in the reducer
Muthu created HIVE-7217: --- Summary: Inner join query fails in the reducer Key: HIVE-7217 URL: https://issues.apache.org/jira/browse/HIVE-7217 Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 0.13.0 Reporter: Muthu SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id = T2.video_id WHERE T1.hourid=389567 hive show create table video; OK CREATE TABLE `video`( `video_id` int, `video_title` string, ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video' TBLPROPERTIES ( 'numPartitions'='0', 'numFiles'='1', 'last_modified_by'='hadoop', 'last_modified_time'='1336446601', 'COLUMN_STATS_ACCURATE'='true', 'transient_lastDdlTime'='1402514051', 'numRows'='0', 'totalSize'='586773666', 'rawDataSize'='0') Time taken: 0.249 seconds, Fetched: 98 row(s) The reducer fails with the following exception: 2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) ... 7 more Caused by: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728) at org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226) ... 12 more -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7217) Inner join query fails in the reducer
[ https://issues.apache.org/jira/browse/HIVE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Muthu updated HIVE-7217: Attachment: reducer.log Inner join query fails in the reducer - Key: HIVE-7217 URL: https://issues.apache.org/jira/browse/HIVE-7217 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.13.1 Reporter: Muthu Attachments: reducer.log SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id = T2.video_id WHERE T1.hourid=389567 hive show create table video; OK CREATE TABLE `video`( `video_id` int, `video_title` string, ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video' TBLPROPERTIES ( 'numPartitions'='0', 'numFiles'='1', 'last_modified_by'='hadoop', 'last_modified_time'='1336446601', 'COLUMN_STATS_ACCURATE'='true', 'transient_lastDdlTime'='1402514051', 'numRows'='0', 'totalSize'='586773666', 'rawDataSize'='0') Time taken: 0.249 seconds, Fetched: 98 row(s) The reducer fails with the following exception: 2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) ... 7 more Caused by: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728) at org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226) ... 12 more -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7217) Inner join query fails in the reducer when join key file is spilled to tmp by RowContainer
[ https://issues.apache.org/jira/browse/HIVE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Muthu updated HIVE-7217: Summary: Inner join query fails in the reducer when join key file is spilled to tmp by RowContainer (was: Inner join query fails in the reducer) Inner join query fails in the reducer when join key file is spilled to tmp by RowContainer -- Key: HIVE-7217 URL: https://issues.apache.org/jira/browse/HIVE-7217 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.13.1 Reporter: Muthu Attachments: reducer.log SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id = T2.video_id WHERE T1.hourid=389567 hive show create table video; OK CREATE TABLE `video`( `video_id` int, `video_title` string, ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video' TBLPROPERTIES ( 'numPartitions'='0', 'numFiles'='1', 'last_modified_by'='hadoop', 'last_modified_time'='1336446601', 'COLUMN_STATS_ACCURATE'='true', 'transient_lastDdlTime'='1402514051', 'numRows'='0', 'totalSize'='586773666', 'rawDataSize'='0') Time taken: 0.249 seconds, Fetched: 98 row(s) The reducer fails with the following exception: 2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) ... 7 more Caused by: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728) at org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226) ... 12 more -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7217) Inner join query fails in the reducer when join key file is spilled to tmp by RowContainer
[ https://issues.apache.org/jira/browse/HIVE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Muthu updated HIVE-7217: Description: SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id = T2.video_id WHERE T1.hourid=389567 hive show create table video; OK CREATE TABLE `video`( `video_id` int, `video_title` string, ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video' TBLPROPERTIES ( 'numPartitions'='0', 'numFiles'='1', 'last_modified_by'='hadoop', 'last_modified_time'='1336446601', 'COLUMN_STATS_ACCURATE'='true', 'transient_lastDdlTime'='1402514051', 'numRows'='0', 'totalSize'='586773666', 'rawDataSize'='0') Time taken: 0.249 seconds, Fetched: 98 row(s) The reducer fails with the following exception: 2014-06-11 12:32:39,051 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 16000 rows for join key [663184] 2014-06-11 12:32:39,061 INFO org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created temp file /mnt/volume2/mapred/local/taskTracker/muthu.nivas/jobcache/job_201405301214_170634/attempt_201405301214_170634_r_00_0/work/tmp/hive-rowcontainer413460656723947992/RowContainer1053550561043043830.tmp 2014-06-11 12:32:39,237 INFO org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 2 2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) ... 7 more Caused by: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728) at org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226) ... 12 more was: SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id = T2.video_id WHERE T1.hourid=389567 hive show create table video; OK CREATE TABLE `video`( `video_id` int, `video_title` string, ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video' TBLPROPERTIES ( 'numPartitions'='0', 'numFiles'='1', 'last_modified_by'='hadoop', 'last_modified_time'='1336446601', 'COLUMN_STATS_ACCURATE'='true', 'transient_lastDdlTime'='1402514051', 'numRows'='0', 'totalSize'='586773666', 'rawDataSize'='0') Time taken: 0.249 seconds, Fetched: 98 row(s) The reducer fails with the following exception: 2014-06-11 12:32:39,299
[jira] [Commented] (HIVE-5888) group by after join operation product no result when hive.optimize.skewjoin = true
[ https://issues.apache.org/jira/browse/HIVE-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963454#comment-13963454 ] Muthu commented on HIVE-5888: - [~navis] After applying the patch from HIVE-6041 to hive 0.12, queries with auto MAPJOIN fails with the following error: Any workarounds? set hive.optimize.skewjoin=true; set hive.auto.convert.join=true; SELECT ru.userid, SUM(ru.total_count) FROM BIGTABLE ru JOIN SMALLTABLE c on c.creative_id = ru.creative_id JOIN placement_dapi p ON p.placement_id = c.placement_id WHERE ru.realdate = '2014-01-02' AND ru.userid 0 GROUP BY ru.userid; Stage-1 is selected by condition resolver. java.io.FileNotFoundException: java.io.FileNotFoundException: File does not exist: /tmp/hive-muthu.nivas/tmp/hive-muthu.nivas/hive_2014-02-26_18-17-04_075_3879899075227148508-1/-mr-10002 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58) at org.apache.hadoop.hdfs.DFSClient.getContentSummary(DFSClient.java:917) at org.apache.hadoop.hdfs.DistributedFileSystem.getContentSummary(DistributedFileSystem.java:232) at org.apache.hadoop.hive.ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask(ConditionalResolverCommonJoin.java:185) at org.apache.hadoop.hive.ql.plan.ConditionalResolverCommonJoin.getTasks(ConditionalResolverCommonJoin.java:117) at org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:55) group by after join operation product no result when hive.optimize.skewjoin = true Key: HIVE-5888 URL: https://issues.apache.org/jira/browse/HIVE-5888 Project: Hive Issue Type: Bug Affects Versions: 0.11.0, 0.12.0 Reporter: cyril liao Priority: Critical -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6041) Incorrect task dependency graph for skewed join optimization
[ https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13913950#comment-13913950 ] Muthu commented on HIVE-6041: - This patch doesn't seems to work for hive 0.12 for queries with auto MAPJOIN. set hive.optimize.skewjoin=true; set hive.auto.convert.join=true; SELECT ru.userid, SUM(ru.total_count) FROM BIGTABLE ru JOIN SMALLTABLE c on c.creative_id = ru.creative_id JOIN placement_dapi p ON p.placement_id = c.placement_id WHERE ru.realdate = '2014-01-02' AND ru.userid 0 GROUP BY ru.userid; Stage-1 is selected by condition resolver. java.io.FileNotFoundException: java.io.FileNotFoundException: File does not exist: /tmp/hive-muthu.nivas/tmp/hive-muthu.nivas/hive_2014-02-26_18-17-04_075_3879899075227148508-1/-mr-10002 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58) at org.apache.hadoop.hdfs.DFSClient.getContentSummary(DFSClient.java:917) at org.apache.hadoop.hdfs.DistributedFileSystem.getContentSummary(DistributedFileSystem.java:232) at org.apache.hadoop.hive.ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask(ConditionalResolverCommonJoin.java:185) at org.apache.hadoop.hive.ql.plan.ConditionalResolverCommonJoin.getTasks(ConditionalResolverCommonJoin.java:117) at org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:55) Incorrect task dependency graph for skewed join optimization Key: HIVE-6041 URL: https://issues.apache.org/jira/browse/HIVE-6041 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0 Environment: Hadoop 1.0.3 Reporter: Adrian Popescu Assignee: Navis Priority: Critical Fix For: 0.13.0 Attachments: HIVE-6041.1.patch.txt The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query (e.g., move stage which writes down the results into the result table) are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. -- This message was sent by Atlassian JIRA (v6.1.5#6160)