[jira] [Commented] (HIVE-7217) Inner join query fails in the reducer when join key file is spilled to tmp by RowContainer

2014-06-12 Thread Muthu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030226#comment-14030226
 ] 

Muthu commented on HIVE-7217:
-

[~navis] Could you share more details about the issue.  Appreciate if you could 
create a separate patch for 0.13, I could help in testing the patch.

Thanks,
Muthu

 Inner join query fails in the reducer when join key file is spilled to tmp by 
 RowContainer
 --

 Key: HIVE-7217
 URL: https://issues.apache.org/jira/browse/HIVE-7217
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.13.1
Reporter: Muthu
 Attachments: reducer.log


 {code}
 SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON 
 T1.video_id = T2.video_id WHERE T1.hourid=389567
 hive show create table video;
 OK
 CREATE  TABLE `video`(
   `video_id` int,
   `video_title` string,
 )
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\t'
   LINES TERMINATED BY '\n'
 STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
 OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
 LOCATION
   'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video'
 TBLPROPERTIES (
   'numPartitions'='0',
   'numFiles'='1',
   'last_modified_by'='hadoop',
   'last_modified_time'='1336446601',
   'COLUMN_STATS_ACCURATE'='true',
   'transient_lastDdlTime'='1402514051',
   'numRows'='0',
   'totalSize'='586773666',
   'rawDataSize'='0')
 Time taken: 0.249 seconds, Fetched: 98 row(s)
 {code}
 The reducer fails with the following exception:
 {code}
 2014-06-11 12:32:39,051 INFO 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 16000 rows for 
 join key [663184]
 2014-06-11 12:32:39,061 INFO 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created 
 temp file 
 /mnt/volume2/mapred/local/taskTracker/muthu.nivas/jobcache/job_201405301214_170634/attempt_201405301214_170634_r_00_0/work/tmp/hive-rowcontainer413460656723947992/RowContainer1053550561043043830.tmp
 2014-06-11 12:32:39,237 INFO org.apache.hadoop.mapred.FileInputFormat: Total 
 input paths to process : 2
 2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.io.IOException: 
 hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
  not a SequenceFile
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
   at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.io.IOException: 
 hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
  not a SequenceFile
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
   ... 7 more
 Caused by: java.io.IOException: 
 hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
  not a SequenceFile
   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728)
   at 
 org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43)
   at 
 org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226)
   ... 12 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7217) Inner join query fails in the reducer

2014-06-11 Thread Muthu (JIRA)
Muthu created HIVE-7217:
---

 Summary: Inner join query fails in the reducer
 Key: HIVE-7217
 URL: https://issues.apache.org/jira/browse/HIVE-7217
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1, 0.13.0
Reporter: Muthu


SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id 
= T2.video_id WHERE T1.hourid=389567

hive show create table video;
OK
CREATE  TABLE `video`(
  `video_id` int,
  `video_title` string,
)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video'
TBLPROPERTIES (
  'numPartitions'='0',
  'numFiles'='1',
  'last_modified_by'='hadoop',
  'last_modified_time'='1336446601',
  'COLUMN_STATS_ACCURATE'='true',
  'transient_lastDdlTime'='1402514051',
  'numRows'='0',
  'totalSize'='586773666',
  'rawDataSize'='0')
Time taken: 0.249 seconds, Fetched: 98 row(s)

The reducer fails with the following exception:
2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: 
hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
 not a SequenceFile
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: 
hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
 not a SequenceFile
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
at 
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
... 7 more
Caused by: java.io.IOException: 
hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
 not a SequenceFile
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805)
at 
org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
at 
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714)
at 
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728)
at 
org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43)
at 
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226)
... 12 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7217) Inner join query fails in the reducer

2014-06-11 Thread Muthu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muthu updated HIVE-7217:


Attachment: reducer.log

 Inner join query fails in the reducer
 -

 Key: HIVE-7217
 URL: https://issues.apache.org/jira/browse/HIVE-7217
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.13.1
Reporter: Muthu
 Attachments: reducer.log


 SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON 
 T1.video_id = T2.video_id WHERE T1.hourid=389567
 hive show create table video;
 OK
 CREATE  TABLE `video`(
   `video_id` int,
   `video_title` string,
 )
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\t'
   LINES TERMINATED BY '\n'
 STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
 OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
 LOCATION
   'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video'
 TBLPROPERTIES (
   'numPartitions'='0',
   'numFiles'='1',
   'last_modified_by'='hadoop',
   'last_modified_time'='1336446601',
   'COLUMN_STATS_ACCURATE'='true',
   'transient_lastDdlTime'='1402514051',
   'numRows'='0',
   'totalSize'='586773666',
   'rawDataSize'='0')
 Time taken: 0.249 seconds, Fetched: 98 row(s)
 The reducer fails with the following exception:
 2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.io.IOException: 
 hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
  not a SequenceFile
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
   at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.io.IOException: 
 hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
  not a SequenceFile
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
   ... 7 more
 Caused by: java.io.IOException: 
 hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
  not a SequenceFile
   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728)
   at 
 org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43)
   at 
 org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226)
   ... 12 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7217) Inner join query fails in the reducer when join key file is spilled to tmp by RowContainer

2014-06-11 Thread Muthu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muthu updated HIVE-7217:


Summary: Inner join query fails in the reducer when join key file is 
spilled to tmp by RowContainer  (was: Inner join query fails in the reducer)

 Inner join query fails in the reducer when join key file is spilled to tmp by 
 RowContainer
 --

 Key: HIVE-7217
 URL: https://issues.apache.org/jira/browse/HIVE-7217
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.13.1
Reporter: Muthu
 Attachments: reducer.log


 SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON 
 T1.video_id = T2.video_id WHERE T1.hourid=389567
 hive show create table video;
 OK
 CREATE  TABLE `video`(
   `video_id` int,
   `video_title` string,
 )
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\t'
   LINES TERMINATED BY '\n'
 STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
 OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
 LOCATION
   'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video'
 TBLPROPERTIES (
   'numPartitions'='0',
   'numFiles'='1',
   'last_modified_by'='hadoop',
   'last_modified_time'='1336446601',
   'COLUMN_STATS_ACCURATE'='true',
   'transient_lastDdlTime'='1402514051',
   'numRows'='0',
   'totalSize'='586773666',
   'rawDataSize'='0')
 Time taken: 0.249 seconds, Fetched: 98 row(s)
 The reducer fails with the following exception:
 2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.io.IOException: 
 hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
  not a SequenceFile
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
   at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.io.IOException: 
 hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
  not a SequenceFile
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
   ... 7 more
 Caused by: java.io.IOException: 
 hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
  not a SequenceFile
   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728)
   at 
 org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43)
   at 
 org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226)
   ... 12 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7217) Inner join query fails in the reducer when join key file is spilled to tmp by RowContainer

2014-06-11 Thread Muthu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muthu updated HIVE-7217:


Description: 
SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id 
= T2.video_id WHERE T1.hourid=389567

hive show create table video;
OK
CREATE  TABLE `video`(
  `video_id` int,
  `video_title` string,
)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video'
TBLPROPERTIES (
  'numPartitions'='0',
  'numFiles'='1',
  'last_modified_by'='hadoop',
  'last_modified_time'='1336446601',
  'COLUMN_STATS_ACCURATE'='true',
  'transient_lastDdlTime'='1402514051',
  'numRows'='0',
  'totalSize'='586773666',
  'rawDataSize'='0')
Time taken: 0.249 seconds, Fetched: 98 row(s)

The reducer fails with the following exception:
2014-06-11 12:32:39,051 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: 
table 0 has 16000 rows for join key [663184]
2014-06-11 12:32:39,061 INFO 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created 
temp file 
/mnt/volume2/mapred/local/taskTracker/muthu.nivas/jobcache/job_201405301214_170634/attempt_201405301214_170634_r_00_0/work/tmp/hive-rowcontainer413460656723947992/RowContainer1053550561043043830.tmp
2014-06-11 12:32:39,237 INFO org.apache.hadoop.mapred.FileInputFormat: Total 
input paths to process : 2
2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: 
hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
 not a SequenceFile
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: 
hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
 not a SequenceFile
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
at 
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
... 7 more
Caused by: java.io.IOException: 
hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
 not a SequenceFile
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805)
at 
org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
at 
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714)
at 
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728)
at 
org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43)
at 
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226)
... 12 more

  was:
SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id 
= T2.video_id WHERE T1.hourid=389567

hive show create table video;
OK
CREATE  TABLE `video`(
  `video_id` int,
  `video_title` string,
)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video'
TBLPROPERTIES (
  'numPartitions'='0',
  'numFiles'='1',
  'last_modified_by'='hadoop',
  'last_modified_time'='1336446601',
  'COLUMN_STATS_ACCURATE'='true',
  'transient_lastDdlTime'='1402514051',
  'numRows'='0',
  'totalSize'='586773666',
  'rawDataSize'='0')
Time taken: 0.249 seconds, Fetched: 98 row(s)

The reducer fails with the following exception:
2014-06-11 12:32:39,299 

[jira] [Commented] (HIVE-5888) group by after join operation product no result when hive.optimize.skewjoin = true

2014-04-08 Thread Muthu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963454#comment-13963454
 ] 

Muthu commented on HIVE-5888:
-

[~navis] After applying the patch from HIVE-6041 to hive 0.12, queries with 
auto MAPJOIN fails with the following error:  Any workarounds?
set hive.optimize.skewjoin=true; set hive.auto.convert.join=true; SELECT 
ru.userid, SUM(ru.total_count) FROM BIGTABLE ru JOIN SMALLTABLE c on 
c.creative_id = ru.creative_id JOIN placement_dapi p ON p.placement_id = 
c.placement_id WHERE ru.realdate = '2014-01-02' AND ru.userid  0 GROUP BY 
ru.userid;

Stage-1 is selected by condition resolver.
java.io.FileNotFoundException: java.io.FileNotFoundException: File does not 
exist: 
/tmp/hive-muthu.nivas/tmp/hive-muthu.nivas/hive_2014-02-26_18-17-04_075_3879899075227148508-1/-mr-10002
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96)
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58)
at org.apache.hadoop.hdfs.DFSClient.getContentSummary(DFSClient.java:917)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getContentSummary(DistributedFileSystem.java:232)
at 
org.apache.hadoop.hive.ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask(ConditionalResolverCommonJoin.java:185)
at 
org.apache.hadoop.hive.ql.plan.ConditionalResolverCommonJoin.getTasks(ConditionalResolverCommonJoin.java:117)
at 
org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:55)


 group by after join operation product no result when  hive.optimize.skewjoin 
 = true 
 

 Key: HIVE-5888
 URL: https://issues.apache.org/jira/browse/HIVE-5888
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.12.0
Reporter: cyril liao
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6041) Incorrect task dependency graph for skewed join optimization

2014-02-26 Thread Muthu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13913950#comment-13913950
 ] 

Muthu commented on HIVE-6041:
-

This patch doesn't seems to work for hive 0.12 for queries with auto MAPJOIN.
set hive.optimize.skewjoin=true; set hive.auto.convert.join=true; SELECT 
ru.userid, SUM(ru.total_count) FROM BIGTABLE ru JOIN SMALLTABLE c on 
c.creative_id = ru.creative_id JOIN placement_dapi p ON p.placement_id = 
c.placement_id WHERE ru.realdate = '2014-01-02' AND ru.userid  0 GROUP BY 
ru.userid;

Stage-1 is selected by condition resolver.
java.io.FileNotFoundException: java.io.FileNotFoundException: File does not 
exist: 
/tmp/hive-muthu.nivas/tmp/hive-muthu.nivas/hive_2014-02-26_18-17-04_075_3879899075227148508-1/-mr-10002
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96)
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58)
at 
org.apache.hadoop.hdfs.DFSClient.getContentSummary(DFSClient.java:917)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getContentSummary(DistributedFileSystem.java:232)
at 
org.apache.hadoop.hive.ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask(ConditionalResolverCommonJoin.java:185)
at 
org.apache.hadoop.hive.ql.plan.ConditionalResolverCommonJoin.getTasks(ConditionalResolverCommonJoin.java:117)
at 
org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:55)

 Incorrect task dependency graph for skewed join optimization
 

 Key: HIVE-6041
 URL: https://issues.apache.org/jira/browse/HIVE-6041
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0
 Environment: Hadoop 1.0.3
Reporter: Adrian Popescu
Assignee: Navis
Priority: Critical
 Fix For: 0.13.0

 Attachments: HIVE-6041.1.patch.txt


 The dependency graph among task stages is incorrect for the skewed join 
 optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. 
 For the case that skewed keys do not exist, all the tasks following the 
 common join are filtered out at runtime.
 In particular, the conditional task in the optimized plan maintains no 
 dependency with the child tasks of the common join task in the original plan. 
 The conditional task is composed of the map join task which maintains all 
 these dependencies, but for the case the map join task is filtered out (i.e., 
 no skewed keys exist), all these dependencies are lost. Hence, all the other 
 task stages of the query (e.g., move stage which writes down the results into 
 the result table) are skipped.
 The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, 
 processSkewJoin() function, immediately after the ConditionalTask is created 
 and its dependencies are set.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)