Hi Suman, We've seen this happen due to a bug in Hive's CombineHiveInputFormat. Try disabling that before querying by issuing:
SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; HTH, Joe On Fri, Aug 24, 2012 at 4:43 PM, <suman.adda...@sanofipasteur.com> wrote: > Hi,**** > > I have setup a Hadoop cluster on Amazon EC2 with my data stored on S3. I > would like to use Hive to process the data on S3.**** > > ** ** > > I created an external table in hive using the following:**** > > CREATE EXTERNAL TABLE mytable1**** > > (**** > > HIT_TIME_GMT string,**** > > SERVICE string**** > > ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'**** > > LOCATION 's3n://com.xxxxx.webanalytics/hive/';**** > > ** ** > > I loaded a few records into the table (LOAD DATA LOCAL INPATH > '/home/ubuntu/data/play/test' INTO TABLE mytable1;) .**** > > ** ** > > Select * from mytable1; shows me the data in the table.**** > > ** ** > > When I try to run the query which requires a map-reduce job to be run, for > example, select count(*) from mytable1; I see an exception thrown.**** > > Total MapReduce jobs = 1**** > > Launching Job 1 out of 1**** > > Number of reduce tasks determined at compile time: 1**** > > In order to change the average load for a reducer (in bytes):**** > > set hive.exec.reducers.bytes.per.reducer=<number>**** > > In order to limit the maximum number of reducers:**** > > set hive.exec.reducers.max=<number>**** > > In order to set a constant number of reducers:**** > > set mapred.reduce.tasks=<number>**** > > java.io.FileNotFoundException: File does not exist: /hive/test**** > > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:527) > **** > > at > org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:462) > **** > > at > org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256) > **** > > at > org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212) > **** > > at > org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:347) > **** > > at > org.apache.hadoop.hive.shims.Hadoop20SShims$CombineFileInputFormatShim.getSplits(Hadoop20SShims.java:313) > **** > > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:377) > **** > > at > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1026)**** > > at > org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1018)**** > > at > org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)**** > > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:929)*** > * > > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:882)*** > * > > at java.security.AccessController.doPrivileged(Native Method)**** > > at javax.security.auth.Subject.doAs(Subject.java:415)**** > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) > **** > > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:882)** > ** > > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:856) > **** > > at > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:671)**** > > at > org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)**** > > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131)* > *** > > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) > **** > > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)** > ** > > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)**** > > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)**** > > at > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)**** > > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)**** > > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:516)** > ** > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)**** > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > **** > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > **** > > at java.lang.reflect.Method.invoke(Method.java:601)**** > > at org.apache.hadoop.util.RunJar.main(RunJar.java:197)**** > > Job Submission failed with exception 'java.io.FileNotFoundException(File > does not exist: /hive/test)'**** > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MapRedTask**** > > ** ** > > The file does exist and I can see it on S3. Select * from table is > returning the data in the table. I am not sure what is going wrong when a > map-reduce job is being initiated by the hive query. Any pointer as to > where I went wrong? Appreciate your help.**** > > ** ** > > Thank you**** > > Suman**** >