I'm not sure what's happening here, but one suggestion; use s3n://... instead of s3://... The "new" version is supposed to provide better performance.
dean On Thu, Apr 18, 2013 at 8:43 AM, Tim Bittersohl <t...@innoplexia.com> wrote: > Hi,**** > > ** ** > > I just found out, that I don't have to change the default file system of > Hadoop.**** > > The location in the create table command has just to be changed:**** > > ** ** > > CREATE EXTERNAL TABLE testtable(nyseVal STRING, cliVal STRING, dateVal > STRING, number1Val STRING)**** > > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t'**** > > LINES TERMINATED BY '\\n'**** > > STORED AS TextFile LOCATION "s3://hadoop-bucket/data/"**** > > ** ** > > ** ** > > But when I try to access the table with a command that creates a Hadoop > job, I get the following error:**** > > ** ** > > 13/04/18 15:29:36 ERROR security.UserGroupInformation: > PriviledgedActionException as:tim (auth:SIMPLE) > cause:java.io.FileNotFoundException: File does not exist: > /data/NYSE_daily.txt**** > > java.io.FileNotFoundException: File does not exist: /data/NYSE_daily.txt** > ** > > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:807) > **** > > at > org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:462) > **** > > at > org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256) > **** > > at > org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212) > **** > > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:411) > **** > > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:377) > **** > > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387) > **** > > at > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1091)**** > > at > org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1083)**** > > at > org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)**** > > at > org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:993)**** > > at > org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:946)**** > > at java.security.AccessController.doPrivileged(Native > Method)**** > > at javax.security.auth.Subject.doAs(Subject.java:415)**** > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > **** > > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:946)** > ** > > at > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:920)**** > > at > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)**** > > at > org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)**** > > at > org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)**** > > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) > **** > > at > org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)**** > > at > org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)**** > > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)** > ** > > at > org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198) > **** > > at > org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:644) > **** > > at > org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:628) > **** > > at > org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)**** > > at > org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)**** > > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) > **** > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > **** > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > **** > > at java.lang.Thread.run(Thread.java:722)**** > > Job Submission failed with exception 'java.io.FileNotFoundException(File > does not exist: /data/NYSE_daily.txt)'**** > > 13/04/18 15:29:36 ERROR exec.Task: Job Submission failed with exception > 'java.io.FileNotFoundException(File does not exist: /data/NYSE_daily.txt)' > **** > > java.io.FileNotFoundException: File does not exist: /data/NYSE_daily.txt** > ** > > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:807) > **** > > at > org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:462) > **** > > at > org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256) > **** > > at > org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212) > **** > > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:411) > **** > > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:377) > **** > > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387) > **** > > at > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1091)**** > > at > org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1083)**** > > at > org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)**** > > at > org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:993)**** > > at > org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:946)**** > > at java.security.AccessController.doPrivileged(Native > Method)**** > > at javax.security.auth.Subject.doAs(Subject.java:415)**** > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > **** > > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:946)** > ** > > at > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:920)**** > > at > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)**** > > at > org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)**** > > at > org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)**** > > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) > **** > > at > org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)**** > > at > org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)**** > > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)** > ** > > at > org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198) > **** > > at > org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:644) > **** > > at > org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:628) > **** > > at > org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)**** > > at > org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)**** > > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) > **** > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > **** > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > **** > > at java.lang.Thread.run(Thread.java:722)**** > > ** ** > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MapRedTask**** > > 13/04/18 15:29:36 ERROR ql.Driver: FAILED: Execution Error, return code 1 > from org.apache.hadoop.hive.ql.exec.MapRedTask**** > > ** ** > > ** ** > > In the internet I found the hint to set the this configuration, to solve > the problem:**** > > ** ** > > hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat**** > > ** ** > > But I just get a RuntimeException doing so:**** > > ** ** > > java.lang.RuntimeException: org.apache.hadoop.hive.ql.io.HiveInputFormat > **** > > at > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:333)**** > > at > org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)**** > > at > org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)**** > > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) > **** > > at > org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)**** > > at > org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)**** > > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)** > ** > > at > org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198) > **** > > at > org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:644) > **** > > at > org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:628) > **** > > at > org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)**** > > at > org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)**** > > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) > **** > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > **** > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > **** > > at java.lang.Thread.run(Thread.java:722)**** > > 13/04/18 15:37:14 ERROR exec.ExecDriver: Exception: > org.apache.hadoop.hive.ql.io.HiveInputFormat **** > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MapRedTask**** > > 13/04/18 15:37:14 ERROR ql.Driver: FAILED: Execution Error, return code 1 > from org.apache.hadoop.hive.ql.exec.MapRedTask**** > > ** ** > > ** ** > > I’m using the Cloudera “0.10.0-cdh4.2.0” version of the Hive libraries.*** > * > > ** ** > > Greetings**** > > Tim Bittersohl **** > > Software Engineer **** > > > [image: http://www.innoplexia.de/ci/logo/inno_logo_links%20200x80.png] > > Innoplexia GmbH > Mannheimer Str. 175 **** > > 69123 Heidelberg > > Tel.: +49 (0) 6221 7198033 > Mobiltel.: +49 (0) 160 99186759 > Fax: +49 (0) 6221 7198034 > Web: www.innoplexia.com > > Sitz: 69123 Heidelberg, Mannheimer Str. 175 - Steuernummer 32494/62606 - > USt. IdNr.: DE 272 871 728 - Geschäftsführer: Prof. Dr. Herbert Schuster * > *** > > ** ** > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com
<<image001.png>>