Re: Hive query problem on S3 table

Dean Wampler Thu, 18 Apr 2013 07:37:18 -0700

I'm not sure what's happening here, but one suggestion; use s3n://...
instead of s3://... The "new" version is supposed to provide better
performance.


dean


On Thu, Apr 18, 2013 at 8:43 AM, Tim Bittersohl <t...@innoplexia.com> wrote:

> Hi,****
>
> ** **
>
> I just found out, that I don't have to change the default file system of
> Hadoop.****
>
> The location in the create table command has just to be changed:****
>
> ** **
>
> CREATE EXTERNAL TABLE testtable(nyseVal STRING, cliVal STRING, dateVal
> STRING, number1Val STRING)****
>
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t'****
>
> LINES TERMINATED BY '\\n'****
>
> STORED AS TextFile LOCATION "s3://hadoop-bucket/data/"****
>
> ** **
>
> ** **
>
> But when I try to access the table with a command that creates a Hadoop
> job, I get the following error:****
>
> ** **
>
> 13/04/18 15:29:36 ERROR security.UserGroupInformation:
> PriviledgedActionException as:tim (auth:SIMPLE)
> cause:java.io.FileNotFoundException: File does not exist:
> /data/NYSE_daily.txt****
>
> java.io.FileNotFoundException: File does not exist: /data/NYSE_daily.txt**
> **
>
>                 at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:807)
> ****
>
>                 at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:462)
> ****
>
>                 at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256)
> ****
>
>                 at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212)
> ****
>
>                 at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:411)
> ****
>
>                 at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:377)
> ****
>
>                 at
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387)
> ****
>
>                 at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1091)****
>
>                 at
> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1083)****
>
>                 at
> org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)****
>
>                 at
> org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:993)****
>
>                 at
> org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:946)****
>
>                 at java.security.AccessController.doPrivileged(Native
> Method)****
>
>                 at javax.security.auth.Subject.doAs(Subject.java:415)****
>
>                 at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> ****
>
>                 at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:946)**
> **
>
>                 at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:920)****
>
>                 at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)****
>
>                 at
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)****
>
>                 at
> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)****
>
>                 at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
> ****
>
>                 at
> org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)****
>
>                 at
> org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)****
>
>                 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)**
> **
>
>                 at
> org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198)
> ****
>
>                 at
> org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:644)
> ****
>
>                 at
> org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:628)
> ****
>
>                 at
> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)****
>
>                 at
> org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)****
>
>                 at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> ****
>
>                 at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> ****
>
>                 at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> ****
>
>                 at java.lang.Thread.run(Thread.java:722)****
>
> Job Submission failed with exception 'java.io.FileNotFoundException(File
> does not exist: /data/NYSE_daily.txt)'****
>
> 13/04/18 15:29:36 ERROR exec.Task: Job Submission failed with exception
> 'java.io.FileNotFoundException(File does not exist: /data/NYSE_daily.txt)'
> ****
>
> java.io.FileNotFoundException: File does not exist: /data/NYSE_daily.txt**
> **
>
>                 at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:807)
> ****
>
>                 at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:462)
> ****
>
>                 at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256)
> ****
>
>                 at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212)
> ****
>
>                 at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:411)
> ****
>
>                 at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:377)
> ****
>
>                 at
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387)
> ****
>
>                 at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1091)****
>
>                 at
> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1083)****
>
>                 at
> org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)****
>
>                 at
> org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:993)****
>
>                 at
> org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:946)****
>
>                 at java.security.AccessController.doPrivileged(Native
> Method)****
>
>                 at javax.security.auth.Subject.doAs(Subject.java:415)****
>
>                 at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> ****
>
>                 at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:946)**
> **
>
>                 at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:920)****
>
>                 at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)****
>
>                 at
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)****
>
>                 at
> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)****
>
>                 at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
> ****
>
>                 at
> org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)****
>
>                 at
> org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)****
>
>                 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)**
> **
>
>                 at
> org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198)
> ****
>
>                 at
> org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:644)
> ****
>
>                 at
> org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:628)
> ****
>
>                 at
> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)****
>
>                 at
> org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)****
>
>                 at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> ****
>
>                 at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> ****
>
>                 at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> ****
>
>                 at java.lang.Thread.run(Thread.java:722)****
>
> ** **
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MapRedTask****
>
> 13/04/18 15:29:36 ERROR ql.Driver: FAILED: Execution Error, return code 1
> from org.apache.hadoop.hive.ql.exec.MapRedTask****
>
> ** **
>
> ** **
>
> In the internet I found the hint to set the this configuration, to solve
> the problem:****
>
> ** **
>
> hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat****
>
> ** **
>
> But I just get a RuntimeException doing so:****
>
> ** **
>
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.io.HiveInputFormat
> ****
>
>                 at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:333)****
>
>                 at
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)****
>
>                 at
> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)****
>
>                 at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
> ****
>
>                 at
> org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)****
>
>                 at
> org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)****
>
>                 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)**
> **
>
>                 at
> org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198)
> ****
>
>                 at
> org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:644)
> ****
>
>                 at
> org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:628)
> ****
>
>                 at
> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)****
>
>                 at
> org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)****
>
>                 at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> ****
>
>                 at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> ****
>
>                 at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> ****
>
>                 at java.lang.Thread.run(Thread.java:722)****
>
> 13/04/18 15:37:14 ERROR exec.ExecDriver: Exception:
> org.apache.hadoop.hive.ql.io.HiveInputFormat               ****
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MapRedTask****
>
> 13/04/18 15:37:14 ERROR ql.Driver: FAILED: Execution Error, return code 1
> from org.apache.hadoop.hive.ql.exec.MapRedTask****
>
> ** **
>
> ** **
>
> I’m using the Cloudera “0.10.0-cdh4.2.0” version of the Hive libraries.***
> *
>
> ** **
>
> Greetings****
>
> Tim Bittersohl ****
>
> Software Engineer ****
>
>
> [image: http://www.innoplexia.de/ci/logo/inno_logo_links%20200x80.png]
>
> Innoplexia GmbH
> Mannheimer Str. 175 ****
>
> 69123 Heidelberg
>
> Tel.: +49 (0) 6221 7198033
> Mobiltel.: +49 (0) 160 99186759
> Fax: +49 (0) 6221 7198034
> Web: www.innoplexia.com
>
> Sitz: 69123 Heidelberg, Mannheimer Str. 175 - Steuernummer 32494/62606 -
> USt. IdNr.: DE 272 871 728 - Geschäftsführer: Prof. Dr. Herbert Schuster *
> ***
>
> ** **
>



-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com

<<image001.png>>

Re: Hive query problem on S3 table

Reply via email to