Re: Trying to submit Pig job to Amazon EMR

Daniel Dai Fri, 02 Dec 2011 01:06:52 -0800

Pig should support this syntax. Do you share your s3 data to public?
Otherwise do you have fs.s3.awsAccessKeyId/fs.s3.awsSecretAccessKey defined?


Daniel

On Thu, Dec 1, 2011 at 4:27 PM, Ayon Sinha <[email protected]> wrote:

> Well, I should not need Pig to connect to HDFS. Its should use S3, so I
> changed fs.default.name to
> s3n://<mybucketname> and now I get the Grunt prompt.
>
> The next problem I'm facing is when I say,
> a = load 's3n://<mydatabucket>/blah/foo/day=20111127' using PigStorage();
>
>
> I get
>
> 2011-12-01 16:22:01,948 [main] WARN
>  org.jets3t.service.impl.rest.httpclient.RestS3Service - Response
> '/user%2Fmymapred-user' - Unexpected response code 404, expected 200
> 2011-12-01 16:22:02,024 [main] WARN
>  org.jets3t.service.impl.rest.httpclient.RestS3Service - Response
> '/user%2Fmymapred-user_%24folder%24' - Unexpected response code 404,
> expected 200
> 2011-12-01 16:22:02,038 [main] WARN
>  org.jets3t.service.impl.rest.httpclient.RestS3Service - Response '/' -
> Unexpected response code 404, expected 200
> 2011-12-01 16:22:02,038 [main] WARN
>  org.jets3t.service.impl.rest.httpclient.RestS3Service - Response '/' -
> Received error response with XML message
> 2011-12-01 16:22:02,045 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 6007: Unable to check name s3n://<mybucketname/user/mymapred-user
>
>
> What is it trying to check? Does it need some storage to write
> intermediate files to?
>
> -Ayon
> See My Photos on Flickr
> Also check out my Blog for answers to commonly asked questions.
>
>
>
> ________________________________
>  From: Jonathan Coveney <[email protected]>
> To: [email protected]; Ayon Sinha <[email protected]>
> Sent: Thursday, December 1, 2011 4:17 PM
> Subject: Re: Trying to submit Pig job to Amazon EMR
>
>
> Usually this means that the version of Hadoop in pig mismatches with the
> version of Hadoop you're running. I'd do ant jar-withouthadoop and point it
> at the HAdoop on EC2 using the hadoopless pig jar
>
>
> 2011/12/1 Ayon Sinha <[email protected]>
>
> Hi,
> >I have a EC2 box setup with Pig 0.8.1 which can run my jobs fine in local
> mode. So now I want to configure the NN & JT such that the job goes to the
> EMR cluster I've spun up.
> >I have a local pigconf directory with the Hadoop XML files and pointed
> HADOOP_CONF_DIR and PIG_CLASSPATH set to it.
> >
> >in core-site.xml I have
> >
> > <property>
> >    <name>fs.default.name</name>
> >    <value>hdfs://10.116.83.74:9000</value>
> >  </property>
> >
> >
> >On mapred-site.xml I have:
> ><configuration>
> >  <property>
> >    <name>mapred.job.tracker</name>
> >    <value>10.116.83.74:9001</value>
> >  </property>
> >
> >
> >Now Pig tries to connect and I get
> >2011-12-01 16:10:58,009 [main] INFO  org.apache.pig.Main - Logging error
> messages to: /home/mashlogic/ayon/pigconf/pig_1322784657959.log
> >2011-12-01 16:10:58,950 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: hdfs://10.116.83.74:9000
> >2011-12-01 16:10:59,814 [main] ERROR org.apache.pig.Main - ERROR 2999:
> Unexpected internal error. Failed to create DataStorage
> >
> >
> >log file says:
> >
> >Error before Pig is launched
> >----------------------------
> >ERROR 2999: Unexpected internal error. Failed to create DataStorage
> >
> >java.lang.RuntimeException: Failed to create DataStorage
> >at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
> >at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
> >at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214)
> >at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)
> >at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
> >at org.apache.pig.PigServer.<init>(PigServer.java:226)
> >at org.apache.pig.PigServer.<init>(PigServer.java:215)
> >at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:55)
> >at org.apache.pig.Main.run(Main.java:452)
> >at org.apache.pig.Main.main(Main.java:107)
> >Caused by: java.io.IOException: Call to /10.116.83.74:9000 failed on
> local exception: java.io.EOFException
> >at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142)
> >at org.apache.hadoop.ipc.Client.call(Client.java:1110)
> >at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
> >at $Proxy0.getProtocolVersion(Unknown Source)
> >at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
> >at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
> >at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
> >at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:213)
> >at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:180)
> >at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
> >at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514)
> >at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
> >at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548)
> >at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530)
> >at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
> >at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:111)
> >at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
> >... 9 more
> >Caused by: java.io.EOFException
> >at java.io.DataInputStream.readInt(DataInputStream.java:375)
> >at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:815)
> >at org.apache.hadoop.ipc.Client$Connection.run(Client.java:724)
>
> >================================================================================
> >
> >My EMR is running Hive jobs just fine. So if I can get it to run my Pig
> jobs, I'll be happy.
> >
> >-Ayon
> >See My Photos on Flickr
> >Also check out my Blog for answers to commonly asked questions.
> >
>

Re: Trying to submit Pig job to Amazon EMR

Reply via email to