Pig should support this syntax. Do you share your s3 data to public? Otherwise do you have fs.s3.awsAccessKeyId/fs.s3.awsSecretAccessKey defined?
Daniel On Thu, Dec 1, 2011 at 4:27 PM, Ayon Sinha <[email protected]> wrote: > Well, I should not need Pig to connect to HDFS. Its should use S3, so I > changed fs.default.name to > s3n://<mybucketname> and now I get the Grunt prompt. > > The next problem I'm facing is when I say, > a = load 's3n://<mydatabucket>/blah/foo/day=20111127' using PigStorage(); > > > I get > > 2011-12-01 16:22:01,948 [main] WARN > org.jets3t.service.impl.rest.httpclient.RestS3Service - Response > '/user%2Fmymapred-user' - Unexpected response code 404, expected 200 > 2011-12-01 16:22:02,024 [main] WARN > org.jets3t.service.impl.rest.httpclient.RestS3Service - Response > '/user%2Fmymapred-user_%24folder%24' - Unexpected response code 404, > expected 200 > 2011-12-01 16:22:02,038 [main] WARN > org.jets3t.service.impl.rest.httpclient.RestS3Service - Response '/' - > Unexpected response code 404, expected 200 > 2011-12-01 16:22:02,038 [main] WARN > org.jets3t.service.impl.rest.httpclient.RestS3Service - Response '/' - > Received error response with XML message > 2011-12-01 16:22:02,045 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 6007: Unable to check name s3n://<mybucketname/user/mymapred-user > > > What is it trying to check? Does it need some storage to write > intermediate files to? > > -Ayon > See My Photos on Flickr > Also check out my Blog for answers to commonly asked questions. > > > > ________________________________ > From: Jonathan Coveney <[email protected]> > To: [email protected]; Ayon Sinha <[email protected]> > Sent: Thursday, December 1, 2011 4:17 PM > Subject: Re: Trying to submit Pig job to Amazon EMR > > > Usually this means that the version of Hadoop in pig mismatches with the > version of Hadoop you're running. I'd do ant jar-withouthadoop and point it > at the HAdoop on EC2 using the hadoopless pig jar > > > 2011/12/1 Ayon Sinha <[email protected]> > > Hi, > >I have a EC2 box setup with Pig 0.8.1 which can run my jobs fine in local > mode. So now I want to configure the NN & JT such that the job goes to the > EMR cluster I've spun up. > >I have a local pigconf directory with the Hadoop XML files and pointed > HADOOP_CONF_DIR and PIG_CLASSPATH set to it. > > > >in core-site.xml I have > > > > <property> > > <name>fs.default.name</name> > > <value>hdfs://10.116.83.74:9000</value> > > </property> > > > > > >On mapred-site.xml I have: > ><configuration> > > <property> > > <name>mapred.job.tracker</name> > > <value>10.116.83.74:9001</value> > > </property> > > > > > >Now Pig tries to connect and I get > >2011-12-01 16:10:58,009 [main] INFO org.apache.pig.Main - Logging error > messages to: /home/mashlogic/ayon/pigconf/pig_1322784657959.log > >2011-12-01 16:10:58,950 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > Connecting to hadoop file system at: hdfs://10.116.83.74:9000 > >2011-12-01 16:10:59,814 [main] ERROR org.apache.pig.Main - ERROR 2999: > Unexpected internal error. Failed to create DataStorage > > > > > >log file says: > > > >Error before Pig is launched > >---------------------------- > >ERROR 2999: Unexpected internal error. Failed to create DataStorage > > > >java.lang.RuntimeException: Failed to create DataStorage > >at > org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75) > >at > org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58) > >at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214) > >at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134) > >at org.apache.pig.impl.PigContext.connect(PigContext.java:183) > >at org.apache.pig.PigServer.<init>(PigServer.java:226) > >at org.apache.pig.PigServer.<init>(PigServer.java:215) > >at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:55) > >at org.apache.pig.Main.run(Main.java:452) > >at org.apache.pig.Main.main(Main.java:107) > >Caused by: java.io.IOException: Call to /10.116.83.74:9000 failed on > local exception: java.io.EOFException > >at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142) > >at org.apache.hadoop.ipc.Client.call(Client.java:1110) > >at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) > >at $Proxy0.getProtocolVersion(Unknown Source) > >at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398) > >at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384) > >at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111) > >at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:213) > >at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:180) > >at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) > >at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514) > >at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) > >at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548) > >at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530) > >at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228) > >at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:111) > >at > org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72) > >... 9 more > >Caused by: java.io.EOFException > >at java.io.DataInputStream.readInt(DataInputStream.java:375) > >at > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:815) > >at org.apache.hadoop.ipc.Client$Connection.run(Client.java:724) > > >================================================================================ > > > >My EMR is running Hive jobs just fine. So if I can get it to run my Pig > jobs, I'll be happy. > > > >-Ayon > >See My Photos on Flickr > >Also check out my Blog for answers to commonly asked questions. > > >
