Yes, I do that the awsSecretAccessKey defined, correct, I believe.
To test:

mashlogic@cruncher ~ [ 8:07AM] hadoop dfs -ls 
s3n://ml-weblogs/smartlinks/daytsvs/day=20111130/
Found 29 items
-rwxrwxrwx   1  139148530 2011-12-01 07:03 
/smartlinks/daytsvs/day=20111130/xaa.tsv.gz
-rwxrwxrwx   1  138086136 2011-12-01 07:03 
/smartlinks/daytsvs/day=20111130/xab.tsv.gz
-rwxrwxrwx   1  146165298 2011-12-01 07:03 
/smartlinks/daytsvs/day=20111130/xac.tsv.gz
-rwxrwxrwx   1  152491197 2011-12-01 07:03 
/smartlinks/daytsvs/day=20111130/xad.tsv.gz
-rwxrwxrwx   1  154673351 2011-12-01 07:03 
/smartlinks/daytsvs/day=20111130/xae.tsv.gz
-rwxrwxrwx   1  155920643 2011-12-01 07:03 
/smartlinks/daytsvs/day=20111130/xaf.tsv.gz
-rwxrwxrwx   1  156468098 2011-12-01 07:03 
/smartlinks/daytsvs/day=20111130/xag.tsv.gz
-rwxrwxrwx   1  157626894 2011-12-01 07:03 
/smartlinks/daytsvs/day=20111130/xah.tsv.gz
-rwxrwxrwx   1  158872953 2011-12-01 07:04 
/smartlinks/daytsvs/day=20111130/xai.tsv.gz
-rwxrwxrwx   1  158108620 2011-12-01 07:04 
/smartlinks/daytsvs/day=20111130/xaj.tsv.gz
-rwxrwxrwx   1  158439002 2011-12-01 07:04 
/smartlinks/daytsvs/day=20111130/xak.tsv.gz
-rwxrwxrwx   1  158618811 2011-12-01 07:04 
/smartlinks/daytsvs/day=20111130/xal.tsv.gz
-rwxrwxrwx   1  159421273 2011-12-01 07:04 
/smartlinks/daytsvs/day=20111130/xam.tsv.gz
-rwxrwxrwx   1  158402981 2011-12-01 07:04 
/smartlinks/daytsvs/day=20111130/xan.tsv.gz
-rwxrwxrwx   1  157375232 2011-12-01 07:04 
/smartlinks/daytsvs/day=20111130/xao.tsv.gz
-rwxrwxrwx   1  158516929 2011-12-01 07:05 
/smartlinks/daytsvs/day=20111130/xap.tsv.gz
-rwxrwxrwx   1  158029022 2011-12-01 07:05 
/smartlinks/daytsvs/day=20111130/xaq.tsv.gz
-rwxrwxrwx   1  159808270 2011-12-01 07:05 
/smartlinks/daytsvs/day=20111130/xar.tsv.gz
-rwxrwxrwx   1  160148777 2011-12-01 07:05 
/smartlinks/daytsvs/day=20111130/xas.tsv.gz
-rwxrwxrwx   1  160844640 2011-12-01 07:05 
/smartlinks/daytsvs/day=20111130/xat.tsv.gz
-rwxrwxrwx   1  161679424 2011-12-01 07:05 
/smartlinks/daytsvs/day=20111130/xau.tsv.gz
-rwxrwxrwx   1  159240120 2011-12-01 07:05 
/smartlinks/daytsvs/day=20111130/xav.tsv.gz
-rwxrwxrwx   1  160124996 2011-12-01 07:06 
/smartlinks/daytsvs/day=20111130/xaw.tsv.gz
-rwxrwxrwx   1  159158447 2011-12-01 07:06 
/smartlinks/daytsvs/day=20111130/xax.tsv.gz
-rwxrwxrwx   1  158436630 2011-12-01 07:06 
/smartlinks/daytsvs/day=20111130/xay.tsv.gz
-rwxrwxrwx   1  158518938 2011-12-01 07:06 
/smartlinks/daytsvs/day=20111130/xaz.tsv.gz
-rwxrwxrwx   1  156520868 2011-12-01 07:06 
/smartlinks/daytsvs/day=20111130/xba.tsv.gz
-rwxrwxrwx   1  154253795 2011-12-01 07:06 
/smartlinks/daytsvs/day=20111130/xbb.tsv.gz
-rwxrwxrwx   1  142244585 2011-12-01 07:06 
/smartlinks/daytsvs/day=20111130/xbc.tsv.gz

 
Trying to run something as simple as 
a = load 's3n://ml-weblogs/smartlinks/daytsvs/day=20111130/' using PigStorage();
s = sample a 0.001;
dump s;

gives 
>ERROR 2999: Unexpected internal error. Failed to create DataStorage
>
>java.lang.RuntimeException: Failed to create DataStorage
>at 
>org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
>at 
>org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
>at 
>org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214)
>at 
>org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)
>at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
>at org.apache.pig.PigServer.<init>(PigServer.java:226)
>at org.apache.pig.PigServer.<init>(PigServer.java:215)
>at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:55)
>at org.apache.pig.Main.run(Main.java:452)
>at org.apache.pig.Main.main(Main.java:107)
>Caused by: java.io.IOException: Call to /10.116.83.74:9000 failed on local 
>exception: java.io.EOFException
>at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142)
>at org.apache.hadoop.ipc.Client.call(Client.java:1110)
>at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>at $Proxy0.getProtocolVersion(Unknown Source)
>at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
>at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
>at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
>at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:213)
>at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:180)
>at 
>org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514)
>at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
>at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548)
>at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530)
>at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
>at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:111)
>at 
>org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
>... 9 more
>Caused by: java.io.EOFException
>at java.io.DataInputStream.readInt(DataInputStream.java:375)
>at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:815)
>at org.apache.hadoop.ipc.Client$Connection.run(Client.java:724)


-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.



________________________________
 From: Daniel Dai <[email protected]>
To: [email protected]; Ayon Sinha <[email protected]> 
Sent: Friday, December 2, 2011 1:06 AM
Subject: Re: Trying to submit Pig job to Amazon EMR
 

Pig should support this syntax. Do you share your s3 data to public? Otherwise 
do you have fs.s3.awsAccessKeyId/fs.s3.awsSecretAccessKey defined?

Daniel


On Thu, Dec 1, 2011 at 4:27 PM, Ayon Sinha <[email protected]> wrote:

Well, I should not need Pig to connect to HDFS. Its should use S3, so I changed 
fs.default.name to 
>s3n://<mybucketname> and now I get the Grunt prompt.
>
>The next problem I'm facing is when I say,
>a = load 's3n://<mydatabucket>/blah/foo/day=20111127' using PigStorage();
>
>
>I get 
>
>2011-12-01 16:22:01,948 [main] WARN  
>org.jets3t.service.impl.rest.httpclient.RestS3Service - Response 
>'/user%2Fmymapred-user' - Unexpected response code 404, expected 200
>2011-12-01 16:22:02,024 [main] WARN  
>org.jets3t.service.impl.rest.httpclient.RestS3Service - Response 
>'/user%2Fmymapred-user_%24folder%24' - Unexpected response code 404, expected 
>200
>2011-12-01 16:22:02,038 [main] WARN  
>org.jets3t.service.impl.rest.httpclient.RestS3Service - Response '/' - 
>Unexpected response code 404, expected 200
>2011-12-01 16:22:02,038 [main] WARN  
>org.jets3t.service.impl.rest.httpclient.RestS3Service - Response '/' - 
>Received error response with XML message
>2011-12-01 16:22:02,045 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
>6007: Unable to check name s3n://<mybucketname/user/mymapred-user
>
>
>What is it trying to check? Does it need some storage to write intermediate 
>files to?
>
> 
>-Ayon
>See My Photos on Flickr
>Also check out my Blog for answers to commonly asked questions.
>
>
>
>
>________________________________
> From: Jonathan Coveney <[email protected]>
>To: [email protected]; Ayon Sinha <[email protected]>
>Sent: Thursday, December 1, 2011 4:17 PM
>Subject: Re: Trying to submit Pig job to Amazon EMR
>
>
>
>Usually this means that the version of Hadoop in pig mismatches with the 
>version of Hadoop you're running. I'd do ant jar-withouthadoop and point it at 
>the HAdoop on EC2 using the hadoopless pig jar
>
>
>2011/12/1 Ayon Sinha <[email protected]>
>
>Hi,
>>I have a EC2 box setup with Pig 0.8.1 which can run my jobs fine in local 
>>mode. So now I want to configure the NN & JT such that the job goes to the 
>>EMR cluster I've spun up.
>>I have a local pigconf directory with the Hadoop XML files and pointed 
>>HADOOP_CONF_DIR and PIG_CLASSPATH set to it.
>>
>>in core-site.xml I have
>>
>> <property>
>>    <name>fs.default.name</name>
>>    <value>hdfs://10.116.83.74:9000</value>
>>  </property>
>>
>>
>>On mapred-site.xml I have:
>><configuration>
>>  <property>
>>    <name>mapred.job.tracker</name>
>>    <value>10.116.83.74:9001</value>
>>  </property>
>>
>>
>>Now Pig tries to connect and I get 
>>2011-12-01 16:10:58,009 [main] INFO  org.apache.pig.Main - Logging error 
>>messages to: /home/mashlogic/ayon/pigconf/pig_1322784657959.log
>>2011-12-01 16:10:58,950 [main] INFO  
>>org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
>>to hadoop file system at: hdfs://10.116.83.74:9000
>>2011-12-01 16:10:59,814 [main] ERROR org.apache.pig.Main - ERROR 2999: 
>>Unexpected internal error. Failed to create DataStorage
>>
>>
>>log file says:
>>
>>Error before Pig is launched
>>----------------------------
>>ERROR 2999: Unexpected internal error. Failed to create DataStorage
>>
>>java.lang.RuntimeException: Failed to create DataStorage
>>at 
>>org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
>>at 
>>org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
>>at 
>>org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214)
>>at 
>>org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)
>>at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
>>at org.apache.pig.PigServer.<init>(PigServer.java:226)
>>at org.apache.pig.PigServer.<init>(PigServer.java:215)
>>at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:55)
>>at org.apache.pig.Main.run(Main.java:452)
>>at org.apache.pig.Main.main(Main.java:107)
>>Caused by: java.io.IOException: Call to /10.116.83.74:9000 failed on local 
>>exception: java.io.EOFException
>>at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142)
>>at org.apache.hadoop.ipc.Client.call(Client.java:1110)
>>at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>>at $Proxy0.getProtocolVersion(Unknown Source)
>>at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
>>at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
>>at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
>>at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:213)
>>at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:180)
>>at 
>>org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>>at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514)
>>at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
>>at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548)
>>at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530)
>>at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
>>at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:111)
>>at 
>>org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
>>... 9 more
>>Caused by: java.io.EOFException
>>at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:815)
>>at org.apache.hadoop.ipc.Client$Connection.run(Client.java:724)
>>================================================================================
>>
>>My EMR is running Hive jobs just fine. So if I can get it to run my Pig jobs, 
>>I'll be happy.
>> 
>>-Ayon
>>See My Photos on Flickr
>>Also check out my Blog for answers to commonly asked questions.
>>

Reply via email to