Hi, Gary, I originally used spark-ec2 to deploy it, which installs hdfs for me , I killed that and set fs.default.name to s3n://xxx just like what I did in hadoop
so I thought the broken connection to ip:9000 is for hdfs….then I realized that masterip:9000 is something related to spark daemons(? but what’s that, the default port should be 7070?) I didn’t step into the details of spark-ec2, just manually setup a cluster in ec2, and directly pass s3n:// as the input and output path everything works now Best, -- Nan Zhu School of Computer Science, McGill University On Sunday, December 15, 2013 at 1:04 PM, Gary Malouf wrote: > Nan, if you solve stuff yourself it would be good if you post your solution > after asking a question. Might save someone else a few hours of work. > Best, > Gary > On Dec 15, 2013 12:52 PM, "Nan Zhu" <[email protected] > (mailto:[email protected])> wrote: > > finally understand it > > > > solved > > > > -- > > Nan Zhu > > School of Computer Science, > > McGill University > > > > > > > > > > On Sunday, December 15, 2013 at 1:43 AM, Nan Zhu wrote: > > > > > Hi, all > > > > > > I’m trying to run Spark on EC2 and using S3 as the data storage service, > > > > > > I set fs.default.name (http://fs.default.name) to > > > s3://myaccessid:mysecreteid@bucketid, and I tried to load a local file > > > with textFile > > > > > > I found that Spark still tries to find http://mymasterip:9000 > > > > > > I also tried to load a file stored in s3, the same thing > > > > > > Did I misunderstand something? > > > > > > I once setup hadoop cluster in ec2 using s3 to store data, it’s > > > straightforward that I only need to set fs.default.name > > > (http://fs.default.name) > > > > > > I assume that spark uses hadoop file interfaces to be able to interact > > > with S3, so there should be no difference? > > > > > > Best, > > > > > > Nan > >
