Hi, Gary,  

I originally used spark-ec2 to deploy it, which installs hdfs for me , I killed 
that and set fs.default.name to s3n://xxx just like what I did in hadoop

so I thought the broken connection to ip:9000 is for hdfs….then I realized that 
masterip:9000 is something related to spark daemons(? but what’s that, the 
default port should be 7070?)  

I didn’t step into the details of spark-ec2, just manually setup a cluster in 
ec2, and directly pass s3n:// as the input and output path

everything works now

Best,

--  
Nan Zhu
School of Computer Science,
McGill University


On Sunday, December 15, 2013 at 1:04 PM, Gary Malouf wrote:  
> Nan, if you solve stuff yourself it would be good if you post your solution 
> after asking a question.   Might save someone else a few hours of work.  
> Best,
> Gary
> On Dec 15, 2013 12:52 PM, "Nan Zhu" <[email protected] 
> (mailto:[email protected])> wrote:
> > finally understand it  
> >  
> > solved  
> >  
> > --  
> > Nan Zhu
> > School of Computer Science,
> > McGill University
> >  
> >  
> >  
> >  
> > On Sunday, December 15, 2013 at 1:43 AM, Nan Zhu wrote:
> >  
> > > Hi, all
> > >  
> > > I’m trying to run Spark on EC2 and using S3 as the data storage service,  
> > >  
> > > I set fs.default.name (http://fs.default.name) to 
> > > s3://myaccessid:mysecreteid@bucketid, and I tried to load a local file 
> > > with textFile  
> > >  
> > > I found that Spark still tries to find http://mymasterip:9000
> > >  
> > > I also tried to load a file stored in s3, the same thing  
> > >  
> > > Did I misunderstand something?
> > >  
> > > I once setup hadoop cluster in ec2 using s3 to store data, it’s 
> > > straightforward that I only need to set fs.default.name 
> > > (http://fs.default.name)  
> > >  
> > > I assume that spark uses hadoop file interfaces to be able to interact 
> > > with S3, so there should be no difference?
> > >  
> > > Best,
> > >  
> > > Nan  
> >  

Reply via email to