Re: Do i really need HDFS?

Tim St Clair Tue, 21 Oct 2014 08:41:29 -0700

Ankur - 

To answer your specific question re: 
Q: Is a s3 path considered non-hdfs? 
A: At this time no, it uses the hdfs layer to resolve (for better or worse).


---------------------------------------------------------------------
  // Grab the resource using the hadoop client if it's one of the known schemes
  // TODO(tarnfeld): This isn't very scalable with hadoop's pluggable
  // filesystem implementations.
  // TODO(matei): Enforce some size limits on files we get from HDFS
  if (strings::startsWith(uri, "hdfs://") ||
      strings::startsWith(uri, "hftp://";) ||
      strings::startsWith(uri, "s3://") ||
      strings::startsWith(uri, "s3n://")) {
    Try<string> base = os::basename(uri);
    if (base.isError()) {
      LOG(ERROR) << "Invalid basename for URI: " << base.error();
      return Error("Invalid basename for URI");
    }
    string path = path::join(directory, base.get());

    HDFS hdfs;

    LOG(INFO) << "Downloading resource from '" << uri
              << "' to '" << path << "'";
    Try<Nothing> result = hdfs.copyToLocal(uri, path);
    if (result.isError()) {
      LOG(ERROR) << "HDFS copyToLocal failed: " << result.error();
      return Error(result.error());
    }
---------------------------------------------------------------------

----- Original Message ----- 

> From: "Ankur Chauhan" <[email protected]>
> To: [email protected]
> Sent: Tuesday, October 21, 2014 10:28:50 AM
> Subject: Re: Do i really need HDFS?

> This is what I also intend to do. Is a s3 path considered non-hdfs? If so,
> how does it know the credentials to use to fetch the file.

> Sent from my iPhone

> On Oct 21, 2014, at 5:16 AM, David Greenberg < [email protected] >
> wrote:

> > We use spark without HDFS--in our case, we just use ansible to copy the
> > spark
> > executors onto all hosts at the same path. We also load and store our spark
> > data from non-HDFS sources.
> 

> > On Tue, Oct 21, 2014 at 4:57 AM, Dick Davies < [email protected] >
> > wrote:
> 

> > > I think Spark needs a way to send jobs to/from the workers - the Spark
> > 
> 
> > > distro itself
> > 
> 
> > > will pull down the executor ok, but in my (very basic) tests I got
> > 
> 
> > > stuck without HDFS.
> > 
> 

> > > So basically it depends on the framework. I think in Sparks case they
> > 
> 
> > > assume most
> > 
> 
> > > users are migrating from an existing Hadoop deployment, so HDFS is
> > 
> 
> > > sort of assumed.
> > 
> 

> > > On 20 October 2014 23:18, CCAAT < [email protected] > wrote:
> > 
> 
> > > > On 10/20/14 11:46, Steven Schlansker wrote:
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > >> We are running Mesos entirely without HDFS with no problems. We use
> > 
> 
> > > >> Docker to distribute our
> > 
> 
> > > >> application to slave nodes, and keep no state on individual nodes.
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > Background: I'm building up a 3 node cluster to run mesos and spark. No
> > 
> 
> > > > legacy Hadoop needed or wanted. I am using btrfs for the local file
> > > > system,
> > 
> 
> > > > with (2) drives set up for raid1 on each system.
> > 
> 
> > > >
> > 
> 
> > > > So you are suggesting that I can install mesos + spark + docker
> > 
> 
> > > > and not a DFS on these (3) machines?
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > Will I need any other softwares? My application is a geophysical
> > 
> 
> > > > fluid simulator, so scala, R, and all sorts of advanced math will
> > 
> 
> > > > be required on the cluster for the Finite Element Methods.
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > James
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 

-- 

-- 
Cheers,
Timothy St. Clair
Red Hat Inc.

Re: Do i really need HDFS?

Reply via email to