Re: Do i really need HDFS?

Adam Bordelon Tue, 21 Oct 2014 10:16:15 -0700

Note that you can also use http, ftp, or file:// for your executor/whatever
URIs, so if you've got an ftp or web server (including an S3 http link),
you can just let Mesos pull binaries down from there. Alternately, you can
use file:// to tell the slaves to retrieve the binaries from the same
location on each slave, using something like ansible to ensure the binaries
are already distributed to the slaves.


In addition to the Spark executors, you can use the same strategy for your
R and Scala binaries, or you can bundle up all your dependencies in a
docker image.

Spark should be able to run with S3 instead of HDFS, although you may
want/need the Tachyon in-memory layer on top of S3.

On Tue, Oct 21, 2014 at 9:20 AM, Tim St Clair <[email protected]> wrote:

> No, it just means you need the utility libraries to access the path.
>
> ----- Original Message -----
> > From: "Ankur Chauhan" <[email protected]>
> > To: [email protected]
> > Sent: Tuesday, October 21, 2014 11:18:11 AM
> > Subject: Re: Do i really need HDFS?
> >
> > So that means even if I don't use the dfs I would need HDFS namenode and
> data
> > node and related config to fetch s3 and s3n urns.
> >
> > Sent from my iPhone
> >
> > > On Oct 21, 2014, at 8:40 AM, Tim St Clair <[email protected]> wrote:
> > >
> > > Ankur -
> > >
> > > To answer your specific question re:
> > > Q: Is a s3 path considered non-hdfs?
> > > A: At this time no, it uses the hdfs layer to resolve (for better or
> > > worse).
> > >
> > > ---------------------------------------------------------------------
> > >  // Grab the resource using the hadoop client if it's one of the known
> > >  schemes
> > >  // TODO(tarnfeld): This isn't very scalable with hadoop's pluggable
> > >  // filesystem implementations.
> > >  // TODO(matei): Enforce some size limits on files we get from HDFS
> > >  if (strings::startsWith(uri, "hdfs://") ||
> > >      strings::startsWith(uri, "hftp://";) ||
> > >      strings::startsWith(uri, "s3://") ||
> > >      strings::startsWith(uri, "s3n://")) {
> > >    Try<string> base = os::basename(uri);
> > >    if (base.isError()) {
> > >      LOG(ERROR) << "Invalid basename for URI: " << base.error();
> > >      return Error("Invalid basename for URI");
> > >    }
> > >    string path = path::join(directory, base.get());
> > >
> > >    HDFS hdfs;
> > >
> > >    LOG(INFO) << "Downloading resource from '" << uri
> > >              << "' to '" << path << "'";
> > >    Try<Nothing> result = hdfs.copyToLocal(uri, path);
> > >    if (result.isError()) {
> > >      LOG(ERROR) << "HDFS copyToLocal failed: " << result.error();
> > >      return Error(result.error());
> > >    }
> > > ---------------------------------------------------------------------
> > >
> > > ----- Original Message -----
> > >
> > >> From: "Ankur Chauhan" <[email protected]>
> > >> To: [email protected]
> > >> Sent: Tuesday, October 21, 2014 10:28:50 AM
> > >> Subject: Re: Do i really need HDFS?
> > >
> > >> This is what I also intend to do. Is a s3 path considered non-hdfs?
> If so,
> > >> how does it know the credentials to use to fetch the file.
> > >
> > >> Sent from my iPhone
> > >
> > >> On Oct 21, 2014, at 5:16 AM, David Greenberg < [email protected]
> >
> > >> wrote:
> > >
> > >>> We use spark without HDFS--in our case, we just use ansible to copy
> the
> > >>> spark
> > >>> executors onto all hosts at the same path. We also load and store our
> > >>> spark
> > >>> data from non-HDFS sources.
> > >
> > >>> On Tue, Oct 21, 2014 at 4:57 AM, Dick Davies <
> [email protected] >
> > >>> wrote:
> > >
> > >>>> I think Spark needs a way to send jobs to/from the workers - the
> Spark
> > >>
> > >>>> distro itself
> > >>
> > >>>> will pull down the executor ok, but in my (very basic) tests I got
> > >>
> > >>>> stuck without HDFS.
> > >
> > >>>> So basically it depends on the framework. I think in Sparks case
> they
> > >>
> > >>>> assume most
> > >>
> > >>>> users are migrating from an existing Hadoop deployment, so HDFS is
> > >>
> > >>>> sort of assumed.
> > >
> > >>>> On 20 October 2014 23:18, CCAAT < [email protected] > wrote:
> > >>
> > >>>>> On 10/20/14 11:46, Steven Schlansker wrote:
> > >>
> > >>
> > >>
> > >>>>>> We are running Mesos entirely without HDFS with no problems. We
> use
> > >>
> > >>>>>> Docker to distribute our
> > >>
> > >>>>>> application to slave nodes, and keep no state on individual nodes.
> > >>
> > >>
> > >>
> > >>
> > >>>>> Background: I'm building up a 3 node cluster to run mesos and
> spark. No
> > >>
> > >>>>> legacy Hadoop needed or wanted. I am using btrfs for the local file
> > >>>>> system,
> > >>
> > >>>>> with (2) drives set up for raid1 on each system.
> > >>
> > >>
> > >>>>> So you are suggesting that I can install mesos + spark + docker
> > >>
> > >>>>> and not a DFS on these (3) machines?
> > >>
> > >>
> > >>
> > >>>>> Will I need any other softwares? My application is a geophysical
> > >>
> > >>>>> fluid simulator, so scala, R, and all sorts of advanced math will
> > >>
> > >>>>> be required on the cluster for the Finite Element Methods.
> > >>
> > >>
> > >>
> > >>>>> James
> > >>
> > >>
> > >
> > > --
> > >
> > > --
> > > Cheers,
> > > Timothy St. Clair
> > > Red Hat Inc.
> >
>
> --
> Cheers,
> Timothy St. Clair
> Red Hat Inc.
>

Re: Do i really need HDFS?

Reply via email to