Re: Do i really need HDFS?

CCAAT Wed, 22 Oct 2014 08:36:35 -0700

Ok so,

I'd be curious to know your final architecture (D. Davies)?

I was looking to put Ceph on top of the (3) btrfs nodes in case we needa DFS at some later point. We're not really sure what softwares will be

in our final mix. Certainly installing Ceph does not hurt anything (?);
and I'm not sure we want to use ceph from userspace only. We have had
excellent success using btrfs, so that is firm for us, short of some
gapping problem emerging. Growing the cluster size will happen, once
we establish the basic functionality of the cluster.

Right now, there is a focus on subsurface fluid simulations for carbonsequsttration, but also using the cluster for general (cron-chronos)batch jobs is a secondary appeal to us. So, I guess my question is,knowing that we want to avoid the hdfs/hadoop setup entirely, willlocalFS/DFS with btrfs/ceph be sufficiently robust to test not onlymesos+spark but many other related softwares, such as but not limited toR, scala, sparkR, database(sql) and many other softwares? We're justtrying to avoid some common mistakes as we move forward with mesos.


James



On 10/22/14 02:29, Dick Davies wrote:

Be interested to know what that is, if you don't mind sharing.

We're thinking of deploying a Ceph cluster for another project anyway,
it seems to remove some of the chokepoints/points of failure HDFS suffers from
but I've no idea how well it can interoperate with the usual HDFS clients
(Spark in my particular case but I'm trying to keep this general).

On 21 October 2014 13:16, David Greenberg <[email protected]> wrote:

We use spark without HDFS--in our case, we just use ansible to copy the
spark executors onto all hosts at the same path. We also load and store our
spark data from non-HDFS sources.

On Tue, Oct 21, 2014 at 4:57 AM, Dick Davies <[email protected]> wrote:


I think Spark needs a way to send jobs to/from the workers - the Spark
distro itself
will pull down the executor ok, but in my (very basic) tests I got
stuck without HDFS.

So basically it depends on the framework. I think in Sparks case they
assume most
users are migrating from an existing Hadoop deployment, so HDFS is
sort of assumed.


On 20 October 2014 23:18, CCAAT <[email protected]> wrote:

On 10/20/14 11:46, Steven Schlansker wrote:

We are running Mesos entirely without HDFS with no problems.  We use
Docker to distribute our
application to slave nodes, and keep no state on individual nodes.




Background: I'm building up a 3 node cluster to run mesos and spark. No
legacy Hadoop needed or wanted. I am using btrfs for the local file
system,
with (2) drives set up for raid1 on each system.

So you  are suggesting that I can install mesos + spark + docker
and not a DFS on these (3) machines?


Will I need any other softwares? My application is a geophysical
fluid simulator, so scala, R, and all sorts of advanced math will
be required on the cluster for the Finite Element Methods.


James

Re: Do i really need HDFS?

Reply via email to