Re: Multi-Cluster Setup

fab wol Thu, 03 Jul 2014 08:37:59 -0700

Hey Nitin,

I'm not talking about concept-wise. I'm takling about how to actually do it
technically and how to set it up. Imagine this: I have two clusters, both
running fine and they are both (setup-wise) the same, besides that one has
way more tasktrackers/Nodemanagers than the other one. Now I want to
incorporate some data from the small cluster in the analysis of the big
cluster. How could i access the data natively (Just giving the input job
another HDFS folder)? In MapR I configure the specified file and then i
have another folder in the MapRFS with all the content from the other
cluster ... Could i somehow specify one Namenode to lookup another Namenode
and incorporate all the uncommon files?


Cheers
Fabian


2014-07-03 17:09 GMT+02:00 Nitin Pawar <[email protected]>:

> Nothing is stopping you to implement cluster the way you want.
> You can have storage only nodes for your HDFS and do not run tasktrackers
> on them.
>
> Start bunch of machines with High RAM and high CPUs but no storage.
>
> Only thing to worry then would be network bandwidth to carry data from
> hdfs to tasks and back to hdfs.
>
>
> On Thu, Jul 3, 2014 at 8:29 PM, fab wol <[email protected]> wrote:
>
>> hey everyone,
>>
>> MapR is offering the possibility to acces from one cluster (e.g. a
>> compute only cluster without much storage capabilities) another cluster's
>> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf).
>> In times of Hadoop-as-a-Service this becomes very interesting. Is this
>> somehow possible with the "normal" Hadoop Distributions possible (CDH and
>> HDP, I'm looking at you ;- ) ) or with even without this help from those
>> distributors? Any Hacks and Tricks or even specific Functions are welcome.
>> If this is not possible, has anyone issued this as a Ticket or
>> something?`Ticket Number forwarding is also appreciated ...
>>
>> Cheers
>> Wolli
>>
>
>
>
> --
> Nitin Pawar
>

Re: Multi-Cluster Setup

Reply via email to