Hey Nitin, I'm not talking about concept-wise. I'm takling about how to actually do it technically and how to set it up. Imagine this: I have two clusters, both running fine and they are both (setup-wise) the same, besides that one has way more tasktrackers/Nodemanagers than the other one. Now I want to incorporate some data from the small cluster in the analysis of the big cluster. How could i access the data natively (Just giving the input job another HDFS folder)? In MapR I configure the specified file and then i have another folder in the MapRFS with all the content from the other cluster ... Could i somehow specify one Namenode to lookup another Namenode and incorporate all the uncommon files?
Cheers Fabian 2014-07-03 17:09 GMT+02:00 Nitin Pawar <[email protected]>: > Nothing is stopping you to implement cluster the way you want. > You can have storage only nodes for your HDFS and do not run tasktrackers > on them. > > Start bunch of machines with High RAM and high CPUs but no storage. > > Only thing to worry then would be network bandwidth to carry data from > hdfs to tasks and back to hdfs. > > > On Thu, Jul 3, 2014 at 8:29 PM, fab wol <[email protected]> wrote: > >> hey everyone, >> >> MapR is offering the possibility to acces from one cluster (e.g. a >> compute only cluster without much storage capabilities) another cluster's >> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf). >> In times of Hadoop-as-a-Service this becomes very interesting. Is this >> somehow possible with the "normal" Hadoop Distributions possible (CDH and >> HDP, I'm looking at you ;- ) ) or with even without this help from those >> distributors? Any Hacks and Tricks or even specific Functions are welcome. >> If this is not possible, has anyone issued this as a Ticket or >> something?`Ticket Number forwarding is also appreciated ... >> >> Cheers >> Wolli >> > > > > -- > Nitin Pawar >
