Thank you Shahab but it would be really nice if I can get some input on my initial question as it would really help.
On Wed, Mar 12, 2014 at 3:11 PM, Shahab Yunus <[email protected]>wrote: > I would suggest that given the level of details that you are looking for > and fundamental nature of your questions, you should get hold of books or > online documentation. Basically some reading/research. > > Latest edition of > http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520 is > highly recommended to begin with. > > Regards, > Shahab > > > On Wed, Mar 12, 2014 at 3:07 PM, [email protected] <[email protected]>wrote: > >> Hello Team, >> >> I am starting off on Hadoop eco-system and wanted to learn first based on >> my use case if Hadoop is right tool for me. >> >> I have only structured data and my goal is to safe this data into Hadoop >> and take benefit of replication factor. I am using Microsoft tools for >> doing analysis and it provides me with good drag and drop functionality for >> creating different kind of anaylsis and also it has hadoop drivers so it >> can have hadoop as data source for doing analysis. >> >> My question here is how benefits YARN architecture give me in tems of >> analysis that my Microsoft, Netezza of Tableau products are not giving me. >> I am just trying to understand value of introducing Hadoop in my >> Architecture in terms of Analysis apart from data replication. Any insights >> would be very helpful. >> >> Also, my goal for POC is related to efficient data storage/retrieval and >> so >> >> 1. how does data retrieval work in hadoop? >> 2. do i always need to have any kind of data source on top of hdfs >> like hbase/cassandra/mongo or there is not need for one and i can have all >> my data stored in hdfs directly and can retrieve them when i need by using >> different analytic tools that have hdfs as data source? >> 3. say if i have 3 node cluster, one master and 2 slaves and if am >> trying to insert data into hadoop then what is the cycle that framework >> performs to install my data into hdfs - does my process reads all meta >> data >> information from master node about where is my slaves nodes and what kind >> of data should go on which slave node or all data is send to master node >> and from there depending upon meta data information it reads and decides >> that what portion of data should be going to which node? >> 4. Also if i have 3 node cluster with 1 master and 2 slaves and if my >> data is equally distributed in two nodes and if i have replication set to >> 2 >> then where and how will replication take place as i do not have any node >> vacant for doing replication? >> 5. Also, for POC, does it make sense to go with Cloudera 3 node free >> cluster or Hortonworks 3 node free cluster or it makes sense to go with >> opensource hadoop version and if we go with open source hadoop version >> then >> where can we define that which is master node and which is slave node and >> also can we have all 3 nodes on same machine or we need to have all 3 >> nodes >> on different machines? >> 6. Also, what are the pros and cons with going through >> Hortonworks/Cloudera as opposed to Apache Hadoop from initial POC point of >> view? >> 7. Also, if we go with Hortonworks/Cloudera then what all tools are >> come clubbed together with Hadoop framework and if we go with Apache >> Hadoop, do we get any tools like Pig, Hive clubbed together or we have to >> install them separately? >> >> Since am staring off on Hadoop Journey recently, I would really >> appreciate if community can point me in right direction? >> >> Regards, Andy. >> > >
