Hello As the hadoop ecosystem moves fast and the yarn part was a mini revolution I understand your confusion. To make it simple in hadoop 1 there were two main things Hadoop MapReduce and Hadoop HDFS. Hadoop MR was actually two things: A compute paradigme, map-reduce and a distribution process of that paradigme. So MR had to do map and reduce phases but also talk to all the machines to get compute slots at the right places. This meant that use that distribution process you had to go through the mapreduce paradigme, since they were bundeled.
In hadoop 2 you have map reduce 2 that is a paradigme and yarn that does the distribution. The added bonus here is now you can use the paradigme you want and talk to yarn to get the distribution. So you can still do Map Reduce code if you want but you can now do other stuff like tez,spark,giraph etc... and they all use yarn as a way to get distributed cleanly on the cluster. On the Api question yarn has also changed the game you now want to use the paradigme or engine of your choice according to what best fits your calculations, DAG or not, In memory or not, Graph or nt etc... I would advise going through higher level APIs that let you write your logic and then choose the engine you need, so Cascading for example is a nice for that. Hive As well let's you write sql code and then decide later what you need, Map reduce, tez, in the near future spark. etc.. I hope this helps On Sun, Aug 10, 2014 at 7:23 PM, Sebastiano Di Paola < [email protected]> wrote: > Hi all, > I'm a newbie hadoop user, and I started using hadoop 2.4.1 as my first > installation. > So now I'm struggling with mapred, mapreduce, yarn....MRv1, MRv2, yarn. > I tried to read the documentation, but I couldn't find a clear > answer...sometimes it seems that documentations thinks that you know all > the history about hadoop framework... :( > > I started with standalone node of course, but I have deployed also a > cluster with 10 machines. > > Start with the example on the documentation. > > Cluster installed...dfs running with > start-dfs.sh > > when I run > > bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar > grep input output 'dfs[a-z.]+' > > What I'm using? MRv1, MRv2? > The job execute successfully and I can get the output on HDFS output > directory. > > > Then on the same installation I start yarn with start-yarn.sh > I run the same command after starting yarn > > bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar > grep input output 'dfs[a-z.]+' > > So what I'm using in this case? > > I'm not sure about what is the difference from mapreduce and > yarn....probably mapreduce is running on top of yarn? How does mapreduce > interact with yarn? it it completely transparent? > > What's the difference between a mapreduce and a yarn application? (Forgive > me if it's not correct to talk about mapreduce application) > > Besides that...writing a completely new mapreduce application what API > that should be used? not to write deprecated/old hadoop style code? > mapred or mapreduce > Thanks a lot. > Kind regards. > Seba > > > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
