Hi all, I'm a newbie hadoop user, and I started using hadoop 2.4.1 as my first installation. So now I'm struggling with mapred, mapreduce, yarn....MRv1, MRv2, yarn. I tried to read the documentation, but I couldn't find a clear answer...sometimes it seems that documentations thinks that you know all the history about hadoop framework... :(
I started with standalone node of course, but I have deployed also a cluster with 10 machines. Start with the example on the documentation. Cluster installed...dfs running with start-dfs.sh when I run bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+' What I'm using? MRv1, MRv2? The job execute successfully and I can get the output on HDFS output directory. Then on the same installation I start yarn with start-yarn.sh I run the same command after starting yarn bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+' So what I'm using in this case? I'm not sure about what is the difference from mapreduce and yarn....probably mapreduce is running on top of yarn? How does mapreduce interact with yarn? it it completely transparent? What's the difference between a mapreduce and a yarn application? (Forgive me if it's not correct to talk about mapreduce application) Besides that...writing a completely new mapreduce application what API that should be used? not to write deprecated/old hadoop style code? mapred or mapreduce Thanks a lot. Kind regards. Seba
