Hi There is a way but it's not an easy one. You should overwrite the container request code in MR_AM. As each container in MapReduce gets the same amount of memory, the OOM shouldn't be problem as inner task "buffers" can be spilled to disk. I am no MapReduce (code) specialist but I would start by finding MR_Driver.class and MR_AM.class. Then overwrite the Driver.class to execute your class Custom_MR_AM (C_MR_AM). C_MR_AM will be a copy of MR_AM but you should change the container request code, so that you can allocate N containers with X memory and M container with Y memory.
The hadoop-mapreduce-examples.jar is just a bunch of HelloWorld jobs. So a new user can pick up and "learn" MR quickly. Maybe some real MR specialist can give you better advice than me. regards tmp 2013/12/5 Yue Wang <[email protected]> > Hi, > > Thank you for your answer. Now I understand the connection between the two > ways. > > I asked this question because I want to take benefit from the YARN > architecture. > If I understood correctly, I can let my ApplicationMaster request > containers more flexibly. For example, I can request two containers with > 100MB memory and two containers with 200MB memory for my mappers on YARN. > However, I cannot do that on MRv1. > > So if I execute a WordCount program by typing "yarn jar > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount > wordcount/ wc-output/", such flexibility is gone. > > Is there a way to let my ApplicationMaster execute WordCount on HDFS on > containers? > > > Thanks! > > > On Thu, Dec 5, 2013 at 4:28 AM, Rob Blah <[email protected]> wrote: > >> Hi >> >> If I understood you correctly, you would like to run your AM with YARN >> Client from shell as oppose to run the Driver like in MRv1. But it's the >> same thing (more or less). In the example you provided >> (org.apache.hadoop.yarn.applications.DistributedShell) the Client.class is >> the "driver". However since distributed-shell is a "simple" application you >> do not need a lot of configuration (setting fields in Configuration.class, >> I/O formats etc.). The same goes for any other application. As for the >> second example (org.apache.hadoop.examples.WordCount) MapReduce AM requires >> certain configuration, thus you have to to it the "old-way". The main >> difference would be: MR -> end-user-config -> driver, DS -> driver (but you >> still can create your own end-user-config). Hope this answers your question >> and that I understood it correctly. >> >> regards >> tmp >> >> >> 2013/12/5 Yue Wang <[email protected]> >> >>> Hi, >>> >>> I took a look at the codes and found some examples on the web. >>> One example is: http://wiki.opf-labs.org/display/SP/Resource+management >>> >>> It seems that users can run simple shell commands using Client of YARN. >>> But when it comes to a practical MapReduce example like WordCount, >>> people still run commands in the old way as in MRv1. >>> >>> How can I run WordCount using Client and ApplicationMaster of YARN so >>> that I can request resources flexibly? >>> >>> >>> Thanks! >>> >>> >>> On Mon, Dec 2, 2013 at 11:26 AM, Rob Blah <[email protected]> wrote: >>> >>>> Hi >>>> >>>> Follow the example provided in >>>> Yarn_dist/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell. >>>> >>>> regards >>>> tmp >>>> >>>> >>>> 2013/12/1 Yue Wang <[email protected]> >>>> >>>>> Hi, >>>>> >>>>> I found the page ( >>>>> http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html) >>>>> and know how to write an ApplicationMaster. >>>>> >>>>> However, is there a complete example showing how to run this >>>>> ApplicationMaster with a real Hadoop Program (e.g. WordCount) on YARN? >>>>> >>>>> Thanks! >>>>> >>>>> >>>>> >>>>> Yue >>>>> >>>> >>>> >>> >> >
