Digging little further I saw that the problem was with config mapreduce.jobtracker.split.metainfo.maxsize In 2.0 documentation that config is marked as mapreduce.*job*.split.metainfo.maxsize while the code is refers to mapreduce.jobtracker.split.metainfo.maxsize After setting mapreduce.jobtracker.split.metainfo.maxsize to higher value I could get the job running. I will open JIRA for this.
2013/1/7 Lohit <[email protected]> > It is easily reproducible. Generate 10 TB of input data using > teragen(replication 3) and try to run terasort using that input. First > container fails without any information in logs and job fails > > Lohit > > On Jan 7, 2013, at 6:41 AM, Robert Evans <[email protected]> wrote: > > > We have run some very large jobs on top of YARN, but have not run into > > this issue yet. The fact that the job.jar was not symlinked correctly > > makes me think this is a YARN distributed cache issue and not really an > > input split issue. How reproducible is this? Does it happen every time > > you run the job, or did it just happen once? Could you take a look at > the > > node manage logs to see if anything shows issues while launching. Sadly > > the node manager does not log everything when downloading the application > > and private distributed caches, so there could be an error in there where > > it did not create the symlink and failed to fail :). > > > > --Bobby > > > > On 1/5/13 2:44 PM, "lohit" <[email protected]> wrote: > > > >> Hi Devs, > >> > >> Has anyone seen issues when running big jobs on YARN. > >> I am trying 10 TB terasort where input is 3 way replicated. This > generates > >> job.split and job.splitmetainfo of more than 10MB. I see that first > >> container launched crashes without any error files. > >> Debugging little bit I see that job.jar symlink is not created property > >> which was strange. > >> If I try same 10TB terasort but with input one way replicated the job > runs > >> fine. job.split and job.splitmetainfo is much less in this case, which > >> makes me believe there is some kind of limit I might be hitting. > >> I tried to set mapreduce.job.split.metainfo.maxsize to 100M, but that > did > >> not help. > >> Any experience running big jobs and any related configs you guys use? > >> > >> -- > >> Have a Nice Day! > >> Lohit > > > -- Have a Nice Day! Lohit
