Thanks a lot for the answer! If you don't mind help more on this, here is what I am seeing.
- The NameNode/DataNode and ResourceManager/NodeManager were running for 6 months before I discovered that the job history server was not running. After bringing up the job history server, I saw like 2k+ jobs showing up from the history server web ui. But then the job history server got restarted, and I don't see any jobs more than 7 days old showing up in the history web ui. - I've disabled the cleaner in the config file. My question is, is there a way to find/recover the job history files more than 7 days old? I read that the container logs are stored locally in the NodeManger user log dir, and there are files (I have not dig through them yet). I am not sure if the deleted job history files (by history cleaner) are not easy to recover. Thanks in advance, Boyu On Mon, Sep 21, 2015 at 4:35 PM, Varun Saxena <vsaxena.va...@gmail.com> wrote: > MR jobs will write history files to path given by config > mapreduce.jobhistory.intermediate-done-dir > History server will then move them to done dir which is given by config m > apreduce.jobhistory.done-dir. > > By default these config values > are ${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate > and ${yarn.app.mapreduce.am.staging-dir}/history/done respectively. > > 7 days is also configurable(config being mapreduce.jobhistory.max-age-ms). > You can set this value according to your cluster. > > I hope this answers your question. > > Regards, > Varun Saxena. > > On Tue, Sep 22, 2015 at 1:39 AM, Boyu Zhang <boyuzhan...@gmail.com> wrote: > >> Thanks a lot for the clarification! >> >> I tried to find the log and history information about finished jobs. But >> they are not in hfs://xxx/user/myusername/output/_SUCCESS (0B). Can you >> please give some pointers on where the statistical/job history files are >> located? The hfs://xxxx/history/done only stores history files up to 7 days. >> >> Thanks, >> Boyu >> >> On Mon, Sep 21, 2015 at 1:23 PM, Varun Saxena <vsaxena.va...@gmail.com> >> wrote: >> >>> No, you cant show them in RM UI then. >>> >>> However if you can start another daemon, you can consider using YARN >>> Application History/Timeline Server or MR Job History Server(only for MR >>> jobs) to see information about completed jobs. >>> You can look up Hadoop documentation to learn more about them and how to >>> configure them. >>> >>> Just to clarify though, the apps themselves are not lost, as in, the >>> output is not lost. Its just the information about them which is no longer >>> present on RM restart. >>> >>> Regards, >>> Varun Saxena. >>> >>> On Mon, Sep 21, 2015 at 10:31 PM, Boyu Zhang <boyuzhan...@gmail.com> >>> wrote: >>> >>>> Thanks for the answer Varun. >>>> >>>> It is the case that yarn.resourcemanager.recovery.enabled is set to be >>>> false. Is there a way to show the jobs that are submitted before the >>>> restart? We don't want to lose that data. >>>> >>>> Thanks, >>>> Boyu >>>> >>>> >>>> On Mon, Sep 21, 2015 at 12:53 PM, Varun Saxena <vsaxena.va...@gmail.com >>>> > wrote: >>>> >>>>> Hi Boyu, >>>>> >>>>> RM stores apps in state store if recovery is enabled. Only then they >>>>> will be available on restart. >>>>> Otherwise they are kept in memory and hence lost on restart. >>>>> >>>>> You may not have it enabled. Check config value for below config. By >>>>> default its false. >>>>> yarn.resourcemanager.recovery.enabled >>>>> >>>>> Regards, >>>>> Varun. >>>>> >>>>> On Mon, Sep 21, 2015 at 10:01 PM, Boyu Zhang <boyuzhan...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hello Everyone, >>>>>> >>>>>> I have a strange error regarding the ResourceManager web UI ( >>>>>> http://xx.xx:8088). >>>>>> >>>>>> Someone before me set up the hadoop + yarn cluster using Pivotal HD, >>>>>> it was running fine. Then today, the resource manager and node manager >>>>>> disappeared, the logs did not record this. I restarted them, they are up >>>>>> and running, but the resource manger web UI does not show any jobs. We >>>>>> have >>>>>> 700+ jobs in the past, and they were showing before. >>>>>> >>>>>> If I submit MapReduce jobs, the new submitted ones show up. But the >>>>>> disappear again after restart the resource manger and node manager. >>>>>> >>>>>> Can anyone give any hint on where to look? >>>>>> >>>>>> Thanks in advance, >>>>>> Boyu >>>>>> >>>>>> >>>>> >>>> >>> >> >