Oh! good to know! It keep tracks even of month old entries??? There is no TTL?
I was not able to find the documentation for local.cache.size or mapreduce.tasktracker.cache.local.size in 1.0.x branch. Do you know where I can found that? Thanks, JM 2013/3/27 Koji Noguchi <[email protected]>: >> Else, I will go for a customed script to delete all directories (and >> content) older than 2 or 3 days… >> > TaskTracker (or NodeManager in 2.*) keeps the list of dist cache entries in > memory. > So if external process (like your script) start deleting dist cache files, > there would be inconsistency and you'll start seeing task initialization > failures due to no file found error. > > Koji > > > On Mar 26, 2013, at 9:00 PM, Jean-Marc Spaggiari wrote: > >> For the situation I faced I was really a disk space issue, not related >> to the number of files. It was writing on a small partition. >> >> I will try with local.cache.size or >> mapreduce.tasktracker.cache.local.size to see if I can keep the final >> total size under 5GB... Else, I will go for a customed script to >> delete all directories (and content) older than 2 or 3 days... >> >> Thanks, >> >> JM >> >> 2013/3/26 Abdelrahman Shettia <[email protected]>: >>> Let me clarify , If there are lots of files or directories up to 32K ( >>> Depending on the user's # of files sys os config) in those distributed cache >>> dirs, The OS will not be able to create any more files/dirs, Thus M-R jobs >>> wont get initiated on those tasktracker machines. Hope this helps. >>> >>> >>> Thanks >>> >>> >>> On Tue, Mar 26, 2013 at 1:44 PM, Vinod Kumar Vavilapalli >>> <[email protected]> wrote: >>>> >>>> >>>> All the files are not opened at the same time ever, so you shouldn't see >>>> any "# of open files exceeds error". >>>> >>>> Thanks, >>>> +Vinod Kumar Vavilapalli >>>> Hortonworks Inc. >>>> http://hortonworks.com/ >>>> >>>> On Mar 26, 2013, at 12:53 PM, Abdelrhman Shettia wrote: >>>> >>>> Hi JM , >>>> >>>> Actually these dirs need to be purged by a script that keeps the last 2 >>>> days worth of files, Otherwise you may run into # of open files exceeds >>>> error. >>>> >>>> Thanks >>>> >>>> >>>> On Mar 25, 2013, at 5:16 PM, Jean-Marc Spaggiari <[email protected]> >>>> wrote: >>>> >>>> Hi, >>>> >>>> >>>> Each time my MR job is run, a directory is created on the TaskTracker >>>> >>>> under mapred/local/taskTracker/hadoop/distcache (based on my >>>> >>>> configuration). >>>> >>>> >>>> I looked at the directory today, and it's hosting thousands of >>>> >>>> directories and more than 8GB of data there. >>>> >>>> >>>> Is there a way to automatically delete this directory when the job is >>>> done? >>>> >>>> >>>> Thanks, >>>> >>>> >>>> JM >>>> >>>> >>>> >>> >
