For the situation I faced I was really a disk space issue, not related to the number of files. It was writing on a small partition.
I will try with local.cache.size or mapreduce.tasktracker.cache.local.size to see if I can keep the final total size under 5GB... Else, I will go for a customed script to delete all directories (and content) older than 2 or 3 days... Thanks, JM 2013/3/26 Abdelrahman Shettia <[email protected]>: > Let me clarify , If there are lots of files or directories up to 32K ( > Depending on the user's # of files sys os config) in those distributed cache > dirs, The OS will not be able to create any more files/dirs, Thus M-R jobs > wont get initiated on those tasktracker machines. Hope this helps. > > > Thanks > > > On Tue, Mar 26, 2013 at 1:44 PM, Vinod Kumar Vavilapalli > <[email protected]> wrote: >> >> >> All the files are not opened at the same time ever, so you shouldn't see >> any "# of open files exceeds error". >> >> Thanks, >> +Vinod Kumar Vavilapalli >> Hortonworks Inc. >> http://hortonworks.com/ >> >> On Mar 26, 2013, at 12:53 PM, Abdelrhman Shettia wrote: >> >> Hi JM , >> >> Actually these dirs need to be purged by a script that keeps the last 2 >> days worth of files, Otherwise you may run into # of open files exceeds >> error. >> >> Thanks >> >> >> On Mar 25, 2013, at 5:16 PM, Jean-Marc Spaggiari <[email protected]> >> wrote: >> >> Hi, >> >> >> Each time my MR job is run, a directory is created on the TaskTracker >> >> under mapred/local/taskTracker/hadoop/distcache (based on my >> >> configuration). >> >> >> I looked at the directory today, and it's hosting thousands of >> >> directories and more than 8GB of data there. >> >> >> Is there a way to automatically delete this directory when the job is >> done? >> >> >> Thanks, >> >> >> JM >> >> >> >
