Let me clarify , If there are lots of files or directories up to 32K ( Depending on the user's # of files sys os config) in those distributed cache dirs, The OS will not be able to create any more files/dirs, Thus M-R jobs wont get initiated on those tasktracker machines. Hope this helps.
Thanks On Tue, Mar 26, 2013 at 1:44 PM, Vinod Kumar Vavilapalli < [email protected]> wrote: > > All the files are not opened at the same time ever, so you shouldn't see > any "# of open files exceeds error". > > Thanks, > +Vinod Kumar Vavilapalli > Hortonworks Inc. > http://hortonworks.com/ > > On Mar 26, 2013, at 12:53 PM, Abdelrhman Shettia wrote: > > Hi JM , > > Actually these dirs need to be purged by a script that keeps the last 2 > days worth of files, Otherwise you may run into # of open files exceeds > error. > > Thanks > > > On Mar 25, 2013, at 5:16 PM, Jean-Marc Spaggiari <[email protected]> > wrote: > > Hi, > > > Each time my MR job is run, a directory is created on the TaskTracker > > under mapred/local/taskTracker/hadoop/distcache (based on my > > configuration). > > > I looked at the directory today, and it's hosting thousands of > > directories and more than 8GB of data there. > > > Is there a way to automatically delete this directory when the job is done? > > > Thanks, > > > JM > > > >
