Hi guys, after crawling multiple sites repeatedly for a long time, I'm getting 31999 subdirectories in the hadoop/mapred/local/taskTracker/jobcache/ directory.
After that, the crawler stops because of a 32000-file limit per directory in Linux. I'm wondering what is the solution to that? When running Nutch at a high level, I don't have access to the specific directories that get created under jobcache for a given job, so I can't delete them myself easily (unless I'm mistaken?). Is there an option to either delete these directories / temp files in Hadoop when a crawl is complete, or is there a way to configure Hadoop so that it won't run into these limitations? Or any other option to keep my crawler running? Thanks - help much appreciated again. Yann -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-links-in-hadoop-directory-tp4108378.html Sent from the Nutch - User mailing list archive at Nabble.com.

