Hi Markus, thanks for your answer - however, this isn't a good option for me, as I'm running a Nutch server with multiple instances crawling multiple sites.
>From the Nutch API, I can't know which folders under the "jobcache" directory belong to a crawl that has just completed, vs belong to other still ongoing crawls. Or can I? Thanks Yann -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-links-in-hadoop-directory-tp4108378p4108393.html Sent from the Nutch - User mailing list archive at Nabble.com.

