Hi Markus,

thanks for your answer - however, this isn't a good option for me, as I'm
running a Nutch server with multiple instances crawling multiple sites. 

>From the Nutch API, I can't know which folders under the "jobcache"
directory belong to a crawl that has just completed, vs belong to other
still ongoing crawls.

Or can I?

Thanks

Yann



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-links-in-hadoop-directory-tp4108378p4108393.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to