Alright, then you can safely remove directories older than 24 hours using the
find command, assuming no job runs for that ridiculous amount of time :)
-----Original message-----
> From:yann <[email protected]>
> Sent: Friday 27th December 2013 18:24
> To: [email protected]
> Subject: RE: Too many links in hadoop directory
>
> Hi Markus,
>
> thanks for your answer - however, this isn't a good option for me, as I'm
> running a Nutch server with multiple instances crawling multiple sites.
>
> From the Nutch API, I can't know which folders under the "jobcache"
> directory belong to a crawl that has just completed, vs belong to other
> still ongoing crawls.
>
> Or can I?
>
> Thanks
>
> Yann
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Too-many-links-in-hadoop-directory-tp4108378p4108393.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>