Hi Manish,
On Sat, Jan 16, 2016 at 5:19 AM, wrote:
> I was Checking the Nutch logs,I observed there are more fetching logs then
> parsed logs.
> I understand parsing does not happen for urls with fetch fail but the
> difference is so high, any Idea ?
How did
When we were doing billion page crawls awhile back, in 2006-2008 we had
the following setup.
1. Have a given number of shards to handle the full index, at that time
this was 25 million pages per shard for 40 shards for a total of 1
billion pages.
2. Crawl the pages for 1 shard. Update
2 matches
Mail list logo