Re: user Digest 16 Jan 2016 13:19:55 -0000 Issue 2520

2016-01-16 Thread Lewis John Mcgibbney
Hi Manish, On Sat, Jan 16, 2016 at 5:19 AM, wrote: > I was Checking the Nutch logs,I observed there are more fetching logs then > parsed logs. > I understand parsing does not happen for urls with fetch fail but the > difference is so high, any Idea ? How did

Re: Handling large scale incremental PageRank updates

2016-01-16 Thread Dennis Kubes
When we were doing billion page crawls awhile back, in 2006-2008 we had the following setup. 1. Have a given number of shards to handle the full index, at that time this was 25 million pages per shard for 40 shards for a total of 1 billion pages. 2. Crawl the pages for 1 shard. Update