I tried switching to "protocol-httpclient" instead of "protocol-http" but it
did not help. Also I do not see this problem on small number of URLs (6.000)
but when I feed Nutch about 200K it crawls only 96K.
Mike Alulin <[EMAIL PROTECTED]> wrote:
I've switched from 0.6 to 0.71 and found out that now after crawling and
indexing I have less than a half of the pages in the index. What can be wrong.
It does not show any error messages and I am sure that all the crawled pages
exists.
---------------------------------
Yahoo! Mail
Bring photos to life! New PhotoMail makes sharing a breeze.
---------------------------------
Yahoo! Mail
Bring photos to life! New PhotoMail makes sharing a breeze.