>> When I performed a whole-web crawl test according to the tutorial, I got
>> Number of pages: 36668
>> Number of links: 46721.
>> Then how many have you got?

>I only played around with Nutch some month ago, and I got as many as
500.000 
>pages and several million links within a few days over my home DSL line.
Your 
>crawler might be stuck somewhere ...?

Number of pages - it's probably number of Page instances, number of
successfully retrieved web-pages.
Number of links - probably total number of Link instances in WebDB,
including non-retrieved pages, and links to the same Page instance. 

Different pages may have different links (with different anchor text and
even different URL) to the same Page instance; page equality is defined as
MD5 hash (checksum of all bytes in plain HTTP response).

Single page may have hundreds of links, including links to foreign hosts.

Nutch 0.7.1

Reply via email to