>> When I performed a whole-web crawl test according to the tutorial, I got >> Number of pages: 36668 >> Number of links: 46721. >> Then how many have you got?
>I only played around with Nutch some month ago, and I got as many as 500.000 >pages and several million links within a few days over my home DSL line. Your >crawler might be stuck somewhere ...? Number of pages - it's probably number of Page instances, number of successfully retrieved web-pages. Number of links - probably total number of Link instances in WebDB, including non-retrieved pages, and links to the same Page instance. Different pages may have different links (with different anchor text and even different URL) to the same Page instance; page equality is defined as MD5 hash (checksum of all bytes in plain HTTP response). Single page may have hundreds of links, including links to foreign hosts. Nutch 0.7.1
