Hello,
I assume second counts are printed by some tool accessing WebD. Right?
If so - 2 250 000 is the number of pages generated to be fetched (so all
fetched pages, fetch attempts with error) - simply total number of pages
in segments. The second number is amount of Pages/Links in WebDB - pages
/links known to nutch gathered by extracting links from already fetched
pages. Some of these pages have been already fetched but some of them
are to be fetched in future.
Regards
Piotr
Ilia S. Yatsenko wrote:
Hello
Sorry my little English
How nutch count document in search index?
I have 90 segments with 25000 in each segment
Total is 2 250 000 pages in index (this number I see when execute
mergesegs).
But in the same time nutch report me:
Number of pages: 4318557
Number of links: 5541456
Why I see in 2 times more pages than I have in real index?
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general