Hello,
I assume  second counts are printed by some tool accessing WebD. Right?
If so - 2 250 000 is the number of pages generated to be fetched (so all fetched pages, fetch attempts with error) - simply total number of pages in segments. The second number is amount of Pages/Links in WebDB - pages /links known to nutch gathered by extracting links from already fetched pages. Some of these pages have been already fetched but some of them are to be fetched in future.
Regards
Piotr

Ilia S. Yatsenko wrote:
Hello Sorry my little English
How nutch count document in search index?

I have 90 segments with 25000 in each segment

Total is 2 250 000 pages in index (this number I see when execute
mergesegs).

But in the same time nutch report me:

Number of pages: 4318557
Number of links: 5541456
Why I see in 2 times more pages than I have in real index?




-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to