Hello,
I assume second counts are printed by some tool accessing WebD. Right?
If so - 2 250 000 is the number of pages generated to be fetched (so all
fetched pages, fetch attempts with error) - simply total number of pages
in segments. The second number is amount of Pages/Links in WebDB - pages
/links known to nutch gathered by extracting links from already fetched
pages. Some of these pages have been already fetched but some of them
are to be fetched in future.
Regards
Piotr
Ilia S. Yatsenko wrote:
Hello
Sorry my little English
How nutch count document in search index?
I have 90 segments with 25000 in each segment
Total is 2 250 000 pages in index (this number I see when execute
mergesegs).
But in the same time nutch report me:
Number of pages: 4318557
Number of links: 5541456
Why I see in 2 times more pages than I have in real index?