[Nutch-dev] Re: Different Number of Doc in Index and WebDB

Michael Weber Fri, 12 Aug 2005 06:39:06 -0700

> in the Database all(let s say 24) pages are stored.

The Database Stored 24 "URLs". That is the one URL which is Indexed anthe 23 URLs which are on the linked on the page an Nutch must Index inthe next crawl.


Best regards from Germany

Michael

Nils Hoeller schrieb:

Hi,

i ve got following Problem.
When I crawl and index a Sitewith for example depth 1, it
works perfectly for the WebDB which means,
in the Database all(let s say 24) pages are stored.
But when I look at the index Dir with LukeI see only one page/doc (root page of crawl).
Now when I increase the depth of the crawl
to 2, I have about 400 pages in theWebDB and the 24 in the Index.
So the Index seems to be made for depth-1 # of Pages?
Why is that so ? Is that a configuration problem ?
Thanks for your help

Nils



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Re: Different Number of Doc in Index and WebDB

Reply via email to