Vincent155 wrote:
I have a virtual machine running (VMware 1.0.7). Both host and guest run on
Fedora 10. In the virtual machine, I have Nutch installed. I can index
directories on my host as if they are websites.

Now I want to compare Nutch with another search enige. For that, I want to
index some 2,500 files in a directory. But when I execute a command like
"crawl urls -dir crawl.test -depth 3 -topN 2500", of leave away the
topN-statement, there are still only some 50 to 75 files indexed.

Check in your nutch-site.xml what is the value of db.max.outlinks.per.page, the default is 100 - when crawling filesystems each file in a directory is treated as an outlink, and this limit is then applied.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to