Vincent155 wrote:
I have a virtual machine running (VMware 1.0.7). Both host and guest run on
Fedora 10. In the virtual machine, I have Nutch installed. I can index
directories on my host as if they are websites.
Now I want to compare Nutch with another search enige. For that, I want to
index some 2,500 files in a directory. But when I execute a command like
"crawl urls -dir crawl.test -depth 3 -topN 2500", of leave away the
topN-statement, there are still only some 50 to 75 files indexed.
Check in your nutch-site.xml what is the value of
db.max.outlinks.per.page, the default is 100 - when crawling filesystems
each file in a directory is treated as an outlink, and this limit is
then applied.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com