I'm sorry - scrub the last message, I found the crawl-tool.xml file which I think will help.

Given that this is an intranet, and all pages are "trusted" - i.e. none is an authority over the other, and there is no spam present at all in the index, I wonder if I can simply remove the inbound link score factor entirely, and keep it to basic on-page factors?

In other words, I turn this off:-

<property>
 <name>indexer.boost.by.link.count</name>
 <value>true</value>
 <description>When true scores for a page are multipled by the log of
 the number of incoming links to the page.</description>
</property>

Any thoughts on doing this?

Also, is it possible to reindex without doing a re-crawl, so that I can do some testing?

I'm using the basic process for an intranet crawl, and I'm very new to Nutch!

Thanks,

Dean

Reply via email to