1. from my understanding, link analysis scores are computed during the analysis phase and stored into the boost field of the lucene index during the indexing phase, correct?
Yes.
2. how are the link analysis scores computed during the analysis phase? is it simply link popularity, or is there more to it?
There is more to it. It uses a PageRank-like algorithm.
i've also noticed the "indexer.boost.by.link.count" property. why does this property exist? isn't this what the analyzer is doing anyway?
No. This is much simpler, just counting the number of incoming links known. It is much faster, since analyze need not be run. However it is more susceptible to link spam. So it is best used in applications (e.g., intranets) where link spam is not an issue.
3. one of our ideas for improving search results is to crawl another subset of related sites to give the analyzer more pages to better compute link analysis scores. however, we don't want these sites to show up in the search. what's the best/easiest way to do this? do i have to write a plugin, or is there a better way?
If you keep these sites in separate segments then you can easily exclude them from search. Alternately, you could write a tool that marks them as deleted in the index.
Doug
------------------------------------------------------- This SF.Net email is sponsored by: Sybase ASE Linux Express Edition - download now for FREE LinuxWorld Reader's Choice Award Winner for best database on Linux. http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general
