UpdateDB copies link information and score from the WebDB to segments so it is important to have score calculated before updatedb is run. One can use current standard nutch score (based on number of inlinks) or try to use analyze - I have committed a patch for it some time ago that might help a bit with it disk space requirements so the best approach would be to test it (it worked ok for me) and if it is ok for you - report it so others can also try it out.
Regards
Piotr
AJ Chen wrote:
In a whole-web or vertical crawling setting, is it right that link analysis and update segment from DB should be performed in right order before indexing the segments?

There's not much talk about update segment from DB. I think it should be an important step. Could someone point out when it should be run and what the benefits are?

I remember it was mentioned sometime ago that the link analysis tool does not work yet and the number of in-links should be used instead. Any update? If it's still not working, how to set it to use link numbers?

Thanks,
AJ



Reply via email to