UpdateDB copies link information and score from the WebDB to segments so
it is important to have score calculated before updatedb is run. One can
use current standard nutch score (based on number of inlinks) or try to
use analyze - I have committed a patch for it some time ago that might
help a bit with it disk space requirements so the best approach would be
to test it (it worked ok for me) and if it is ok for you - report it so
others can also try it out.
Regards
Piotr
AJ Chen wrote:
In a whole-web or vertical crawling setting, is it right that link
analysis and update segment from DB should be performed in right order
before indexing the segments?
There's not much talk about update segment from DB. I think it should be
an important step. Could someone point out when it should be run and
what the benefits are?
I remember it was mentioned sometime ago that the link analysis tool
does not work yet and the number of in-links should be used instead. Any
update? If it's still not working, how to set it to use link numbers?
Thanks,
AJ