I assume you mean UpdateSegmentFromDB, and there is no need to run link
analysis tool if I want to use the number of inlinks for nutch score.
Right? I tried to find your patch, but couldn't find it. How to find it?
-AJ
Piotr Kosiorowski wrote:
UpdateDB copies link information and score from the WebDB to segments
so it is important to have score calculated before updatedb is run.
One can use current standard nutch score (based on number of inlinks)
or try to use analyze - I have committed a patch for it some time ago
that might help a bit with it disk space requirements so the best
approach would be to test it (it worked ok for me) and if it is ok for
you - report it so others can also try it out.
Regards
Piotr
AJ Chen wrote:
In a whole-web or vertical crawling setting, is it right that link
analysis and update segment from DB should be performed in right
order before indexing the segments?
There's not much talk about update segment from DB. I think it should
be an important step. Could someone point out when it should be run
and what the benefits are?
I remember it was mentioned sometime ago that the link analysis tool
does not work yet and the number of in-links should be used instead.
Any update? If it's still not working, how to set it to use link
numbers?
Thanks,
AJ