Hi,
As my understanding, link anaylsis is neccessary to
run whenever a new fetching is updated to webdb.
Because the link graphic is changed ( it is possible
that new links are added and old links are deleted ),
the score for each node is changed so a recaculation
is neccessary.
Link analysis will update the score for each node (by
page) in webdb, then updatesegmentfromdb needs to run
to copy recalculated score to segment.
I can't see a point that we can skip link anaylsis. Am
I missing something important? Let me know.
thanks,
Michael Ji,
--- AJ Chen <[EMAIL PROTECTED]> wrote:
> I assume you mean UpdateSegmentFromDB, and there is
> no need to run link
> analysis tool if I want to use the number of inlinks
> for nutch score.
> Right? I tried to find your patch, but couldn't find
> it. How to find it?
> -AJ
>
> Piotr Kosiorowski wrote:
>
> > UpdateDB copies link information and score from
> the WebDB to segments
> > so it is important to have score calculated before
> updatedb is run.
> > One can use current standard nutch score (based on
> number of inlinks)
> > or try to use analyze - I have committed a patch
> for it some time ago
> > that might help a bit with it disk space
> requirements so the best
> > approach would be to test it (it worked ok for me)
> and if it is ok for
> > you - report it so others can also try it out.
> > Regards
> > Piotr
> > AJ Chen wrote:
> >
> >> In a whole-web or vertical crawling setting, is
> it right that link
> >> analysis and update segment from DB should be
> performed in right
> >> order before indexing the segments?
> >>
> >> There's not much talk about update segment from
> DB. I think it should
> >> be an important step. Could someone point out
> when it should be run
> >> and what the benefits are?
> >>
> >> I remember it was mentioned sometime ago that the
> link analysis tool
> >> does not work yet and the number of in-links
> should be used instead.
> >> Any update? If it's still not working, how to set
> it to use link
> >> numbers?
> >>
> >> Thanks,
> >> AJ
> >>
> >>
> >
> >
>
>
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com