1) Does that mean a page's outlink number will count for its' score? ( I seems to see this logic in code, but can't remember which one )
Then, my question is --- how accurate the score for this method will be? I mean, theoretically, a page's score depends on the number of in-links and the score of source page of these in-links. 2) How much link analysis tool cost? for example, if I have 10 million pages in Webdb, how long it will take to run? thanks, Michael Ji, --- "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > The linkanalisis tool needs long time to process. > Doug wrotte some comments from it: > The fetchlist.score.by.link.count and the > indexer.boost.by.link.count to > true. And forgot using of linkanalysis tool. > I use these method from since 2005 June, without > problem. > With the linkanalysis tool the scoring is better, > but with the explained > setup it is near scoring - without many resource > usage. > > Michael Ji wrotte: > > >Hi, > > > >As my understanding, link anaylsis is neccessary to > > >run whenever a new fetching is updated to webdb. > >Because the link graphic is changed ( it is > possible > >that new links are added and old links are deleted > ), > >the score for each node is changed so a > recaculation > >is neccessary. > > > >Link analysis will update the score for each node > (by > >page) in webdb, then updatesegmentfromdb needs to > run > >to copy recalculated score to segment. > > > >I can't see a point that we can skip link anaylsis. > Am > >I missing something important? Let me know. > > > >thanks, > > > >Michael Ji, > > > > > >--- AJ Chen <[EMAIL PROTECTED]> wrote: > > > > > > > >>I assume you mean UpdateSegmentFromDB, and there > is > >>no need to run link > >>analysis tool if I want to use the number of > inlinks > >>for nutch score. > >>Right? I tried to find your patch, but couldn't > find > >>it. How to find it? > >>-AJ > >> > >>Piotr Kosiorowski wrote: > >> > >> > >> > >>>UpdateDB copies link information and score from > >>> > >>> > >>the WebDB to segments > >> > >> > >>>so it is important to have score calculated > before > >>> > >>> > >>updatedb is run. > >> > >> > >>>One can use current standard nutch score (based > on > >>> > >>> > >>number of inlinks) > >> > >> > >>>or try to use analyze - I have committed a patch > >>> > >>> > >>for it some time ago > >> > >> > >>>that might help a bit with it disk space > >>> > >>> > >>requirements so the best > >> > >> > >>>approach would be to test it (it worked ok for > me) > >>> > >>> > >>and if it is ok for > >> > >> > >>>you - report it so others can also try it out. > >>>Regards > >>>Piotr > >>>AJ Chen wrote: > >>> > >>> > >>> > >>>>In a whole-web or vertical crawling setting, is > >>>> > >>>> > >>it right that link > >> > >> > >>>>analysis and update segment from DB should be > >>>> > >>>> > >>performed in right > >> > >> > >>>>order before indexing the segments? > >>>> > >>>>There's not much talk about update segment from > >>>> > >>>> > >>DB. I think it should > >> > >> > >>>>be an important step. Could someone point out > >>>> > >>>> > >>when it should be run > >> > >> > >>>>and what the benefits are? > >>>> > >>>>I remember it was mentioned sometime ago that > the > >>>> > >>>> > >>link analysis tool > >> > >> > >>>>does not work yet and the number of in-links > >>>> > >>>> > >>should be used instead. > >> > >> > >>>>Any update? If it's still not working, how to > set > >>>> > >>>> > >>it to use link > >> > >> > >>>>numbers? > >>>> > >>>>Thanks, > >>>>AJ > >>>> > >>>> > >>>> > >>>> > >>> > >>> > >> > >> > > > > > > > > > >__________________________________ > >Yahoo! Mail - PC Magazine Editors' Choice 2005 > >http://mail.yahoo.com > > > > > > > > > > __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com
