1)
Does that mean a page's outlink number will count for
its' score? ( I seems to see this logic in code, but
can't remember which one )

Then, my question is --- how accurate the score for
this method will be?

I mean, theoretically, a page's score depends on the
number of in-links and the score of source page of
these in-links. 

2) 
How much link analysis tool cost? for example, if I
have 10 million pages in Webdb, how long it will take
to run?

thanks,

Michael Ji,

--- "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
wrote:

> The linkanalisis tool needs long time to process.
> Doug wrotte some comments from it:
> The fetchlist.score.by.link.count and the
> indexer.boost.by.link.count to 
> true. And forgot using of linkanalysis tool.
> I use these method from since 2005 June, without
> problem.
> With the linkanalysis tool the scoring is better,
> but with the explained 
> setup it is near scoring - without many resource
> usage.
> 
> Michael Ji wrotte:
> 
> >Hi,
> >
> >As my understanding, link anaylsis is neccessary to
> 
> >run whenever a new fetching is updated to webdb.
> >Because the link graphic is changed ( it is
> possible
> >that new links are added and old links are deleted
> ),
> >the score for each node is changed so a
> recaculation
> >is neccessary. 
> >
> >Link analysis will update the score for each node
> (by
> >page) in webdb, then updatesegmentfromdb needs to
> run
> >to copy recalculated score to segment.
> >
> >I can't see a point that we can skip link anaylsis.
> Am
> >I missing something important? Let me know.
> >
> >thanks,
> >
> >Michael Ji,
> >
> >
> >--- AJ Chen <[EMAIL PROTECTED]> wrote:
> >
> >  
> >
> >>I assume you mean UpdateSegmentFromDB, and there
> is
> >>no need to run link 
> >>analysis tool if I want to use the number of
> inlinks
> >>for nutch score. 
> >>Right? I tried to find your patch, but couldn't
> find
> >>it. How to find it?
> >>-AJ
> >>
> >>Piotr Kosiorowski wrote:
> >>
> >>    
> >>
> >>>UpdateDB copies link information and score from
> >>>      
> >>>
> >>the WebDB to segments 
> >>    
> >>
> >>>so it is important to have score calculated
> before
> >>>      
> >>>
> >>updatedb is run. 
> >>    
> >>
> >>>One can use current standard nutch score (based
> on
> >>>      
> >>>
> >>number of inlinks) 
> >>    
> >>
> >>>or try to use analyze - I have committed a patch
> >>>      
> >>>
> >>for it some time ago 
> >>    
> >>
> >>>that might help a bit with it disk space
> >>>      
> >>>
> >>requirements so the best 
> >>    
> >>
> >>>approach would be to test it (it worked ok for
> me)
> >>>      
> >>>
> >>and if it is ok for 
> >>    
> >>
> >>>you - report it so others can also try it out.
> >>>Regards
> >>>Piotr
> >>>AJ Chen wrote:
> >>>
> >>>      
> >>>
> >>>>In a whole-web or vertical crawling setting, is
> >>>>        
> >>>>
> >>it right that link 
> >>    
> >>
> >>>>analysis and update segment from DB should be
> >>>>        
> >>>>
> >>performed in right 
> >>    
> >>
> >>>>order before indexing the segments?
> >>>>
> >>>>There's not much talk about update segment from
> >>>>        
> >>>>
> >>DB. I think it should 
> >>    
> >>
> >>>>be an important step. Could someone point out
> >>>>        
> >>>>
> >>when it should be  run 
> >>    
> >>
> >>>>and what the benefits are?
> >>>>
> >>>>I remember it was mentioned sometime ago that
> the
> >>>>        
> >>>>
> >>link analysis tool 
> >>    
> >>
> >>>>does not work yet and the number of in-links
> >>>>        
> >>>>
> >>should be used instead. 
> >>    
> >>
> >>>>Any update? If it's still not working, how to
> set
> >>>>        
> >>>>
> >>it to use link 
> >>    
> >>
> >>>>numbers?
> >>>>
> >>>>Thanks,
> >>>>AJ
> >>>>
> >>>>
> >>>>        
> >>>>
> >>>      
> >>>
> >>    
> >>
> >
> >
> >
> >             
> >__________________________________ 
> >Yahoo! Mail - PC Magazine Editors' Choice 2005 
> >http://mail.yahoo.com
> >
> >
> >  
> >
> 
> 



                
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

Reply via email to