Xin-Yi,

As Doug recently pointed out to me, it is an overstatement to say that
"link analysis is not necessary" for internet crawling. Links can have
analytic value in an intranet crawl. Frequently referred-to pages,
such as a help page, would likely be highly relevant to users and also
discovered by link analysis.

That said, link analysis is presently an expensive operation in Nutch,
and I don't believe this cost is justified in all situations. Nutch
installers will need to make a cost/benefit analysis to decide if the
current link analyzer is worth running to get better boost values.

(ps: yes, documentation could be clearer on this matter...)

--Matt Kangas

On Thu, 30 Dec 2004 08:48:51 -0800 (PST), Xin-Yi Liu
<[EMAIL PROTECTED]> wrote:
> crawltool is intended for intranet crawling, where
> link analysis is not necessary.  if you need to use
> link analysis, it is best to go through all the
> different phases (whole internet crawling in the
> tutorial.)
> 
> --- Vikas Gupta <[EMAIL PROTECTED]> wrote:
> 
> > I think we need the following 2 lines before we
> > re-generate the fetchlist.
> > This is to ensure that the new fetchlist has the
> > correct link analysis
> > score. Worked for me.
> >
> > -Vikas
> >
> > In CrawlTool.java::main()
> >
> >     ...
> >
> >     //need these 2 lines - compute link analysis
> > score
> >     String argv2[] = {db, "75"};//75 iterations of
> > link analysis
> >     net.nutch.tools.LinkAnalysisTool.main(argv2);
> >
> >     // generate a single segment containing all
> > pages in the db
> >     FetchListTool.main(new String[] { db, segments,
> > "-adddays",
> > ""+Integer.MAX_VALUE });
> >     String segment = getLatestSegment(segments);
> >
> >     ...
> >
> >


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to