thanks Howie,

that guides me,

Michael,

--- Howie Wang <[EMAIL PROTECTED]> wrote:

> There are probably two settings you'll need to tweak
> in nutch-default.xml
> 
> http.content.limit -- by default it's 64K, if the
> page is
> larger than that, then it essentially truncates the
> file.
> You could be missing lots of links that appear later
> in
> the page.
> 
> max.outlinks.per.page -- by default it's 100. You
> might
> want to increase this since for pages with something
> like
> a nested navigation sidebar with tons of links, it
> won't
> get any links from the main part of the page.
> 
> The *.xml files are fairly descriptive. So just
> reading through
> them can be pretty helpful. I don't know if there is
> a full
> guide to the config files.
> 
> Howie
> 
> 
> 
> >
> >1)
> >I did several testing running to fetch page from
> two
> >website. The fetching depth is 10.
> >
> >After checking log files, I found the actual
> fetched
> >page linkage is very different for two sites.
> >
> >In one site with lots of news, only first two depth
> >fetching running well and only fetching 5 linkages.
> >The actual linkages in that site is far beyond
> that.
> >
> >The other site can fetch till 10 rounds and fetched
> >100's linkage.
> >
> >I wonder if any one has similar experience. Should
> I
> >setup configure files in /conf/?
> >
> >2)
> >Also, in Nutch/conf/ directory, I found several
> >configuration files. Actually, I only modify
> >crawl-urlfilter.txt to let it accept all the url
> >(*.*).
> >
> >Is it proper?
> >
> >I really doesn't touch other conf files. Is there a
> >guideline how I use these files?
> >
> >thanks,
> >
> >Michael,
> >
> >
> >
> >__________________________________________________
> >Do You Yahoo!?
> >Tired of spam?  Yahoo! Mail has the best spam
> protection around
> >http://mail.yahoo.com
> 
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to