thanks Howie, that guides me,
Michael, --- Howie Wang <[EMAIL PROTECTED]> wrote: > There are probably two settings you'll need to tweak > in nutch-default.xml > > http.content.limit -- by default it's 64K, if the > page is > larger than that, then it essentially truncates the > file. > You could be missing lots of links that appear later > in > the page. > > max.outlinks.per.page -- by default it's 100. You > might > want to increase this since for pages with something > like > a nested navigation sidebar with tons of links, it > won't > get any links from the main part of the page. > > The *.xml files are fairly descriptive. So just > reading through > them can be pretty helpful. I don't know if there is > a full > guide to the config files. > > Howie > > > > > > >1) > >I did several testing running to fetch page from > two > >website. The fetching depth is 10. > > > >After checking log files, I found the actual > fetched > >page linkage is very different for two sites. > > > >In one site with lots of news, only first two depth > >fetching running well and only fetching 5 linkages. > >The actual linkages in that site is far beyond > that. > > > >The other site can fetch till 10 rounds and fetched > >100's linkage. > > > >I wonder if any one has similar experience. Should > I > >setup configure files in /conf/? > > > >2) > >Also, in Nutch/conf/ directory, I found several > >configuration files. Actually, I only modify > >crawl-urlfilter.txt to let it accept all the url > >(*.*). > > > >Is it proper? > > > >I really doesn't touch other conf files. Is there a > >guideline how I use these files? > > > >thanks, > > > >Michael, > > > > > > > >__________________________________________________ > >Do You Yahoo!? > >Tired of spam? Yahoo! Mail has the best spam > protection around > >http://mail.yahoo.com > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
