hi Fredrik: Actually, I use nutch/crawl command as following: " bin/nutch crawl urls -dir crawl-s -depth 1 >& crawl-s.log " I guess I don't need to do index explicitly after crawl. Is it right?
My sample crawling doesn't go deeply and only stop at the home page of the URL. I guess the -depth is defined for crawling the website which is pointed out from initial page, is it right? One thing I found, the result in /segments/ will have the same number of sub-dir (which are all time stamped number) as the -depth parameter, thanks a lot, Michael, --- Fredrik Andersson <[EMAIL PROTECTED]> wrote: > Hi Michael. > > Have you indexed the crawl/segment? Easy to forget > sometimes : ) Also, > check the crawler-tools.xml or whatever it's called, > so that ASP pages > aren't blocked or anything. The Nutch crawler > doesn't by default > handle parameters (committees.asp?viewPerson=Ji), I > guess that could > be an issue as well. No errors or funny stuff in the > logs? > > Fredrik > > On 7/23/05, Feng (Michael) Ji <[EMAIL PROTECTED]> > wrote: > > Hi there: > > > > I have a question about the crawling depth VS > search > > result. I attached part of my log information; > > > > " > > 050722 181508 fetching > > http://www.committemuse.com/content/committees.asp > > : > > : > > 050722 181508 fetching > > : > > 050722 181508 status: segment 20050722181440, 100 > > pages, 4 errors, 1952888 bytes, 26204 ms > > " > > > > And I see segment in my tomcat box. > > > > But when I do search the specific word in that > page, > > it return 0. > > > > Is that because the page is written in "asp"? > > > > thanks, > > > > Michael, > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
