hi Fredrik:

Actually, I use nutch/crawl command as following:
"
bin/nutch crawl urls -dir crawl-s -depth 1 >&
crawl-s.log
"
I guess I don't need to do index explicitly after
crawl. Is it right?

My sample crawling doesn't go deeply and only stop at
the home page of the URL. 

I guess the -depth is defined for crawling the website
which is pointed out from initial page, is it right?

One thing I found, the result in /segments/ will have
the same number of sub-dir (which are all time stamped
number) as the -depth parameter,

thanks a lot,

Michael,

--- Fredrik Andersson <[EMAIL PROTECTED]>
wrote:

> Hi Michael.
> 
> Have you indexed the crawl/segment? Easy to forget
> sometimes : ) Also,
> check the crawler-tools.xml or whatever it's called,
> so that ASP pages
> aren't blocked or anything. The Nutch crawler
> doesn't by default
> handle parameters (committees.asp?viewPerson=Ji), I
> guess that could
> be an issue as well. No errors or funny stuff in the
> logs?
> 
> Fredrik
> 
> On 7/23/05, Feng (Michael) Ji <[EMAIL PROTECTED]>
> wrote:
> > Hi there:
> > 
> > I have a question about the crawling depth VS
> search
> > result. I attached part of my log information;
> > 
> > "
> > 050722 181508 fetching
> > http://www.committemuse.com/content/committees.asp
> > :
> > :
> > 050722 181508 fetching
> > :
> > 050722 181508 status: segment 20050722181440, 100
> > pages, 4 errors, 1952888 bytes, 26204 ms
> > "
> > 
> > And I see segment in my tomcat box.
> > 
> > But when I do search the specific word in that
> page,
> > it return 0.
> > 
> > Is that because the page is written in "asp"?
> > 
> > thanks,
> > 
> > Michael,
> > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> >
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to