Hi Fredrik:

the command
"
bin/nutch crawl * -dir * -depth d 
"
is only working for intranet? means only can fetch
within one particular domain?

I want to do a global fetching, but only limited to a
limited web list, so, should I use command set of
"bin/nutch admin db -create
...
" 
instead?

but, 

I will try your "bin/nutch index" command,

thanks, 

Michael,

--- Fredrik Andersson <[EMAIL PROTECTED]>
wrote:

> No, I think you're right that indexing is done
> automatically after
> intranet crawls. Just try "bin/nutch index
> yourSegment", if it says
> that 'index.done exists already',then well.. you get
> the point. I
> don't know what platform you're using, but try doing
> a "grep -r <some
> text in your crawled site> *". The grep command
> should match on both
> your segment data and the binary index that have
> been built.
> I have run in to a similar problem, where the
> Websearch thingie does
> not work, but a manual search using the
> IndexSearcher class does work.
> Also, try opening your index from the LUKE program
> if you haven't
> already. It's a very handy tool for validating and
> test-searching your
> data.
> 
> Good luck,
> Fredrik
> 
> On 7/23/05, Feng (Michael) Ji <[EMAIL PROTECTED]>
> wrote:
> > hi Fredrik:
> > 
> > Actually, I use nutch/crawl command as following:
> > "
> > bin/nutch crawl urls -dir crawl-s -depth 1 >&
> > crawl-s.log
> > "
> > I guess I don't need to do index explicitly after
> > crawl. Is it right?
> > 
> > My sample crawling doesn't go deeply and only stop
> at
> > the home page of the URL. 
> > 
> > I guess the -depth is defined for crawling the
> website
> > which is pointed out from initial page, is it
> right?
> > 
> > One thing I found, the result in /segments/ will
> have
> > the same number of sub-dir (which are all time
> stamped
> > number) as the -depth parameter,
> > 
> > thanks a lot,
> > 
> > Michael,
> > 
> > --- Fredrik Andersson <[EMAIL PROTECTED]>
> > wrote:
> > 
> > > Hi Michael.
> > > 
> > > Have you indexed the crawl/segment? Easy to
> forget
> > > sometimes : ) Also,
> > > check the crawler-tools.xml or whatever it's
> called,
> > > so that ASP pages
> > > aren't blocked or anything. The Nutch crawler
> > > doesn't by default
> > > handle parameters
> (committees.asp?viewPerson=Ji), I
> > > guess that could
> > > be an issue as well. No errors or funny stuff in
> the
> > > logs?
> > > 
> > > Fredrik
> > > 
> > > On 7/23/05, Feng (Michael) Ji <[EMAIL PROTECTED]>
> > > wrote:
> > > > Hi there:
> > > > 
> > > > I have a question about the crawling depth VS
> > > search
> > > > result. I attached part of my log information;
> > > > 
> > > > "
> > > > 050722 181508 fetching
> > > >
> http://www.committemuse.com/content/committees.asp
> > > > :
> > > > :
> > > > 050722 181508 fetching
> > > > :
> > > > 050722 181508 status: segment 20050722181440,
> 100
> > > > pages, 4 errors, 1952888 bytes, 26204 ms
> > > > "
> > > > 
> > > > And I see segment in my tomcat box.
> > > > 
> > > > But when I do search the specific word in that
> > > page,
> > > > it return 0.
> > > > 
> > > > Is that because the page is written in "asp"?
> > > > 
> > > > thanks,
> > > > 
> > > > Michael,
> > > > 
> > > > 
> > > > 
> > > >
> __________________________________________________
> > > > Do You Yahoo!?
> > > > Tired of spam?  Yahoo! Mail has the best spam
> > > protection around
> > > > http://mail.yahoo.com
> > > >
> > > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> > http://mail.yahoo.com 
> >
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to