Michael,
You DON'T need to copy the segments or db to the root of tomcat, but you DO
need to start tomcat from the directory directly above the segments
directory (or from the crawl directory if you've done intranet crawling).
e.g. if you have /usr/local/nutch/segments, you might type:
cd /usr/local/nutch
/usr/local/tomcat/bin/catalina start
to start tomcat.
It's all explained in the tutorial at
http://lucene.apache.org/nutch/tutorial.html
Just follow it step by step and you should be ok.
Cheers...
Roger
----- Original Message -----
From: "Feng (Michael) Ji" <[EMAIL PROTECTED]>
To: <[email protected]>; "Fredrik Andersson"
<[EMAIL PROTECTED]>
Sent: Sunday, July 24, 2005 1:36 AM
Subject: Re: search result
hi Fredrik:
After I did crawling in Nutch, I copy segments to root
of tomcat.
I wonder if I need to do the same thing for index and
db directory.
thanks,
Michael,
--- Fredrik Andersson <[EMAIL PROTECTED]>
wrote:
No, I think you're right that indexing is done
automatically after
intranet crawls. Just try "bin/nutch index
yourSegment", if it says
that 'index.done exists already',then well.. you get
the point. I
don't know what platform you're using, but try doing
a "grep -r <some
text in your crawled site> *". The grep command
should match on both
your segment data and the binary index that have
been built.
I have run in to a similar problem, where the
Websearch thingie does
not work, but a manual search using the
IndexSearcher class does work.
Also, try opening your index from the LUKE program
if you haven't
already. It's a very handy tool for validating and
test-searching your
data.
Good luck,
Fredrik
On 7/23/05, Feng (Michael) Ji <[EMAIL PROTECTED]>
wrote:
> hi Fredrik:
>
> Actually, I use nutch/crawl command as following:
> "
> bin/nutch crawl urls -dir crawl-s -depth 1 >&
> crawl-s.log
> "
> I guess I don't need to do index explicitly after
> crawl. Is it right?
>
> My sample crawling doesn't go deeply and only stop
at
> the home page of the URL.
>
> I guess the -depth is defined for crawling the
website
> which is pointed out from initial page, is it
right?
>
> One thing I found, the result in /segments/ will
have
> the same number of sub-dir (which are all time
stamped
> number) as the -depth parameter,
>
> thanks a lot,
>
> Michael,
>
> --- Fredrik Andersson <[EMAIL PROTECTED]>
> wrote:
>
> > Hi Michael.
> >
> > Have you indexed the crawl/segment? Easy to
forget
> > sometimes : ) Also,
> > check the crawler-tools.xml or whatever it's
called,
> > so that ASP pages
> > aren't blocked or anything. The Nutch crawler
> > doesn't by default
> > handle parameters
(committees.asp?viewPerson=Ji), I
> > guess that could
> > be an issue as well. No errors or funny stuff in
the
> > logs?
> >
> > Fredrik
> >
> > On 7/23/05, Feng (Michael) Ji <[EMAIL PROTECTED]>
> > wrote:
> > > Hi there:
> > >
> > > I have a question about the crawling depth VS
> > search
> > > result. I attached part of my log information;
> > >
> > > "
> > > 050722 181508 fetching
> > >
http://www.committemuse.com/content/committees.asp
> > > :
> > > :
> > > 050722 181508 fetching
> > > :
> > > 050722 181508 status: segment 20050722181440,
100
> > > pages, 4 errors, 1952888 bytes, 26204 ms
> > > "
> > >
> > > And I see segment in my tomcat box.
> > >
> > > But when I do search the specific word in that
> > page,
> > > it return 0.
> > >
> > > Is that because the page is written in "asp"?
> > >
> > > thanks,
> > >
> > > Michael,
> > >
> > >
> > >
> > >
__________________________________________________
> > > Do You Yahoo!?
> > > Tired of spam? Yahoo! Mail has the best spam
> > protection around
> > > http://mail.yahoo.com
> > >
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam
protection around
> http://mail.yahoo.com
>
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers