Re: nutch questions

Peter W . Mon, 15 Dec 2008 14:14:55 -0800

Howdy,

I can do a fetch of several test urls and invert the links and the
segments created look similar to a populated Lucene index.


The nutch war file was then placed in the regular tomcat webapps
directory (not ROOT) and the nutch-site.xml properties point to the
crawl directory.

However, nutchbean doesn't see the segments and web browser
searches return no hits. Also, I tried to access the crawl data directly
using the api but am getting "can't find plugin" error messages.

Maybe I need to place the plugin path in nutch-site.xml.

Peter



On Dec 12, 2008, at 6:30 PM, elangovan anbalahan wrote:

Hi there..
I am assuming that you have succesfully configured nutch and areable to
crawl websites.

Before i suggest you any solution , let me know the following;
1) Have you deployed nutch-XX.war on tomcat ? ( XX-means nutchversion no.)
2)After deployment , you have to configure nutch-site.xml inside
WEB-INF/classes folder, to tell tomcat , there to look for crawleddata.
If you have done this let me know.
On Fri, Dec 12, 2008 at 6:42 PM, Peter W.<[email protected]>wrote:
Hello,
I'm new to nutch and have successfully configured the fetchingapplication
but had some questions about its tomcat search component:

a. should indexes be stored under the webapps dir?
b. can these segments be read with a Luke type application?
c. are the pages being stored as html? if so how do you filter outtags
with an analyzer?
d. is it possible to only check for http status code 200's
e. how do you customize the search results templates?

Thanks,

Peter

Re: nutch questions

Reply via email to