Howdy,
I can do a fetch of several test urls and invert the links and the
segments created look similar to a populated Lucene index.
The nutch war file was then placed in the regular tomcat webapps
directory (not ROOT) and the nutch-site.xml properties point to the
crawl directory.
However, nutchbean doesn't see the segments and web browser
searches return no hits. Also, I tried to access the crawl data directly
using the api but am getting "can't find plugin" error messages.
Maybe I need to place the plugin path in nutch-site.xml.
Peter
On Dec 12, 2008, at 6:30 PM, elangovan anbalahan wrote:
Hi there..
I am assuming that you have succesfully configured nutch and are
able to
crawl websites.
Before i suggest you any solution , let me know the following;
1) Have you deployed nutch-XX.war on tomcat ? ( XX-means nutch
version no.)
2)After deployment , you have to configure nutch-site.xml inside
WEB-INF/classes folder, to tell tomcat , there to look for crawled
data.
If you have done this let me know.
On Fri, Dec 12, 2008 at 6:42 PM, Peter W.
<[email protected]>wrote:
Hello,
I'm new to nutch and have successfully configured the fetching
application
but had some questions about its tomcat search component:
a. should indexes be stored under the webapps dir?
b. can these segments be read with a Luke type application?
c. are the pages being stored as html? if so how do you filter out
tags
with an analyzer?
d. is it possible to only check for http status code 200's
e. how do you customize the search results templates?
Thanks,
Peter