Hi, after following the tutorial of Nutch 0.8, when I try to search with bin/nutch org.apache.nutch.searcher.NutchBean apache
I receive "Total Hits:0" I have followed all the steps: 1. Create a directory with a flat file of root urls. For example, to crawl the nutch site you might start with a file named urls/nutchcontaining the url of just the Nutch home page. All other Nutch pages should be reachable from this page. The urls/nutch file would thus contain: http://lucene.apache.org/nutch/ 2. Edit the file conf/crawl-urlfilter.txt and replace MY.DOMAIN.NAMEwith the name of the domain you wish to crawl. For example, if you wished to limit the crawl to the apache.org domain, the line should read: +^http://([a-z0-9]*\.)*apache.org/ This will include any url in the domain apache.org. 3. Edit the file conf/nutch-site.xml, insert at minimum following properties into it and edit in proper values for the properties.... Then I executed: bin/nutch crawl urls -dir crawl -depth 3 -topN 50 Maybe the only problem that I find is when fetching, there is a java.lang.NullpointerException. Questions are: 1.- Is this the cause of the problem? How can I solution it? 2.- Is this the question why y always find the problem in http://localhost:8080 the HTTP STATUS 500, No Context configured to process this request - HTTP Status 500 <http://www.mail-archive.com/[email protected]/msg09150.html> tHANKS A LOT Fabian
