Hi Fabian,

sorry, but I can only "reply" with a bunch of questions . . .

On 8/14/07, Fabian López <[EMAIL PROTECTED]> wrote:
>
> Hi,
> after following the tutorial of Nutch 0.8, when I try to search with
>
> bin/nutch org.apache.nutch.searcher.NutchBean apache
>
> I receive "Total Hits:0"
>
> I have followed all the steps:
>
>
>    1. Create a directory with a flat file of root urls. For example, to
>    crawl the nutch site you might start with a file named
> urls/nutchcontaining the url of just the Nutch home page. All other
> Nutch pages should
>    be reachable from this page. The urls/nutch file would thus contain:
>
>    http://lucene.apache.org/nutch/
>
>    2. Edit the file conf/crawl-urlfilter.txt and replace
> MY.DOMAIN.NAMEwith the name of the domain you wish to crawl. For
> example, if you wished to
>    limit the crawl to the apache.org domain, the line should read:
>
>    +^http://([a-z0-9]*\.)*apache.org/
>
>    This will include any url in the domain apache.org.
>    3. Edit the file conf/nutch-site.xml, insert at minimum following
>    properties into it and edit in proper values for the properties....
>
> Then I executed:
>
> bin/nutch crawl urls -dir crawl -depth 3 -topN 50
>
> Maybe the only problem that I find is when fetching, there is a
> java.lang.NullpointerException.
> Questions are:
>
> 1.- Is this the cause of the problem? How can I solution it?


Can you be a little bit more specific about the NPE? What is it's
stacktrace? Did you have a look at hadoop.log (located in
"path_to_nutch/log/")? Probarbly you can find a hint there . . .

2.- Is this the question why y always find the problem in
> http://localhost:8080 the HTTP STATUS 500,
> No Context configured to process this request - HTTP Status 500
> <http://www.mail-archive.com/[email protected]/msg09150.html>


I don't think so . . . these two errors are not related to each other. The
"crawl" job has no dependencies on tomcat. Did you use the tomcat package
from the ubuntu repository? Probarbly try things out with a downloaded
version from apache.
I tried out nutch with ubuntu as well (tomcat from ubuntu-rep.) and
encountered troubles as well . . . , but too long time ago to remember

tHANKS A LOT
> Fabian
>

hope it helps at least a litttle bit,

martin

Reply via email to