On 9/18/06, Zaheed Haque <[EMAIL PROTECTED]> wrote:
Hi:

I have just checked your flash movie.. quick observation you are
running tomcat 4.1.31 and there is nothing you are doing that seems
wrong. Anyway after starting the servers can you search using the
following command

bin/nutch org.apache.nutch.search.NutchBean bobdocs

what do you get .. and what's in the logfile?

If you get something then probably its tomcat 4.1.31 is  the problem.

[EMAIL PROTECTED] ~/posao/nutch/novo/nutch-0.8 $ ./bin/nutch
org.apache.nutch.search.NutchBean bobdocs
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/nutch/search/NutchBean
[EMAIL PROTECTED] ~/posao/nutch/novo/nutch-0.8 $

It doesn't really tell me if tomcat is the problem, does it? I've
added debug statements to the nutch script so I can check if my
CLASSPATH is correct. I have no idea why nutch can't find the
NutchBean class.
I have, however, checked out the nutch 0.8 and hadoop 0.5 sources from
the svn repository, imported them into an eclipse project and used the
DistributedSearch Client and Server "public static void main" methods.
My experiments showed that my problem is not with tomcat or the nutch
web UI, because the DistributedSearch.Client also returned 0 results
regardless of the query or combination of indexes. I've managed to
confirm that the Client sees all the search servers, but it simply
fails to return any results.
I also ran across something in the logs that I didn't see before. The
following is periodically output (regardless of what I'm doing in
eclipse, as long as the Client thread is active):

2006-09-18 13:55:30,352 INFO  searcher.DistributedSearch - STATS: 2
servers, 2 segments.
2006-09-18 13:55:40,539 INFO  searcher.DistributedSearch - Querying
segments from search servers...
2006-09-18 13:55:40,559 INFO  searcher.DistributedSearch - STATS: 2
servers, 2 segments.
2006-09-18 13:55:50,564 INFO  searcher.DistributedSearch - Querying
segments from search servers...

Going back to square one...am I building the crawls correctly?
./bin/nutch crawl urls -threads 15 -topN 10 -depth 3

Is it the fact that I'm doing an intranet crawl every time, instead of
the multi-step whole web crawl? What else, what am I missing?

t.n.a.

Reply via email to