Hello Team, 

I am having a lot of fun evaluating 0.8-dev, and after following
Stefan's and the doc team's tutorials, have got everything working in
both local and multi-machine modes using hadoop.  

In single-machine mode, I have come unstuck, though, trying to expose
"nutch server" on port 8081 so as to be able to deploy multiple
searchers eventually.  

In summary, the *site.conf files, host folder, search-servers.txt are
configured and the server is running on port 8081.  However when I
perform a search from the front-end webapp, errors appear in the
server's console output.  

Here are the details:


Conf files hadoop-site.xml / nutch-site.xml contain:
===================================================

<property>
  <name>searcher.dir</name>
  <value>/usr/local/nutch/nutch-2006-03-02/monu-conf</value>
  <description>
  Path to root of index directories.  This directory is searched (in
  order) for either the file search-servers.txt, containing a list of
  distributed search servers, or the directory "index" containing
  merged indexes, or the directory "segments" containing segment
  indexes.
  </description>
</property>

search-server.txt contains:
==========================
[EMAIL PROTECTED] monu-conf]# cat
/usr/local/nutch/nutch-2006-03-02/monu-conf/search-servers.txt
193.203.244.233 8081

Issuing "bin/nutch server" on its own produces:
==============================================

DistributedSearch$Server <port> <index dir>

When "bin/nutch server" is started:
==================================

I usually use the relative path of the crawl directory, and the full
path works too, and the output below suggests that the server is looking
for crawldb, indexes, plugins, linkdb and segments in the right places.

# bin/nutch server 8081 /usr/local/nutch/nutch-2006-03-02/crawl

[EMAIL PROTECTED] nutch-2006-03-02]# bin/nutch server 8081
/usr/local/nutch/nutch-2006-03-02/crawl
060306 191228 10 parsing
jar:file:/usr/local/nutch/nutch-2006-03-02/lib/hadoop-0.1-dev.jar!/hadoo
p-default.xml
060306 191228 10 parsing
file:/usr/local/nutch/nutch-2006-03-02/conf/nutch-default.xml
060306 191228 10 parsing
file:/usr/local/nutch/nutch-2006-03-02/conf/nutch-site.xml
060306 191228 10 parsing
file:/usr/local/nutch/nutch-2006-03-02/conf/hadoop-site.xml
060306 191228 10 opening indexes in
/usr/local/nutch/nutch-2006-03-02/crawl/indexes
060306 191228 10 Plugins: looking in:
/usr/local/nutch/nutch-2006-03-02/plugins
060306 191228 10 Plugin Auto-activation mode: [true]
060306 191228 10 Registered Plugins:
060306 191228 10        HTTP Framework (lib-http)
060306 191228 10        CyberNeko HTML Parser (lib-nekohtml)
060306 191228 10        URL Query Filter (query-url)
060306 191228 10        Site Query Filter (query-site)
060306 191228 10        Html Parse Plug-in (parse-html)
060306 191228 10        Http Protocol Plug-in (protocol-http)
060306 191228 10        the nutch core extension points
(nutch-extensionpoints)
060306 191228 10        Basic Indexing Filter (index-basic)
060306 191228 10        Text Parse Plug-in (parse-text)
060306 191228 10        JavaScript Parser (parse-js)
060306 191228 10        Regex URL Filter (urlfilter-regex)
060306 191228 10        Basic Query Filter (query-basic)
060306 191228 10 Registered Extension-Points:
060306 191228 10        Nutch Protocol
(org.apache.nutch.protocol.Protocol)
060306 191228 10        Nutch URL Filter
(org.apache.nutch.net.URLFilter)
060306 191228 10        HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)
060306 191228 10        Nutch Online Search Results Clustering Plugin
(org.apache.nutch.clustering.OnlineClusterer)
060306 191228 10        Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)
060306 191228 10        Nutch Content Parser
(org.apache.nutch.parse.Parser)
060306 191228 10        Ontology Model Loader
(org.apache.nutch.ontology.Ontology)
060306 191228 10        Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
060306 191228 10        Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
060306 191228 10 opening segments in
/usr/local/nutch/nutch-2006-03-02/crawl/segments
060306 191228 10 found resource common-terms.utf8 at
file:/usr/local/nutch/nutch-2006-03-02/conf/common-terms.utf8
060306 191228 10 opening linkdb in
/usr/local/nutch/nutch-2006-03-02/crawl/linkdb
060306 191228 11 Server listener on port 8081: starting
060306 191228 12 Server handler 0 on 8081: starting
060306 191228 13 Server handler 1 on 8081: starting
060306 191228 14 Server handler 2 on 8081: starting
060306 191228 15 Server handler 3 on 8081: starting
060306 191228 16 Server handler 4 on 8081: starting
060306 191228 17 Server handler 5 on 8081: starting
060306 191228 18 Server handler 6 on 8081: starting
060306 191228 19 Server handler 7 on 8081: starting
060306 191228 20 Server handler 8 on 8081: starting
060306 191228 21 Server handler 9 on 8081: starting

When a search is initiated in the webapp:
========================================

060306 191615 22 Server connection on port 8081 from 193.203.244.233:
starting
060306 191615 12 Call: getSegmentNames()
060306 191615 12 Return: [Ljava.lang.String;@1e859c0
060306 191615 22 Server connection on port 8081 from 193.203.244.233
caught: java.lang.RuntimeException: java.lang.InstantiationException:
org.apache.nutch.searcher.Query
java.lang.RuntimeException: java.lang.InstantiationException:
org.apache.nutch.searcher.Query
        at
org.apache.hadoop.io.WritableFactories.newInstance(WritableFactories.jav
a:47)
        at
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:230)
        at org.apache.hadoop.ipc.RPC$Invocation.readFields(RPC.java:88)
        at org.apache.hadoop.ipc.Server$Connection.run(Server.java:138)
Caused by: java.lang.InstantiationException:
org.apache.nutch.searcher.Query
        at java.lang.Class.newInstance0(Unknown Source)
        at java.lang.Class.newInstance(Unknown Source)
        at
org.apache.hadoop.io.WritableFactories.newInstance(WritableFactories.jav
a:45)
        ... 3 more
060306 191615 22 Server connection on port 8081 from 193.203.244.233:
exiting
060306 191625 23 Server connection on port 8081 from 193.203.244.233:
starting
060306 191625 13 Call: getSegmentNames()
060306 191625 13 Return: [Ljava.lang.String;@1e859c0
060306 191635 12 Call: getSegmentNames()
060306 191635 12 Return: [Ljava.lang.String;@1e859c0
060306 191645 13 Call: getSegmentNames()
060306 191645 13 Return: [Ljava.lang.String;@1e859c0
060306 191655 16 Call: getSegmentNames()
060306 191655 16 Return: [Ljava.lang.String;@1e859c0
060306 191705 14 Call: getSegmentNames()
060306 191705 14 Return: [Ljava.lang.String;@1e859c0

Everything else has worked so well, and the self-same experiment works
fine under 0.7.1 - could this be a bug?

Can someone advise what to do?

Many thanks, 

Monu Ogbe

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 268.1.2/274 - Release Date:
03/03/2006
 

Reply via email to