Hello Team,
I am having a lot of fun evaluating 0.8-dev, and after following
Stefan's and the doc team's tutorials, have got everything working in
both local and multi-machine modes using hadoop.
In single-machine mode, I have come unstuck, though, trying to expose
"nutch server" on port 8081 so as to be able to deploy multiple
searchers eventually.
In summary, the *site.conf files, host folder, search-servers.txt are
configured and the server is running on port 8081. However when I
perform a search from the front-end webapp, errors appear in the
server's console output.
Here are the details:
Conf files hadoop-site.xml / nutch-site.xml contain:
===================================================
<property>
<name>searcher.dir</name>
<value>/usr/local/nutch/nutch-2006-03-02/monu-conf</value>
<description>
Path to root of index directories. This directory is searched (in
order) for either the file search-servers.txt, containing a list of
distributed search servers, or the directory "index" containing
merged indexes, or the directory "segments" containing segment
indexes.
</description>
</property>
search-server.txt contains:
==========================
[EMAIL PROTECTED] monu-conf]# cat
/usr/local/nutch/nutch-2006-03-02/monu-conf/search-servers.txt
193.203.244.233 8081
Issuing "bin/nutch server" on its own produces:
==============================================
DistributedSearch$Server <port> <index dir>
When "bin/nutch server" is started:
==================================
I usually use the relative path of the crawl directory, and the full
path works too, and the output below suggests that the server is looking
for crawldb, indexes, plugins, linkdb and segments in the right places.
# bin/nutch server 8081 /usr/local/nutch/nutch-2006-03-02/crawl
[EMAIL PROTECTED] nutch-2006-03-02]# bin/nutch server 8081
/usr/local/nutch/nutch-2006-03-02/crawl
060306 191228 10 parsing
jar:file:/usr/local/nutch/nutch-2006-03-02/lib/hadoop-0.1-dev.jar!/hadoo
p-default.xml
060306 191228 10 parsing
file:/usr/local/nutch/nutch-2006-03-02/conf/nutch-default.xml
060306 191228 10 parsing
file:/usr/local/nutch/nutch-2006-03-02/conf/nutch-site.xml
060306 191228 10 parsing
file:/usr/local/nutch/nutch-2006-03-02/conf/hadoop-site.xml
060306 191228 10 opening indexes in
/usr/local/nutch/nutch-2006-03-02/crawl/indexes
060306 191228 10 Plugins: looking in:
/usr/local/nutch/nutch-2006-03-02/plugins
060306 191228 10 Plugin Auto-activation mode: [true]
060306 191228 10 Registered Plugins:
060306 191228 10 HTTP Framework (lib-http)
060306 191228 10 CyberNeko HTML Parser (lib-nekohtml)
060306 191228 10 URL Query Filter (query-url)
060306 191228 10 Site Query Filter (query-site)
060306 191228 10 Html Parse Plug-in (parse-html)
060306 191228 10 Http Protocol Plug-in (protocol-http)
060306 191228 10 the nutch core extension points
(nutch-extensionpoints)
060306 191228 10 Basic Indexing Filter (index-basic)
060306 191228 10 Text Parse Plug-in (parse-text)
060306 191228 10 JavaScript Parser (parse-js)
060306 191228 10 Regex URL Filter (urlfilter-regex)
060306 191228 10 Basic Query Filter (query-basic)
060306 191228 10 Registered Extension-Points:
060306 191228 10 Nutch Protocol
(org.apache.nutch.protocol.Protocol)
060306 191228 10 Nutch URL Filter
(org.apache.nutch.net.URLFilter)
060306 191228 10 HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)
060306 191228 10 Nutch Online Search Results Clustering Plugin
(org.apache.nutch.clustering.OnlineClusterer)
060306 191228 10 Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)
060306 191228 10 Nutch Content Parser
(org.apache.nutch.parse.Parser)
060306 191228 10 Ontology Model Loader
(org.apache.nutch.ontology.Ontology)
060306 191228 10 Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
060306 191228 10 Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
060306 191228 10 opening segments in
/usr/local/nutch/nutch-2006-03-02/crawl/segments
060306 191228 10 found resource common-terms.utf8 at
file:/usr/local/nutch/nutch-2006-03-02/conf/common-terms.utf8
060306 191228 10 opening linkdb in
/usr/local/nutch/nutch-2006-03-02/crawl/linkdb
060306 191228 11 Server listener on port 8081: starting
060306 191228 12 Server handler 0 on 8081: starting
060306 191228 13 Server handler 1 on 8081: starting
060306 191228 14 Server handler 2 on 8081: starting
060306 191228 15 Server handler 3 on 8081: starting
060306 191228 16 Server handler 4 on 8081: starting
060306 191228 17 Server handler 5 on 8081: starting
060306 191228 18 Server handler 6 on 8081: starting
060306 191228 19 Server handler 7 on 8081: starting
060306 191228 20 Server handler 8 on 8081: starting
060306 191228 21 Server handler 9 on 8081: starting
When a search is initiated in the webapp:
========================================
060306 191615 22 Server connection on port 8081 from 193.203.244.233:
starting
060306 191615 12 Call: getSegmentNames()
060306 191615 12 Return: [Ljava.lang.String;@1e859c0
060306 191615 22 Server connection on port 8081 from 193.203.244.233
caught: java.lang.RuntimeException: java.lang.InstantiationException:
org.apache.nutch.searcher.Query
java.lang.RuntimeException: java.lang.InstantiationException:
org.apache.nutch.searcher.Query
at
org.apache.hadoop.io.WritableFactories.newInstance(WritableFactories.jav
a:47)
at
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:230)
at org.apache.hadoop.ipc.RPC$Invocation.readFields(RPC.java:88)
at org.apache.hadoop.ipc.Server$Connection.run(Server.java:138)
Caused by: java.lang.InstantiationException:
org.apache.nutch.searcher.Query
at java.lang.Class.newInstance0(Unknown Source)
at java.lang.Class.newInstance(Unknown Source)
at
org.apache.hadoop.io.WritableFactories.newInstance(WritableFactories.jav
a:45)
... 3 more
060306 191615 22 Server connection on port 8081 from 193.203.244.233:
exiting
060306 191625 23 Server connection on port 8081 from 193.203.244.233:
starting
060306 191625 13 Call: getSegmentNames()
060306 191625 13 Return: [Ljava.lang.String;@1e859c0
060306 191635 12 Call: getSegmentNames()
060306 191635 12 Return: [Ljava.lang.String;@1e859c0
060306 191645 13 Call: getSegmentNames()
060306 191645 13 Return: [Ljava.lang.String;@1e859c0
060306 191655 16 Call: getSegmentNames()
060306 191655 16 Return: [Ljava.lang.String;@1e859c0
060306 191705 14 Call: getSegmentNames()
060306 191705 14 Return: [Ljava.lang.String;@1e859c0
Everything else has worked so well, and the self-same experiment works
fine under 0.7.1 - could this be a bug?
Can someone advise what to do?
Many thanks,
Monu Ogbe
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 268.1.2/274 - Release Date:
03/03/2006