Re: Query on merged indexes returned 0 hit - test case included (Nutch 0.8)

Andrzej Bialecki Tue, 04 Apr 2006 09:29:51 -0700

Olive g wrote:

Hi Andrzej & other gurus who might be reading this message :-):
I ran some tests and somehow my query returned 0 hit against mergedindexes. Here is my test case and it's a bit long, thank you inadvance for your patience:
1. crawled the first 100 urls
~/nutch/search/bin/nutch crawl urls-001-100 -dir test1 -depth 1 >&test1.log&
2. set searcher.dir to test1

3. query for "movie"
~/nutch/search/bin/nutch org.apache.nutch.searcher.NutchBean movie
it returned 64 hits (a web research with tomcat returned the sameresult)
4. crawled the second 100 urls
~/nutch/search/bin/nutch crawl urls-101-200 -dir test2 -depth 1 >&test2.log&
5. set searcher.dir to test2

6. query for "movie"
 ~/nutch/search/bin/nutch org.apache.nutch.searcher.NutchBean movie
it returned 55 hits (a web research with tomcat returned the sameresult)
7.  attempted to merge using the following command:
 ../search/bin/nutch merge test3 test1 test2 >& merge-test3&
 returned error:
Exception in thread "main" java.rmi.RemoteException:java.io.IOException: Cannot
open filename /user/root/test1/crawldb/segments
       at org.apache.hadoop.dfs.NameNode.open(NameNode.java:120)

8.  attempted to merge again using the following command:
../search/bin/nutch merge test4 test1/indexes test2/indexes >&merge-test4&
  merged successfully with no errors

9. set searcher.dir to test4

10.  query for "movie" by:
  ~/nutch/search/bin/nutch org.apache.nutch.searcher.NutchBean movie
and it returned 0 hit (a web research with tomcat returned the sameresult)
 060403 201545 10 opening segments in test4/segments
 060403 201545 10 found resource common-terms.utf8 at
 file:/root/nutch/search/conf/common-terms.utf8
 060403 201545 10 opening linkdb in test4/linkdb
 Total hits: 0
It appeared to be looking for test4/segments and test4/linkdb whichdid not exist?

Well, the short answer is that you cannot at the moment merge crawldbsor linkdbs. As a consequence, you cannot use multiple outputs of 'nutchcrawl' together (because NutchBean needs to reference a single linkdbduring searching).


This is technically possible, but simply not implemented (yet).

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Query on merged indexes returned 0 hit - test case included (Nutch 0.8)

Reply via email to