Re: Distributed Search problem

Dennis Kubes Mon, 14 Dec 2009 06:08:06 -0800

Index and segments is the minimum yes. You only need the segments forthe indexes that you are serving on the local box.


Dennis


MilleBii wrote:

Ok I don't per say need distributed search.
I was trying to avoid a copy to local file system to optimize on
ressources working off HDFS

What is the minimum to copy over index and segments ? Not crawldb ?
All data in segments ?

2009/12/13, Dennis Kubes <ku...@apache.org>:

The assumption is wrong.  Distributed search is done from indexes on
local file systems not HDFS.

It doesn't return because lucene is trying to search across the indexes
in HDFS in real time which doesn't work because of network overhead.
Depending on the size of the indexes it may actually return after some
time but I have seen it timeout even for small indexes.

Short of it is, move the indexes and segments to a local file system,
then point the distributed search server at their parent directory.
Something like this:

bin/nutch server 8100 /full/path/to/parent/of/local/indexes

It technically doesn't have to be a full path.  Then point the
searcher.dir to a directory with search-servers.txt as you have done.
The search-servers.txt points like you have it.

Dennis

MilleBii wrote:

I'm trying to search directly from the index in hdfs so in distributed
mode

What do I have wrong ?

created  nutch/conf/search-servers.txt with
 localhost 8100

pointed  search.dir in nutch-site.xml to nutch/conf

tried to start search server with either :
 + nutch server 8100  crawl
 + nutch server 8100 hdfs://localhost:9000/user/nutch/crawl

The nutch server command doesn't return to prompt ???
Is this normal should I wait ?

And of course if I try a search it doesn't work

Re: Distributed Search problem

Reply via email to