Hi Doug,
In the future I would like to implement a more automated
distributed search system than Nutch currently has. One way to do
this might be to use MapReduce. Each map task's input could be an
index and some segment data. The map method would serve queries,
i.e., run a Nutch DistributedSearch.Server. It would first copy
the index out of NDFS to the local disk, for better performance.
I have 2 questions regarding this mechanism.
First, what you plan to make the running search servers known by the
master (search client) I can imaging a similar mechanism as the
tasktracker and jobtracker use, a kind of heart beat message.
Second wouldn't be there also a possibility to solve nutch-92
(DistributedSearch incorrectly scores results) by first running a map
reduce task over the indexes that counting terms and than hold this
somehow in the memory of master (search server client). But I'm not
sure if that is may to much data.
Stefan