I was able to setup
nutch searchers in distributed fashion buy creating the search-server.txt
files
at the root of the
data where Tomcat was running. I had a total of 1.9 MM URLs slit in half
for
each
searcher.
I was very surprised
to see that the performance numbers I got for this set up was not as good
as
I was
expecting. Before I ran this setup, I run the test in a single searcher
with 1.9 MM URLs.
The results for the
distributed setup were the same or even.
One thing that I
suspect is that Tomcat is querying each nutch search server synchronously
instead of asynchronously, by querying each
server one at the time, because that would explain a lot.
Can somebody tell me if this is
true??
I'm running Nutch
0.5 with very beefy machines.
Thanks,
Ledio
