Re: How can i make a distribute search on Solr?

2007-09-20 Thread David Welton
 Maybe I got this wrong...but isn't this what mapreduce is meant to deal with?
 eg,

 1) get the job (a query)
 2) map it to workers ( servers that provide search results from their own
 indexing)
 3) wait for the results from all workers that reply within acceptable 
 timeframe.
 4) comb through the lot of  results from all workers, reduce them according to
 your own biz rules (eg, remove dupes, sort them by quality / priority... here 
 possibly relying on the original parameters of the query in 1)
 5) return the reduced results to the frontend.

That seems to be how Sphinx works:

http://www.sphinxsearch.com/doc.html#distributed

Of course, the details of this are far over my head for either system,
so I don't really know if that's a sensible way of doing things or
not.

Ciao,
-- 
David N. Welton
http://www.welton.it/davidw/


largish test data set?

2007-09-17 Thread David Welton
Hi,

I'm in the process of evaluating solr and sphinx, and have come to
realize that actually having a large data set to run them against
would be handy.  However, I'm pretty new to both systems, so thought
that perhaps asking around my produce something useful.

What *I* mean by largish is something that won't fit into memory - say
5 or 6 gigs, which is probably puny for some and huge for others.

BTW, I would also welcome any input from others who have done the
above comparison, although what we'll be using it for is specific
enough that of course I'll need to do my own testing.

Thanks!
-- 
David N. Welton
http://www.welton.it/davidw/