On 9/19/07, Norberto Meijome <[EMAIL PROTECTED]> wrote: > Maybe I got this wrong...but isn't this what mapreduce is meant to deal with?
Not really... you could force a *lot* of different problems into map-reduce (that's sort of the point... being able to automatically parallelize a lot of different problems). It really isn't the best fit though, and would end up being much slower than a custom job. Then there is the issue that the way map-reduce is implemented (like hadoop) is also tuned for longer running batch jobs on huge data (temporary files are used, external sorts, initial input, final output is via files, etc). Check out the google map-reduce paper - they don't use it for their search side either. Things are already progressing in the distributed search area: https://issues.apache.org/jira/browse/SOLR-303 Hopefully I'll have time to dig into it more myself in a few weeks. -Yonik