The reason I would a major speed win when expect indexing to local disk and copying later is that you get much more efficient reading of documents with normal hadoop mechanisms. Throwing documents to the various Solr master indexers is bound to be slower than having 20 machines reading at local disk speeds if only because of network delays.
On Fri, Jan 15, 2010 at 12:09 PM, Grant Ingersoll <gsing...@apache.org>wrote: > I can see why that is a win over the existing, but I still don't get why it > wouldn't be faster just to index to a suite of Solr master indexers and save > all this file slogging around. But, I guess that is a separate patch all > together. > >