I can see why that is a win over the existing, but I still don't get why it wouldn't be faster just to index to a suite of Solr master indexers and save all this file slogging around. But, I guess that is a separate patch all together.
On Jan 15, 2010, at 2:35 PM, Jason Rutherglen wrote: > Zipping cores/shards is in the latest patch... > > On Fri, Jan 15, 2010 at 11:22 AM, Andrzej Bialecki <a...@getopt.org> wrote: >> On 2010-01-15 20:13, Ted Dunning wrote: >>> >>> This can also be a big performance win. Jason Venner reports significant >>> index and cluster start time improvements by indexing to local disk, >>> zipping >>> and then uploading the resulting zip file. Hadoop has significant file >>> open >>> overhead so moving one zip file wins big over many index component files. >>> There is a secondary bandwidth win as well. >> >> Indeed, this one should be easy to add to this patch. Unless Jason & Jason >> already cooked a patch for this? ;) >> >>> >>> On Fri, Jan 15, 2010 at 8:34 AM, Andrzej Bialecki >>> (JIRA)<j...@apache.org>wrote: >>> >>>> >>>> HDFS doesn't support enough POSIX to support writing Lucene indexes >>>> directly to HDFS - for this reason indexes are always created on local >>>> storage of each node, and then after closing they are copied to HDFS. >> >> >> >> >> -- >> Best regards, >> Andrzej Bialecki <>< >> ___. ___ ___ ___ _ _ __________________________________ >> [__ || __|__/|__||\/| Information Retrieval, Semantic Web >> ___|||__|| \| || | Embedded Unix, System Integration >> http://www.sigram.com Contact: info at sigram dot com >> >> -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search