I can see why that is a win over the existing, but I still don't get why it 
wouldn't be faster just to index to a suite of Solr master indexers and save 
all this file slogging around.  But, I guess that is a separate patch all 
together.



On Jan 15, 2010, at 2:35 PM, Jason Rutherglen wrote:

> Zipping cores/shards is in the latest patch...
> 
> On Fri, Jan 15, 2010 at 11:22 AM, Andrzej Bialecki <a...@getopt.org> wrote:
>> On 2010-01-15 20:13, Ted Dunning wrote:
>>> 
>>> This can also be a big performance win.  Jason Venner reports significant
>>> index and cluster start time improvements by indexing to local disk,
>>> zipping
>>> and then uploading the resulting zip file.  Hadoop has significant file
>>> open
>>> overhead so moving one zip file wins big over many index component files.
>>> There is a secondary bandwidth win as well.
>> 
>> Indeed, this one should be easy to add to this patch. Unless Jason & Jason
>> already cooked a patch for this? ;)
>> 
>>> 
>>> On Fri, Jan 15, 2010 at 8:34 AM, Andrzej Bialecki
>>> (JIRA)<j...@apache.org>wrote:
>>> 
>>>> 
>>>> HDFS doesn't support enough POSIX to support writing Lucene indexes
>>>> directly to HDFS - for this reason indexes are always created on local
>>>> storage of each node, and then after closing they are copied to HDFS.
>> 
>> 
>> 
>> 
>> --
>> Best regards,
>> Andrzej Bialecki     <><
>>  ___. ___ ___ ___ _ _   __________________________________
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com  Contact: info at sigram dot com
>> 
>> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search

Reply via email to