Makes sense.  Interesting exercise to think about.

On Jan 15, 2010, at 4:08 PM, Jason Rutherglen wrote:

> Copying files ala HDFS is trivial because it's sequential,
> Lucene merging isn't, so scaling merging over 20 machines vs 4 Solr
> has clear advantages... That and on-demand expandability, so I
> can reindex 2 terabytes of data in half a day vs weeks or more
> with 4 Solr masters has compelling advantages.
> 
> On Fri, Jan 15, 2010 at 12:09 PM, Grant Ingersoll <gsing...@apache.org> wrote:
>> I can see why that is a win over the existing, but I still don't get why it 
>> wouldn't be faster just to index to a suite of Solr master indexers and save 
>> all this file slogging around.  But, I guess that is a separate patch all 
>> together.
>> 
>> 
>> 
>> On Jan 15, 2010, at 2:35 PM, Jason Rutherglen wrote:
>> 
>>> Zipping cores/shards is in the latest patch...
>>> 
>>> On Fri, Jan 15, 2010 at 11:22 AM, Andrzej Bialecki <a...@getopt.org> wrote:
>>>> On 2010-01-15 20:13, Ted Dunning wrote:
>>>>> 
>>>>> This can also be a big performance win.  Jason Venner reports significant
>>>>> index and cluster start time improvements by indexing to local disk,
>>>>> zipping
>>>>> and then uploading the resulting zip file.  Hadoop has significant file
>>>>> open
>>>>> overhead so moving one zip file wins big over many index component files.
>>>>> There is a secondary bandwidth win as well.
>>>> 
>>>> Indeed, this one should be easy to add to this patch. Unless Jason & Jason
>>>> already cooked a patch for this? ;)
>>>> 
>>>>> 
>>>>> On Fri, Jan 15, 2010 at 8:34 AM, Andrzej Bialecki
>>>>> (JIRA)<j...@apache.org>wrote:
>>>>> 
>>>>>> 
>>>>>> HDFS doesn't support enough POSIX to support writing Lucene indexes
>>>>>> directly to HDFS - for this reason indexes are always created on local
>>>>>> storage of each node, and then after closing they are copied to HDFS.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> Andrzej Bialecki     <><
>>>>  ___. ___ ___ ___ _ _   __________________________________
>>>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>>>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>>>> http://www.sigram.com  Contact: info at sigram dot com
>>>> 
>>>> 
>> 
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem using Solr/Lucene: 
>> http://www.lucidimagination.com/search
>> 
>> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search

Reply via email to