Makes sense. Interesting exercise to think about. On Jan 15, 2010, at 4:08 PM, Jason Rutherglen wrote:
> Copying files ala HDFS is trivial because it's sequential, > Lucene merging isn't, so scaling merging over 20 machines vs 4 Solr > has clear advantages... That and on-demand expandability, so I > can reindex 2 terabytes of data in half a day vs weeks or more > with 4 Solr masters has compelling advantages. > > On Fri, Jan 15, 2010 at 12:09 PM, Grant Ingersoll <gsing...@apache.org> wrote: >> I can see why that is a win over the existing, but I still don't get why it >> wouldn't be faster just to index to a suite of Solr master indexers and save >> all this file slogging around. But, I guess that is a separate patch all >> together. >> >> >> >> On Jan 15, 2010, at 2:35 PM, Jason Rutherglen wrote: >> >>> Zipping cores/shards is in the latest patch... >>> >>> On Fri, Jan 15, 2010 at 11:22 AM, Andrzej Bialecki <a...@getopt.org> wrote: >>>> On 2010-01-15 20:13, Ted Dunning wrote: >>>>> >>>>> This can also be a big performance win. Jason Venner reports significant >>>>> index and cluster start time improvements by indexing to local disk, >>>>> zipping >>>>> and then uploading the resulting zip file. Hadoop has significant file >>>>> open >>>>> overhead so moving one zip file wins big over many index component files. >>>>> There is a secondary bandwidth win as well. >>>> >>>> Indeed, this one should be easy to add to this patch. Unless Jason & Jason >>>> already cooked a patch for this? ;) >>>> >>>>> >>>>> On Fri, Jan 15, 2010 at 8:34 AM, Andrzej Bialecki >>>>> (JIRA)<j...@apache.org>wrote: >>>>> >>>>>> >>>>>> HDFS doesn't support enough POSIX to support writing Lucene indexes >>>>>> directly to HDFS - for this reason indexes are always created on local >>>>>> storage of each node, and then after closing they are copied to HDFS. >>>> >>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> Andrzej Bialecki <>< >>>> ___. ___ ___ ___ _ _ __________________________________ >>>> [__ || __|__/|__||\/| Information Retrieval, Semantic Web >>>> ___|||__|| \| || | Embedded Unix, System Integration >>>> http://www.sigram.com Contact: info at sigram dot com >>>> >>>> >> >> -------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem using Solr/Lucene: >> http://www.lucidimagination.com/search >> >> -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search