Re: Date faceting - howto improve performance

2009-04-27 Thread Ning Li
You mean doc A and doc B will become one doc after adding index 2 to index 1? I don't think this is currently supported either at Lucene level or at Solr level. If index 1 has m docs and index 2 has n docs, index 1 will have m+n docs after adding index 2 to index 1. Documents themselves are not

Re: solr index size

2009-04-03 Thread Ning Li
Slightly different index sizes (even optimized) are normal - a same document may get different internal docids in different runs. I don't know why the number of terms are slight different. On Fri, Apr 3, 2009 at 7:21 PM, Jun Rao jun...@almaden.ibm.com wrote: Hi, We built a Solr index on a

Re: Merging Solr Indexes

2009-04-01 Thread Ning Li
There is a jira issue on supporting index merge: https://issues.apache.org/jira/browse/SOLR-1051. But I agree with Otis that you should go with a single index first. Cheers, Ning On Wed, Apr 1, 2009 at 12:06 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Yes, you can write to

Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ning Li
that an application has more control over where to store the primary and replicas of an HDFS block. This feature may be useful for other HDFS applications (e.g., HBase). We would like to collaborate with other people who are interested in adding this feature to HDFS. Regards, Ning Li

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ning Li
I work for IBM Research. I read the Rackspace article. Rackspace's Mailtrust has a similar design. Happy to see an existing application on such a system. Do they plan to open-source it? Is the AOL project an open source project? On Feb 6, 2008 11:33 AM, Clay Webster [EMAIL PROTECTED] wrote:

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ning Li
No. I'm curious too. :) On Feb 6, 2008 11:44 AM, J. Delgado [EMAIL PROTECTED] wrote: I assume that Google also has distributed index over their GFS/MapReduce implementation. Any idea how they achieve this? J.D.

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ning Li
One main focus is to provide fault-tolerance in this distributed index system. Correct me if I'm wrong, I think SOLR-303 is focusing on merging results from multiple shards right now. We'd like to start an open source project for a fault-tolerant distributed index system (or join if one already