Re: Question on Solr Scalability
Thanks really useful article. I am wondering about this statement in the article Keep in mind that Solr does not calculate universal term/doc frequencies. At a large scale, its not likely to matter that tf/idf is calculated at the shard level - however, if your collection is heavily skewed in its distribution across servers, you might take issue with the relevance results. Its probably best to randomly distribute documents to your shards So if there is no universal tf/idf kept, then how does solr determine the rank of two documents which came from different shards in a distributed search query? Regards, Abhishek Juan Pedro Danculovic-2 wrote: To scale solr, take a look to this article http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr Juan Pedro Danculovic CTO - www.linebee.com On Thu, Feb 11, 2010 at 4:12 AM, abhishes abhis...@gmail.com wrote: Suppose I am indexing very large data (5 billion rows in a database) Now I want to use the Solr Core feature to split the index into manageable chunks. However I have two questions 1. Can Cores reside on difference physical servers? 2. when a query comes, will the query be answered by index in 1 core or the query will be sent to all the cores? My desire is to have a system which from outside appears as a single large index... but inside it is multiple small indexes running on different hardware machines. -- View this message in context: http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27543068.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27544436.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question on Solr Scalability
There is already a patch available to address that short-coming in distributed search: http://issues.apache.org/jira/browse/SOLR-1632 On Feb 11, 2010, at 6:56 AM, abhishes wrote: Thanks really useful article. I am wondering about this statement in the article Keep in mind that Solr does not calculate universal term/doc frequencies. At a large scale, its not likely to matter that tf/idf is calculated at the shard level - however, if your collection is heavily skewed in its distribution across servers, you might take issue with the relevance results. Its probably best to randomly distribute documents to your shards So if there is no universal tf/idf kept, then how does solr determine the rank of two documents which came from different shards in a distributed search query? Regards, Abhishek Juan Pedro Danculovic-2 wrote: To scale solr, take a look to this article http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr Juan Pedro Danculovic CTO - www.linebee.com On Thu, Feb 11, 2010 at 4:12 AM, abhishes abhis...@gmail.com wrote: Suppose I am indexing very large data (5 billion rows in a database) Now I want to use the Solr Core feature to split the index into manageable chunks. However I have two questions 1. Can Cores reside on difference physical servers? 2. when a query comes, will the query be answered by index in 1 core or the query will be sent to all the cores? My desire is to have a system which from outside appears as a single large index... but inside it is multiple small indexes running on different hardware machines. -- View this message in context: http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27543068.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27544436.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question on Solr Scalability
On Thu, Feb 11, 2010 at 6:56 AM, abhishes abhis...@gmail.com wrote: Thanks really useful article. I am wondering about this statement in the article Keep in mind that Solr does not calculate universal term/doc frequencies. At a large scale, its not likely to matter that tf/idf is calculated at the shard level - however, if your collection is heavily skewed in its distribution across servers, you might take issue with the relevance results. Its probably best to randomly distribute documents to your shards So if there is no universal tf/idf kept, then how does solr determine the rank of two documents which came from different shards in a distributed search query? tf is per document, so it's the same distributed or non-distributed. idf (inverse document frequency) is the measure of the rareness of a term. Scoring in distributed search only considers the term rareness within the shard. Solr still orders documents from different shards by this score. Even after we integrate distributed idf, it will be optional because it comes with a cost and is often unnecessary. -Yonik http://www.lucidimagination.com
Question on Solr Scalability
Suppose I am indexing very large data (5 billion rows in a database) Now I want to use the Solr Core feature to split the index into manageable chunks. However I have two questions 1. Can Cores reside on difference physical servers? 2. when a query comes, will the query be answered by index in 1 core or the query will be sent to all the cores? My desire is to have a system which from outside appears as a single large index... but inside it is multiple small indexes running on different hardware machines. -- View this message in context: http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27543068.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question on Solr Scalability
To scale solr, take a look to this article http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr Juan Pedro Danculovic CTO - www.linebee.com On Thu, Feb 11, 2010 at 4:12 AM, abhishes abhis...@gmail.com wrote: Suppose I am indexing very large data (5 billion rows in a database) Now I want to use the Solr Core feature to split the index into manageable chunks. However I have two questions 1. Can Cores reside on difference physical servers? 2. when a query comes, will the query be answered by index in 1 core or the query will be sent to all the cores? My desire is to have a system which from outside appears as a single large index... but inside it is multiple small indexes running on different hardware machines. -- View this message in context: http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27543068.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question on Solr Scalability
Hi, I think your needs would meet better with Distributed Search http://wiki.apache.org/solr/DistributedSearch Which allows sharding to live on different servers and will search across all of those shard when a query comes in. There are a few patch which will hopefully be available in the Solr 1.5 release that will improve this including distributed tf idf across shards Regards, David On 11 Feb 2010, at 07:12, abhishes abhis...@gmail.com wrote: Suppose I am indexing very large data (5 billion rows in a database) Now I want to use the Solr Core feature to split the index into manageable chunks. However I have two questions 1. Can Cores reside on difference physical servers? 2. when a query comes, will the query be answered by index in 1 core or the query will be sent to all the cores? My desire is to have a system which from outside appears as a single large index... but inside it is multiple small indexes running on different hardware machines. -- View this message in context: http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27543068.html Sent from the Solr - User mailing list archive at Nabble.com.