Re: Question on Solr Scalability

2010-02-11 Thread abhishes

Thanks really useful article.

I am wondering about this statement in the article

Keep in mind that Solr does not calculate universal term/doc frequencies.
At a large scale, its not likely  to matter that tf/idf is calculated at the
shard level - however, if your collection is heavily skewed in its
distribution across servers, you might take issue with the relevance
results. Its probably best to randomly distribute documents to your shards

So if there is no universal tf/idf kept, then how does solr determine the
rank of two documents which came from different shards in a distributed
search query?

Regards,
Abhishek





Juan Pedro Danculovic-2 wrote:
 
 To scale solr, take a look to this article
 
 http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr
 
 
 
 Juan Pedro Danculovic
 CTO - www.linebee.com
 
 
 On Thu, Feb 11, 2010 at 4:12 AM, abhishes abhis...@gmail.com wrote:
 

 Suppose I am indexing very large data (5 billion rows in a database)

 Now I want to use the Solr Core feature to split the index into
 manageable
 chunks.

 However I have two questions


 1. Can Cores reside on difference physical servers?

 2. when a query comes, will the query be answered by index in 1 core or
 the
 query will be sent to all the cores?

 My desire is to have a system which from outside appears as a single
 large
 index... but inside it is multiple small indexes running on different
 hardware machines.
 --
 View this message in context:
 http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27543068.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27544436.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Question on Solr Scalability

2010-02-11 Thread Erik Hatcher
There is already a patch available to address that short-coming in  
distributed search:


http://issues.apache.org/jira/browse/SOLR-1632


On Feb 11, 2010, at 6:56 AM, abhishes wrote:



Thanks really useful article.

I am wondering about this statement in the article

Keep in mind that Solr does not calculate universal term/doc  
frequencies.
At a large scale, its not likely  to matter that tf/idf is  
calculated at the

shard level - however, if your collection is heavily skewed in its
distribution across servers, you might take issue with the relevance
results. Its probably best to randomly distribute documents to your  
shards


So if there is no universal tf/idf kept, then how does solr  
determine the
rank of two documents which came from different shards in a  
distributed

search query?

Regards,
Abhishek





Juan Pedro Danculovic-2 wrote:


To scale solr, take a look to this article

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr



Juan Pedro Danculovic
CTO - www.linebee.com


On Thu, Feb 11, 2010 at 4:12 AM, abhishes abhis...@gmail.com wrote:



Suppose I am indexing very large data (5 billion rows in a database)

Now I want to use the Solr Core feature to split the index into
manageable
chunks.

However I have two questions


1. Can Cores reside on difference physical servers?

2. when a query comes, will the query be answered by index in 1  
core or

the
query will be sent to all the cores?

My desire is to have a system which from outside appears as a single
large
index... but inside it is multiple small indexes running on  
different

hardware machines.
--
View this message in context:
http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27543068.html
Sent from the Solr - User mailing list archive at Nabble.com.







--
View this message in context: 
http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27544436.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Question on Solr Scalability

2010-02-11 Thread Yonik Seeley
On Thu, Feb 11, 2010 at 6:56 AM, abhishes abhis...@gmail.com wrote:

 Thanks really useful article.

 I am wondering about this statement in the article

 Keep in mind that Solr does not calculate universal term/doc frequencies.
 At a large scale, its not likely  to matter that tf/idf is calculated at the
 shard level - however, if your collection is heavily skewed in its
 distribution across servers, you might take issue with the relevance
 results. Its probably best to randomly distribute documents to your shards

 So if there is no universal tf/idf kept, then how does solr determine the
 rank of two documents which came from different shards in a distributed
 search query?

tf is per document, so it's the same distributed or non-distributed.
idf (inverse document frequency) is the measure of the rareness of a term.
Scoring in distributed search only considers the term rareness within
the shard.  Solr still orders documents from different shards by this
score.

Even after we integrate distributed idf, it will be optional because
it comes with a cost and is often unnecessary.

-Yonik
http://www.lucidimagination.com


Question on Solr Scalability

2010-02-10 Thread abhishes

Suppose I am indexing very large data (5 billion rows in a database)

Now I want to use the Solr Core feature to split the index into manageable
chunks.

However I have two questions


1. Can Cores reside on difference physical servers?

2. when a query comes, will the query be answered by index in 1 core or the
query will be sent to all the cores?

My desire is to have a system which from outside appears as a single large
index... but inside it is multiple small indexes running on different
hardware machines.
-- 
View this message in context: 
http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27543068.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Question on Solr Scalability

2010-02-10 Thread Juan Pedro Danculovic
To scale solr, take a look to this article

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr



Juan Pedro Danculovic
CTO - www.linebee.com


On Thu, Feb 11, 2010 at 4:12 AM, abhishes abhis...@gmail.com wrote:


 Suppose I am indexing very large data (5 billion rows in a database)

 Now I want to use the Solr Core feature to split the index into manageable
 chunks.

 However I have two questions


 1. Can Cores reside on difference physical servers?

 2. when a query comes, will the query be answered by index in 1 core or the
 query will be sent to all the cores?

 My desire is to have a system which from outside appears as a single large
 index... but inside it is multiple small indexes running on different
 hardware machines.
 --
 View this message in context:
 http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27543068.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Question on Solr Scalability

2010-02-10 Thread David Stuart

Hi,

I think your needs would meet better with Distributed Search http://wiki.apache.org/solr/DistributedSearch 
 Which allows sharding to live on different servers and will search  
across all of those shard when a query comes in. There are a few patch  
which will hopefully be available in the Solr 1.5 release that will  
improve this including distributed tf idf across shards


Regards,

David
On 11 Feb 2010, at 07:12, abhishes abhis...@gmail.com wrote:



Suppose I am indexing very large data (5 billion rows in a database)

Now I want to use the Solr Core feature to split the index into  
manageable

chunks.

However I have two questions


1. Can Cores reside on difference physical servers?

2. when a query comes, will the query be answered by index in 1 core  
or the

query will be sent to all the cores?

My desire is to have a system which from outside appears as a single  
large

index... but inside it is multiple small indexes running on different
hardware machines.
--
View this message in context: 
http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27543068.html
Sent from the Solr - User mailing list archive at Nabble.com.