Re: SolrCloud AutoSharding?

Jason Huang Thu, 04 Oct 2012 12:03:35 -0700

Thanks Otis.

This starts to make more sense to me. I will go through the links in
your signature and dig into it.


Still learning but this is a good direction.

thanks!

Jason

On Thu, Oct 4, 2012 at 2:55 PM, Otis Gospodnetic
<otis.gospodne...@gmail.com> wrote:
> Hi,
>
> You could start with one node on which you could start with # shards
> == # CPU cores.
> Then, all while running a stress/performance test, observe the latency
> and other metrics you care about.
> Keep increasing the number of shards and keep observing.
>
> SPM for Solr (see signature) will help with the observing part.
> JMeter or SolrMeter (hi Tomás ;)) will help with stress testing part.
>
> You cannot change the number of shards on the fly, reindexing is needed.
> The above also doesn't take into account index/shard size, but that is
> dimension to experiment with, too.
>
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
>
>
> On Thu, Oct 4, 2012 at 2:43 PM, Jason Huang <jason.hu...@icare.com> wrote:
>> Tomás,
>>
>> Thanks for the response.
>>
>> So basically at this point what I could do is to make a "best guess"
>> of my estimated index size and specify a few shards to start with. I
>> am guessing if I assigned too many shards, then the "join" between
>> different shards may be the bottleneck? On the other side, if I assign
>> only one or two shards, then each shard may become too big and the I/O
>> within each shard will be the bottleneck?
>>
>> Then after a while of deployment, if we find out where the bottleneck
>> is, do we have a way to adjust the number of shards without breaking
>> the indexing and without require any downtime in production system?
>> Say I have 4 shards and each of them is 100GB. I found that the I/O is
>> the bottleneck and I want to use 8 shards instead - is there a good
>> way to redistribute the whole index from 4 existing shards to 8 shards
>> without breaking anything (and without a downtime)?
>>
>> thanks!
>>
>> Jason
>>
>>
>>
>> On Thu, Oct 4, 2012 at 1:36 PM, Tomás Fernández Löbbe
>> <tomasflo...@gmail.com> wrote:
>>> SolrCloud doesn't auto-shard at this point. It doesn't split indexes either
>>> (there is an open issue for this:
>>> https://issues.apache.org/jira/browse/SOLR-3755 )
>>>
>>> At this point you need to specify the number of shards for a collection in
>>> advance, with the numShards parameter. When you have more than one shard
>>> for a collection, SolrCloud automatically distributes the query to one
>>> replica of each shard and join the results for you.
>>>
>>> Most reliable documentation about SolrCloud can be found here:
>>> http://wiki.apache.org/solr/SolrCloud
>>>
>>> Tomás
>>>
>>> On Thu, Oct 4, 2012 at 12:02 PM, Jason Huang <jason.hu...@icare.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am exploring SolrCloud and have a few questions about SolrCloud's
>>>> auto-sharding functionality. I couldn't find any good answer from my
>>>> online search - if anyone knows the answer to these questions or can
>>>> point me to the right document, that would be great!
>>>>
>>>> (1) Does SolrCloud offer auto-sharding functionality? If we
>>>> continuously feed documents to a single index, eventually the shard
>>>> will grow to a huge size and the query will be slow. How does
>>>> SolrCloud handle this situation?
>>>>
>>>> (2) If SolrCloud auto-splits a big shard to two small shards, then
>>>> shard 1 will have part of the index and shard 2 will have some other
>>>> part of index. Is this correct? If so, when we perform a query, do we
>>>> need to go through both shards in order to get a good response? Will
>>>> this be slow (because we need to go through two shards, or more shards
>>>> later if we need to split the shards again when the size is too big)?
>>>>
>>>> thanks!
>>>>
>>>> Jason
>>>>

Re: SolrCloud AutoSharding?

Reply via email to