Thanks Otis. This starts to make more sense to me. I will go through the links in your signature and dig into it.
Still learning but this is a good direction. thanks! Jason On Thu, Oct 4, 2012 at 2:55 PM, Otis Gospodnetic <otis.gospodne...@gmail.com> wrote: > Hi, > > You could start with one node on which you could start with # shards > == # CPU cores. > Then, all while running a stress/performance test, observe the latency > and other metrics you care about. > Keep increasing the number of shards and keep observing. > > SPM for Solr (see signature) will help with the observing part. > JMeter or SolrMeter (hi Tomás ;)) will help with stress testing part. > > You cannot change the number of shards on the fly, reindexing is needed. > The above also doesn't take into account index/shard size, but that is > dimension to experiment with, too. > > Otis > -- > Search Analytics - http://sematext.com/search-analytics/index.html > Performance Monitoring - http://sematext.com/spm/index.html > > > On Thu, Oct 4, 2012 at 2:43 PM, Jason Huang <jason.hu...@icare.com> wrote: >> Tomás, >> >> Thanks for the response. >> >> So basically at this point what I could do is to make a "best guess" >> of my estimated index size and specify a few shards to start with. I >> am guessing if I assigned too many shards, then the "join" between >> different shards may be the bottleneck? On the other side, if I assign >> only one or two shards, then each shard may become too big and the I/O >> within each shard will be the bottleneck? >> >> Then after a while of deployment, if we find out where the bottleneck >> is, do we have a way to adjust the number of shards without breaking >> the indexing and without require any downtime in production system? >> Say I have 4 shards and each of them is 100GB. I found that the I/O is >> the bottleneck and I want to use 8 shards instead - is there a good >> way to redistribute the whole index from 4 existing shards to 8 shards >> without breaking anything (and without a downtime)? >> >> thanks! >> >> Jason >> >> >> >> On Thu, Oct 4, 2012 at 1:36 PM, Tomás Fernández Löbbe >> <tomasflo...@gmail.com> wrote: >>> SolrCloud doesn't auto-shard at this point. It doesn't split indexes either >>> (there is an open issue for this: >>> https://issues.apache.org/jira/browse/SOLR-3755 ) >>> >>> At this point you need to specify the number of shards for a collection in >>> advance, with the numShards parameter. When you have more than one shard >>> for a collection, SolrCloud automatically distributes the query to one >>> replica of each shard and join the results for you. >>> >>> Most reliable documentation about SolrCloud can be found here: >>> http://wiki.apache.org/solr/SolrCloud >>> >>> Tomás >>> >>> On Thu, Oct 4, 2012 at 12:02 PM, Jason Huang <jason.hu...@icare.com> wrote: >>> >>>> Hello, >>>> >>>> I am exploring SolrCloud and have a few questions about SolrCloud's >>>> auto-sharding functionality. I couldn't find any good answer from my >>>> online search - if anyone knows the answer to these questions or can >>>> point me to the right document, that would be great! >>>> >>>> (1) Does SolrCloud offer auto-sharding functionality? If we >>>> continuously feed documents to a single index, eventually the shard >>>> will grow to a huge size and the query will be slow. How does >>>> SolrCloud handle this situation? >>>> >>>> (2) If SolrCloud auto-splits a big shard to two small shards, then >>>> shard 1 will have part of the index and shard 2 will have some other >>>> part of index. Is this correct? If so, when we perform a query, do we >>>> need to go through both shards in order to get a good response? Will >>>> this be slow (because we need to go through two shards, or more shards >>>> later if we need to split the shards again when the size is too big)? >>>> >>>> thanks! >>>> >>>> Jason >>>>