This is a complex setup, all right. A pluggable sharding strategy is definitely something that is on the roadmap for SolrCloud, but hasn't made it into the code base yet.
Keep in mind, though, that all the SolrCloud goodness centers around the idea of a single index that may be sharded. I don't think SolrCloud has had time to really think about handling the situation in which you have a bunch of cores that may or may not be sharded but are running on the same server. I don't know that it _doesn't_ work, mind you, but that scenario doesn't seem like the prime use-case for SolrCloud. That said, I don't know that such a situation is _not_ do-able in SolrCloud. Mostly I haven't explored that kind of functionality yet. Not much help, I know. I suspect that this is one of those cases where _we_ will learn from _you_ if you try to meld SolrCloud with your setup. Sounds like a great Wiki page if you do pursue this! Best Erick On Tue, Nov 6, 2012 at 4:58 PM, Jie Sun <jsun5...@yahoo.com> wrote: > Hi Eric, > thanks for your information. I read all the related issues with SOLR-1293 > as > your just pointed me to. > > It seems they are not very suitable for our scenario. > > We do have couple of hundreds cores (you are right each customer will be > corresponded to a core) typically on one solr instance. and all of them > need > to be actively working with indexing and queries. So we are not having like > 10s of thousands of cores that only part of them need to be loaded. > > Our issues are on some servers that host very large customers, it runs out > of disk space after some time due to the large among of index data. I have > written a restful service that is being deployed with solr on tomcat to > identify the large customer (core) indexing requests and consult with a dns > service, it then off loads the indexing requests to additional solr > servers, > and support queries using solr shards on these servers going forward. > > We also have replicas for each shard, managed by our own software using > peer > model (I am thinking about using solr replications after 1.4). > > to me, SolrCould is like sharding+replication+zookeeper. I could be wrong. > But if I am right, with very big existing data in our service, and we > already have a lot of software in place working pretty well utilizing solr > 1.4, I am just trying to figure out if it will worth it to migrate the > production system to use SolrCloud. > > The problem we need to fix is in one area : I need to automate the off-load > (sharding) process. Right now we use some monitor system to watch for the > growth on each server. When we find a fast growing large core(customer), we > will start to manually configure our dns directory and start adding > shard(s) > to it (basically we create a same core name on a different solr > server/instance). my restful service going forward will then direct the > queries for the customer onto these sharded cores using solr shards. > > If SolrCloud can not really help me automate this process, it is not very > attractive to me right now. I have read some of the topics, I looked into > distributing indexing, distributed update processor ... none of them can > help the way I have been looking for. So I guess using solrcloud or not, I > will need to write my own kind of 'load balancer' for indexing, unless I am > wrong. > > I did come across Jon's white paper on Loggly, I have designed a model > based > on what he has done. The solution should be able to automatically creating > shards, but it will need rsych index files for a core to different server > and use solr merge to merge small core into larger cores, or use core admin > to add new core on the fly. > > is this approach sounds like someone is already familiar with and had > out-of-box solution? When I looked into solrcloud, I was expecting some > pluggable index distributing policy factory I can customize. > The closest thing I found was SOLR-2593 (A new core admin action 'split' > for splitting index ) but not exactly what I wanted. Let me know if you > can > advice me on this more. > > thanks > Jie > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/load-balance-with-SolrCloud-tp4018367p4018609.html > Sent from the Solr - User mailing list archive at Nabble.com. >