I'm working on a product hosted with AWS that uses Elastic Beanstalk
auto-scaling to good effect and we are trying to set up similar (more or
less) runtime scaling support with Solr. I think I understand how to set
this up, and wanted to check I was on the right track.

We currently run 3 cores on a single host / Solr server / shard. This is
just fine for now, and we have overhead for the near future. However, I
need to have a plan, and then test, for a higher capacity future.

1) I gather that if I set up SolrCloud, and then later load increases, I
can spin up a second host / Solr server, create a new shard, and then split
the first shard:

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3

And doing this, we no longer have to commit to shards out of the gate.

2) I'm not clear whether there's a big advantage splitting up the cores or
not. Two of the three cores will have about the same number of documents,
though only one contains large amounts of text. The third core is much
smaller in both bytes and documents (2 orders of magnitude).

3) We are also looking at moving multi-lingual. The current plan is to
store the localized text in fields within the same core. The languages will
be added over time. We can update the schema (as each will be optional).
This seems easier than adding a core for each language. Is there a downside?

Thanks for any pointers.

Reply via email to