I am running a multi-tenant system where the tenants can upload and import
their own data into their respective cores. Fortunately, Solr makes it easy
to make sure that the search indices don't mix and that clients can only
access their "cores".
However, isolating the resource consumption seems a little trickier. Of
course it's fairly easy to limit the number of documents and queries per
second for each tenant, but what if they add a few GBs of text to their
documents? What if they use millions of different filter values? This may
quickly fill up the VM heap and negatively impact the other tenants (I'm
totally fine if the search for that one tenant goes down).
Of course I can check their input data and apply a seemingly endless number
of limits for all kinds of cases but that smells. Is there a more general
solution to limit resource consumption per core? Something along the lines
of "each core may use up to 5% of the heap".
One suggestion I found on the mailing list was to run a separate Solr
instance for each tenant. While this is certainly possible there is a
significant administrative and resource overhead.
Another way may be to go full on SolrCloud and add shards and replicas as
required, but I have to limit the resources I can use.