Hi,

Thanks a lot Erick and Shawn for your answers.
I am aware that it is a very particular issue with not a common use of
Solr. I just wondered if people had the similar business case. For
information we need a very important number of collections with the same
configuration cause of legally reasons. Indeed each collection represents
one of our customers and by contract we have to separate the data of each
of them.
If we had the choice, we just would have one collection with a field name
'Customers' and we would do filter queries on it but we can't !

Anyway thanks again for your answers. For now, we finally did not add the
different languages dictionaries per collection and it is fine for 1K+
customers with more resources added to the servers.

Best,

Olivier Tavard



2015-07-27 17:53 GMT+02:00 Shawn Heisey <apa...@elyograg.org>:

> On 7/27/2015 9:16 AM, Olivier wrote:
> > I have a SolrCloud cluster with 3 nodes :  3 shards per node and
> > replication factor at 3.
> > The collections number is around 1000. All the collections use the same
> > Zookeeper configuration.
> > So when I create each collection, the ZK configuration is pulled from ZK
> > and the configuration files are stored in the JVM.
> > I thought that if the configuration was the same for each collection, the
> > impact on the JVM would be insignifiant because the configuration should
> be
> > loaded only once. But it is not the case, for each collection created,
> the
> > JVM size increases because the configuration is loaded again, am I
> correct ?
> >
> > If I have a small configuration folder size, I have no problem because
> the
> > folder size is less than 500 KB so if we count 1000 collections x 500 KB,
> > the JVM impact is 500 MB.
> > But we manage a lot of languages with some dictionaries so the
> > configuration folder size is about 6 MB. The JVM impact is very important
> > now because it can be more than 6 GB (1000 x 6 MB).
> >
> > So I would like to have the feeback of people who have a cluster with a
> > large number of collections too. Do I have to change some settings to
> > handle this case better ? What can I do to optimize this behaviour ?
> > For now, we just increase the RAM size per node at 16 GB but we plan to
> > increase the collections number.
>
> Severe issues were noticed when dealing with many collections, and this
> was with a simple config, and completely empty indexes.  A complex
> config and actual index data would make it run that much more slowly.
>
> https://issues.apache.org/jira/browse/SOLR-7191
>
> Memory usage for the config wasn't even considered when I was working on
> reporting that issue.
>
> SolrCloud is highly optimized to work well when there are a relatively
> small number of collections.  I think there is work that we can do which
> will optimize operations to the point where thousands of collections
> will work well, especially if they all share the same config/schema ...
> but this is likely to be a fair amount of work, which will only benefit
> a handful of users who are pushing the boundaries of what Solr can do.
> In the open source world, a problem like that doesn't normally receive a
> lot of developer attention, and we rely much more on help from the
> community, specifically from knowledgeable users who are having the
> problem and know enough to try and fix it.
>
> FYI -- 16GB of RAM per machine is quite small for Solr, particularly
> when pushing the envelope.  My Solr machines are maxed at 64GB, and I
> frequently wish I could install more.
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#RAM
>
> One possible solution for your dilemma is simply adding more machines
> and spreading your collections out so each machine's memory requirements
> go down.
>
> Thanks,
> Shawn
>
>

Reply via email to