Re: Separation of Compute and Storage in SolrCloud

Ilan Ginzburg Sat, 29 Nov 2025 11:37:06 -0800

With multiple copies of data (shard replicas) spread on multiple nodes,
nodes must be up and discoverable (and in SolrCloud, cores must participate
in a shart leader election) so that any other node in the cluster can get
the latest copy of the data if needed.
With separation of compute and storage, and a single stored copy of each
shard (*not* a copy per replica), any node can fetch the data it needs, and
only actually used cores/replicas need to be open or even known.


Fat data nodes with fast disks and current SolrCloud code would not support
thousands or tens of thousands cores per node (unless they also have very
fat memory).

Varnish in front of Solr is to cache query results? In enterprise search
(Salesforce), each user has different access rights and sees different
document subsets and cache hit rates are pretty low.

On Sat, Nov 29, 2025 at 8:23 PM matthew sporleder <[email protected]>
wrote:

> Can't you get something close-ish to this with solrcloud as-is?
>
> Use a traffic routing layer with no replicas and then some fat data
> nodes with fast disks + well-tuned cache settings.
>
> My thinking is the cache settings would do the magic of the hot/cold
> stuff being described and the routing layer could pretty much
> auto-scale (with its own query caching). The data layer could even be
> segmented with specific collections on different hardware.
>
> Speaking of caching - we used to run varnish in front of solr with
> good effect as well.
>
> On Sat, Nov 29, 2025 at 2:11 PM Walter Underwood <[email protected]>
> wrote:
> >
> > MarkLogic had this as a feature early on, E Nodes (execute) and D nodes
> (data). I don’t remember anybody using it. It was probably a special for
> some customer. Once it was built, it wasn’t a big deal to maintain, but it
> was extra code that wasn’t adding much value.
> >
> > wunder
> > Walter Underwood
> > [email protected]
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Nov 29, 2025, at 9:34 AM, Ilan Ginzburg <[email protected]> wrote:
> > >
> > > The only code drop was the initial branch
> > > https://github.com/apache/solr/tree/jira/solr-17125-zero-replicas
> > > That branch is a cleaned up version (and a better one really) of the
> > > production code Salesforce was running back then.
> > > Changes done since were not ported.
> > >
> > > Any Solr node being able to get the latest copy of a shard allows no
> longer
> > > opening nor discovering all cores on a node but discovering and opening
> > > them lazily when needed (our clusters now scale to 100 000+
> collections),
> > > no longer doing shard leader elections and instead doing a best effort
> to
> > > index on the same replica, limiting the number of open cores by using
> > > transient cores in SolrCloud mode etc.
> > >
> > > A clear benefit of such a separation of compute and storage is when
> there's
> > > a high number of indexes, with only a small subset active at any given
> > > time. This meshes well with hosting scenarios with a lot of customers
> but
> > > few active at any given time.
> > > When all indexes are active, they have to be loaded on nodes anyway.
> > >
> > > Ilan
> > >
> > > On Sat, Nov 29, 2025 at 12:52 AM Matt Kuiper <[email protected]>
> wrote:
> > >
> > >> Thanks for your reply. What you say makes sense.
> > >>
> > >> Is there perhaps a fork of the Solr baseline with your changes
> available
> > >> for others to use?
> > >>
> > >> Your solution is very compelling!
> > >>
> > >> Matt
> > >>
> > >> On Thu, Nov 27, 2025 at 3:39 AM Ilan Ginzburg <[email protected]>
> wrote:
> > >>
> > >>> I don't believe there will be future work on this topic in the
> context of
> > >>> the Solr project.
> > >>>
> > >>> With the experience of running in production at high scale for a few
> > >> years
> > >>> now a modified Solr with separation of compute and storage, the
> changes
> > >> (to
> > >>> the Cloud part of Solr, but there's unfortunately no real separation
> > >>> between single node Solr and SolrCloud code) are too big to make this
> > >>> approach optional. Efficiently implementing such a separation
> requires it
> > >>> to be the only storage/persistence layer. It changes
> > >>> durability/availability and cluster management assumptions in
> fundamental
> > >>> ways.
> > >>>
> > >>> Ilan
> > >>>
> > >>> On Fri, Nov 21, 2025 at 9:37 PM mtn search <[email protected]>
> wrote:
> > >>>
> > >>>> Hello,
> > >>>>
> > >>>> I am curious if there is current/future worked planned for:
> > >>>>
> > >>>> https://issues.apache.org/jira/browse/SOLR-17125
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> https://cwiki.apache.org/confluence/display/SOLR/SIP-20%3A+Separation+of+Compute+and+Storage+in+SolrCloud
> > >>>>
> > >>>> Thanks,
> > >>>> Matt
> > >>>>
> > >>>
> > >>
> >
>

Re: Separation of Compute and Storage in SolrCloud

Reply via email to