Can't you get something close-ish to this with solrcloud as-is? Use a traffic routing layer with no replicas and then some fat data nodes with fast disks + well-tuned cache settings.
My thinking is the cache settings would do the magic of the hot/cold stuff being described and the routing layer could pretty much auto-scale (with its own query caching). The data layer could even be segmented with specific collections on different hardware. Speaking of caching - we used to run varnish in front of solr with good effect as well. On Sat, Nov 29, 2025 at 2:11 PM Walter Underwood <[email protected]> wrote: > > MarkLogic had this as a feature early on, E Nodes (execute) and D nodes > (data). I don’t remember anybody using it. It was probably a special for some > customer. Once it was built, it wasn’t a big deal to maintain, but it was > extra code that wasn’t adding much value. > > wunder > Walter Underwood > [email protected] > http://observer.wunderwood.org/ (my blog) > > > On Nov 29, 2025, at 9:34 AM, Ilan Ginzburg <[email protected]> wrote: > > > > The only code drop was the initial branch > > https://github.com/apache/solr/tree/jira/solr-17125-zero-replicas > > That branch is a cleaned up version (and a better one really) of the > > production code Salesforce was running back then. > > Changes done since were not ported. > > > > Any Solr node being able to get the latest copy of a shard allows no longer > > opening nor discovering all cores on a node but discovering and opening > > them lazily when needed (our clusters now scale to 100 000+ collections), > > no longer doing shard leader elections and instead doing a best effort to > > index on the same replica, limiting the number of open cores by using > > transient cores in SolrCloud mode etc. > > > > A clear benefit of such a separation of compute and storage is when there's > > a high number of indexes, with only a small subset active at any given > > time. This meshes well with hosting scenarios with a lot of customers but > > few active at any given time. > > When all indexes are active, they have to be loaded on nodes anyway. > > > > Ilan > > > > On Sat, Nov 29, 2025 at 12:52 AM Matt Kuiper <[email protected]> wrote: > > > >> Thanks for your reply. What you say makes sense. > >> > >> Is there perhaps a fork of the Solr baseline with your changes available > >> for others to use? > >> > >> Your solution is very compelling! > >> > >> Matt > >> > >> On Thu, Nov 27, 2025 at 3:39 AM Ilan Ginzburg <[email protected]> wrote: > >> > >>> I don't believe there will be future work on this topic in the context of > >>> the Solr project. > >>> > >>> With the experience of running in production at high scale for a few > >> years > >>> now a modified Solr with separation of compute and storage, the changes > >> (to > >>> the Cloud part of Solr, but there's unfortunately no real separation > >>> between single node Solr and SolrCloud code) are too big to make this > >>> approach optional. Efficiently implementing such a separation requires it > >>> to be the only storage/persistence layer. It changes > >>> durability/availability and cluster management assumptions in fundamental > >>> ways. > >>> > >>> Ilan > >>> > >>> On Fri, Nov 21, 2025 at 9:37 PM mtn search <[email protected]> wrote: > >>> > >>>> Hello, > >>>> > >>>> I am curious if there is current/future worked planned for: > >>>> > >>>> https://issues.apache.org/jira/browse/SOLR-17125 > >>>> > >>>> > >>>> > >>> > >> https://cwiki.apache.org/confluence/display/SOLR/SIP-20%3A+Separation+of+Compute+and+Storage+in+SolrCloud > >>>> > >>>> Thanks, > >>>> Matt > >>>> > >>> > >> >
