HI,
Thanks for asking that question.

The separation of compute and storage would be relevant for the nodes
having the "data" role, i.e. nodes that host indexes.

SIP-20 offers a way for these indexes to be on shared storage (S3/GCS
etc) and not persisted long term on each individual node, making the
nodes themselves stateless (can lose all disk content as they restart
and everything will work ok).
Given roles coordinator and overseer do not require local state (local
persistent storage on the node local disk), SIP-20 makes all the nodes
stateless, the same way it does when no node roles are used (state is
then only maintained in ZooKeeper and the shared storage backend).

If a specific assignment of node roles works for a given cluster/use
case, adopting SIP-20 in that cluster would change the storage of
indexes and the way each update is handled (distributed to multiple
replicas without SIP-20 or being processed by a single replica and
shared storage with SIP-20) but the roles would likely stay unchanged:
some nodes will be preferred for hosting the Overseer or for
coordinating queries, and the same subset of nodes will be handling
indexes (although in a different way).

Hope that helps,
Ilan




On Tue, Jan 16, 2024 at 8:57 AM rajani m <rajinima...@gmail.com> wrote:
>
> Hi All,
>
>    Saw a post on the dev-mailing list about  SIP-20 Separation of Compute
> and Storage
> <https://cwiki.apache.org/confluence/display/SOLR/SIP-20%3A+Separation+of+Compute+and+Storage+in+SolrCloud>.
> Trying to understand what extra features it adds when compared to
> configuring a solrcloud cluster by leveraging node roles
> <https://solr.apache.org/guide/solr/latest/deployment-guide/node-roles.html>
> ?
>
> Thanks,
> Rajani

Reply via email to