[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-6220: Fix Version/s: Trunk > Replica placement strategy for solrcloud > > > Key: SOLR-6220 > URL: https://issues.apache.org/jira/browse/SOLR-6220 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: 5.2, Trunk > > Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, > SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch > > > h1.Objective > Most cloud based systems allow to specify rules on how the replicas/nodes of > a cluster are allocated . Solr should have a flexible mechanism through which > we should be able to control allocation of replicas or later change it to > suit the needs of the system > All configurations are per collection basis. The rules are applied whenever a > replica is created in any of the shards in a given collection during > * collection creation > * shard splitting > * add replica > * createsshard > There are two aspects to how replicas are placed: snitch and placement. > h2.snitch > How to identify the tags of nodes. Snitches are configured through collection > create command with the snitch param . eg: snitch=EC2Snitch or > snitch=class:EC2Snitch > h2.ImplicitSnitch > This is shipped by default with Solr. user does not need to specify > {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are > present in the rules , it is automatically used, > tags provided by ImplicitSnitch > # cores : No:of cores in the node > # disk : Disk space available in the node > # host : host name of the node > # node: node name > # D.* : These are values available from systrem propertes. {{D.key}} means a > value that is passed to the node as {{-Dkey=keyValue}} during the node > startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} > h2.Rules > This tells how many replicas for a given shard needs to be assigned to nodes > with the given key value pairs. These parameters will be passed on to the > collection CREATE api as a multivalued parameter "rule" . The values will be > saved in the state of the collection as follows > {code:Javascript} > { > “mycollection”:{ > “snitch”: { > class:“ImplicitSnitch” > } > “rules”:[{"cores":"4-"}, > {"replica":"1" ,"shard" :"*" ,"node":"*"}, > {"disk":">100"}] > } > {code} > A rule is specified as a pseudo JSON syntax . which is a map of keys and > values > *Each collection can have any number of rules. As long as the rules do not > conflict with each other it should be OK. Or else an error is thrown > * In each rule , shard and replica can be omitted > ** default value of replica is {{\*}} means ANY or you can specify a count > and an operand such as {{<}} (less than) or {{>}} (greater than) > ** and the value of shard can be a shard name or {{\*}} means EACH or > {{**}} means ANY. default value is {{\*\*}} (ANY) > * There should be exactly one extra condition in a rule other than {{shard}} > and {{replica}}. > * all keys other than {{shard}} and {{replica}} are called tags and the tags > are nothing but values provided by the snitch for each node > * By default certain tags such as {{node}}, {{host}}, {{port}} are provided > by the system implicitly > h3.How are nodes picked up? > Nodes are not picked up in random. The rules are used to first sort the nodes > according to affinity. For example, if there is a rule that says > {{disk:100+}} , nodes with more disk space are given higher preference. And > if the rule is {{disk:100-}} nodes with lesser disk space will be given > priority. If everything else is equal , nodes with fewer cores are given > higher priority > h3.Fuzzy match > Fuzzy match can be applied when strict matches fail .The values can be > prefixed {{~}} to specify fuzziness > example rule > {noformat} > #Example requirement "use only one replica of a shard in a host if possible, > if no matches found , relax that rule". > rack:*,shard:*,replica:<2~ > #Another example, assign all replicas to nodes with disk space of 100GB or > more,, or relax the rule if not possible. This will ensure that if a node > does not exist with 100GB disk, nodes are picked up the order of size say a > 85GB node would be picked up over 80GB disk node > disk:>100~ > {noformat} > Examples: > {noformat} > #in each rack there can be max two replicas of A given shard > rack:*,shard:*,replica:<3 > //in each rack there can be max two replicas of ANY replica > rack:*,shard:**,replica:2 > rack:*,replica:<3 > #in each node there should be a max one replica of EACH shard > node:*,shard:*,replica:1- > #in each node
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Attachment: (was: SOLR-6220.patch) > Replica placement strategy for solrcloud > > > Key: SOLR-6220 > URL: https://issues.apache.org/jira/browse/SOLR-6220 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, > SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch > > > h1.Objective > Most cloud based systems allow to specify rules on how the replicas/nodes of > a cluster are allocated . Solr should have a flexible mechanism through which > we should be able to control allocation of replicas or later change it to > suit the needs of the system > All configurations are per collection basis. The rules are applied whenever a > replica is created in any of the shards in a given collection during > * collection creation > * shard splitting > * add replica > * createsshard > There are two aspects to how replicas are placed: snitch and placement. > h2.snitch > How to identify the tags of nodes. Snitches are configured through collection > create command with the snitch param . eg: snitch=EC2Snitch or > snitch=class:EC2Snitch > h2.ImplicitSnitch > This is shipped by default with Solr. user does not need to specify > {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are > present in the rules , it is automatically used, > tags provided by ImplicitSnitch > # cores : No:of cores in the node > # disk : Disk space available in the node > # host : host name of the node > # node: node name > # D.* : These are values available from systrem propertes. {{D.key}} means a > value that is passed to the node as {{-Dkey=keyValue}} during the node > startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} > h2.Rules > This tells how many replicas for a given shard needs to be assigned to nodes > with the given key value pairs. These parameters will be passed on to the > collection CREATE api as a multivalued parameter "rule" . The values will be > saved in the state of the collection as follows > {code:Javascript} > { > “mycollection”:{ > “snitch”: { > class:“ImplicitSnitch” > } > “rules”:[{"cores":"4-"}, > {"replica":"1" ,"shard" :"*" ,"node":"*"}, > {"disk":">100"}] > } > {code} > A rule is specified as a pseudo JSON syntax . which is a map of keys and > values > *Each collection can have any number of rules. As long as the rules do not > conflict with each other it should be OK. Or else an error is thrown > * In each rule , shard and replica can be omitted > ** default value of replica is {{\*}} means ANY or you can specify a count > and an operand such as {{<}} (less than) or {{>}} (greater than) > ** and the value of shard can be a shard name or {{\*}} means EACH or > {{**}} means ANY. default value is {{\*\*}} (ANY) > * There should be exactly one extra condition in a rule other than {{shard}} > and {{replica}}. > * all keys other than {{shard}} and {{replica}} are called tags and the tags > are nothing but values provided by the snitch for each node > * By default certain tags such as {{node}}, {{host}}, {{port}} are provided > by the system implicitly > h3.How are nodes picked up? > Nodes are not picked up in random. The rules are used to first sort the nodes > according to affinity. For example, if there is a rule that says > {{disk:100+}} , nodes with more disk space are given higher preference. And > if the rule is {{disk:100-}} nodes with lesser disk space will be given > priority. If everything else is equal , nodes with fewer cores are given > higher priority > h3.Fuzzy match > Fuzzy match can be applied when strict matches fail .The values can be > prefixed {{~}} to specify fuzziness > example rule > {noformat} > #Example requirement "use only one replica of a shard in a host if possible, > if no matches found , relax that rule". > rack:*,shard:*,replica:<2~ > #Another example, assign all replicas to nodes with disk space of 100GB or > more,, or relax the rule if not possible. This will ensure that if a node > does not exist with 100GB disk, nodes are picked up the order of size say a > 85GB node would be picked up over 80GB disk node > disk:>100~ > {noformat} > Examples: > {noformat} > #in each rack there can be max two replicas of A given shard > rack:*,shard:*,replica:<3 > //in each rack there can be max two replicas of ANY replica > rack:*,shard:**,replica:2 > rack:*,replica:<3 > #in each node there should be a max one replica of EACH shard > node:*,shard:*,replica:1- > #in each node there should be a max one replica of ANY
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Attachment: SOLR-6220.patch > Replica placement strategy for solrcloud > > > Key: SOLR-6220 > URL: https://issues.apache.org/jira/browse/SOLR-6220 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, > SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, > SOLR-6220.patch > > > h1.Objective > Most cloud based systems allow to specify rules on how the replicas/nodes of > a cluster are allocated . Solr should have a flexible mechanism through which > we should be able to control allocation of replicas or later change it to > suit the needs of the system > All configurations are per collection basis. The rules are applied whenever a > replica is created in any of the shards in a given collection during > * collection creation > * shard splitting > * add replica > * createsshard > There are two aspects to how replicas are placed: snitch and placement. > h2.snitch > How to identify the tags of nodes. Snitches are configured through collection > create command with the snitch param . eg: snitch=EC2Snitch or > snitch=class:EC2Snitch > h2.ImplicitSnitch > This is shipped by default with Solr. user does not need to specify > {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are > present in the rules , it is automatically used, > tags provided by ImplicitSnitch > # cores : No:of cores in the node > # disk : Disk space available in the node > # host : host name of the node > # node: node name > # D.* : These are values available from systrem propertes. {{D.key}} means a > value that is passed to the node as {{-Dkey=keyValue}} during the node > startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} > h2.Rules > This tells how many replicas for a given shard needs to be assigned to nodes > with the given key value pairs. These parameters will be passed on to the > collection CREATE api as a multivalued parameter "rule" . The values will be > saved in the state of the collection as follows > {code:Javascript} > { > “mycollection”:{ > “snitch”: { > class:“ImplicitSnitch” > } > “rules”:[{"cores":"4-"}, > {"replica":"1" ,"shard" :"*" ,"node":"*"}, > {"disk":">100"}] > } > {code} > A rule is specified as a pseudo JSON syntax . which is a map of keys and > values > *Each collection can have any number of rules. As long as the rules do not > conflict with each other it should be OK. Or else an error is thrown > * In each rule , shard and replica can be omitted > ** default value of replica is {{\*}} means ANY or you can specify a count > and an operand such as {{<}} (less than) or {{>}} (greater than) > ** and the value of shard can be a shard name or {{\*}} means EACH or > {{**}} means ANY. default value is {{\*\*}} (ANY) > * There should be exactly one extra condition in a rule other than {{shard}} > and {{replica}}. > * all keys other than {{shard}} and {{replica}} are called tags and the tags > are nothing but values provided by the snitch for each node > * By default certain tags such as {{node}}, {{host}}, {{port}} are provided > by the system implicitly > h3.How are nodes picked up? > Nodes are not picked up in random. The rules are used to first sort the nodes > according to affinity. For example, if there is a rule that says > {{disk:100+}} , nodes with more disk space are given higher preference. And > if the rule is {{disk:100-}} nodes with lesser disk space will be given > priority. If everything else is equal , nodes with fewer cores are given > higher priority > h3.Fuzzy match > Fuzzy match can be applied when strict matches fail .The values can be > prefixed {{~}} to specify fuzziness > example rule > {noformat} > #Example requirement "use only one replica of a shard in a host if possible, > if no matches found , relax that rule". > rack:*,shard:*,replica:<2~ > #Another example, assign all replicas to nodes with disk space of 100GB or > more,, or relax the rule if not possible. This will ensure that if a node > does not exist with 100GB disk, nodes are picked up the order of size say a > 85GB node would be picked up over 80GB disk node > disk:>100~ > {noformat} > Examples: > {noformat} > #in each rack there can be max two replicas of A given shard > rack:*,shard:*,replica:<3 > //in each rack there can be max two replicas of ANY replica > rack:*,shard:**,replica:2 > rack:*,replica:<3 > #in each node there should be a max one replica of EACH shard > node:*,shard:*,replica:1- > #in each node there should be a max one repli
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Attachment: SOLR-6220.patch Updated patch to trunk > Replica placement strategy for solrcloud > > > Key: SOLR-6220 > URL: https://issues.apache.org/jira/browse/SOLR-6220 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, > SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch > > > h1.Objective > Most cloud based systems allow to specify rules on how the replicas/nodes of > a cluster are allocated . Solr should have a flexible mechanism through which > we should be able to control allocation of replicas or later change it to > suit the needs of the system > All configurations are per collection basis. The rules are applied whenever a > replica is created in any of the shards in a given collection during > * collection creation > * shard splitting > * add replica > * createsshard > There are two aspects to how replicas are placed: snitch and placement. > h2.snitch > How to identify the tags of nodes. Snitches are configured through collection > create command with the snitch param . eg: snitch=EC2Snitch or > snitch=class:EC2Snitch > h2.ImplicitSnitch > This is shipped by default with Solr. user does not need to specify > {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are > present in the rules , it is automatically used, > tags provided by ImplicitSnitch > # cores : No:of cores in the node > # disk : Disk space available in the node > # host : host name of the node > # node: node name > # D.* : These are values available from systrem propertes. {{D.key}} means a > value that is passed to the node as {{-Dkey=keyValue}} during the node > startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} > h2.Rules > This tells how many replicas for a given shard needs to be assigned to nodes > with the given key value pairs. These parameters will be passed on to the > collection CREATE api as a multivalued parameter "rule" . The values will be > saved in the state of the collection as follows > {code:Javascript} > { > “mycollection”:{ > “snitch”: { > class:“ImplicitSnitch” > } > “rules”:[{"cores":"4-"}, > {"replica":"1" ,"shard" :"*" ,"node":"*"}, > {"disk":">100"}] > } > {code} > A rule is specified as a pseudo JSON syntax . which is a map of keys and > values > *Each collection can have any number of rules. As long as the rules do not > conflict with each other it should be OK. Or else an error is thrown > * In each rule , shard and replica can be omitted > ** default value of replica is {{\*}} means ANY or you can specify a count > and an operand such as {{<}} (less than) or {{>}} (greater than) > ** and the value of shard can be a shard name or {{\*}} means EACH or > {{**}} means ANY. default value is {{\*\*}} (ANY) > * There should be exactly one extra condition in a rule other than {{shard}} > and {{replica}}. > * all keys other than {{shard}} and {{replica}} are called tags and the tags > are nothing but values provided by the snitch for each node > * By default certain tags such as {{node}}, {{host}}, {{port}} are provided > by the system implicitly > h3.How are nodes picked up? > Nodes are not picked up in random. The rules are used to first sort the nodes > according to affinity. For example, if there is a rule that says > {{disk:100+}} , nodes with more disk space are given higher preference. And > if the rule is {{disk:100-}} nodes with lesser disk space will be given > priority. If everything else is equal , nodes with fewer cores are given > higher priority > h3.Fuzzy match > Fuzzy match can be applied when strict matches fail .The values can be > prefixed {{~}} to specify fuzziness > example rule > {noformat} > #Example requirement "use only one replica of a shard in a host if possible, > if no matches found , relax that rule". > rack:*,shard:*,replica:<2~ > #Another example, assign all replicas to nodes with disk space of 100GB or > more,, or relax the rule if not possible. This will ensure that if a node > does not exist with 100GB disk, nodes are picked up the order of size say a > 85GB node would be picked up over 80GB disk node > disk:>100~ > {noformat} > Examples: > {noformat} > #in each rack there can be max two replicas of A given shard > rack:*,shard:*,replica:<3 > //in each rack there can be max two replicas of ANY replica > rack:*,shard:**,replica:2 > rack:*,replica:<3 > #in each node there should be a max one replica of EACH shard > node:*,shard:*,replica:1- > #in each node there should be a max one r
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Attachment: SOLR-6220.patch More tests > Replica placement strategy for solrcloud > > > Key: SOLR-6220 > URL: https://issues.apache.org/jira/browse/SOLR-6220 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, > SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch > > > h1.Objective > Most cloud based systems allow to specify rules on how the replicas/nodes of > a cluster are allocated . Solr should have a flexible mechanism through which > we should be able to control allocation of replicas or later change it to > suit the needs of the system > All configurations are per collection basis. The rules are applied whenever a > replica is created in any of the shards in a given collection during > * collection creation > * shard splitting > * add replica > * createsshard > There are two aspects to how replicas are placed: snitch and placement. > h2.snitch > How to identify the tags of nodes. Snitches are configured through collection > create command with the snitch param . eg: snitch=EC2Snitch or > snitch=class:EC2Snitch > h2.ImplicitSnitch > This is shipped by default with Solr. user does not need to specify > {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are > present in the rules , it is automatically used, > tags provided by ImplicitSnitch > # cores : No:of cores in the node > # disk : Disk space available in the node > # host : host name of the node > # node: node name > # D.* : These are values available from systrem propertes. {{D.key}} means a > value that is passed to the node as {{-Dkey=keyValue}} during the node > startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} > h2.Rules > This tells how many replicas for a given shard needs to be assigned to nodes > with the given key value pairs. These parameters will be passed on to the > collection CREATE api as a multivalued parameter "rule" . The values will be > saved in the state of the collection as follows > {code:Javascript} > { > “mycollection”:{ > “snitch”: { > class:“ImplicitSnitch” > } > “rules”:[{"cores":"4-"}, > {"replica":"1" ,"shard" :"*" ,"node":"*"}, > {"disk":">100"}] > } > {code} > A rule is specified as a pseudo JSON syntax . which is a map of keys and > values > *Each collection can have any number of rules. As long as the rules do not > conflict with each other it should be OK. Or else an error is thrown > * In each rule , shard and replica can be omitted > ** default value of replica is {{\*}} means ANY or you can specify a count > and an operand such as {{<}} (less than) or {{>}} (greater than) > ** and the value of shard can be a shard name or {{\*}} means EACH or > {{**}} means ANY. default value is {{\*\*}} (ANY) > * There should be exactly one extra condition in a rule other than {{shard}} > and {{replica}}. > * all keys other than {{shard}} and {{replica}} are called tags and the tags > are nothing but values provided by the snitch for each node > * By default certain tags such as {{node}}, {{host}}, {{port}} are provided > by the system implicitly > h3.How are nodes picked up? > Nodes are not picked up in random. The rules are used to first sort the nodes > according to affinity. For example, if there is a rule that says > {{disk:100+}} , nodes with more disk space are given higher preference. And > if the rule is {{disk:100-}} nodes with lesser disk space will be given > priority. If everything else is equal , nodes with fewer cores are given > higher priority > h3.Fuzzy match > Fuzzy match can be applied when strict matches fail .The values can be > prefixed {{~}} to specify fuzziness > example rule > {noformat} > #Example requirement "use only one replica of a shard in a host if possible, > if no matches found , relax that rule". > rack:*,shard:*,replica:<2~ > #Another example, assign all replicas to nodes with disk space of 100GB or > more,, or relax the rule if not possible. This will ensure that if a node > does not exist with 100GB disk, nodes are picked up the order of size say a > 85GB node would be picked up over 80GB disk node > disk:>100~ > {noformat} > Examples: > {noformat} > #in each rack there can be max two replicas of A given shard > rack:*,shard:*,replica:<3 > //in each rack there can be max two replicas of ANY replica > rack:*,shard:**,replica:2 > rack:*,replica:<3 > #in each node there should be a max one replica of EACH shard > node:*,shard:*,replica:1- > #in each node there should be a max one replica of ANY shard > node:
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Description: h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk space available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter "rule" . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{"cores":"4-"}, {"replica":"1" ,"shard" :"*" ,"node":"*"}, {"disk":">100"}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{<}} (less than) or {{>}} (greater than) ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with more disk space are given higher preference. And if the rule is {{disk:100-}} nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement "use only one replica of a shard in a host if possible, if no matches found , relax that rule". rack:*,shard:*,replica:<2~ #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not exist with 100GB disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node disk:>100~ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:<3 //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2 rack:*,replica:<3 #in each node there should be a max one replica of EACH shard node:*,shard:*,replica:1- #in each node there should be a max one replica of ANY shard node:*,shard:**,replica:1- node:*,replica:1- #In rack 738 and shard=shard1, there can be a max 0 replica rack:738,shard:shard1,replica:<1 #All replicas of shard1 should go to rack 730 shard:shard1,replica:*,rack:730 shard:shard1,rack:730 #all replicas must be created in a node with at least 20GB disk replica:*,shard:*,disk:>20 replica:*,disk:>20 disk:>20 #All replicas should be created in nodes with less than 5 cores #In this ANY AND each for shard have same meaning replica:*,shard:**,cores:<5 replica:*,cores:<5 cores:<5 #one replica of shard1 must go to node 192.168.1.2:8080_solr node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 #No replica of shard1 sho
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Attachment: SOLR-6220.patch Operators {{+}} and {{-}} replaced with {{<}} and {{>}} This is now feature complete. I'll add some more tests and commit this > Replica placement strategy for solrcloud > > > Key: SOLR-6220 > URL: https://issues.apache.org/jira/browse/SOLR-6220 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, > SOLR-6220.patch, SOLR-6220.patch > > > h1.Objective > Most cloud based systems allow to specify rules on how the replicas/nodes of > a cluster are allocated . Solr should have a flexible mechanism through which > we should be able to control allocation of replicas or later change it to > suit the needs of the system > All configurations are per collection basis. The rules are applied whenever a > replica is created in any of the shards in a given collection during > * collection creation > * shard splitting > * add replica > * createsshard > There are two aspects to how replicas are placed: snitch and placement. > h2.snitch > How to identify the tags of nodes. Snitches are configured through collection > create command with the snitch param . eg: snitch=EC2Snitch or > snitch=class:EC2Snitch > h2.ImplicitSnitch > This is shipped by default with Solr. user does not need to specify > {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are > present in the rules , it is automatically used, > tags provided by ImplicitSnitch > # cores : No:of cores in the node > # disk : Disk space available in the node > # host : host name of the node > # node: node name > # D.* : These are values available from systrem propertes. {{D.key}} means a > value that is passed to the node as {{-Dkey=keyValue}} during the node > startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} > h2.Rules > This tells how many replicas for a given shard needs to be assigned to nodes > with the given key value pairs. These parameters will be passed on to the > collection CREATE api as a multivalued parameter "rule" . The values will be > saved in the state of the collection as follows > {code:Javascript} > { > “mycollection”:{ > “snitch”: { > class:“ImplicitSnitch” > } > “rules”:[{"cores":"4-"}, > {"replica":"1" ,"shard" :"*" ,"node":"*"}, > {"disk":"1+"}]ol < 2 > } > {code} > A rule is specified as a pseudo JSON syntax . which is a map of keys and > values > *Each collection can have any number of rules. As long as the rules do not > conflict with each other it should be OK. Or else an error is thrown > * In each rule , shard and replica can be omitted > ** default value of replica is {{\*}} means ANY or you can specify a count > and an operand such as {{+}} or {{-}} > ** and the value of shard can be a shard name or {{\*}} means EACH or > {{**}} means ANY. default value is {{\*\*}} (ANY) > * There should be exactly one extra condition in a rule other than {{shard}} > and {{replica}}. > * all keys other than {{shard}} and {{replica}} are called tags and the tags > are nothing but values provided by the snitch for each node > * By default certain tags such as {{node}}, {{host}}, {{port}} are provided > by the system implicitly > h3.How are nodes picked up? > Nodes are not picked up in random. The rules are used to first sort the nodes > according to affinity. For example, if there is a rule that says > {{disk:100+}} , nodes with more disk space are given higher preference. And > if the rule is {{disk:100-}} nodes with lesser disk space will be given > priority. If everything else is equal , nodes with fewer cores are given > higher priority > h3.Fuzzy match > Fuzzy match can be applied when strict matches fail .The values can be > prefixed {{~}} to specify fuzziness > example rule > {noformat} > #Example requirement "use only one replica of a shard in a host if possible, > if no matches found , relax that rule". > rack:*,shard:*,replica:~1- > #Another example, assign all replicas to nodes with disk space of 100GB or > more,, or relax the rule if not possible. This will ensure that if a node > does not exist with 100GB disk, nodes are picked up the order of size say a > 85GB node would be picked up over 80GB disk node > disk:~100+ > {noformat} > Examples: > {noformat} > #in each rack there can be max two replicas of A given shard > rack:*,shard:*,replica:2- > //in each rack there can be max two replicas of ANY replica > rack:*,shard:**,replica:2 > rack:*,replica:2- > #in each node there should be a max one replica of EACH shard > node:*,shard:*,replica:
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Description: h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk space available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter "rule" . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{"cores":"4-"}, {"replica":"1" ,"shard" :"*" ,"node":"*"}, {"disk":"1+"}]ol < 2 } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{+}} or {{-}} ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with more disk space are given higher preference. And if the rule is {{disk:100-}} nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement "use only one replica of a shard in a host if possible, if no matches found , relax that rule". rack:*,shard:*,replica:~1- #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not exist with 100GB disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node disk:~100+ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:2- //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2 rack:*,replica:2- #in each node there should be a max one replica of EACH shard node:*,shard:*,replica:1- #in each node there should be a max one replica of ANY shard node:*,shard:**,replica:1- node:*,replica:1- #In rack 738 and shard=shard1, there can be a max 0 replica rack:738,shard:shard1,replica:0- #All replicas of shard1 should go to rack 730 shard:shard1,replica:*,rack:730 shard:shard1,rack:730 #all replicas must be created in a node with at least 20GB disk replica:*,shard:*,disk:20+ replica:*,disk:20+ disk:20+ #All replicas should be created in nodes with less than 5 cores #In this ANY AND each for shard have same meaning replica:*,shard:**,cores:5- replica:*,cores:5- cores:5- #one replica of shard1 must go to node 192.168.1.2:8080_solr node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 #No replica of shard1 should go to rack 738 rack
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Description: h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk cpace available in the nodeol < 2 # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter "rule" . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{"cores":"4-"}, {"replica":"1" ,"shard" :"*" ,"node":"*"}, {"disk":"1+"}]ol < 2 } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{+}} or {{-}} ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinol < 2ity. For example, if there is a rule that says {{disk:100+}} , nodes with 100GB or more are given higher preference. If all alse are equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement "use only one replica of a shard in a host if possible, if no matches found , relax that rule". rack:*,shard:*,replica:~1- #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not exist with 100GB disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node disk:~100+ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:2- //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2 rack:*,replica:2- #in each node there should be a max one replica of EACH shard node:*,shard:*,replica:1- #in each node there should be a max one replica of ANY shard node:*,shard:**,replica:1- node:*,replica:1- #In rack 738 and shard=shard1, there can be a max 0 replica rack:738,shard:shard1,replica:0- #All replicas of shard1 should go to rack 730 shard:shard1,replica:*,rack:730 shard:shard1,rack:730 #all replicas must be created in a node with at least 20GB disk replica:*,shard:*,disk:20+ replica:*,disk:20+ disk:20+ #All replicas should be created in nodes with less than 5 cores #In this ANY AND each for shard have same meaning replica:*,shard:**,cores:5- replica:*,cores:5- cores:5- #one replica of shard1 must go to node 192.168.1.2:8080_solr node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 #No replica of shard1 should go to rack 738 rack:!738,shard:shard1,replica:* rack:!738,shard:shard1 #No replica of ANY shard should g
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Description: h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk cpace available in the nodeol < 2 # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter "rule" . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{"cores":"4-"}, {"replica":"1" ,"shard" :"*" ,"node":"*"}, {"disk":"1+"}]ol < 2 } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{+}} or {{-}} ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinol < 2ity. For example, if there is a rule that says {{disk:100+}} , nodes with 100GB or more are given higher preference. If all alse are equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement "use only one replica of a shard in a host if possible, if no matches found , relax that rule". rack:*,shard:*,replica:~1- #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible disk:~100+ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:2- //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2 rack:*,replica:2- #in each node there should be a max one replica of EACH shard node:*,shard:*,replica:1- #in each node there should be a max one replica of ANY shard node:*,shard:**,replica:1- node:*,replica:1- #In rack 738 and shard=shard1, there can be a max 0 replica rack:738,shard:shard1,replica:0- #All replicas of shard1 should go to rack 730 shard:shard1,replica:*,rack:730 shard:shard1,rack:730 #all replicas must be created in a node with at least 20GB disk replica:*,shard:*,disk:20+ replica:*,disk:20+ disk:20+ #All replicas should be created in nodes with less than 5 cores #In this ANY AND each for shard have same meaning replica:*,shard:**,cores:5- replica:*,cores:5- cores:5- #one replica of shard1 must go to node 192.168.1.2:8080_solr node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 #No replica of shard1 should go to rack 738 rack:!738,shard:shard1,replica:* rack:!738,shard:shard1 #No replica of ANY shard should go to rack 738 rack:!738,shard:**,replica:* rack:!738,shard:* rack:!738 {noformat} In the collection create API all the placement rules are provided as a par
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Attachment: SOLR-6220.patch * Added fuzzy match option {{~}} * nodes are presorted based on rules instead of randomly picking nodes * {{ImplicitSnitch}} can support per node system properties > Replica placement strategy for solrcloud > > > Key: SOLR-6220 > URL: https://issues.apache.org/jira/browse/SOLR-6220 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, > SOLR-6220.patch > > > ol < 2ol < 2h1.Objective > Most cloud based systems allow to specify rules on how the replicas/nodes of > a cluster are allocated . Solr should have a flexible mechanism through which > we should be able to control allocation of replicas or later change it to > suit the needs of the system > All configurations are per collection basis. The rules are applied whenever a > replica is created in any of the shards in a given collection during > * collection creation > * shard splitting > * add replica > * createsshard > There are two aspects to how replicas are placed: snitch and placement. > h2.snitch > How to identify the tags of nodes. Snitches are configured through collection > create command with the snitch param . eg: snitch=EC2Snitch or > snitch=class:EC2Snitch > h2.ImplicitSnitch > This is shipped by default with Solr. user does not need to specify > {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are > present in the rules , it is automatically used, > tags provided by ImplicitSnitch > # cores : No:of cores in the node > # disk : Disk cpace available in the nodeol < 2 > # host : host name of the node > # node: node name > # D.* : These are values available from systrem propertes. {{D.key}} means a > value that is passed to the node as {{-Dkey=keyValue}} during the node > startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} > h2.Rules > This tells how many replicas for a given shard needs to be assigned to nodes > with the given key value pairs. These parameters will be passed on to the > collection CREATE api as a multivalued parameter "rule" . The values will be > saved in the state of the collection as follows > {code:Javascript} > { > “mycollection”:{ > “snitch”: { > class:“ImplicitSnitch” > } > “rules”:[{"cores":"4-"}, > {"replica":"1" ,"shard" :"*" ,"node":"*"}, > {"disk":"1+"}]ol < 2 > } > {code} > A rule is specified as a pseudo JSON syntax . which is a map of keys and > values > *Each collection can have any number of rules. As long as the rules do not > conflict with each other it should be OK. Or else an error is thrown > * In each rule , shard and replica can be omitted > ** default value of replica is {{\*}} means ANY or you can specify a count > and an operand such as {{+}} or {{-}} > ** and the value of shard can be a shard name or {{\*}} means EACH or > {{**}} means ANY. default value is {{\*\*}} (ANY) > * There should be exactly one extra condition in a rule other than {{shard}} > and {{replica}}. > * all keys other than {{shard}} and {{replica}} are called tags and the tags > are nothing but values provided by the snitch for each node > * By default certain tags such as {{node}}, {{host}}, {{port}} are provided > by the system implicitly > h3.How are nodes picked up? > Nodes are not picked up in random. The rules are used to first sort the nodes > according to affinol < 2ity. For example, if there is a rule that says > {{disk:100+}} , nodes with 100GB or more are given higher preference. If all > alse are equal , nodes with fewer cores are given higher priority > h3.Fuzzy match > Fuzzy match can be applied when strict matches fail .The values can be > prefixed {{~}} to specify fuzziness > example rule > {noformat} > #Example requirement "use only one replica of a shard in a host if possible, > if no matches found , relax that rule". > rack:*,shard:*,replica:~1- > #Another example, assign all replicas to nodes with disk space of 100GB or > more,, or relax the rule if not possible > disk:~100+ > {noformat} > Examples: > {noformat} > #in each rack there can be max two replicas of A given shard > rack:*,shard:*,replica:2- > //in each rack there can be max two replicas of ANY replica > rack:*,shard:**,replica:2-ol < 2ol < 2ol < 2 > rack:*,replica:2- > #in each node there should be a max one replica of EACH shard > node:*,shard:*,replica:1- > #in each node there should be a max one replica of ANY shard > node:*,shard:**,replica:1- > node:*,replica:1- > > #In rack 738 and shard=shard1, there can be a max 0 replica > rack:738,shard:sh
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-6220: Description: ol < 2ol < 2h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk cpace available in the nodeol < 2 # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter "rule" . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{"cores":"4-"}, {"replica":"1" ,"shard" :"*" ,"node":"*"}, {"disk":"1+"}]ol < 2 } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{+}} or {{-}} ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinol < 2ity. For example, if there is a rule that says {{disk:100+}} , nodes with 100GB or more are given higher preference. If all alse are equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement "use only one replica of a shard in a host if possible, if no matches found , relax that rule". rack:*,shard:*,replica:~1- #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible disk:~100+ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:2- //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2-ol < 2ol < 2ol < 2 rack:*,replica:2- #in each node there should be a max one replica of EACH shard node:*,shard:*,replica:1- #in each node there should be a max one replica of ANY shard node:*,shard:**,replica:1- node:*,replica:1- #In rack 738 and shard=shard1, there can be a max 0 replica rack:738,shard:shard1,replica:0- #All replicas of shard1 should go to rack 730 shard:shard1,replica:*,rack:730 shard:shard1,rack:730 #all replicas must be created in a node with at least 20GB disk replica:*,shard:*,disk:20+ replica:*,disk:20+ disk:20+ #All replicas should be created in nodes with less than 5 cores #In this ANY AND each for shard have same meaning replica:*,shard:**,cores:5- replica:*,cores:5- cores:5- #one replica of shard1 must go to node 192.168.1.2:8080_solr node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 #No replica of shard1 should go to rack 738 rack:!738,shard:shard1,replica:* rack:!738,shard:shard1 #No replica of ANY shard should go to rack 738 rack:!738,shard:**,replica:* rack:!738,shard:* rack:!738 {noformat} In the collection crea
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Description: h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk cpace available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter "rule" . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{"cores":"4-"}, {"replica":"1" ,"shard" :"*" ,"node":"*"}, {"disk":"1+"}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{+}} or {{-}} ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with 100GB or more are given higher preference. If all alse are equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement "use only one replica of a shard in a host if possible, if no matches found , relax that rule". rack:*,shard:*,replica:~1- #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible disk:~100+ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:2- //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2- rack:*,replica:2- #in each node there should be a max one replica of EACH shard node:*,shard:*,replica:1- #in each node there should be a max one replica of ANY shard node:*,shard:**,replica:1- node:*,replica:1- #In rack 738 and shard=shard1, there can be a max 0 replica rack:738,shard:shard1,replica:0- #All replicas of shard1 should go to rack 730 shard:shard1,replica:*,rack:730 shard:shard1,rack:730 #all replicas must be created in a node with at least 20GB disk replica:*,shard:*,disk:20+ replica:*,disk:20+ disk:20+ #All replicas should be created in nodes with less than 5 cores #In this ANY AND each for shard have same meaning replica:*,shard:**,cores:5- replica:*,cores:5- cores:5- #one replica of shard1 must go to node 192.168.1.2:8080_solr node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 #No replica of shard1 should go to rack 738 rack:!738,shard:shard1,replica:* rack:!738,shard:shard1 #No replica of ANY shard should go to rack 738 rack:!738,shard:**,replica:* rack:!738,shard:* rack:!738 {noformat} In the collection create API all the placement rules are provided as a parameters called ru
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Description: h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch prefix . eg: snitch.type=EC2Snitch. The system provides the following implicit tag names which cannot be used by other snitches * node : The solr nodename * host : The hostname * ip : The ip address of the host * cores : This is a dynamic varibale which gives the core count at any given point * disk : This is a dynamic variable which gives the available disk space at any given point There will a few snitches provided by the system such as h3.EC2Snitch Provides two tags called dc, rack from the region and zone values in EC2 h3.IPSnitch Use the IP to infer the “dc” and “rack” values h3.NodePropertySnitch This lets users provide system properties to each node with tagname and value . example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this particular node will have two tags “tag-x” and “tag-y” . h3.RestSnitch Which lets the user configure a url which the server can invoke and get all the tags for a given node. This takes extra parameters in create command example: {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}} The response of the rest call {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}} must be in json format eg: {code:JavaScript} { “tag-x”:”x-val”, “tag-y”:”y-val” } {code} h3.ManagedSnitch This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The user should be able to manage the tags and values of each node through a collection API h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter "rule" . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitTagsSnitch” } “rules”:[{"cores":"4-"}, {"replica":"1" ,"shard" :"*" ,"node":"*"}, {"disk":"1+"}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{+}} or {{-}} ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly Examples: {noformat} //in each rack there can be max two replicas of A given shard {rack:*,shard:*,replica:2-} //in each rack there can be max two replicas of ANY replica {rack:*,shard:**,replica:2-} {rack:*,replica:2-} //in each node there should be a max one replica of EACH shard {node:*,shard:*,replica:1-} //in each node there should be a max one replica of ANY shard {node:*,shard:**,replica:1-} {node:*,replica:1-} //In rack 738 and shard=shard1, there can be a max 0 replica {rack:738,shard:shard1,replica:0-} //All replicas of shard1 should go to rack 730 {shard:shard1,replica:*,rack:730} {shard:shard1,rack:730} // all replicas must be created in a node with at least 20GB disk {replica:*,shard:*,disk:20+} {replica:*,disk:20+} {disk:20+} // All replicas should be created in nodes with less than 5 cores //In this ANY AND each for shard have same meaning {replica:*,shard:**,cores:5-} {replica:*,cores:5-} {cores:5-} //one replica of shard1 must go to node 192.168.1.2:8080_solr {node:”192.168.1.2:8080_solr”, shard:shard1, replica:1} //No replica of shard1 should go to rack 738 {rack:!738,shard:shard1,replica:*} {rack:!738,shard:shard1} //No replica of ANY shard should go to rack 738 {rack:!738,shard:**,replica:*} {rack:!738,shard
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Attachment: (was: SOLR-6220.patch) > Replica placement strategy for solrcloud > > > Key: SOLR-6220 > URL: https://issues.apache.org/jira/browse/SOLR-6220 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch > > > h1.Objective > Most cloud based systems allow to specify rules on how the replicas/nodes of > a cluster are allocated . Solr should have a flexible mechanism through which > we should be able to control allocation of replicas or later change it to > suit the needs of the system > All configurations are per collection basis. The rules are applied whenever a > replica is created in any of the shards in a given collection during > * collection creation > * shard splitting > * add replica > * createsshard > There are two aspects to how replicas are placed: snitch and placement. > h2.snitch > How to identify the tags of nodes. Snitches are configured through collection > create command with the snitch prefix . eg: snitch.type=EC2Snitch. > The system provides the following implicit tag names which cannot be used by > other snitches > * node : The solr nodename > * host : The hostname > * ip : The ip address of the host > * cores : This is a dynamic varibale which gives the core count at any given > point > * disk : This is a dynamic variable which gives the available disk space at > any given point > There will a few snitches provided by the system such as > h3.EC2Snitch > Provides two tags called dc, rack from the region and zone values in EC2 > h3.IPSnitch > Use the IP to infer the “dc” and “rack” values > h3.NodePropertySnitch > This lets users provide system properties to each node with tagname and value > . > example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this > particular node will have two tags “tag-x” and “tag-y” . > > h3.RestSnitch > Which lets the user configure a url which the server can invoke and get all > the tags for a given node. > This takes extra parameters in create command > example: {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}} > The response of the rest call > {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}} > must be in json format > eg: > {code:JavaScript} > { > “tag-x”:”x-val”, > “tag-y”:”y-val” > } > {code} > h3.ManagedSnitch > This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The > user should be able to manage the tags and values of each node through a > collection API > h2.Rules > This tells how many replicas for a given shard needs to be assigned to nodes > with the given key value pairs. These parameters will be passed on to the > collection CREATE api as a multivalued parameter "rule" . The values will be > saved in the state of the collection as follows > {code:Javascript} > { > “mycollection”:{ > “snitch”: { > type:“EC2Snitch” > } > “rules”:[ >{“shard”: “value1”, “replica”: “value2”, "tag1":"val1"}, >{“shard”: “value1”, “replica”: “value2”, "tag2":"val2"} >] > } > {code} > A rule is specified as a pseudo JSON syntax . which is a map of keys and > values > *Each collection can have any number of rules. As long as the rules do not > conflict with each other it should be OK. Or else an error is thrown > * In each rule , shard and replica can be omitted > ** default value of replica is {{\*}} means ANY or you can specify a count > and an operand such as {{+}} or {{-}} > ** and the value of shard can be a shard name or {{\*}} means EACH or > {{**}} means ANY. default value is {{\*\*}} (ANY) > * There should be exactly one extra condition in a rule other than {{shard}} > and {{replica}}. > * all keys other than {{shard}} and {{replica}} are called tags and the tags > are nothing but values provided by the snitch for each node > * By default certain tags such as {{node}}, {{host}}, {{port}} are provided > by the system implicitly > Examples: > {noformat} > //in each rack there can be max two replicas of A given shard > {rack:*,shard:*,replica:2-} > //in each rack there can be max two replicas of ANY replica > {rack:*,shard:**,replica:2-} > {rack:*,replica:2-} > //in each node there should be a max one replica of EACH shard > {node:*,shard:*,replica:1-} > //in each node there should be a max one replica of ANY shard > {node:*,shard:**,replica:1-} > {node:*,replica:1-} > > //In rack 738 and shard=shard1, there can be a max 0 replica > {rack:738,shard:shard1,replica:0-} > > //All replicas of shard1 should go to rack 730 > {shard:shard1,replica:*,rack:730} > {
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Attachment: SOLR-6220.patch > Replica placement strategy for solrcloud > > > Key: SOLR-6220 > URL: https://issues.apache.org/jira/browse/SOLR-6220 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch > > > h1.Objective > Most cloud based systems allow to specify rules on how the replicas/nodes of > a cluster are allocated . Solr should have a flexible mechanism through which > we should be able to control allocation of replicas or later change it to > suit the needs of the system > All configurations are per collection basis. The rules are applied whenever a > replica is created in any of the shards in a given collection during > * collection creation > * shard splitting > * add replica > * createsshard > There are two aspects to how replicas are placed: snitch and placement. > h2.snitch > How to identify the tags of nodes. Snitches are configured through collection > create command with the snitch prefix . eg: snitch.type=EC2Snitch. > The system provides the following implicit tag names which cannot be used by > other snitches > * node : The solr nodename > * host : The hostname > * ip : The ip address of the host > * cores : This is a dynamic varibale which gives the core count at any given > point > * disk : This is a dynamic variable which gives the available disk space at > any given point > There will a few snitches provided by the system such as > h3.EC2Snitch > Provides two tags called dc, rack from the region and zone values in EC2 > h3.IPSnitch > Use the IP to infer the “dc” and “rack” values > h3.NodePropertySnitch > This lets users provide system properties to each node with tagname and value > . > example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this > particular node will have two tags “tag-x” and “tag-y” . > > h3.RestSnitch > Which lets the user configure a url which the server can invoke and get all > the tags for a given node. > This takes extra parameters in create command > example: {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}} > The response of the rest call > {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}} > must be in json format > eg: > {code:JavaScript} > { > “tag-x”:”x-val”, > “tag-y”:”y-val” > } > {code} > h3.ManagedSnitch > This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The > user should be able to manage the tags and values of each node through a > collection API > h2.Rules > This tells how many replicas for a given shard needs to be assigned to nodes > with the given key value pairs. These parameters will be passed on to the > collection CREATE api as a multivalued parameter "rule" . The values will be > saved in the state of the collection as follows > {code:Javascript} > { > “mycollection”:{ > “snitch”: { > type:“EC2Snitch” > } > “rules”:[ >{“shard”: “value1”, “replica”: “value2”, "tag1":"val1"}, >{“shard”: “value1”, “replica”: “value2”, "tag2":"val2"} >] > } > {code} > A rule is specified as a pseudo JSON syntax . which is a map of keys and > values > *Each collection can have any number of rules. As long as the rules do not > conflict with each other it should be OK. Or else an error is thrown > * In each rule , shard and replica can be omitted > ** default value of replica is {{\*}} means ANY or you can specify a count > and an operand such as {{+}} or {{-}} > ** and the value of shard can be a shard name or {{\*}} means EACH or > {{**}} means ANY. default value is {{\*\*}} (ANY) > * There should be exactly one extra condition in a rule other than {{shard}} > and {{replica}}. > * all keys other than {{shard}} and {{replica}} are called tags and the tags > are nothing but values provided by the snitch for each node > * By default certain tags such as {{node}}, {{host}}, {{port}} are provided > by the system implicitly > Examples: > {noformat} > //in each rack there can be max two replicas of A given shard > {rack:*,shard:*,replica:2-} > //in each rack there can be max two replicas of ANY replica > {rack:*,shard:**,replica:2-} > {rack:*,replica:2-} > //in each node there should be a max one replica of EACH shard > {node:*,shard:*,replica:1-} > //in each node there should be a max one replica of ANY shard > {node:*,shard:**,replica:1-} > {node:*,replica:1-} > > //In rack 738 and shard=shard1, there can be a max 0 replica > {rack:738,shard:shard1,replica:0-} > > //All replicas of shard1 should go to rack 730 > {shard:shard1,replica:*,rack:730} > {shard:shard
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Attachment: SOLR-6220.patch More tests . > Replica placement strategy for solrcloud > > > Key: SOLR-6220 > URL: https://issues.apache.org/jira/browse/SOLR-6220 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch > > > h1.Objective > Most cloud based systems allow to specify rules on how the replicas/nodes of > a cluster are allocated . Solr should have a flexible mechanism through which > we should be able to control allocation of replicas or later change it to > suit the needs of the system > All configurations are per collection basis. The rules are applied whenever a > replica is created in any of the shards in a given collection during > * collection creation > * shard splitting > * add replica > * createsshard > There are two aspects to how replicas are placed: snitch and placement. > h2.snitch > How to identify the tags of nodes. Snitches are configured through collection > create command with the snitch prefix . eg: snitch.type=EC2Snitch. > The system provides the following implicit tag names which cannot be used by > other snitches > * node : The solr nodename > * host : The hostname > * ip : The ip address of the host > * cores : This is a dynamic varibale which gives the core count at any given > point > * disk : This is a dynamic variable which gives the available disk space at > any given point > There will a few snitches provided by the system such as > h3.EC2Snitch > Provides two tags called dc, rack from the region and zone values in EC2 > h3.IPSnitch > Use the IP to infer the “dc” and “rack” values > h3.NodePropertySnitch > This lets users provide system properties to each node with tagname and value > . > example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this > particular node will have two tags “tag-x” and “tag-y” . > > h3.RestSnitch > Which lets the user configure a url which the server can invoke and get all > the tags for a given node. > This takes extra parameters in create command > example: {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}} > The response of the rest call > {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}} > must be in json format > eg: > {code:JavaScript} > { > “tag-x”:”x-val”, > “tag-y”:”y-val” > } > {code} > h3.ManagedSnitch > This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The > user should be able to manage the tags and values of each node through a > collection API > h2.Rules > This tells how many replicas for a given shard needs to be assigned to nodes > with the given key value pairs. These parameters will be passed on to the > collection CREATE api as a multivalued parameter "rule" . The values will be > saved in the state of the collection as follows > {code:Javascript} > { > “mycollection”:{ > “snitch”: { > type:“EC2Snitch” > } > “rules”:[ >{“shard”: “value1”, “replica”: “value2”, "tag1":"val1"}, >{“shard”: “value1”, “replica”: “value2”, "tag2":"val2"} >] > } > {code} > A rule is specified as a pseudo JSON syntax . which is a map of keys and > values > *Each collection can have any number of rules. As long as the rules do not > conflict with each other it should be OK. Or else an error is thrown > * In each rule , shard and replica can be omitted > ** default value of replica is {{\*}} means ANY or you can specify a count > and an operand such as {{+}} or {{-}} > ** and the value of shard can be a shard name or {{\*}} means EACH or > {{**}} means ANY. default value is {{\*\*}} (ANY) > * There should be exactly one extra condition in a rule other than {{shard}} > and {{replica}}. > * all keys other than {{shard}} and {{replica}} are called tags and the tags > are nothing but values provided by the snitch for each node > * By default certain tags such as {{node}}, {{host}}, {{port}} are provided > by the system implicitly > Examples: > {noformat} > //in each rack there can be max two replicas of A given shard > {rack:*,shard:*,replica:2-} > //in each rack there can be max two replicas of ANY replica > {rack:*,shard:**,replica:2-} > {rack:*,replica:2-} > //in each node there should be a max one replica of EACH shard > {node:*,shard:*,replica:1-} > //in each node there should be a max one replica of ANY shard > {node:*,shard:**,replica:1-} > {node:*,replica:1-} > > //In rack 738 and shard=shard1, there can be a max 0 replica > {rack:738,shard:shard1,replica:0-} > > //All replicas of shard1 should go to rack 730 > {shard:shard1,replica:*,rack:730}
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Attachment: SOLR-6220.patch All planned features included. Tests will come next > Replica placement strategy for solrcloud > > > Key: SOLR-6220 > URL: https://issues.apache.org/jira/browse/SOLR-6220 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-6220.patch, SOLR-6220.patch > > > h1.Objective > Most cloud based systems allow to specify rules on how the replicas/nodes of > a cluster are allocated . Solr should have a flexible mechanism through which > we should be able to control allocation of replicas or later change it to > suit the needs of the system > All configurations are per collection basis. The rules are applied whenever a > replica is created in any of the shards in a given collection during > * collection creation > * shard splitting > * add replica > * createsshard > There are two aspects to how replicas are placed: snitch and placement. > h2.snitch > How to identify the tags of nodes. Snitches are configured through collection > create command with the snitch prefix . eg: snitch.type=EC2Snitch. > The system provides the following implicit tag names which cannot be used by > other snitches > * node : The solr nodename > * host : The hostname > * ip : The ip address of the host > * cores : This is a dynamic varibale which gives the core count at any given > point > * disk : This is a dynamic variable which gives the available disk space at > any given point > There will a few snitches provided by the system such as > h3.EC2Snitch > Provides two tags called dc, rack from the region and zone values in EC2 > h3.IPSnitch > Use the IP to infer the “dc” and “rack” values > h3.NodePropertySnitch > This lets users provide system properties to each node with tagname and value > . > example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this > particular node will have two tags “tag-x” and “tag-y” . > > h3.RestSnitch > Which lets the user configure a url which the server can invoke and get all > the tags for a given node. > This takes extra parameters in create command > example: {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}} > The response of the rest call > {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}} > must be in json format > eg: > {code:JavaScript} > { > “tag-x”:”x-val”, > “tag-y”:”y-val” > } > {code} > h3.ManagedSnitch > This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The > user should be able to manage the tags and values of each node through a > collection API > h2.Rules > This tells how many replicas for a given shard needs to be assigned to nodes > with the given key value pairs. These parameters will be passed on to the > collection CREATE api as a multivalued parameter "rule" . The values will be > saved in the state of the collection as follows > {code:Javascript} > { > “mycollection”:{ > “snitch”: { > type:“EC2Snitch” > } > “rules”:[ >{“shard”: “value1”, “replica”: “value2”, "tag1":"val1"}, >{“shard”: “value1”, “replica”: “value2”, "tag2":"val2"} >] > } > {code} > A rule is specified as a pseudo JSON syntax . which is a map of keys and > values > *Each collection can have any number of rules. As long as the rules do not > conflict with each other it should be OK. Or else an error is thrown > * In each rule , shard and replica can be omitted > ** default value of replica is {{\*}} means ANY or you can specify a count > and an operand such as {{+}} or {{-}} > ** and the value of shard can be a shard name or {{\*}} means EACH or > {{**}} means ANY. default value is {{\*\*}} (ANY) > * There should be exactly one extra condition in a rule other than {{shard}} > and {{replica}}. > * all keys other than {{shard}} and {{replica}} are called tags and the tags > are nothing but values provided by the snitch for each node > * By default certain tags such as {{node}}, {{host}}, {{port}} are provided > by the system implicitly > Examples: > {noformat} > //in each rack there can be max two replicas of A given shard > {rack:*,shard:*,replica:2-} > //in each rack there can be max two replicas of ANY replica > {rack:*,shard:**,replica:2-} > {rack:*,replica:2-} > //in each node there should be a max one replica of EACH shard > {node:*,shard:*,replica:1-} > //in each node there should be a max one replica of ANY shard > {node:*,shard:**,replica:1-} > {node:*,replica:1-} > > //In rack 738 and shard=shard1, there can be a max 0 replica > {rack:738,shard:shard1,replica:0-} > > //All replicas of shard1 should go to rack 730 > {shard:shard1
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Attachment: SOLR-6220.patch First cut with some basic tests > Replica placement strategy for solrcloud > > > Key: SOLR-6220 > URL: https://issues.apache.org/jira/browse/SOLR-6220 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-6220.patch > > > h1.Objective > Most cloud based systems allow to specify rules on how the replicas/nodes of > a cluster are allocated . Solr should have a flexible mechanism through which > we should be able to control allocation of replicas or later change it to > suit the needs of the system > All configurations are per collection basis. The rules are applied whenever a > replica is created in any of the shards in a given collection during > * collection creation > * shard splitting > * add replica > * createsshard > There are two aspects to how replicas are placed: snitch and placement. > h2.snitch > How to identify the tags of nodes. Snitches are configured through collection > create command with the snitch prefix . eg: snitch.type=EC2Snitch. > The system provides the following implicit tag names which cannot be used by > other snitches > * node : The solr nodename > * host : The hostname > * ip : The ip address of the host > * cores : This is a dynamic varibale which gives the core count at any given > point > * disk : This is a dynamic variable which gives the available disk space at > any given point > There will a few snitches provided by the system such as > h3.EC2Snitch > Provides two tags called dc, rack from the region and zone values in EC2 > h3.IPSnitch > Use the IP to infer the “dc” and “rack” values > h3.NodePropertySnitch > This lets users provide system properties to each node with tagname and value > . > example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this > particular node will have two tags “tag-x” and “tag-y” . > > h3.RestSnitch > Which lets the user configure a url which the server can invoke and get all > the tags for a given node. > This takes extra parameters in create command > example: {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}} > The response of the rest call > {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}} > must be in json format > eg: > {code:JavaScript} > { > “tag-x”:”x-val”, > “tag-y”:”y-val” > } > {code} > h3.ManagedSnitch > This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The > user should be able to manage the tags and values of each node through a > collection API > h2.Rules > This tells how many replicas for a given shard needs to be assigned to nodes > with the given key value pairs. These parameters will be passed on to the > collection CREATE api as a multivalued parameter "rule" . The values will be > saved in the state of the collection as follows > {code:Javascript} > { > “mycollection”:{ > “snitch”: { > type:“EC2Snitch” > } > “rules”:[ >{“shard”: “value1”, “replica”: “value2”, "tag1":"val1"}, >{“shard”: “value1”, “replica”: “value2”, "tag2":"val2"} >] > } > {code} > A rule is specified as a pseudo JSON syntax . which is a map of keys and > values > *Each collection can have any number of rules. As long as the rules do not > conflict with each other it should be OK. Or else an error is thrown > * In each rule , shard and replica can be omitted > ** default value of replica is {{\*}} means ANY or you can specify a count > and an operand such as {{+}} or {{-}} > ** and the value of shard can be a shard name or {{\*}} means EACH or > {{**}} means ANY. default value is {{\*\*}} (ANY) > * There should be exactly one extra condition in a rule other than {{shard}} > and {{replica}}. > * all keys other than {{shard}} and {{replica}} are called tags and the tags > are nothing but values provided by the snitch for each node > * By default certain tags such as {{node}}, {{host}}, {{port}} are provided > by the system implicitly > Examples: > {noformat} > //in each rack there can be max two replicas of A given shard > {rack:*,shard:*,replica:2-} > //in each rack there can be max two replicas of ANY replica > {rack:*,shard:**,replica:2-} > {rack:*,replica:2-} > //in each node there should be a max one replica of EACH shard > {node:*,shard:*,replica:1-} > //in each node there should be a max one replica of ANY shard > {node:*,shard:**,replica:1-} > {node:*,replica:1-} > > //In rack 738 and shard=shard1, there can be a max 0 replica > {rack:738,shard:shard1,replica:0-} > > //All replicas of shard1 should go to rack 730 > {shard:shard1,replica:*,rack:730} > {shard:shard1
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Description: h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch prefix . eg: snitch.type=EC2Snitch. The system provides the following implicit tag names which cannot be used by other snitches * node : The solr nodename * host : The hostname * ip : The ip address of the host * cores : This is a dynamic varibale which gives the core count at any given point * disk : This is a dynamic variable which gives the available disk space at any given point There will a few snitches provided by the system such as h3.EC2Snitch Provides two tags called dc, rack from the region and zone values in EC2 h3.IPSnitch Use the IP to infer the “dc” and “rack” values h3.NodePropertySnitch This lets users provide system properties to each node with tagname and value . example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this particular node will have two tags “tag-x” and “tag-y” . h3.RestSnitch Which lets the user configure a url which the server can invoke and get all the tags for a given node. This takes extra parameters in create command example: {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}} The response of the rest call {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}} must be in json format eg: {code:JavaScript} { “tag-x”:”x-val”, “tag-y”:”y-val” } {code} h3.ManagedSnitch This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The user should be able to manage the tags and values of each node through a collection API h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter "rule" . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { type:“EC2Snitch” } “rules”:[ {“shard”: “value1”, “replica”: “value2”, "tag1":"val1"}, {“shard”: “value1”, “replica”: “value2”, "tag2":"val2"} ] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{+}} or {{-}} ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly Examples: {noformat} //in each rack there can be max two replicas of A given shard {rack:*,shard:*,replica:2-} //in each rack there can be max two replicas of ANY replica {rack:*,shard:**,replica:2-} {rack:*,replica:2-} //in each node there should be a max one replica of EACH shard {node:*,shard:*,replica:1-} //in each node there should be a max one replica of ANY shard {node:*,shard:**,replica:1-} {node:*,replica:1-} //In rack 738 and shard=shard1, there can be a max 0 replica {rack:738,shard:shard1,replica:0-} //All replicas of shard1 should go to rack 730 {shard:shard1,replica:*,rack:730} {shard:shard1,rack:730} // all replicas must be created in a node with at least 20GB disk {replica:*,shard:*,disk:20+} {replica:*,disk:20+} {disk:20+} // All replicas should be created in nodes with less than 5 cores //In this ANY AND each for shard have same meaning {replica:*,shard:**,cores:5-} {replica:*,cores:5-} {cores:5-} //one replica of shard1 must go to node 192.168.1.2:8080_solr {node:”192.168.1.2:8080_solr”, shard:shard1, replica:1} //No replica of shard1 should go to rack 738 {rack:!738,shard:shard1,replica:*} {rack:!738,shard:shard1} //No replica of ANY shard should go to rack 738 {rack:!738,shard:**,replica:*} {r
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Description: h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch prefix . eg: snitch.type=EC2Snitch. The system provides the following implicit tag names which cannot be used by other snitches * node : The solr nodename * host : The hostname * ip : The ip address of the host * cores : This is a dynamic varibale which gives the core count at any given point * disk : This is a dynamic variable which gives the available disk space at any given point There will a few snitches provided by the system such as h3.EC2Snitch Provides two tags called dc, rack from the region and zone values in EC2 h3.IPSnitch Use the IP to infer the “dc” and “rack” values h3.NodePropertySnitch This lets users provide system properties to each node with tagname and value . example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this particular node will have two tags “tag-x” and “tag-y” . h3.RestSnitch Which lets the user configure a url which the server can invoke and get all the tags for a given node. This takes extra parameters in create command example: {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}} The response of the rest call {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}} must be in json format eg: {code:JavaScript} { “tag-x”:”x-val”, “tag-y”:”y-val” } {code} h3.ManagedSnitch This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The user should be able to manage the tags and values of each node through a collection API h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter "rule" . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { type:“EC2Snitch” } “rules”:[ {“key1”: “value1”, “key2”: “value2”}, {"key1" :"x", "a":"b"} ] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica are optional and have default values as {{\*}} * There should be atleast one non wild card {{\*}} condition in that rule and there should be at least one another condition other than tag than {{shard}} and {{replica}} * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly Examples: {noformat} //in each rack there can be max two replica of each shard {rack:*,shard:*,replica:2-} {rack:*,replica:2-} //in each node there should be a max one replica of each shard {node:*,shard:*,replica:1-} {node:*,replica:1-} //In rack 738 and shard=shard1, there can be a max 0 replica {rack:738,shard:shard1,replica:0-} //All replicas of shard1 should go to rack 730 {shard:shard1,replica:*,rack:730} {shard:shard1,rack:730} // all replicas must be created in a node with at least 20GB disk {replica:*,shard:*,disk:20+} {replica:*,disk:20+} {disk:20+} // All replicas should be created in nodes with less than 5 cores {replica:*,shard:*,cores:5-} {replica:*,cores:5-} {cores:5-} //one replica of shard1 must go to node 192.168.1.2:8080_solr {node:”192.168.1.2:8080_solr”, shard:shard1, replica:1} //No replica of shard1 should go to rack 738 {rack:!738,shard:shard1, replica:*} {rack:!738,shard:shard1} //No replica should go to rack 738 {rack:!738,shard:*,replica:*} {rack:!738,shard:*} {rack:!738} {noformat} In the collection create API all the placement rules are provided as a parameter called placement and multiple rules are separated with "|" example: {noformat} snitch={type:EC2Snitch}&rule={shard:*,replica:1,dc:dc1}&rule={shard:*,replica:2-,dc:dc3}&rule{shard:shard1,replica:*,rack:!738 {noformat} was: h1.Objective Most cloud based systems allow to specify rules on how the replicas
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Description: h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch prefix . eg: snitch.type=EC2Snitch. The system provides the following implicit tag names which cannot be used by other snitches * node : The solr nodename * host : The hostname * ip : The ip address of the host * cores : This is a dynamic varibale which gives the core count at any given point * disk : This is a dynamic variable which gives the available disk space at any given point There will a few snitches provided by the system such as h3.EC2Snitch Provides two tags called dc, rack from the region and zone values in EC2 h3.IPSnitch Use the IP to infer the “dc” and “rack” values h3.NodePropertySnitch This lets users provide system properties to each node with tagname and value . example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this particular node will have two tags “tag-x” and “tag-y” . h3.RestSnitch Which lets the user configure a url which the server can invoke and get all the tags for a given node. This takes extra parameters in create command example: {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}} The response of the rest call {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}} must be in json format eg: {code:JavaScript} { “tag-x”:”x-val”, “tag-y”:”y-val” } {code} h3.ManagedSnitch This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The user should be able to manage the tags and values of each node through a collection API h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter "rule" . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { type:“EC2Snitch” } “rules”:[ {“key1”: “value1”, “key2”: “value2”}, {"key1" :"x", "a":"b"} ] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica are optional and have default values as {{\*}} * There should be atleast one non wild card {{\*}} condition in that rule and there should be at least one another condition other than tag than {{shard}} and {{replica}} * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly Examples: {noformat} //in each rack there can be max two replica of each shard {rack:*,shard:*,replica:2-} {rack:*,replica:2-} //in each node there should be a max one replica of each shard {node:*,shard:*,replica:1-} //In rack 738 and shard=shard1, there can be a max 0 replica {rack:738,shard:shard1,replica:0-} //All replicas of shard1 should go to rack 730 {shard:shard1,replica:*,rack:730} {shard:shard1,rack:730} // all replicas must be created in a node with at least 20GB disk {replica:*,shard:*,disk:20+} {replica:*,disk:20+} {disk:20+} // All replicas should be created in nodes with less than 5 cores {replica:*,shard:*,cores:5-} {replica:*,cores:5-} {cores:5-} //one replica of shard1 must go to node 192.168.1.2:8080_solr {node:”192.168.1.2:8080_solr”, shard:shard1, replica:1} //No replica of shard1 should go to rack 738 {rack:!738,shard:shard1, replica:*} {rack:!738,shard:shard1} //No replica should go to rack 738 {rack:!738,shard:*,replica:*} {rack:!738,shard:*} {rack:!738} {noformat} In the collection create API all the placement rules are provided as a parameter called placement and multiple rules are separated with "|" example: {noformat} snitch={type:EC2Snitch}&rule={shard:*,replica:1,dc:dc1}&rule={shard:*,replica:2-,dc:dc3}&rule{shard:shard1,replica:*,rack:!738 {noformat} was: h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6220: - Description: h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch prefix . eg: snitch.type=EC2Snitch. The system provides the following implicit tag names which cannot be used by other snitches * node : The solr nodename * host : The hostname * ip : The ip address of the host * cores : This is a dynamic varibale which gives the core count at any given point * disk : This is a dynamic variable which gives the available disk space at any given point There will a few snitches provided by the system such as h3.EC2Snitch Provides two tags called dc, rack from the region and zone values in EC2 h3.IPSnitch Use the IP to infer the “dc” and “rack” values h3.NodePropertySnitch This lets users provide system properties to each node with tagname and value . example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this particular node will have two tags “tag-x” and “tag-y” . h3.RestSnitch Which lets the user configure a url which the server can invoke and get all the tags for a given node. This takes extra parameters in create command example: {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}} The response of the rest call {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}} must be in json format eg: {code:JavaScript} { “tag-x”:”x-val”, “tag-y”:”y-val” } {code} h3.ManagedSnitch This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The user should be able to manage the tags and values of each node through a collection API h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter "rule" . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { type:“EC2Snitch” } “rules”:[ {“key1”: “value1”, “key2”: “value2”}, {"key1" :"x", "a":"b"} ] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica are optional and have default values as {{\*}} * There should be atleast one non wild card {{\*}} condition in that rule and there should be at least one another condition other than tag than {{shard}} and {{replica}} * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly Examples: {noformat} //in each rack there can be max two replica of each shard {rack:*,shard:*,replica:2-} {rack:*,replica:2-} //in each node there should be a max one replica of each shard {node:*,shard:*,replica:1-} //In rack 738 and shard=shard1, there can be a max 0 replica {rack:738,shard:shard1,replica:0} //All replicas of shard1 should go to rack 730 {shard:shard1,replica:*,rack:730} {shard:shard1,rack:730} // all replicas must be created in a node with at least 20GB disk {replica:*,shard:*,disk:20+} {replica:*,disk:20+} {disk:20+} // All replicas should be created in nodes with less than 5 cores {replica:*,shard:*,cores:5-} {replica:*,cores:5-} {cores:5-} //one replica of shard1 must go to node 192.168.1.2:8080_solr {node:”192.168.1.2:8080_solr”, shard:shard1, replica:1} //No replica of shard1 should go to rack 738 {rack:!738,shard:shard1, replica:*} {rack:!738,shard:shard1} //No replica should go to rack 738 {rack:!738,shard:*,replica:*} {rack:!738,shard:*} {rack:!738} {noformat} In the collection create API all the placement rules are provided as a parameter called placement and multiple rules are separated with "|" example: {noformat} snitch={type:EC2Snitch}&rule={shard:*,replica:1,dc:dc1}&rule={shard:*,replica:2-,dc:dc3}&rule{shard:shard1,replica:*,rack:!738 {noformat} was: h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster
[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-6220: - Summary: Replica placement strategy for solrcloud (was: Replica placement startegy for solrcloud) > Replica placement strategy for solrcloud > > > Key: SOLR-6220 > URL: https://issues.apache.org/jira/browse/SOLR-6220 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > > h1.Objective > Most cloud based systems allow to specify rules on how the replicas/nodes of > a cluster are allocated . Solr should have a flexible mechanism through which > we should be able to control allocation of replicas or later change it to > suit the needs of the system > All configurations are per collection basis. The rules are applied whenever a > replica is created in any of the shards in a given collection during > * collection creation > * shard splitting > * add replica > * createsshard > There are two aspects to how replicas are placed: snitch and placement. > h2.snitch > How to identify the tags of nodes. Snitches are configured through collection > create command with the snitch prefix . eg: snitch.type=EC2Snitch. > The system provides the following implicit tag names which cannot be used by > other snitches > * node : The solr nodename > * host : The hostname > * ip : The ip address of the host > * cores : This is a dynamic varibale which gives the core count at any given > point > * disk : This is a dynamic variable which gives the available disk space at > any given point > There will a few snitches provided by the system such as > h3.EC2Snitch > Provides two tags called dc, rack from the region and zone values in EC2 > h3.IPSnitch > Use the IP to infer the “dc” and “rack” values > h3.NodePropertySnitch > This lets users provide system properties to each node with tagname and value > . > example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this > particular node will have two tags “tag-x” and “tag-y” . > > h3.RestSnitch > Which lets the user configure a url which the server can invoke and get all > the tags for a given node. > This takes extra parameters in create command > example: > {{snitch.type=RestSnitch&snitch.url=http://snitchserverhost:port?nodename={}}} > The response of the rest call > {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}} > must be in either json format or properties format. > eg: > {code:JavaScript} > { > “tag-x”:”x-val”, > “tag-y”:”y-val” > } > {code} > or > {noformat} > tag-x=x-val > tag-y=y-val > {noformat} > h3.ManagedSnitch > This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The > user should be able to manage the tags and values of each node through a > collection API > h2.Placement > This tells how many replicas for a given shard needs to be assigned to nodes > with the given key value pairs. These parameters will be passed on to the > collection CREATE api as a parameter "placement" . The values will be saved > in the state of the collection as follows > {code:Javascript} > { > “mycollection”:{ > “snitch”: { > type:“EC2Snitch” > } > “placement”:{ >“key1”: “value1”, >“key2”: “value2”, >} > } > {code} > A rule consists of 2 parts > * LHS or the qualifier .The format is > \{shardname}.\{replicacount}\{quantifier} . Use the wild card “*” for > qualifying all. quatifiers are > ** no value means . exactly equal. e.g: 2 means exactly 2 > ** "+" means greater than or equal . e.g : 2+means 2 or more > ** "\-" means less than. e.g 2- means , less than 2 > * RHS or conditions : The format is \{tagname}\{operand}\{value} . The tag > name and values are provided by the snitch. The supported operands are > ** -> : equals > ** >: greater than . Only applicable for numeric tags > ** < : less than , Only applicable to numeric tags > ** ! : NOT or not equals > Each collection can have any number of rules. As long as the rules do not > conflict with each other it should be OK. Or else an error is thrown > Example rules: > * “shard1.1”:“dc->dc1,rack->168” : This would assign exactly 1 replica for > shard1 with nodes having tags “dc=dc1,rack=168”. > * “shard1.1+”:“dc->dc1,rack->168” : Same as above but assigns atleast one > replica to the tag val combination > * “*.1”:“dc->dc1” : For all shards keep exactly one replica in dc:dc1 > * “*.1+”:”dc->dc2” : At least one replica needs to be in dc:dc2 > * “*.2-”:”dc->dc3” : Keep a maximum of 2 replicas in dc:dc3 for all shards > * “shard1.*”:”rack->730” : All replicas of shard1 will go to rack 730 > * “shard1.1”:“node->192.167.1.2:8983_solr” : 1 replica of shard1 must go to > the node 192.167.1.28983_solr > * “shard1.*