[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-06-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-6220:

Fix Version/s: Trunk

> Replica placement strategy for solrcloud
> 
>
> Key: SOLR-6220
> URL: https://issues.apache.org/jira/browse/SOLR-6220
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 5.2, Trunk
>
> Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
> SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch
>
>
> h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of 
> a cluster are allocated . Solr should have a flexible mechanism through which 
> we should be able to control allocation of replicas or later change it to 
> suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a 
> replica is created in any of the shards in a given collection during
>  * collection creation
>  * shard splitting
>  * add replica
>  * createsshard
> There are two aspects to how replicas are placed: snitch and placement. 
> h2.snitch 
> How to identify the tags of nodes. Snitches are configured through collection 
> create command with the snitch param  . eg: snitch=EC2Snitch or 
> snitch=class:EC2Snitch
> h2.ImplicitSnitch 
> This is shipped by default with Solr. user does not need to specify 
> {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
> present in the rules , it is automatically used,
> tags provided by ImplicitSnitch
> # cores :  No:of cores in the node
> # disk : Disk space available in the node 
> # host : host name of the node
> # node: node name 
> # D.* : These are values available from systrem propertes. {{D.key}} means a 
> value that is passed to the node as {{-Dkey=keyValue}} during the node 
> startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
> h2.Rules 
> This tells how many replicas for a given shard needs to be assigned to nodes 
> with the given key value pairs. These parameters will be passed on to the 
> collection CREATE api as a multivalued parameter  "rule" . The values will be 
> saved in the state of the collection as follows
> {code:Javascript}
> {
>  “mycollection”:{
>   “snitch”: {
>   class:“ImplicitSnitch”
> }
>   “rules”:[{"cores":"4-"}, 
>  {"replica":"1" ,"shard" :"*" ,"node":"*"},
>  {"disk":">100"}]
> }
> {code}
> A rule is specified as a pseudo JSON syntax . which is a map of keys and 
> values
> *Each collection can have any number of rules. As long as the rules do not 
> conflict with each other it should be OK. Or else an error is thrown
> * In each rule , shard and replica can be omitted
> ** default value of  replica is {{\*}} means ANY or you can specify a count 
> and an operand such as {{<}} (less than) or {{>}} (greater than)
> ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
> {{**}} means ANY.  default value is {{\*\*}} (ANY)
> * There should be exactly one extra condition in a rule other than {{shard}} 
> and {{replica}}.  
> * all keys other than {{shard}} and {{replica}} are called tags and the tags 
> are nothing but values provided by the snitch for each node
> * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
> by the system implicitly 
> h3.How are nodes picked up? 
> Nodes are not picked up in random. The rules are used to first sort the nodes 
> according to affinity. For example, if there is a rule that says 
> {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
> if the rule is {{disk:100-}} nodes with lesser disk space will be given 
> priority. If everything else is equal , nodes with fewer cores are given 
> higher priority
> h3.Fuzzy match
> Fuzzy match can be applied when strict matches fail .The values can be 
> prefixed {{~}} to specify fuzziness
> example rule
> {noformat}
>  #Example requirement "use only one replica of a shard in a host if possible, 
> if no matches found , relax that rule". 
> rack:*,shard:*,replica:<2~
> #Another example, assign all replicas to nodes with disk space of 100GB or 
> more,, or relax the rule if not possible. This will ensure that if a node 
> does not exist with 100GB disk, nodes are picked up the order of size say a 
> 85GB node would be picked up over 80GB disk node
> disk:>100~
> {noformat}
> Examples:
> {noformat}
> #in each rack there can be max two replicas of A given shard
>  rack:*,shard:*,replica:<3
> //in each rack there can be max two replicas of ANY replica
>  rack:*,shard:**,replica:2
>  rack:*,replica:<3
>  #in each node there should be a max one replica of EACH shard
>  node:*,shard:*,replica:1-
>  #in each node 

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-28 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Attachment: (was: SOLR-6220.patch)

> Replica placement strategy for solrcloud
> 
>
> Key: SOLR-6220
> URL: https://issues.apache.org/jira/browse/SOLR-6220
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
> SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch
>
>
> h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of 
> a cluster are allocated . Solr should have a flexible mechanism through which 
> we should be able to control allocation of replicas or later change it to 
> suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a 
> replica is created in any of the shards in a given collection during
>  * collection creation
>  * shard splitting
>  * add replica
>  * createsshard
> There are two aspects to how replicas are placed: snitch and placement. 
> h2.snitch 
> How to identify the tags of nodes. Snitches are configured through collection 
> create command with the snitch param  . eg: snitch=EC2Snitch or 
> snitch=class:EC2Snitch
> h2.ImplicitSnitch 
> This is shipped by default with Solr. user does not need to specify 
> {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
> present in the rules , it is automatically used,
> tags provided by ImplicitSnitch
> # cores :  No:of cores in the node
> # disk : Disk space available in the node 
> # host : host name of the node
> # node: node name 
> # D.* : These are values available from systrem propertes. {{D.key}} means a 
> value that is passed to the node as {{-Dkey=keyValue}} during the node 
> startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
> h2.Rules 
> This tells how many replicas for a given shard needs to be assigned to nodes 
> with the given key value pairs. These parameters will be passed on to the 
> collection CREATE api as a multivalued parameter  "rule" . The values will be 
> saved in the state of the collection as follows
> {code:Javascript}
> {
>  “mycollection”:{
>   “snitch”: {
>   class:“ImplicitSnitch”
> }
>   “rules”:[{"cores":"4-"}, 
>  {"replica":"1" ,"shard" :"*" ,"node":"*"},
>  {"disk":">100"}]
> }
> {code}
> A rule is specified as a pseudo JSON syntax . which is a map of keys and 
> values
> *Each collection can have any number of rules. As long as the rules do not 
> conflict with each other it should be OK. Or else an error is thrown
> * In each rule , shard and replica can be omitted
> ** default value of  replica is {{\*}} means ANY or you can specify a count 
> and an operand such as {{<}} (less than) or {{>}} (greater than)
> ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
> {{**}} means ANY.  default value is {{\*\*}} (ANY)
> * There should be exactly one extra condition in a rule other than {{shard}} 
> and {{replica}}.  
> * all keys other than {{shard}} and {{replica}} are called tags and the tags 
> are nothing but values provided by the snitch for each node
> * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
> by the system implicitly 
> h3.How are nodes picked up? 
> Nodes are not picked up in random. The rules are used to first sort the nodes 
> according to affinity. For example, if there is a rule that says 
> {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
> if the rule is {{disk:100-}} nodes with lesser disk space will be given 
> priority. If everything else is equal , nodes with fewer cores are given 
> higher priority
> h3.Fuzzy match
> Fuzzy match can be applied when strict matches fail .The values can be 
> prefixed {{~}} to specify fuzziness
> example rule
> {noformat}
>  #Example requirement "use only one replica of a shard in a host if possible, 
> if no matches found , relax that rule". 
> rack:*,shard:*,replica:<2~
> #Another example, assign all replicas to nodes with disk space of 100GB or 
> more,, or relax the rule if not possible. This will ensure that if a node 
> does not exist with 100GB disk, nodes are picked up the order of size say a 
> 85GB node would be picked up over 80GB disk node
> disk:>100~
> {noformat}
> Examples:
> {noformat}
> #in each rack there can be max two replicas of A given shard
>  rack:*,shard:*,replica:<3
> //in each rack there can be max two replicas of ANY replica
>  rack:*,shard:**,replica:2
>  rack:*,replica:<3
>  #in each node there should be a max one replica of EACH shard
>  node:*,shard:*,replica:1-
>  #in each node there should be a max one replica of ANY

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-28 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Attachment: SOLR-6220.patch

> Replica placement strategy for solrcloud
> 
>
> Key: SOLR-6220
> URL: https://issues.apache.org/jira/browse/SOLR-6220
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
> SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
> SOLR-6220.patch
>
>
> h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of 
> a cluster are allocated . Solr should have a flexible mechanism through which 
> we should be able to control allocation of replicas or later change it to 
> suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a 
> replica is created in any of the shards in a given collection during
>  * collection creation
>  * shard splitting
>  * add replica
>  * createsshard
> There are two aspects to how replicas are placed: snitch and placement. 
> h2.snitch 
> How to identify the tags of nodes. Snitches are configured through collection 
> create command with the snitch param  . eg: snitch=EC2Snitch or 
> snitch=class:EC2Snitch
> h2.ImplicitSnitch 
> This is shipped by default with Solr. user does not need to specify 
> {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
> present in the rules , it is automatically used,
> tags provided by ImplicitSnitch
> # cores :  No:of cores in the node
> # disk : Disk space available in the node 
> # host : host name of the node
> # node: node name 
> # D.* : These are values available from systrem propertes. {{D.key}} means a 
> value that is passed to the node as {{-Dkey=keyValue}} during the node 
> startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
> h2.Rules 
> This tells how many replicas for a given shard needs to be assigned to nodes 
> with the given key value pairs. These parameters will be passed on to the 
> collection CREATE api as a multivalued parameter  "rule" . The values will be 
> saved in the state of the collection as follows
> {code:Javascript}
> {
>  “mycollection”:{
>   “snitch”: {
>   class:“ImplicitSnitch”
> }
>   “rules”:[{"cores":"4-"}, 
>  {"replica":"1" ,"shard" :"*" ,"node":"*"},
>  {"disk":">100"}]
> }
> {code}
> A rule is specified as a pseudo JSON syntax . which is a map of keys and 
> values
> *Each collection can have any number of rules. As long as the rules do not 
> conflict with each other it should be OK. Or else an error is thrown
> * In each rule , shard and replica can be omitted
> ** default value of  replica is {{\*}} means ANY or you can specify a count 
> and an operand such as {{<}} (less than) or {{>}} (greater than)
> ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
> {{**}} means ANY.  default value is {{\*\*}} (ANY)
> * There should be exactly one extra condition in a rule other than {{shard}} 
> and {{replica}}.  
> * all keys other than {{shard}} and {{replica}} are called tags and the tags 
> are nothing but values provided by the snitch for each node
> * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
> by the system implicitly 
> h3.How are nodes picked up? 
> Nodes are not picked up in random. The rules are used to first sort the nodes 
> according to affinity. For example, if there is a rule that says 
> {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
> if the rule is {{disk:100-}} nodes with lesser disk space will be given 
> priority. If everything else is equal , nodes with fewer cores are given 
> higher priority
> h3.Fuzzy match
> Fuzzy match can be applied when strict matches fail .The values can be 
> prefixed {{~}} to specify fuzziness
> example rule
> {noformat}
>  #Example requirement "use only one replica of a shard in a host if possible, 
> if no matches found , relax that rule". 
> rack:*,shard:*,replica:<2~
> #Another example, assign all replicas to nodes with disk space of 100GB or 
> more,, or relax the rule if not possible. This will ensure that if a node 
> does not exist with 100GB disk, nodes are picked up the order of size say a 
> 85GB node would be picked up over 80GB disk node
> disk:>100~
> {noformat}
> Examples:
> {noformat}
> #in each rack there can be max two replicas of A given shard
>  rack:*,shard:*,replica:<3
> //in each rack there can be max two replicas of ANY replica
>  rack:*,shard:**,replica:2
>  rack:*,replica:<3
>  #in each node there should be a max one replica of EACH shard
>  node:*,shard:*,replica:1-
>  #in each node there should be a max one repli

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-28 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Attachment: SOLR-6220.patch

Updated patch to trunk

> Replica placement strategy for solrcloud
> 
>
> Key: SOLR-6220
> URL: https://issues.apache.org/jira/browse/SOLR-6220
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
> SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch
>
>
> h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of 
> a cluster are allocated . Solr should have a flexible mechanism through which 
> we should be able to control allocation of replicas or later change it to 
> suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a 
> replica is created in any of the shards in a given collection during
>  * collection creation
>  * shard splitting
>  * add replica
>  * createsshard
> There are two aspects to how replicas are placed: snitch and placement. 
> h2.snitch 
> How to identify the tags of nodes. Snitches are configured through collection 
> create command with the snitch param  . eg: snitch=EC2Snitch or 
> snitch=class:EC2Snitch
> h2.ImplicitSnitch 
> This is shipped by default with Solr. user does not need to specify 
> {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
> present in the rules , it is automatically used,
> tags provided by ImplicitSnitch
> # cores :  No:of cores in the node
> # disk : Disk space available in the node 
> # host : host name of the node
> # node: node name 
> # D.* : These are values available from systrem propertes. {{D.key}} means a 
> value that is passed to the node as {{-Dkey=keyValue}} during the node 
> startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
> h2.Rules 
> This tells how many replicas for a given shard needs to be assigned to nodes 
> with the given key value pairs. These parameters will be passed on to the 
> collection CREATE api as a multivalued parameter  "rule" . The values will be 
> saved in the state of the collection as follows
> {code:Javascript}
> {
>  “mycollection”:{
>   “snitch”: {
>   class:“ImplicitSnitch”
> }
>   “rules”:[{"cores":"4-"}, 
>  {"replica":"1" ,"shard" :"*" ,"node":"*"},
>  {"disk":">100"}]
> }
> {code}
> A rule is specified as a pseudo JSON syntax . which is a map of keys and 
> values
> *Each collection can have any number of rules. As long as the rules do not 
> conflict with each other it should be OK. Or else an error is thrown
> * In each rule , shard and replica can be omitted
> ** default value of  replica is {{\*}} means ANY or you can specify a count 
> and an operand such as {{<}} (less than) or {{>}} (greater than)
> ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
> {{**}} means ANY.  default value is {{\*\*}} (ANY)
> * There should be exactly one extra condition in a rule other than {{shard}} 
> and {{replica}}.  
> * all keys other than {{shard}} and {{replica}} are called tags and the tags 
> are nothing but values provided by the snitch for each node
> * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
> by the system implicitly 
> h3.How are nodes picked up? 
> Nodes are not picked up in random. The rules are used to first sort the nodes 
> according to affinity. For example, if there is a rule that says 
> {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
> if the rule is {{disk:100-}} nodes with lesser disk space will be given 
> priority. If everything else is equal , nodes with fewer cores are given 
> higher priority
> h3.Fuzzy match
> Fuzzy match can be applied when strict matches fail .The values can be 
> prefixed {{~}} to specify fuzziness
> example rule
> {noformat}
>  #Example requirement "use only one replica of a shard in a host if possible, 
> if no matches found , relax that rule". 
> rack:*,shard:*,replica:<2~
> #Another example, assign all replicas to nodes with disk space of 100GB or 
> more,, or relax the rule if not possible. This will ensure that if a node 
> does not exist with 100GB disk, nodes are picked up the order of size say a 
> 85GB node would be picked up over 80GB disk node
> disk:>100~
> {noformat}
> Examples:
> {noformat}
> #in each rack there can be max two replicas of A given shard
>  rack:*,shard:*,replica:<3
> //in each rack there can be max two replicas of ANY replica
>  rack:*,shard:**,replica:2
>  rack:*,replica:<3
>  #in each node there should be a max one replica of EACH shard
>  node:*,shard:*,replica:1-
>  #in each node there should be a max one r

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-22 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Attachment: SOLR-6220.patch

More tests 

> Replica placement strategy for solrcloud
> 
>
> Key: SOLR-6220
> URL: https://issues.apache.org/jira/browse/SOLR-6220
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
> SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch
>
>
> h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of 
> a cluster are allocated . Solr should have a flexible mechanism through which 
> we should be able to control allocation of replicas or later change it to 
> suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a 
> replica is created in any of the shards in a given collection during
>  * collection creation
>  * shard splitting
>  * add replica
>  * createsshard
> There are two aspects to how replicas are placed: snitch and placement. 
> h2.snitch 
> How to identify the tags of nodes. Snitches are configured through collection 
> create command with the snitch param  . eg: snitch=EC2Snitch or 
> snitch=class:EC2Snitch
> h2.ImplicitSnitch 
> This is shipped by default with Solr. user does not need to specify 
> {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
> present in the rules , it is automatically used,
> tags provided by ImplicitSnitch
> # cores :  No:of cores in the node
> # disk : Disk space available in the node 
> # host : host name of the node
> # node: node name 
> # D.* : These are values available from systrem propertes. {{D.key}} means a 
> value that is passed to the node as {{-Dkey=keyValue}} during the node 
> startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
> h2.Rules 
> This tells how many replicas for a given shard needs to be assigned to nodes 
> with the given key value pairs. These parameters will be passed on to the 
> collection CREATE api as a multivalued parameter  "rule" . The values will be 
> saved in the state of the collection as follows
> {code:Javascript}
> {
>  “mycollection”:{
>   “snitch”: {
>   class:“ImplicitSnitch”
> }
>   “rules”:[{"cores":"4-"}, 
>  {"replica":"1" ,"shard" :"*" ,"node":"*"},
>  {"disk":">100"}]
> }
> {code}
> A rule is specified as a pseudo JSON syntax . which is a map of keys and 
> values
> *Each collection can have any number of rules. As long as the rules do not 
> conflict with each other it should be OK. Or else an error is thrown
> * In each rule , shard and replica can be omitted
> ** default value of  replica is {{\*}} means ANY or you can specify a count 
> and an operand such as {{<}} (less than) or {{>}} (greater than)
> ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
> {{**}} means ANY.  default value is {{\*\*}} (ANY)
> * There should be exactly one extra condition in a rule other than {{shard}} 
> and {{replica}}.  
> * all keys other than {{shard}} and {{replica}} are called tags and the tags 
> are nothing but values provided by the snitch for each node
> * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
> by the system implicitly 
> h3.How are nodes picked up? 
> Nodes are not picked up in random. The rules are used to first sort the nodes 
> according to affinity. For example, if there is a rule that says 
> {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
> if the rule is {{disk:100-}} nodes with lesser disk space will be given 
> priority. If everything else is equal , nodes with fewer cores are given 
> higher priority
> h3.Fuzzy match
> Fuzzy match can be applied when strict matches fail .The values can be 
> prefixed {{~}} to specify fuzziness
> example rule
> {noformat}
>  #Example requirement "use only one replica of a shard in a host if possible, 
> if no matches found , relax that rule". 
> rack:*,shard:*,replica:<2~
> #Another example, assign all replicas to nodes with disk space of 100GB or 
> more,, or relax the rule if not possible. This will ensure that if a node 
> does not exist with 100GB disk, nodes are picked up the order of size say a 
> 85GB node would be picked up over 80GB disk node
> disk:>100~
> {noformat}
> Examples:
> {noformat}
> #in each rack there can be max two replicas of A given shard
>  rack:*,shard:*,replica:<3
> //in each rack there can be max two replicas of ANY replica
>  rack:*,shard:**,replica:2
>  rack:*,replica:<3
>  #in each node there should be a max one replica of EACH shard
>  node:*,shard:*,replica:1-
>  #in each node there should be a max one replica of ANY shard
>  node:

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-22 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Description: 
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a 
cluster are allocated . Solr should have a flexible mechanism through which we 
should be able to control allocation of replicas or later change it to suit the 
needs of the system

All configurations are per collection basis. The rules are applied whenever a 
replica is created in any of the shards in a given collection during

 * collection creation
 * shard splitting
 * add replica
 * createsshard

There are two aspects to how replicas are placed: snitch and placement. 

h2.snitch 
How to identify the tags of nodes. Snitches are configured through collection 
create command with the snitch param  . eg: snitch=EC2Snitch or 
snitch=class:EC2Snitch


h2.ImplicitSnitch 
This is shipped by default with Solr. user does not need to specify 
{{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
present in the rules , it is automatically used,
tags provided by ImplicitSnitch
# cores :  No:of cores in the node
# disk : Disk space available in the node 
# host : host name of the node
# node: node name 
# D.* : These are values available from systrem propertes. {{D.key}} means a 
value that is passed to the node as {{-Dkey=keyValue}} during the node startup. 
It is possible to use rules like {{D.key:expectedVal,shard:*}}

h2.Rules 

This tells how many replicas for a given shard needs to be assigned to nodes 
with the given key value pairs. These parameters will be passed on to the 
collection CREATE api as a multivalued parameter  "rule" . The values will be 
saved in the state of the collection as follows
{code:Javascript}
{
 “mycollection”:{
  “snitch”: {
  class:“ImplicitSnitch”
}
  “rules”:[{"cores":"4-"}, 
 {"replica":"1" ,"shard" :"*" ,"node":"*"},
 {"disk":">100"}]
}
{code}

A rule is specified as a pseudo JSON syntax . which is a map of keys and values
*Each collection can have any number of rules. As long as the rules do not 
conflict with each other it should be OK. Or else an error is thrown
* In each rule , shard and replica can be omitted
** default value of  replica is {{\*}} means ANY or you can specify a count and 
an operand such as {{<}} (less than) or {{>}} (greater than)
** and the value of shard can be a shard name or  {{\*}} means EACH  or {{**}} 
means ANY.  default value is {{\*\*}} (ANY)
* There should be exactly one extra condition in a rule other than {{shard}} 
and {{replica}}.  
* all keys other than {{shard}} and {{replica}} are called tags and the tags 
are nothing but values provided by the snitch for each node
* By default certain tags such as {{node}}, {{host}}, {{port}} are provided by 
the system implicitly 

h3.How are nodes picked up? 
Nodes are not picked up in random. The rules are used to first sort the nodes 
according to affinity. For example, if there is a rule that says {{disk:100+}} 
, nodes with  more disk space are given higher preference.  And if the rule is 
{{disk:100-}} nodes with lesser disk space will be given priority. If 
everything else is equal , nodes with fewer cores are given higher priority

h3.Fuzzy match
Fuzzy match can be applied when strict matches fail .The values can be prefixed 
{{~}} to specify fuzziness

example rule
{noformat}
 #Example requirement "use only one replica of a shard in a host if possible, 
if no matches found , relax that rule". 
rack:*,shard:*,replica:<2~

#Another example, assign all replicas to nodes with disk space of 100GB or 
more,, or relax the rule if not possible. This will ensure that if a node does 
not exist with 100GB disk, nodes are picked up the order of size say a 85GB 
node would be picked up over 80GB disk node
disk:>100~
{noformat}
Examples:
{noformat}
#in each rack there can be max two replicas of A given shard
 rack:*,shard:*,replica:<3
//in each rack there can be max two replicas of ANY replica
 rack:*,shard:**,replica:2
 rack:*,replica:<3

 #in each node there should be a max one replica of EACH shard
 node:*,shard:*,replica:1-
 #in each node there should be a max one replica of ANY shard
 node:*,shard:**,replica:1-
 node:*,replica:1-
 
#In rack 738 and shard=shard1, there can be a max 0 replica
 rack:738,shard:shard1,replica:<1
 
 #All replicas of shard1 should go to rack 730
 shard:shard1,replica:*,rack:730
 shard:shard1,rack:730

 #all replicas must be created in a node with at least 20GB disk
 replica:*,shard:*,disk:>20
 replica:*,disk:>20
 disk:>20
#All replicas should be created in nodes with less than 5 cores
#In this ANY AND each for shard have same meaning
replica:*,shard:**,cores:<5
replica:*,cores:<5
cores:<5
#one replica of shard1 must go to node 192.168.1.2:8080_solr
node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 
#No replica of shard1 sho

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-22 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Attachment: SOLR-6220.patch

Operators {{+}} and {{-}} replaced with {{<}} and {{>}}
This is now feature complete.
I'll add some more tests and commit this

> Replica placement strategy for solrcloud
> 
>
> Key: SOLR-6220
> URL: https://issues.apache.org/jira/browse/SOLR-6220
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
> SOLR-6220.patch, SOLR-6220.patch
>
>
> h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of 
> a cluster are allocated . Solr should have a flexible mechanism through which 
> we should be able to control allocation of replicas or later change it to 
> suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a 
> replica is created in any of the shards in a given collection during
>  * collection creation
>  * shard splitting
>  * add replica
>  * createsshard
> There are two aspects to how replicas are placed: snitch and placement. 
> h2.snitch 
> How to identify the tags of nodes. Snitches are configured through collection 
> create command with the snitch param  . eg: snitch=EC2Snitch or 
> snitch=class:EC2Snitch
> h2.ImplicitSnitch 
> This is shipped by default with Solr. user does not need to specify 
> {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
> present in the rules , it is automatically used,
> tags provided by ImplicitSnitch
> # cores :  No:of cores in the node
> # disk : Disk space available in the node 
> # host : host name of the node
> # node: node name 
> # D.* : These are values available from systrem propertes. {{D.key}} means a 
> value that is passed to the node as {{-Dkey=keyValue}} during the node 
> startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
> h2.Rules 
> This tells how many replicas for a given shard needs to be assigned to nodes 
> with the given key value pairs. These parameters will be passed on to the 
> collection CREATE api as a multivalued parameter  "rule" . The values will be 
> saved in the state of the collection as follows
> {code:Javascript}
> {
>  “mycollection”:{
>   “snitch”: {
>   class:“ImplicitSnitch”
> }
>   “rules”:[{"cores":"4-"}, 
>  {"replica":"1" ,"shard" :"*" ,"node":"*"},
>  {"disk":"1+"}]ol < 2
> }
> {code}
> A rule is specified as a pseudo JSON syntax . which is a map of keys and 
> values
> *Each collection can have any number of rules. As long as the rules do not 
> conflict with each other it should be OK. Or else an error is thrown
> * In each rule , shard and replica can be omitted
> ** default value of  replica is {{\*}} means ANY or you can specify a count 
> and an operand such as {{+}} or {{-}}
> ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
> {{**}} means ANY.  default value is {{\*\*}} (ANY)
> * There should be exactly one extra condition in a rule other than {{shard}} 
> and {{replica}}.  
> * all keys other than {{shard}} and {{replica}} are called tags and the tags 
> are nothing but values provided by the snitch for each node
> * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
> by the system implicitly 
> h3.How are nodes picked up? 
> Nodes are not picked up in random. The rules are used to first sort the nodes 
> according to affinity. For example, if there is a rule that says 
> {{disk:100+}} , nodes with  more disk space are given higher preference.  And 
> if the rule is {{disk:100-}} nodes with lesser disk space will be given 
> priority. If everything else is equal , nodes with fewer cores are given 
> higher priority
> h3.Fuzzy match
> Fuzzy match can be applied when strict matches fail .The values can be 
> prefixed {{~}} to specify fuzziness
> example rule
> {noformat}
>  #Example requirement "use only one replica of a shard in a host if possible, 
> if no matches found , relax that rule". 
> rack:*,shard:*,replica:~1-
> #Another example, assign all replicas to nodes with disk space of 100GB or 
> more,, or relax the rule if not possible. This will ensure that if a node 
> does not exist with 100GB disk, nodes are picked up the order of size say a 
> 85GB node would be picked up over 80GB disk node
> disk:~100+
> {noformat}
> Examples:
> {noformat}
> #in each rack there can be max two replicas of A given shard
>  rack:*,shard:*,replica:2-
> //in each rack there can be max two replicas of ANY replica
>  rack:*,shard:**,replica:2
>  rack:*,replica:2-
>  #in each node there should be a max one replica of EACH shard
>  node:*,shard:*,replica:

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-21 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Description: 
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a 
cluster are allocated . Solr should have a flexible mechanism through which we 
should be able to control allocation of replicas or later change it to suit the 
needs of the system

All configurations are per collection basis. The rules are applied whenever a 
replica is created in any of the shards in a given collection during

 * collection creation
 * shard splitting
 * add replica
 * createsshard

There are two aspects to how replicas are placed: snitch and placement. 

h2.snitch 
How to identify the tags of nodes. Snitches are configured through collection 
create command with the snitch param  . eg: snitch=EC2Snitch or 
snitch=class:EC2Snitch


h2.ImplicitSnitch 
This is shipped by default with Solr. user does not need to specify 
{{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
present in the rules , it is automatically used,
tags provided by ImplicitSnitch
# cores :  No:of cores in the node
# disk : Disk space available in the node 
# host : host name of the node
# node: node name 
# D.* : These are values available from systrem propertes. {{D.key}} means a 
value that is passed to the node as {{-Dkey=keyValue}} during the node startup. 
It is possible to use rules like {{D.key:expectedVal,shard:*}}

h2.Rules 

This tells how many replicas for a given shard needs to be assigned to nodes 
with the given key value pairs. These parameters will be passed on to the 
collection CREATE api as a multivalued parameter  "rule" . The values will be 
saved in the state of the collection as follows
{code:Javascript}
{
 “mycollection”:{
  “snitch”: {
  class:“ImplicitSnitch”
}
  “rules”:[{"cores":"4-"}, 
 {"replica":"1" ,"shard" :"*" ,"node":"*"},
 {"disk":"1+"}]ol < 2
}
{code}

A rule is specified as a pseudo JSON syntax . which is a map of keys and values
*Each collection can have any number of rules. As long as the rules do not 
conflict with each other it should be OK. Or else an error is thrown
* In each rule , shard and replica can be omitted
** default value of  replica is {{\*}} means ANY or you can specify a count and 
an operand such as {{+}} or {{-}}
** and the value of shard can be a shard name or  {{\*}} means EACH  or {{**}} 
means ANY.  default value is {{\*\*}} (ANY)
* There should be exactly one extra condition in a rule other than {{shard}} 
and {{replica}}.  
* all keys other than {{shard}} and {{replica}} are called tags and the tags 
are nothing but values provided by the snitch for each node
* By default certain tags such as {{node}}, {{host}}, {{port}} are provided by 
the system implicitly 

h3.How are nodes picked up? 
Nodes are not picked up in random. The rules are used to first sort the nodes 
according to affinity. For example, if there is a rule that says {{disk:100+}} 
, nodes with  more disk space are given higher preference.  And if the rule is 
{{disk:100-}} nodes with lesser disk space will be given priority. If 
everything else is equal , nodes with fewer cores are given higher priority

h3.Fuzzy match
Fuzzy match can be applied when strict matches fail .The values can be prefixed 
{{~}} to specify fuzziness

example rule
{noformat}
 #Example requirement "use only one replica of a shard in a host if possible, 
if no matches found , relax that rule". 
rack:*,shard:*,replica:~1-

#Another example, assign all replicas to nodes with disk space of 100GB or 
more,, or relax the rule if not possible. This will ensure that if a node does 
not exist with 100GB disk, nodes are picked up the order of size say a 85GB 
node would be picked up over 80GB disk node
disk:~100+
{noformat}
Examples:
{noformat}
#in each rack there can be max two replicas of A given shard
 rack:*,shard:*,replica:2-
//in each rack there can be max two replicas of ANY replica
 rack:*,shard:**,replica:2
 rack:*,replica:2-

 #in each node there should be a max one replica of EACH shard
 node:*,shard:*,replica:1-
 #in each node there should be a max one replica of ANY shard
 node:*,shard:**,replica:1-
 node:*,replica:1-
 
#In rack 738 and shard=shard1, there can be a max 0 replica
 rack:738,shard:shard1,replica:0-
 
 #All replicas of shard1 should go to rack 730
 shard:shard1,replica:*,rack:730
 shard:shard1,rack:730

 #all replicas must be created in a node with at least 20GB disk
 replica:*,shard:*,disk:20+
 replica:*,disk:20+
 disk:20+
#All replicas should be created in nodes with less than 5 cores
#In this ANY AND each for shard have same meaning
replica:*,shard:**,cores:5-
replica:*,cores:5-
cores:5-
#one replica of shard1 must go to node 192.168.1.2:8080_solr
node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 
#No replica of shard1 should go to rack 738
rack

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-21 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Description: 
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a 
cluster are allocated . Solr should have a flexible mechanism through which we 
should be able to control allocation of replicas or later change it to suit the 
needs of the system

All configurations are per collection basis. The rules are applied whenever a 
replica is created in any of the shards in a given collection during

 * collection creation
 * shard splitting
 * add replica
 * createsshard

There are two aspects to how replicas are placed: snitch and placement. 

h2.snitch 
How to identify the tags of nodes. Snitches are configured through collection 
create command with the snitch param  . eg: snitch=EC2Snitch or 
snitch=class:EC2Snitch


h2.ImplicitSnitch 
This is shipped by default with Solr. user does not need to specify 
{{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
present in the rules , it is automatically used,
tags provided by ImplicitSnitch
# cores :  No:of cores in the node
# disk : Disk cpace available in the nodeol < 2
# host : host name of the node
# node: node name 
# D.* : These are values available from systrem propertes. {{D.key}} means a 
value that is passed to the node as {{-Dkey=keyValue}} during the node startup. 
It is possible to use rules like {{D.key:expectedVal,shard:*}}

h2.Rules 

This tells how many replicas for a given shard needs to be assigned to nodes 
with the given key value pairs. These parameters will be passed on to the 
collection CREATE api as a multivalued parameter  "rule" . The values will be 
saved in the state of the collection as follows
{code:Javascript}
{
 “mycollection”:{
  “snitch”: {
  class:“ImplicitSnitch”
}
  “rules”:[{"cores":"4-"}, 
 {"replica":"1" ,"shard" :"*" ,"node":"*"},
 {"disk":"1+"}]ol < 2
}
{code}

A rule is specified as a pseudo JSON syntax . which is a map of keys and values
*Each collection can have any number of rules. As long as the rules do not 
conflict with each other it should be OK. Or else an error is thrown
* In each rule , shard and replica can be omitted
** default value of  replica is {{\*}} means ANY or you can specify a count and 
an operand such as {{+}} or {{-}}
** and the value of shard can be a shard name or  {{\*}} means EACH  or {{**}} 
means ANY.  default value is {{\*\*}} (ANY)
* There should be exactly one extra condition in a rule other than {{shard}} 
and {{replica}}.  
* all keys other than {{shard}} and {{replica}} are called tags and the tags 
are nothing but values provided by the snitch for each node
* By default certain tags such as {{node}}, {{host}}, {{port}} are provided by 
the system implicitly 

h3.How are nodes picked up? 
Nodes are not picked up in random. The rules are used to first sort the nodes 
according to affinol < 2ity. For example, if there is a rule that says 
{{disk:100+}} , nodes with 100GB or more are given higher preference. If all 
alse are equal , nodes with fewer cores are given higher priority

h3.Fuzzy match
Fuzzy match can be applied when strict matches fail .The values can be prefixed 
{{~}} to specify fuzziness

example rule
{noformat}
 #Example requirement "use only one replica of a shard in a host if possible, 
if no matches found , relax that rule". 
rack:*,shard:*,replica:~1-

#Another example, assign all replicas to nodes with disk space of 100GB or 
more,, or relax the rule if not possible. This will ensure that if a node does 
not exist with 100GB disk, nodes are picked up the order of size say a 85GB 
node would be picked up over 80GB disk node
disk:~100+
{noformat}
Examples:
{noformat}
#in each rack there can be max two replicas of A given shard
 rack:*,shard:*,replica:2-
//in each rack there can be max two replicas of ANY replica
 rack:*,shard:**,replica:2
 rack:*,replica:2-

 #in each node there should be a max one replica of EACH shard
 node:*,shard:*,replica:1-
 #in each node there should be a max one replica of ANY shard
 node:*,shard:**,replica:1-
 node:*,replica:1-
 
#In rack 738 and shard=shard1, there can be a max 0 replica
 rack:738,shard:shard1,replica:0-
 
 #All replicas of shard1 should go to rack 730
 shard:shard1,replica:*,rack:730
 shard:shard1,rack:730

 #all replicas must be created in a node with at least 20GB disk
 replica:*,shard:*,disk:20+
 replica:*,disk:20+
 disk:20+
#All replicas should be created in nodes with less than 5 cores
#In this ANY AND each for shard have same meaning
replica:*,shard:**,cores:5-
replica:*,cores:5-
cores:5-
#one replica of shard1 must go to node 192.168.1.2:8080_solr
node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 
#No replica of shard1 should go to rack 738
rack:!738,shard:shard1,replica:*
rack:!738,shard:shard1
#No replica  of ANY shard should g

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-21 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Description: 
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a 
cluster are allocated . Solr should have a flexible mechanism through which we 
should be able to control allocation of replicas or later change it to suit the 
needs of the system

All configurations are per collection basis. The rules are applied whenever a 
replica is created in any of the shards in a given collection during

 * collection creation
 * shard splitting
 * add replica
 * createsshard

There are two aspects to how replicas are placed: snitch and placement. 

h2.snitch 
How to identify the tags of nodes. Snitches are configured through collection 
create command with the snitch param  . eg: snitch=EC2Snitch or 
snitch=class:EC2Snitch


h2.ImplicitSnitch 
This is shipped by default with Solr. user does not need to specify 
{{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
present in the rules , it is automatically used,
tags provided by ImplicitSnitch
# cores :  No:of cores in the node
# disk : Disk cpace available in the nodeol < 2
# host : host name of the node
# node: node name 
# D.* : These are values available from systrem propertes. {{D.key}} means a 
value that is passed to the node as {{-Dkey=keyValue}} during the node startup. 
It is possible to use rules like {{D.key:expectedVal,shard:*}}

h2.Rules 

This tells how many replicas for a given shard needs to be assigned to nodes 
with the given key value pairs. These parameters will be passed on to the 
collection CREATE api as a multivalued parameter  "rule" . The values will be 
saved in the state of the collection as follows
{code:Javascript}
{
 “mycollection”:{
  “snitch”: {
  class:“ImplicitSnitch”
}
  “rules”:[{"cores":"4-"}, 
 {"replica":"1" ,"shard" :"*" ,"node":"*"},
 {"disk":"1+"}]ol < 2
}
{code}

A rule is specified as a pseudo JSON syntax . which is a map of keys and values
*Each collection can have any number of rules. As long as the rules do not 
conflict with each other it should be OK. Or else an error is thrown
* In each rule , shard and replica can be omitted
** default value of  replica is {{\*}} means ANY or you can specify a count and 
an operand such as {{+}} or {{-}}
** and the value of shard can be a shard name or  {{\*}} means EACH  or {{**}} 
means ANY.  default value is {{\*\*}} (ANY)
* There should be exactly one extra condition in a rule other than {{shard}} 
and {{replica}}.  
* all keys other than {{shard}} and {{replica}} are called tags and the tags 
are nothing but values provided by the snitch for each node
* By default certain tags such as {{node}}, {{host}}, {{port}} are provided by 
the system implicitly 

h3.How are nodes picked up? 
Nodes are not picked up in random. The rules are used to first sort the nodes 
according to affinol < 2ity. For example, if there is a rule that says 
{{disk:100+}} , nodes with 100GB or more are given higher preference. If all 
alse are equal , nodes with fewer cores are given higher priority

h3.Fuzzy match
Fuzzy match can be applied when strict matches fail .The values can be prefixed 
{{~}} to specify fuzziness

example rule
{noformat}
 #Example requirement "use only one replica of a shard in a host if possible, 
if no matches found , relax that rule". 
rack:*,shard:*,replica:~1-

#Another example, assign all replicas to nodes with disk space of 100GB or 
more,, or relax the rule if not possible
disk:~100+
{noformat}
Examples:
{noformat}
#in each rack there can be max two replicas of A given shard
 rack:*,shard:*,replica:2-
//in each rack there can be max two replicas of ANY replica
 rack:*,shard:**,replica:2
 rack:*,replica:2-

 #in each node there should be a max one replica of EACH shard
 node:*,shard:*,replica:1-
 #in each node there should be a max one replica of ANY shard
 node:*,shard:**,replica:1-
 node:*,replica:1-
 
#In rack 738 and shard=shard1, there can be a max 0 replica
 rack:738,shard:shard1,replica:0-
 
 #All replicas of shard1 should go to rack 730
 shard:shard1,replica:*,rack:730
 shard:shard1,rack:730

 #all replicas must be created in a node with at least 20GB disk
 replica:*,shard:*,disk:20+
 replica:*,disk:20+
 disk:20+
#All replicas should be created in nodes with less than 5 cores
#In this ANY AND each for shard have same meaning
replica:*,shard:**,cores:5-
replica:*,cores:5-
cores:5-
#one replica of shard1 must go to node 192.168.1.2:8080_solr
node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 
#No replica of shard1 should go to rack 738
rack:!738,shard:shard1,replica:*
rack:!738,shard:shard1
#No replica  of ANY shard should go to rack 738
rack:!738,shard:**,replica:*
rack:!738,shard:*
rack:!738
{noformat}



In the collection create API all the placement rules are provided as a 
par

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-21 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Attachment: SOLR-6220.patch

* Added fuzzy match option {{~}}
* nodes are presorted based on rules instead of randomly picking nodes
* {{ImplicitSnitch}} can support per node system properties

> Replica placement strategy for solrcloud
> 
>
> Key: SOLR-6220
> URL: https://issues.apache.org/jira/browse/SOLR-6220
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, 
> SOLR-6220.patch
>
>
> ol < 2ol < 2h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of 
> a cluster are allocated . Solr should have a flexible mechanism through which 
> we should be able to control allocation of replicas or later change it to 
> suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a 
> replica is created in any of the shards in a given collection during
>  * collection creation
>  * shard splitting
>  * add replica
>  * createsshard
> There are two aspects to how replicas are placed: snitch and placement. 
> h2.snitch 
> How to identify the tags of nodes. Snitches are configured through collection 
> create command with the snitch param  . eg: snitch=EC2Snitch or 
> snitch=class:EC2Snitch
> h2.ImplicitSnitch 
> This is shipped by default with Solr. user does not need to specify 
> {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
> present in the rules , it is automatically used,
> tags provided by ImplicitSnitch
> # cores :  No:of cores in the node
> # disk : Disk cpace available in the nodeol < 2
> # host : host name of the node
> # node: node name 
> # D.* : These are values available from systrem propertes. {{D.key}} means a 
> value that is passed to the node as {{-Dkey=keyValue}} during the node 
> startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
> h2.Rules 
> This tells how many replicas for a given shard needs to be assigned to nodes 
> with the given key value pairs. These parameters will be passed on to the 
> collection CREATE api as a multivalued parameter  "rule" . The values will be 
> saved in the state of the collection as follows
> {code:Javascript}
> {
>  “mycollection”:{
>   “snitch”: {
>   class:“ImplicitSnitch”
> }
>   “rules”:[{"cores":"4-"}, 
>  {"replica":"1" ,"shard" :"*" ,"node":"*"},
>  {"disk":"1+"}]ol < 2
> }
> {code}
> A rule is specified as a pseudo JSON syntax . which is a map of keys and 
> values
> *Each collection can have any number of rules. As long as the rules do not 
> conflict with each other it should be OK. Or else an error is thrown
> * In each rule , shard and replica can be omitted
> ** default value of  replica is {{\*}} means ANY or you can specify a count 
> and an operand such as {{+}} or {{-}}
> ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
> {{**}} means ANY.  default value is {{\*\*}} (ANY)
> * There should be exactly one extra condition in a rule other than {{shard}} 
> and {{replica}}.  
> * all keys other than {{shard}} and {{replica}} are called tags and the tags 
> are nothing but values provided by the snitch for each node
> * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
> by the system implicitly 
> h3.How are nodes picked up? 
> Nodes are not picked up in random. The rules are used to first sort the nodes 
> according to affinol < 2ity. For example, if there is a rule that says 
> {{disk:100+}} , nodes with 100GB or more are given higher preference. If all 
> alse are equal , nodes with fewer cores are given higher priority
> h3.Fuzzy match
> Fuzzy match can be applied when strict matches fail .The values can be 
> prefixed {{~}} to specify fuzziness
> example rule
> {noformat}
>  #Example requirement "use only one replica of a shard in a host if possible, 
> if no matches found , relax that rule". 
> rack:*,shard:*,replica:~1-
> #Another example, assign all replicas to nodes with disk space of 100GB or 
> more,, or relax the rule if not possible
> disk:~100+
> {noformat}
> Examples:
> {noformat}
> #in each rack there can be max two replicas of A given shard
>  rack:*,shard:*,replica:2-
> //in each rack there can be max two replicas of ANY replica
>  rack:*,shard:**,replica:2-ol < 2ol < 2ol < 2
>  rack:*,replica:2-
>  #in each node there should be a max one replica of EACH shard
>  node:*,shard:*,replica:1-
>  #in each node there should be a max one replica of ANY shard
>  node:*,shard:**,replica:1-
>  node:*,replica:1-
>  
> #In rack 738 and shard=shard1, there can be a max 0 replica
>  rack:738,shard:sh

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-21 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-6220:

Description: 
ol < 2ol < 2h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a 
cluster are allocated . Solr should have a flexible mechanism through which we 
should be able to control allocation of replicas or later change it to suit the 
needs of the system

All configurations are per collection basis. The rules are applied whenever a 
replica is created in any of the shards in a given collection during

 * collection creation
 * shard splitting
 * add replica
 * createsshard

There are two aspects to how replicas are placed: snitch and placement. 

h2.snitch 
How to identify the tags of nodes. Snitches are configured through collection 
create command with the snitch param  . eg: snitch=EC2Snitch or 
snitch=class:EC2Snitch


h2.ImplicitSnitch 
This is shipped by default with Solr. user does not need to specify 
{{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
present in the rules , it is automatically used,
tags provided by ImplicitSnitch
# cores :  No:of cores in the node
# disk : Disk cpace available in the nodeol < 2
# host : host name of the node
# node: node name 
# D.* : These are values available from systrem propertes. {{D.key}} means a 
value that is passed to the node as {{-Dkey=keyValue}} during the node startup. 
It is possible to use rules like {{D.key:expectedVal,shard:*}}

h2.Rules 

This tells how many replicas for a given shard needs to be assigned to nodes 
with the given key value pairs. These parameters will be passed on to the 
collection CREATE api as a multivalued parameter  "rule" . The values will be 
saved in the state of the collection as follows
{code:Javascript}
{
 “mycollection”:{
  “snitch”: {
  class:“ImplicitSnitch”
}
  “rules”:[{"cores":"4-"}, 
 {"replica":"1" ,"shard" :"*" ,"node":"*"},
 {"disk":"1+"}]ol < 2
}
{code}

A rule is specified as a pseudo JSON syntax . which is a map of keys and values
*Each collection can have any number of rules. As long as the rules do not 
conflict with each other it should be OK. Or else an error is thrown
* In each rule , shard and replica can be omitted
** default value of  replica is {{\*}} means ANY or you can specify a count and 
an operand such as {{+}} or {{-}}
** and the value of shard can be a shard name or  {{\*}} means EACH  or {{**}} 
means ANY.  default value is {{\*\*}} (ANY)
* There should be exactly one extra condition in a rule other than {{shard}} 
and {{replica}}.  
* all keys other than {{shard}} and {{replica}} are called tags and the tags 
are nothing but values provided by the snitch for each node
* By default certain tags such as {{node}}, {{host}}, {{port}} are provided by 
the system implicitly 

h3.How are nodes picked up? 
Nodes are not picked up in random. The rules are used to first sort the nodes 
according to affinol < 2ity. For example, if there is a rule that says 
{{disk:100+}} , nodes with 100GB or more are given higher preference. If all 
alse are equal , nodes with fewer cores are given higher priority

h3.Fuzzy match
Fuzzy match can be applied when strict matches fail .The values can be prefixed 
{{~}} to specify fuzziness

example rule
{noformat}
 #Example requirement "use only one replica of a shard in a host if possible, 
if no matches found , relax that rule". 
rack:*,shard:*,replica:~1-

#Another example, assign all replicas to nodes with disk space of 100GB or 
more,, or relax the rule if not possible
disk:~100+
{noformat}
Examples:
{noformat}
#in each rack there can be max two replicas of A given shard
 rack:*,shard:*,replica:2-
//in each rack there can be max two replicas of ANY replica
 rack:*,shard:**,replica:2-ol < 2ol < 2ol < 2
 rack:*,replica:2-

 #in each node there should be a max one replica of EACH shard
 node:*,shard:*,replica:1-
 #in each node there should be a max one replica of ANY shard
 node:*,shard:**,replica:1-
 node:*,replica:1-
 
#In rack 738 and shard=shard1, there can be a max 0 replica
 rack:738,shard:shard1,replica:0-
 
 #All replicas of shard1 should go to rack 730
 shard:shard1,replica:*,rack:730
 shard:shard1,rack:730

 #all replicas must be created in a node with at least 20GB disk
 replica:*,shard:*,disk:20+
 replica:*,disk:20+
 disk:20+
#All replicas should be created in nodes with less than 5 cores
#In this ANY AND each for shard have same meaning
replica:*,shard:**,cores:5-
replica:*,cores:5-
cores:5-
#one replica of shard1 must go to node 192.168.1.2:8080_solr
node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 
#No replica of shard1 should go to rack 738
rack:!738,shard:shard1,replica:*
rack:!738,shard:shard1
#No replica  of ANY shard should go to rack 738
rack:!738,shard:**,replica:*
rack:!738,shard:*
rack:!738
{noformat}



In the collection crea

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-21 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Description: 
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a 
cluster are allocated . Solr should have a flexible mechanism through which we 
should be able to control allocation of replicas or later change it to suit the 
needs of the system

All configurations are per collection basis. The rules are applied whenever a 
replica is created in any of the shards in a given collection during

 * collection creation
 * shard splitting
 * add replica
 * createsshard

There are two aspects to how replicas are placed: snitch and placement. 

h2.snitch 
How to identify the tags of nodes. Snitches are configured through collection 
create command with the snitch param  . eg: snitch=EC2Snitch or 
snitch=class:EC2Snitch


h2.ImplicitSnitch 
This is shipped by default with Solr. user does not need to specify 
{{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
present in the rules , it is automatically used,
tags provided by ImplicitSnitch
# cores :  No:of cores in the node
# disk : Disk cpace available in the node
# host : host name of the node
# node: node name 
# D.* : These are values available from systrem propertes. {{D.key}} means a 
value that is passed to the node as {{-Dkey=keyValue}} during the node startup. 
It is possible to use rules like {{D.key:expectedVal,shard:*}}

h2.Rules 

This tells how many replicas for a given shard needs to be assigned to nodes 
with the given key value pairs. These parameters will be passed on to the 
collection CREATE api as a multivalued parameter  "rule" . The values will be 
saved in the state of the collection as follows
{code:Javascript}
{
 “mycollection”:{
  “snitch”: {
  class:“ImplicitSnitch”
}
  “rules”:[{"cores":"4-"}, 
 {"replica":"1" ,"shard" :"*" ,"node":"*"},
 {"disk":"1+"}]
}
{code}

A rule is specified as a pseudo JSON syntax . which is a map of keys and values
*Each collection can have any number of rules. As long as the rules do not 
conflict with each other it should be OK. Or else an error is thrown
* In each rule , shard and replica can be omitted
** default value of  replica is {{\*}} means ANY or you can specify a count and 
an operand such as {{+}} or {{-}}
** and the value of shard can be a shard name or  {{\*}} means EACH  or {{**}} 
means ANY.  default value is {{\*\*}} (ANY)
* There should be exactly one extra condition in a rule other than {{shard}} 
and {{replica}}.  
* all keys other than {{shard}} and {{replica}} are called tags and the tags 
are nothing but values provided by the snitch for each node
* By default certain tags such as {{node}}, {{host}}, {{port}} are provided by 
the system implicitly 

h3.How are nodes picked up? 
Nodes are not picked up in random. The rules are used to first sort the nodes 
according to affinity. For example, if there is a rule that says {{disk:100+}} 
, nodes with 100GB or more are given higher preference. If all alse are equal , 
nodes with fewer cores are given higher priority

h3.Fuzzy match
Fuzzy match can be applied when strict matches fail .The values can be prefixed 
{{~}} to specify fuzziness

example rule
{noformat}
 #Example requirement "use only one replica of a shard in a host if possible, 
if no matches found , relax that rule". 
rack:*,shard:*,replica:~1-

#Another example, assign all replicas to nodes with disk space of 100GB or 
more,, or relax the rule if not possible
disk:~100+
{noformat}
Examples:
{noformat}
#in each rack there can be max two replicas of A given shard
 rack:*,shard:*,replica:2-
//in each rack there can be max two replicas of ANY replica
 rack:*,shard:**,replica:2-
 rack:*,replica:2-

 #in each node there should be a max one replica of EACH shard
 node:*,shard:*,replica:1-
 #in each node there should be a max one replica of ANY shard
 node:*,shard:**,replica:1-
 node:*,replica:1-
 
#In rack 738 and shard=shard1, there can be a max 0 replica
 rack:738,shard:shard1,replica:0-
 
 #All replicas of shard1 should go to rack 730
 shard:shard1,replica:*,rack:730
 shard:shard1,rack:730

 #all replicas must be created in a node with at least 20GB disk
 replica:*,shard:*,disk:20+
 replica:*,disk:20+
 disk:20+
#All replicas should be created in nodes with less than 5 cores
#In this ANY AND each for shard have same meaning
replica:*,shard:**,cores:5-
replica:*,cores:5-
cores:5-
#one replica of shard1 must go to node 192.168.1.2:8080_solr
node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 
#No replica of shard1 should go to rack 738
rack:!738,shard:shard1,replica:*
rack:!738,shard:shard1
#No replica  of ANY shard should go to rack 738
rack:!738,shard:**,replica:*
rack:!738,shard:*
rack:!738
{noformat}



In the collection create API all the placement rules are provided as a 
parameters called ru

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-20 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Description: 
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a 
cluster are allocated . Solr should have a flexible mechanism through which we 
should be able to control allocation of replicas or later change it to suit the 
needs of the system

All configurations are per collection basis. The rules are applied whenever a 
replica is created in any of the shards in a given collection during

 * collection creation
 * shard splitting
 * add replica
 * createsshard

There are two aspects to how replicas are placed: snitch and placement. 

h2.snitch 
How to identify the tags of nodes. Snitches are configured through collection 
create command with the snitch prefix  . eg: snitch.type=EC2Snitch.

The system provides the following implicit tag names which cannot be used by 
other snitches
 * node : The solr nodename
 * host : The hostname
 * ip : The ip address of the host
 * cores : This is a dynamic varibale which gives the core count at any given 
point 
 * disk : This is a dynamic variable  which gives the available disk space at 
any given point


There will a few snitches provided by the system such as 

h3.EC2Snitch
Provides two tags called dc, rack from the region and zone values in EC2

h3.IPSnitch 
Use the IP to infer the “dc” and “rack” values

h3.NodePropertySnitch 
This lets users provide system properties to each node with tagname and value .

example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this 
particular node will have two tags “tag-x” and “tag-y” .
 
h3.RestSnitch 
 Which lets the user configure a url which the server can invoke and get all 
the tags for a given node. 

This takes extra parameters in create command
example:  {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}}
The response of the  rest call   
{{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}}

must be in json format 
eg: 
{code:JavaScript}
{
“tag-x”:”x-val”,
“tag-y”:”y-val”
}
{code}
h3.ManagedSnitch
This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The 
user should be able to manage the tags and values of each node through a 
collection API 


h2.Rules 

This tells how many replicas for a given shard needs to be assigned to nodes 
with the given key value pairs. These parameters will be passed on to the 
collection CREATE api as a multivalued parameter  "rule" . The values will be 
saved in the state of the collection as follows
{code:Javascript}
{
 “mycollection”:{
  “snitch”: {
  class:“ImplicitTagsSnitch”
}
  “rules”:[{"cores":"4-"}, 
 {"replica":"1" ,"shard" :"*" ,"node":"*"},
 {"disk":"1+"}]
}
{code}

A rule is specified as a pseudo JSON syntax . which is a map of keys and values
*Each collection can have any number of rules. As long as the rules do not 
conflict with each other it should be OK. Or else an error is thrown
* In each rule , shard and replica can be omitted
** default value of  replica is {{\*}} means ANY or you can specify a count and 
an operand such as {{+}} or {{-}}
** and the value of shard can be a shard name or  {{\*}} means EACH  or {{**}} 
means ANY.  default value is {{\*\*}} (ANY)
* There should be exactly one extra condition in a rule other than {{shard}} 
and {{replica}}.  
* all keys other than {{shard}} and {{replica}} are called tags and the tags 
are nothing but values provided by the snitch for each node
* By default certain tags such as {{node}}, {{host}}, {{port}} are provided by 
the system implicitly 

Examples:
{noformat}
//in each rack there can be max two replicas of A given shard
 {rack:*,shard:*,replica:2-}
//in each rack there can be max two replicas of ANY replica
 {rack:*,shard:**,replica:2-}
 {rack:*,replica:2-}

 //in each node there should be a max one replica of EACH shard
 {node:*,shard:*,replica:1-}
 //in each node there should be a max one replica of ANY shard
 {node:*,shard:**,replica:1-}
 {node:*,replica:1-}
 
//In rack 738 and shard=shard1, there can be a max 0 replica
 {rack:738,shard:shard1,replica:0-}
 
 //All replicas of shard1 should go to rack 730
 {shard:shard1,replica:*,rack:730}
 {shard:shard1,rack:730}

 // all replicas must be created in a node with at least 20GB disk
 {replica:*,shard:*,disk:20+}
 {replica:*,disk:20+}
 {disk:20+}
 // All replicas should be created in nodes with less than 5 cores
//In this ANY AND each for shard have same meaning
 {replica:*,shard:**,cores:5-}
 {replica:*,cores:5-}
 {cores:5-}
//one replica of shard1 must go to node 192.168.1.2:8080_solr
{node:”192.168.1.2:8080_solr”, shard:shard1, replica:1} 
//No replica of shard1 should go to rack 738
{rack:!738,shard:shard1,replica:*}
{rack:!738,shard:shard1}
//No replica  of ANY shard should go to rack 738
{rack:!738,shard:**,replica:*}
{rack:!738,shard

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-20 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Attachment: (was: SOLR-6220.patch)

> Replica placement strategy for solrcloud
> 
>
> Key: SOLR-6220
> URL: https://issues.apache.org/jira/browse/SOLR-6220
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch
>
>
> h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of 
> a cluster are allocated . Solr should have a flexible mechanism through which 
> we should be able to control allocation of replicas or later change it to 
> suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a 
> replica is created in any of the shards in a given collection during
>  * collection creation
>  * shard splitting
>  * add replica
>  * createsshard
> There are two aspects to how replicas are placed: snitch and placement. 
> h2.snitch 
> How to identify the tags of nodes. Snitches are configured through collection 
> create command with the snitch prefix  . eg: snitch.type=EC2Snitch.
> The system provides the following implicit tag names which cannot be used by 
> other snitches
>  * node : The solr nodename
>  * host : The hostname
>  * ip : The ip address of the host
>  * cores : This is a dynamic varibale which gives the core count at any given 
> point 
>  * disk : This is a dynamic variable  which gives the available disk space at 
> any given point
> There will a few snitches provided by the system such as 
> h3.EC2Snitch
> Provides two tags called dc, rack from the region and zone values in EC2
> h3.IPSnitch 
> Use the IP to infer the “dc” and “rack” values
> h3.NodePropertySnitch 
> This lets users provide system properties to each node with tagname and value 
> .
> example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this 
> particular node will have two tags “tag-x” and “tag-y” .
>  
> h3.RestSnitch 
>  Which lets the user configure a url which the server can invoke and get all 
> the tags for a given node. 
> This takes extra parameters in create command
> example:  {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}}
> The response of the  rest call   
> {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}}
> must be in json format 
> eg: 
> {code:JavaScript}
> {
> “tag-x”:”x-val”,
> “tag-y”:”y-val”
> }
> {code}
> h3.ManagedSnitch
> This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The 
> user should be able to manage the tags and values of each node through a 
> collection API 
> h2.Rules 
> This tells how many replicas for a given shard needs to be assigned to nodes 
> with the given key value pairs. These parameters will be passed on to the 
> collection CREATE api as a multivalued parameter  "rule" . The values will be 
> saved in the state of the collection as follows
> {code:Javascript}
> {
>  “mycollection”:{
>   “snitch”: {
>   type:“EC2Snitch”
> }
>   “rules”:[
>{“shard”: “value1”, “replica”: “value2”, "tag1":"val1"},
>{“shard”: “value1”, “replica”: “value2”, "tag2":"val2"}
>]
> }
> {code}
> A rule is specified as a pseudo JSON syntax . which is a map of keys and 
> values
> *Each collection can have any number of rules. As long as the rules do not 
> conflict with each other it should be OK. Or else an error is thrown
> * In each rule , shard and replica can be omitted
> ** default value of  replica is {{\*}} means ANY or you can specify a count 
> and an operand such as {{+}} or {{-}}
> ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
> {{**}} means ANY.  default value is {{\*\*}} (ANY)
> * There should be exactly one extra condition in a rule other than {{shard}} 
> and {{replica}}.  
> * all keys other than {{shard}} and {{replica}} are called tags and the tags 
> are nothing but values provided by the snitch for each node
> * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
> by the system implicitly 
> Examples:
> {noformat}
> //in each rack there can be max two replicas of A given shard
>  {rack:*,shard:*,replica:2-}
> //in each rack there can be max two replicas of ANY replica
>  {rack:*,shard:**,replica:2-}
>  {rack:*,replica:2-}
>  //in each node there should be a max one replica of EACH shard
>  {node:*,shard:*,replica:1-}
>  //in each node there should be a max one replica of ANY shard
>  {node:*,shard:**,replica:1-}
>  {node:*,replica:1-}
>  
> //In rack 738 and shard=shard1, there can be a max 0 replica
>  {rack:738,shard:shard1,replica:0-}
>  
>  //All replicas of shard1 should go to rack 730
>  {shard:shard1,replica:*,rack:730}
>  {

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-20 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Attachment: SOLR-6220.patch

> Replica placement strategy for solrcloud
> 
>
> Key: SOLR-6220
> URL: https://issues.apache.org/jira/browse/SOLR-6220
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch
>
>
> h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of 
> a cluster are allocated . Solr should have a flexible mechanism through which 
> we should be able to control allocation of replicas or later change it to 
> suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a 
> replica is created in any of the shards in a given collection during
>  * collection creation
>  * shard splitting
>  * add replica
>  * createsshard
> There are two aspects to how replicas are placed: snitch and placement. 
> h2.snitch 
> How to identify the tags of nodes. Snitches are configured through collection 
> create command with the snitch prefix  . eg: snitch.type=EC2Snitch.
> The system provides the following implicit tag names which cannot be used by 
> other snitches
>  * node : The solr nodename
>  * host : The hostname
>  * ip : The ip address of the host
>  * cores : This is a dynamic varibale which gives the core count at any given 
> point 
>  * disk : This is a dynamic variable  which gives the available disk space at 
> any given point
> There will a few snitches provided by the system such as 
> h3.EC2Snitch
> Provides two tags called dc, rack from the region and zone values in EC2
> h3.IPSnitch 
> Use the IP to infer the “dc” and “rack” values
> h3.NodePropertySnitch 
> This lets users provide system properties to each node with tagname and value 
> .
> example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this 
> particular node will have two tags “tag-x” and “tag-y” .
>  
> h3.RestSnitch 
>  Which lets the user configure a url which the server can invoke and get all 
> the tags for a given node. 
> This takes extra parameters in create command
> example:  {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}}
> The response of the  rest call   
> {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}}
> must be in json format 
> eg: 
> {code:JavaScript}
> {
> “tag-x”:”x-val”,
> “tag-y”:”y-val”
> }
> {code}
> h3.ManagedSnitch
> This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The 
> user should be able to manage the tags and values of each node through a 
> collection API 
> h2.Rules 
> This tells how many replicas for a given shard needs to be assigned to nodes 
> with the given key value pairs. These parameters will be passed on to the 
> collection CREATE api as a multivalued parameter  "rule" . The values will be 
> saved in the state of the collection as follows
> {code:Javascript}
> {
>  “mycollection”:{
>   “snitch”: {
>   type:“EC2Snitch”
> }
>   “rules”:[
>{“shard”: “value1”, “replica”: “value2”, "tag1":"val1"},
>{“shard”: “value1”, “replica”: “value2”, "tag2":"val2"}
>]
> }
> {code}
> A rule is specified as a pseudo JSON syntax . which is a map of keys and 
> values
> *Each collection can have any number of rules. As long as the rules do not 
> conflict with each other it should be OK. Or else an error is thrown
> * In each rule , shard and replica can be omitted
> ** default value of  replica is {{\*}} means ANY or you can specify a count 
> and an operand such as {{+}} or {{-}}
> ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
> {{**}} means ANY.  default value is {{\*\*}} (ANY)
> * There should be exactly one extra condition in a rule other than {{shard}} 
> and {{replica}}.  
> * all keys other than {{shard}} and {{replica}} are called tags and the tags 
> are nothing but values provided by the snitch for each node
> * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
> by the system implicitly 
> Examples:
> {noformat}
> //in each rack there can be max two replicas of A given shard
>  {rack:*,shard:*,replica:2-}
> //in each rack there can be max two replicas of ANY replica
>  {rack:*,shard:**,replica:2-}
>  {rack:*,replica:2-}
>  //in each node there should be a max one replica of EACH shard
>  {node:*,shard:*,replica:1-}
>  //in each node there should be a max one replica of ANY shard
>  {node:*,shard:**,replica:1-}
>  {node:*,replica:1-}
>  
> //In rack 738 and shard=shard1, there can be a max 0 replica
>  {rack:738,shard:shard1,replica:0-}
>  
>  //All replicas of shard1 should go to rack 730
>  {shard:shard1,replica:*,rack:730}
>  {shard:shard

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-20 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Attachment: SOLR-6220.patch

More tests . 

> Replica placement strategy for solrcloud
> 
>
> Key: SOLR-6220
> URL: https://issues.apache.org/jira/browse/SOLR-6220
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch
>
>
> h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of 
> a cluster are allocated . Solr should have a flexible mechanism through which 
> we should be able to control allocation of replicas or later change it to 
> suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a 
> replica is created in any of the shards in a given collection during
>  * collection creation
>  * shard splitting
>  * add replica
>  * createsshard
> There are two aspects to how replicas are placed: snitch and placement. 
> h2.snitch 
> How to identify the tags of nodes. Snitches are configured through collection 
> create command with the snitch prefix  . eg: snitch.type=EC2Snitch.
> The system provides the following implicit tag names which cannot be used by 
> other snitches
>  * node : The solr nodename
>  * host : The hostname
>  * ip : The ip address of the host
>  * cores : This is a dynamic varibale which gives the core count at any given 
> point 
>  * disk : This is a dynamic variable  which gives the available disk space at 
> any given point
> There will a few snitches provided by the system such as 
> h3.EC2Snitch
> Provides two tags called dc, rack from the region and zone values in EC2
> h3.IPSnitch 
> Use the IP to infer the “dc” and “rack” values
> h3.NodePropertySnitch 
> This lets users provide system properties to each node with tagname and value 
> .
> example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this 
> particular node will have two tags “tag-x” and “tag-y” .
>  
> h3.RestSnitch 
>  Which lets the user configure a url which the server can invoke and get all 
> the tags for a given node. 
> This takes extra parameters in create command
> example:  {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}}
> The response of the  rest call   
> {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}}
> must be in json format 
> eg: 
> {code:JavaScript}
> {
> “tag-x”:”x-val”,
> “tag-y”:”y-val”
> }
> {code}
> h3.ManagedSnitch
> This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The 
> user should be able to manage the tags and values of each node through a 
> collection API 
> h2.Rules 
> This tells how many replicas for a given shard needs to be assigned to nodes 
> with the given key value pairs. These parameters will be passed on to the 
> collection CREATE api as a multivalued parameter  "rule" . The values will be 
> saved in the state of the collection as follows
> {code:Javascript}
> {
>  “mycollection”:{
>   “snitch”: {
>   type:“EC2Snitch”
> }
>   “rules”:[
>{“shard”: “value1”, “replica”: “value2”, "tag1":"val1"},
>{“shard”: “value1”, “replica”: “value2”, "tag2":"val2"}
>]
> }
> {code}
> A rule is specified as a pseudo JSON syntax . which is a map of keys and 
> values
> *Each collection can have any number of rules. As long as the rules do not 
> conflict with each other it should be OK. Or else an error is thrown
> * In each rule , shard and replica can be omitted
> ** default value of  replica is {{\*}} means ANY or you can specify a count 
> and an operand such as {{+}} or {{-}}
> ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
> {{**}} means ANY.  default value is {{\*\*}} (ANY)
> * There should be exactly one extra condition in a rule other than {{shard}} 
> and {{replica}}.  
> * all keys other than {{shard}} and {{replica}} are called tags and the tags 
> are nothing but values provided by the snitch for each node
> * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
> by the system implicitly 
> Examples:
> {noformat}
> //in each rack there can be max two replicas of A given shard
>  {rack:*,shard:*,replica:2-}
> //in each rack there can be max two replicas of ANY replica
>  {rack:*,shard:**,replica:2-}
>  {rack:*,replica:2-}
>  //in each node there should be a max one replica of EACH shard
>  {node:*,shard:*,replica:1-}
>  //in each node there should be a max one replica of ANY shard
>  {node:*,shard:**,replica:1-}
>  {node:*,replica:1-}
>  
> //In rack 738 and shard=shard1, there can be a max 0 replica
>  {rack:738,shard:shard1,replica:0-}
>  
>  //All replicas of shard1 should go to rack 730
>  {shard:shard1,replica:*,rack:730}

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-15 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Attachment: SOLR-6220.patch

All planned features included. Tests will come next

> Replica placement strategy for solrcloud
> 
>
> Key: SOLR-6220
> URL: https://issues.apache.org/jira/browse/SOLR-6220
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-6220.patch, SOLR-6220.patch
>
>
> h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of 
> a cluster are allocated . Solr should have a flexible mechanism through which 
> we should be able to control allocation of replicas or later change it to 
> suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a 
> replica is created in any of the shards in a given collection during
>  * collection creation
>  * shard splitting
>  * add replica
>  * createsshard
> There are two aspects to how replicas are placed: snitch and placement. 
> h2.snitch 
> How to identify the tags of nodes. Snitches are configured through collection 
> create command with the snitch prefix  . eg: snitch.type=EC2Snitch.
> The system provides the following implicit tag names which cannot be used by 
> other snitches
>  * node : The solr nodename
>  * host : The hostname
>  * ip : The ip address of the host
>  * cores : This is a dynamic varibale which gives the core count at any given 
> point 
>  * disk : This is a dynamic variable  which gives the available disk space at 
> any given point
> There will a few snitches provided by the system such as 
> h3.EC2Snitch
> Provides two tags called dc, rack from the region and zone values in EC2
> h3.IPSnitch 
> Use the IP to infer the “dc” and “rack” values
> h3.NodePropertySnitch 
> This lets users provide system properties to each node with tagname and value 
> .
> example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this 
> particular node will have two tags “tag-x” and “tag-y” .
>  
> h3.RestSnitch 
>  Which lets the user configure a url which the server can invoke and get all 
> the tags for a given node. 
> This takes extra parameters in create command
> example:  {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}}
> The response of the  rest call   
> {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}}
> must be in json format 
> eg: 
> {code:JavaScript}
> {
> “tag-x”:”x-val”,
> “tag-y”:”y-val”
> }
> {code}
> h3.ManagedSnitch
> This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The 
> user should be able to manage the tags and values of each node through a 
> collection API 
> h2.Rules 
> This tells how many replicas for a given shard needs to be assigned to nodes 
> with the given key value pairs. These parameters will be passed on to the 
> collection CREATE api as a multivalued parameter  "rule" . The values will be 
> saved in the state of the collection as follows
> {code:Javascript}
> {
>  “mycollection”:{
>   “snitch”: {
>   type:“EC2Snitch”
> }
>   “rules”:[
>{“shard”: “value1”, “replica”: “value2”, "tag1":"val1"},
>{“shard”: “value1”, “replica”: “value2”, "tag2":"val2"}
>]
> }
> {code}
> A rule is specified as a pseudo JSON syntax . which is a map of keys and 
> values
> *Each collection can have any number of rules. As long as the rules do not 
> conflict with each other it should be OK. Or else an error is thrown
> * In each rule , shard and replica can be omitted
> ** default value of  replica is {{\*}} means ANY or you can specify a count 
> and an operand such as {{+}} or {{-}}
> ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
> {{**}} means ANY.  default value is {{\*\*}} (ANY)
> * There should be exactly one extra condition in a rule other than {{shard}} 
> and {{replica}}.  
> * all keys other than {{shard}} and {{replica}} are called tags and the tags 
> are nothing but values provided by the snitch for each node
> * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
> by the system implicitly 
> Examples:
> {noformat}
> //in each rack there can be max two replicas of A given shard
>  {rack:*,shard:*,replica:2-}
> //in each rack there can be max two replicas of ANY replica
>  {rack:*,shard:**,replica:2-}
>  {rack:*,replica:2-}
>  //in each node there should be a max one replica of EACH shard
>  {node:*,shard:*,replica:1-}
>  //in each node there should be a max one replica of ANY shard
>  {node:*,shard:**,replica:1-}
>  {node:*,replica:1-}
>  
> //In rack 738 and shard=shard1, there can be a max 0 replica
>  {rack:738,shard:shard1,replica:0-}
>  
>  //All replicas of shard1 should go to rack 730
>  {shard:shard1

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-07 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Attachment: SOLR-6220.patch

First cut with some basic tests

> Replica placement strategy for solrcloud
> 
>
> Key: SOLR-6220
> URL: https://issues.apache.org/jira/browse/SOLR-6220
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-6220.patch
>
>
> h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of 
> a cluster are allocated . Solr should have a flexible mechanism through which 
> we should be able to control allocation of replicas or later change it to 
> suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a 
> replica is created in any of the shards in a given collection during
>  * collection creation
>  * shard splitting
>  * add replica
>  * createsshard
> There are two aspects to how replicas are placed: snitch and placement. 
> h2.snitch 
> How to identify the tags of nodes. Snitches are configured through collection 
> create command with the snitch prefix  . eg: snitch.type=EC2Snitch.
> The system provides the following implicit tag names which cannot be used by 
> other snitches
>  * node : The solr nodename
>  * host : The hostname
>  * ip : The ip address of the host
>  * cores : This is a dynamic varibale which gives the core count at any given 
> point 
>  * disk : This is a dynamic variable  which gives the available disk space at 
> any given point
> There will a few snitches provided by the system such as 
> h3.EC2Snitch
> Provides two tags called dc, rack from the region and zone values in EC2
> h3.IPSnitch 
> Use the IP to infer the “dc” and “rack” values
> h3.NodePropertySnitch 
> This lets users provide system properties to each node with tagname and value 
> .
> example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this 
> particular node will have two tags “tag-x” and “tag-y” .
>  
> h3.RestSnitch 
>  Which lets the user configure a url which the server can invoke and get all 
> the tags for a given node. 
> This takes extra parameters in create command
> example:  {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}}
> The response of the  rest call   
> {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}}
> must be in json format 
> eg: 
> {code:JavaScript}
> {
> “tag-x”:”x-val”,
> “tag-y”:”y-val”
> }
> {code}
> h3.ManagedSnitch
> This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The 
> user should be able to manage the tags and values of each node through a 
> collection API 
> h2.Rules 
> This tells how many replicas for a given shard needs to be assigned to nodes 
> with the given key value pairs. These parameters will be passed on to the 
> collection CREATE api as a multivalued parameter  "rule" . The values will be 
> saved in the state of the collection as follows
> {code:Javascript}
> {
>  “mycollection”:{
>   “snitch”: {
>   type:“EC2Snitch”
> }
>   “rules”:[
>{“shard”: “value1”, “replica”: “value2”, "tag1":"val1"},
>{“shard”: “value1”, “replica”: “value2”, "tag2":"val2"}
>]
> }
> {code}
> A rule is specified as a pseudo JSON syntax . which is a map of keys and 
> values
> *Each collection can have any number of rules. As long as the rules do not 
> conflict with each other it should be OK. Or else an error is thrown
> * In each rule , shard and replica can be omitted
> ** default value of  replica is {{\*}} means ANY or you can specify a count 
> and an operand such as {{+}} or {{-}}
> ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
> {{**}} means ANY.  default value is {{\*\*}} (ANY)
> * There should be exactly one extra condition in a rule other than {{shard}} 
> and {{replica}}.  
> * all keys other than {{shard}} and {{replica}} are called tags and the tags 
> are nothing but values provided by the snitch for each node
> * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
> by the system implicitly 
> Examples:
> {noformat}
> //in each rack there can be max two replicas of A given shard
>  {rack:*,shard:*,replica:2-}
> //in each rack there can be max two replicas of ANY replica
>  {rack:*,shard:**,replica:2-}
>  {rack:*,replica:2-}
>  //in each node there should be a max one replica of EACH shard
>  {node:*,shard:*,replica:1-}
>  //in each node there should be a max one replica of ANY shard
>  {node:*,shard:**,replica:1-}
>  {node:*,replica:1-}
>  
> //In rack 738 and shard=shard1, there can be a max 0 replica
>  {rack:738,shard:shard1,replica:0-}
>  
>  //All replicas of shard1 should go to rack 730
>  {shard:shard1,replica:*,rack:730}
>  {shard:shard1

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-01 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Description: 
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a 
cluster are allocated . Solr should have a flexible mechanism through which we 
should be able to control allocation of replicas or later change it to suit the 
needs of the system

All configurations are per collection basis. The rules are applied whenever a 
replica is created in any of the shards in a given collection during

 * collection creation
 * shard splitting
 * add replica
 * createsshard

There are two aspects to how replicas are placed: snitch and placement. 

h2.snitch 
How to identify the tags of nodes. Snitches are configured through collection 
create command with the snitch prefix  . eg: snitch.type=EC2Snitch.

The system provides the following implicit tag names which cannot be used by 
other snitches
 * node : The solr nodename
 * host : The hostname
 * ip : The ip address of the host
 * cores : This is a dynamic varibale which gives the core count at any given 
point 
 * disk : This is a dynamic variable  which gives the available disk space at 
any given point


There will a few snitches provided by the system such as 

h3.EC2Snitch
Provides two tags called dc, rack from the region and zone values in EC2

h3.IPSnitch 
Use the IP to infer the “dc” and “rack” values

h3.NodePropertySnitch 
This lets users provide system properties to each node with tagname and value .

example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this 
particular node will have two tags “tag-x” and “tag-y” .
 
h3.RestSnitch 
 Which lets the user configure a url which the server can invoke and get all 
the tags for a given node. 

This takes extra parameters in create command
example:  {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}}
The response of the  rest call   
{{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}}

must be in json format 
eg: 
{code:JavaScript}
{
“tag-x”:”x-val”,
“tag-y”:”y-val”
}
{code}
h3.ManagedSnitch
This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The 
user should be able to manage the tags and values of each node through a 
collection API 


h2.Rules 

This tells how many replicas for a given shard needs to be assigned to nodes 
with the given key value pairs. These parameters will be passed on to the 
collection CREATE api as a multivalued parameter  "rule" . The values will be 
saved in the state of the collection as follows
{code:Javascript}
{
 “mycollection”:{
  “snitch”: {
  type:“EC2Snitch”
}
  “rules”:[
   {“shard”: “value1”, “replica”: “value2”, "tag1":"val1"},
   {“shard”: “value1”, “replica”: “value2”, "tag2":"val2"}
   ]
}
{code}

A rule is specified as a pseudo JSON syntax . which is a map of keys and values
*Each collection can have any number of rules. As long as the rules do not 
conflict with each other it should be OK. Or else an error is thrown
* In each rule , shard and replica can be omitted
** default value of  replica is {{\*}} means ANY or you can specify a count and 
an operand such as {{+}} or {{-}}
** and the value of shard can be a shard name or  {{\*}} means EACH  or {{**}} 
means ANY.  default value is {{\*\*}} (ANY)
* There should be exactly one extra condition in a rule other than {{shard}} 
and {{replica}}.  
* all keys other than {{shard}} and {{replica}} are called tags and the tags 
are nothing but values provided by the snitch for each node
* By default certain tags such as {{node}}, {{host}}, {{port}} are provided by 
the system implicitly 

Examples:
{noformat}
//in each rack there can be max two replicas of A given shard
 {rack:*,shard:*,replica:2-}
//in each rack there can be max two replicas of ANY replica
 {rack:*,shard:**,replica:2-}
 {rack:*,replica:2-}

 //in each node there should be a max one replica of EACH shard
 {node:*,shard:*,replica:1-}
 //in each node there should be a max one replica of ANY shard
 {node:*,shard:**,replica:1-}
 {node:*,replica:1-}
 
//In rack 738 and shard=shard1, there can be a max 0 replica
 {rack:738,shard:shard1,replica:0-}
 
 //All replicas of shard1 should go to rack 730
 {shard:shard1,replica:*,rack:730}
 {shard:shard1,rack:730}

 // all replicas must be created in a node with at least 20GB disk
 {replica:*,shard:*,disk:20+}
 {replica:*,disk:20+}
 {disk:20+}
 // All replicas should be created in nodes with less than 5 cores
//In this ANY AND each for shard have same meaning
 {replica:*,shard:**,cores:5-}
 {replica:*,cores:5-}
 {cores:5-}
//one replica of shard1 must go to node 192.168.1.2:8080_solr
{node:”192.168.1.2:8080_solr”, shard:shard1, replica:1} 
//No replica of shard1 should go to rack 738
{rack:!738,shard:shard1,replica:*}
{rack:!738,shard:shard1}
//No replica  of ANY shard should go to rack 738
{rack:!738,shard:**,replica:*}
{r

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-01 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Description: 
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a 
cluster are allocated . Solr should have a flexible mechanism through which we 
should be able to control allocation of replicas or later change it to suit the 
needs of the system

All configurations are per collection basis. The rules are applied whenever a 
replica is created in any of the shards in a given collection during

 * collection creation
 * shard splitting
 * add replica
 * createsshard

There are two aspects to how replicas are placed: snitch and placement. 

h2.snitch 
How to identify the tags of nodes. Snitches are configured through collection 
create command with the snitch prefix  . eg: snitch.type=EC2Snitch.

The system provides the following implicit tag names which cannot be used by 
other snitches
 * node : The solr nodename
 * host : The hostname
 * ip : The ip address of the host
 * cores : This is a dynamic varibale which gives the core count at any given 
point 
 * disk : This is a dynamic variable  which gives the available disk space at 
any given point


There will a few snitches provided by the system such as 

h3.EC2Snitch
Provides two tags called dc, rack from the region and zone values in EC2

h3.IPSnitch 
Use the IP to infer the “dc” and “rack” values

h3.NodePropertySnitch 
This lets users provide system properties to each node with tagname and value .

example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this 
particular node will have two tags “tag-x” and “tag-y” .
 
h3.RestSnitch 
 Which lets the user configure a url which the server can invoke and get all 
the tags for a given node. 

This takes extra parameters in create command
example:  {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}}
The response of the  rest call   
{{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}}

must be in json format 
eg: 
{code:JavaScript}
{
“tag-x”:”x-val”,
“tag-y”:”y-val”
}
{code}
h3.ManagedSnitch
This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The 
user should be able to manage the tags and values of each node through a 
collection API 


h2.Rules 

This tells how many replicas for a given shard needs to be assigned to nodes 
with the given key value pairs. These parameters will be passed on to the 
collection CREATE api as a multivalued parameter  "rule" . The values will be 
saved in the state of the collection as follows
{code:Javascript}
{
 “mycollection”:{
  “snitch”: {
  type:“EC2Snitch”
}
  “rules”:[
   {“key1”: “value1”, “key2”: “value2”},
   {"key1" :"x", "a":"b"}
   ]
}
{code}

A rule is specified as a pseudo JSON syntax . which is a map of keys and values
*Each collection can have any number of rules. As long as the rules do not 
conflict with each other it should be OK. Or else an error is thrown
* In each rule , shard and replica are optional and have default values as 
{{\*}}
* There should be atleast one non wild card {{\*}} condition in that rule and 
there should be at least one another condition other than tag than {{shard}} 
and {{replica}}
* all keys other than {{shard}} and {{replica}} are called tags and the tags 
are nothing but values provided by the snitch for each node
* By default certain tags such as {{node}}, {{host}}, {{port}} are provided by 
the system implicitly 


Examples:
{noformat}
//in each rack there can be max two replica of each shard
 {rack:*,shard:*,replica:2-}
 {rack:*,replica:2-}
 //in each node there should be a max one replica of each shard
 {node:*,shard:*,replica:1-}
 {node:*,replica:1-}
 
//In rack 738 and shard=shard1, there can be a max 0 replica
 {rack:738,shard:shard1,replica:0-}
 
 //All replicas of shard1 should go to rack 730
 {shard:shard1,replica:*,rack:730}
 {shard:shard1,rack:730}

 // all replicas must be created in a node with at least 20GB disk
 {replica:*,shard:*,disk:20+}
 {replica:*,disk:20+}
 {disk:20+}
 // All replicas should be created in nodes with less than 5 cores
 {replica:*,shard:*,cores:5-}
 {replica:*,cores:5-}
 {cores:5-}
//one replica of shard1 must go to node 192.168.1.2:8080_solr
{node:”192.168.1.2:8080_solr”, shard:shard1, replica:1} 
//No replica of shard1 should go to rack 738
{rack:!738,shard:shard1, replica:*}
{rack:!738,shard:shard1}
//No replica should go to rack 738
{rack:!738,shard:*,replica:*}
{rack:!738,shard:*}
{rack:!738}

{noformat}



In the collection create API all the placement rules are provided as a 
parameter called placement and multiple rules are separated with "|" 
example:
{noformat}
snitch={type:EC2Snitch}&rule={shard:*,replica:1,dc:dc1}&rule={shard:*,replica:2-,dc:dc3}&rule{shard:shard1,replica:*,rack:!738
 
{noformat}

  was:
h1.Objective
Most cloud based systems allow to specify rules on how the replicas

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-01 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Description: 
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a 
cluster are allocated . Solr should have a flexible mechanism through which we 
should be able to control allocation of replicas or later change it to suit the 
needs of the system

All configurations are per collection basis. The rules are applied whenever a 
replica is created in any of the shards in a given collection during

 * collection creation
 * shard splitting
 * add replica
 * createsshard

There are two aspects to how replicas are placed: snitch and placement. 

h2.snitch 
How to identify the tags of nodes. Snitches are configured through collection 
create command with the snitch prefix  . eg: snitch.type=EC2Snitch.

The system provides the following implicit tag names which cannot be used by 
other snitches
 * node : The solr nodename
 * host : The hostname
 * ip : The ip address of the host
 * cores : This is a dynamic varibale which gives the core count at any given 
point 
 * disk : This is a dynamic variable  which gives the available disk space at 
any given point


There will a few snitches provided by the system such as 

h3.EC2Snitch
Provides two tags called dc, rack from the region and zone values in EC2

h3.IPSnitch 
Use the IP to infer the “dc” and “rack” values

h3.NodePropertySnitch 
This lets users provide system properties to each node with tagname and value .

example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this 
particular node will have two tags “tag-x” and “tag-y” .
 
h3.RestSnitch 
 Which lets the user configure a url which the server can invoke and get all 
the tags for a given node. 

This takes extra parameters in create command
example:  {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}}
The response of the  rest call   
{{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}}

must be in json format 
eg: 
{code:JavaScript}
{
“tag-x”:”x-val”,
“tag-y”:”y-val”
}
{code}
h3.ManagedSnitch
This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The 
user should be able to manage the tags and values of each node through a 
collection API 


h2.Rules 

This tells how many replicas for a given shard needs to be assigned to nodes 
with the given key value pairs. These parameters will be passed on to the 
collection CREATE api as a multivalued parameter  "rule" . The values will be 
saved in the state of the collection as follows
{code:Javascript}
{
 “mycollection”:{
  “snitch”: {
  type:“EC2Snitch”
}
  “rules”:[
   {“key1”: “value1”, “key2”: “value2”},
   {"key1" :"x", "a":"b"}
   ]
}
{code}

A rule is specified as a pseudo JSON syntax . which is a map of keys and values
*Each collection can have any number of rules. As long as the rules do not 
conflict with each other it should be OK. Or else an error is thrown
* In each rule , shard and replica are optional and have default values as 
{{\*}}
* There should be atleast one non wild card {{\*}} condition in that rule and 
there should be at least one another condition other than tag than {{shard}} 
and {{replica}}
* all keys other than {{shard}} and {{replica}} are called tags and the tags 
are nothing but values provided by the snitch for each node
* By default certain tags such as {{node}}, {{host}}, {{port}} are provided by 
the system implicitly 


Examples:
{noformat}
//in each rack there can be max two replica of each shard
 {rack:*,shard:*,replica:2-}
 {rack:*,replica:2-}
 //in each node there should be a max one replica of each shard
 {node:*,shard:*,replica:1-}
  
//In rack 738 and shard=shard1, there can be a max 0 replica
 {rack:738,shard:shard1,replica:0-}
 
 //All replicas of shard1 should go to rack 730
 {shard:shard1,replica:*,rack:730}
 {shard:shard1,rack:730}

 // all replicas must be created in a node with at least 20GB disk
 {replica:*,shard:*,disk:20+}
 {replica:*,disk:20+}
 {disk:20+}
 // All replicas should be created in nodes with less than 5 cores
 {replica:*,shard:*,cores:5-}
 {replica:*,cores:5-}
 {cores:5-}
//one replica of shard1 must go to node 192.168.1.2:8080_solr
{node:”192.168.1.2:8080_solr”, shard:shard1, replica:1} 
//No replica of shard1 should go to rack 738
{rack:!738,shard:shard1, replica:*}
{rack:!738,shard:shard1}
//No replica should go to rack 738
{rack:!738,shard:*,replica:*}
{rack:!738,shard:*}
{rack:!738}

{noformat}



In the collection create API all the placement rules are provided as a 
parameter called placement and multiple rules are separated with "|" 
example:
{noformat}
snitch={type:EC2Snitch}&rule={shard:*,replica:1,dc:dc1}&rule={shard:*,replica:2-,dc:dc3}&rule{shard:shard1,replica:*,rack:!738
 
{noformat}

  was:
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a 
cluster

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2015-04-01 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6220:
-
Description: 
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a 
cluster are allocated . Solr should have a flexible mechanism through which we 
should be able to control allocation of replicas or later change it to suit the 
needs of the system

All configurations are per collection basis. The rules are applied whenever a 
replica is created in any of the shards in a given collection during

 * collection creation
 * shard splitting
 * add replica
 * createsshard

There are two aspects to how replicas are placed: snitch and placement. 

h2.snitch 
How to identify the tags of nodes. Snitches are configured through collection 
create command with the snitch prefix  . eg: snitch.type=EC2Snitch.

The system provides the following implicit tag names which cannot be used by 
other snitches
 * node : The solr nodename
 * host : The hostname
 * ip : The ip address of the host
 * cores : This is a dynamic varibale which gives the core count at any given 
point 
 * disk : This is a dynamic variable  which gives the available disk space at 
any given point


There will a few snitches provided by the system such as 

h3.EC2Snitch
Provides two tags called dc, rack from the region and zone values in EC2

h3.IPSnitch 
Use the IP to infer the “dc” and “rack” values

h3.NodePropertySnitch 
This lets users provide system properties to each node with tagname and value .

example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this 
particular node will have two tags “tag-x” and “tag-y” .
 
h3.RestSnitch 
 Which lets the user configure a url which the server can invoke and get all 
the tags for a given node. 

This takes extra parameters in create command
example:  {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}}
The response of the  rest call   
{{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}}

must be in json format 
eg: 
{code:JavaScript}
{
“tag-x”:”x-val”,
“tag-y”:”y-val”
}
{code}
h3.ManagedSnitch
This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The 
user should be able to manage the tags and values of each node through a 
collection API 


h2.Rules 

This tells how many replicas for a given shard needs to be assigned to nodes 
with the given key value pairs. These parameters will be passed on to the 
collection CREATE api as a multivalued parameter  "rule" . The values will be 
saved in the state of the collection as follows
{code:Javascript}
{
 “mycollection”:{
  “snitch”: {
  type:“EC2Snitch”
}
  “rules”:[
   {“key1”: “value1”, “key2”: “value2”},
   {"key1" :"x", "a":"b"}
   ]
}
{code}

A rule is specified as a pseudo JSON syntax . which is a map of keys and values
*Each collection can have any number of rules. As long as the rules do not 
conflict with each other it should be OK. Or else an error is thrown
* In each rule , shard and replica are optional and have default values as 
{{\*}}
* There should be atleast one non wild card {{\*}} condition in that rule and 
there should be at least one another condition other than tag than {{shard}} 
and {{replica}}
* all keys other than {{shard}} and {{replica}} are called tags and the tags 
are nothing but values provided by the snitch for each node
* By default certain tags such as {{node}}, {{host}}, {{port}} are provided by 
the system implicitly 


Examples:
{noformat}
//in each rack there can be max two replica of each shard
 {rack:*,shard:*,replica:2-}
 {rack:*,replica:2-}
 //in each node there should be a max one replica of each shard
 {node:*,shard:*,replica:1-}
  
//In rack 738 and shard=shard1, there can be a max 0 replica
 {rack:738,shard:shard1,replica:0}
 
 //All replicas of shard1 should go to rack 730
 {shard:shard1,replica:*,rack:730}
 {shard:shard1,rack:730}

 // all replicas must be created in a node with at least 20GB disk
 {replica:*,shard:*,disk:20+}
 {replica:*,disk:20+}
 {disk:20+}
 // All replicas should be created in nodes with less than 5 cores
 {replica:*,shard:*,cores:5-}
 {replica:*,cores:5-}
 {cores:5-}
//one replica of shard1 must go to node 192.168.1.2:8080_solr
{node:”192.168.1.2:8080_solr”, shard:shard1, replica:1} 
//No replica of shard1 should go to rack 738
{rack:!738,shard:shard1, replica:*}
{rack:!738,shard:shard1}
//No replica should go to rack 738
{rack:!738,shard:*,replica:*}
{rack:!738,shard:*}
{rack:!738}

{noformat}



In the collection create API all the placement rules are provided as a 
parameter called placement and multiple rules are separated with "|" 
example:
{noformat}
snitch={type:EC2Snitch}&rule={shard:*,replica:1,dc:dc1}&rule={shard:*,replica:2-,dc:dc3}&rule{shard:shard1,replica:*,rack:!738
 
{noformat}

  was:
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a 
cluster 

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

2014-07-02 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-6220:
-

Summary: Replica placement strategy for solrcloud  (was: Replica placement 
startegy for solrcloud)

> Replica placement strategy for solrcloud
> 
>
> Key: SOLR-6220
> URL: https://issues.apache.org/jira/browse/SOLR-6220
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of 
> a cluster are allocated . Solr should have a flexible mechanism through which 
> we should be able to control allocation of replicas or later change it to 
> suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a 
> replica is created in any of the shards in a given collection during
>  * collection creation
>  * shard splitting
>  * add replica
>  * createsshard
> There are two aspects to how replicas are placed: snitch and placement. 
> h2.snitch 
> How to identify the tags of nodes. Snitches are configured through collection 
> create command with the snitch prefix  . eg: snitch.type=EC2Snitch.
> The system provides the following implicit tag names which cannot be used by 
> other snitches
>  * node : The solr nodename
>  * host : The hostname
>  * ip : The ip address of the host
>  * cores : This is a dynamic varibale which gives the core count at any given 
> point 
>  * disk : This is a dynamic variable  which gives the available disk space at 
> any given point
> There will a few snitches provided by the system such as 
> h3.EC2Snitch
> Provides two tags called dc, rack from the region and zone values in EC2
> h3.IPSnitch 
> Use the IP to infer the “dc” and “rack” values
> h3.NodePropertySnitch 
> This lets users provide system properties to each node with tagname and value 
> .
> example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this 
> particular node will have two tags “tag-x” and “tag-y” .
>  
> h3.RestSnitch 
>  Which lets the user configure a url which the server can invoke and get all 
> the tags for a given node. 
> This takes extra parameters in create command
> example:  
> {{snitch.type=RestSnitch&snitch.url=http://snitchserverhost:port?nodename={}}}
> The response of the  rest call   
> {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}}
> must be in either json format or properties format. 
> eg: 
> {code:JavaScript}
> {
> “tag-x”:”x-val”,
> “tag-y”:”y-val”
> }
> {code}
> or
> {noformat}
> tag-x=x-val
> tag-y=y-val
> {noformat}
> h3.ManagedSnitch
> This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The 
> user should be able to manage the tags and values of each node through a 
> collection API 
> h2.Placement 
> This tells how many replicas for a given shard needs to be assigned to nodes 
> with the given key value pairs. These parameters will be passed on to the 
> collection CREATE api as a parameter  "placement" . The values will be saved 
> in the state of the collection as follows
> {code:Javascript}
> {
>  “mycollection”:{
>   “snitch”: {
>   type:“EC2Snitch”
> }
>   “placement”:{
>“key1”: “value1”,
>“key2”: “value2”,
>}
> }
> {code}
> A rule consists of 2 parts
>  * LHS or the qualifier .The format is 
> \{shardname}.\{replicacount}\{quantifier}  .  Use the wild card “*” for 
> qualifying all. quatifiers are
>  ** no value means . exactly equal.  e.g: 2 means exactly 2
>  ** "+" means greater than or equal .  e.g : 2+means 2 or more
>  ** "\-" means less  than. e.g 2- means , less than 2 
>  * RHS or  conditions :  The format is \{tagname}\{operand}\{value} . The tag 
> name and values are provided by the snitch. The supported operands are
>  ** -> :  equals
>  ** >: greater than . Only applicable for numeric tags
>  ** < : less than , Only applicable to numeric tags
> ** !  : NOT or not equals
> Each collection can have any number of rules. As long as the rules do not 
> conflict with each other it should be OK. Or else an error is thrown
> Example rules:
>  * “shard1.1”:“dc->dc1,rack->168” : This would assign exactly 1 replica for 
> shard1 with nodes having tags   “dc=dc1,rack=168”.
>  *  “shard1.1+”:“dc->dc1,rack->168”  : Same as above but assigns atleast one 
> replica to the tag val combination
>  * “*.1”:“dc->dc1” :  For all shards keep exactly one replica in dc:dc1
>  * “*.1+”:”dc->dc2”  : At least one  replica needs to be in dc:dc2
>  * “*.2-”:”dc->dc3” : Keep a maximum of 2 replicas in dc:dc3 for all shards
>  * “shard1.*”:”rack->730”  :  All replicas of shard1 will go to rack 730
>  * “shard1.1”:“node->192.167.1.2:8983_solr”  : 1 replica of shard1 must go to 
> the node 192.167.1.28983_solr
>  * “shard1.*