Re: Solr Autoscaling multi-AZ rules

2018-03-22 Thread Noble Paul
The meaning of Replication Factor is screwed up. Replication factor is
a number. RF=3 means there are 3 replicas for each shard.

I understand that {"replica": "<7", "node":"#ANY"} may result in two
replicas of the same shard ending up on the same node. However, the
other rule should prevent this: {"replica": "<2", "shard": "#EACH",
"node": "#ANY"}
So by using both rules, that should mean "no more than six replicas on
a node, where all the replicas on that node represent distinct
shards". Right?

Yes you are right

On Fri, Feb 23, 2018 at 7:17 AM, Jeff Wartes  wrote:
>
> I managed to miss this reply earlier, but:
>
> Shard: A logical segment of a collection
> Replica: A physical core, representing a particular Shard
> Replication Factor (RF): A set of Replicas, such that a single Replica exists 
> for each Shard in a Collection.
> Availability Zone (AZ): A partitioned set of nodes such that a physical or 
> hardware failure in one AZ should not affect another AZ. AZ could mean 
> distinct racks in a data center, or distinct  data centers, but I happen to 
> specifically mean the AWS definition here: 
> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-regions-availability-zones
>
> So an RF2 collection with 2 shards means I have four Replicas in my 
> collection, two shard1 and two shard2. If it's RF3, then I have six: three 
> shard1 and three shard2.
> I'm using "Distinct RF" as a shorthand for "a single replica for every shard 
> in the collection".
> In the RF2 example above, if I have two Availability Zones, I would want a 
> Distinct RF in each AZ. So, a replica for shard1 and shard2 in AZ1, and a 
> replica for shard1 and shard2 in AZ2. I would *not* want, say, both shard1 
> replicas in AZ1 because then a failure of AZ1 could leave me with no replicas 
> for shard1 and an incomplete collection.
> If I had RF6 and two AZs, I would want three Distinct RFs in each AZ. (three 
> replicas for each shard, per AZ)
>
> I understand that {"replica": "<7", "node":"#ANY"} may result in two replicas 
> of the same shard ending up on the same node. However, the other rule should 
> prevent this: {"replica": "<2", "shard": "#EACH", "node": "#ANY"}
> So by using both rules, that should mean "no more than six replicas on a 
> node, where all the replicas on that node represent distinct shards". Right?
>
>
>
> On 2/12/18, 12:18 PM, "Noble Paul"  wrote:
>
> >>Goal: No node should have more than 6 shards
>
> This is not possible today
>
>  {"replica": "<7", "node":"#ANY"} , means don't put more than 7
> replicas of the collection (irrespective of the shards) in a given
> node
>
> what do you mean by distinct 'RF' ? I think we are screwing up the
> terminologies a bit here
>
> On Wed, Feb 7, 2018 at 1:38 PM, Jeff Wartes  
> wrote:
> > I’ve been messing around with the Solr 7.2 autoscaling framework this 
> week. Some things seem trivial, but I’m also running into questions and 
> issues. If anyone else has experience with this stuff, I’d be glad to hear 
> it. Specifically:
> >
> >
> > Context:
> > -One collection, consisting of 42 shards, where up to 6 shards can fit 
> on a single node. (which means 7 nodes per Replication Factor)
> > -Three AZs, each with its own ip_2 value.
> >
> > Goals:
> >
> > Goal: Fully utilize available nodes.
> > Cluster Preference: {“maximize”: "cores”}
> >
> > Goal: No node should have more than one replica of a given shard
> > Rule: {"replica": "<2", "shard": "#EACH", "node": "#ANY"}
> >
> > Goal: No node should have more than 6 shards
> > Rule: {"replica": "<7", "node":"#ANY"}
> >
> > Goal: Where possible, distinct RFs should each exist in an AZ.
> > (Example1: I’d like 7 nodes with a complete RF in AZ 1 and 7 nodes with 
> a complete RF in AZ 2, and not end up with, say, both shard2 replicas in AZ 1)
> > (Example2: If I have 14 nodes in AZ 1 and 7 in AZ 2, I should have two 
> full RFs in AZ 1 and one in AZ 2)
> > Rule: ???
> >
> > I could have multiple non-strict rules perhaps? Like:
> > {"replica": "<2", "shard": "#EACH", "ip_2": "1", "strict":false}
> > {"replica": "<3", "shard": "#EACH", "ip_2": "1", "strict":false}
> > {"replica": "<4", "shard": "#EACH", "ip_2": "1", "strict":false}
> > {"replica": "<2", "shard": "#EACH", "ip_2": "2", "strict":false}
> > {"replica": "<3", "shard": "#EACH", "ip_2": "2", "strict":false}
> > {"replica": "<4", "shard": "#EACH", "ip_2": "2", "strict":false}
> > etc
> > So having more than one RF in an AZ is a technical “violation”, but if 
> placement minimizes non-strict violations, replicas would tend to get placed 
> correctly.
> >
> >
> > Given a working set of rules, I’m still having trouble with two things:
> >
> >   1.  I’ve manually created the 

Re: Solr Autoscaling multi-AZ rules

2018-02-22 Thread Jeff Wartes

I managed to miss this reply earlier, but:

Shard: A logical segment of a collection
Replica: A physical core, representing a particular Shard
Replication Factor (RF): A set of Replicas, such that a single Replica exists 
for each Shard in a Collection. 
Availability Zone (AZ): A partitioned set of nodes such that a physical or 
hardware failure in one AZ should not affect another AZ. AZ could mean distinct 
racks in a data center, or distinct  data centers, but I happen to specifically 
mean the AWS definition here: 
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-regions-availability-zones

So an RF2 collection with 2 shards means I have four Replicas in my collection, 
two shard1 and two shard2. If it's RF3, then I have six: three shard1 and three 
shard2.
I'm using "Distinct RF" as a shorthand for "a single replica for every shard in 
the collection". 
In the RF2 example above, if I have two Availability Zones, I would want a 
Distinct RF in each AZ. So, a replica for shard1 and shard2 in AZ1, and a 
replica for shard1 and shard2 in AZ2. I would *not* want, say, both shard1 
replicas in AZ1 because then a failure of AZ1 could leave me with no replicas 
for shard1 and an incomplete collection.
If I had RF6 and two AZs, I would want three Distinct RFs in each AZ. (three 
replicas for each shard, per AZ)

I understand that {"replica": "<7", "node":"#ANY"} may result in two replicas 
of the same shard ending up on the same node. However, the other rule should 
prevent this: {"replica": "<2", "shard": "#EACH", "node": "#ANY"}
So by using both rules, that should mean "no more than six replicas on a node, 
where all the replicas on that node represent distinct shards". Right?



On 2/12/18, 12:18 PM, "Noble Paul"  wrote:

>>Goal: No node should have more than 6 shards

This is not possible today

 {"replica": "<7", "node":"#ANY"} , means don't put more than 7
replicas of the collection (irrespective of the shards) in a given
node

what do you mean by distinct 'RF' ? I think we are screwing up the
terminologies a bit here

On Wed, Feb 7, 2018 at 1:38 PM, Jeff Wartes  wrote:
> I’ve been messing around with the Solr 7.2 autoscaling framework this 
week. Some things seem trivial, but I’m also running into questions and issues. 
If anyone else has experience with this stuff, I’d be glad to hear it. 
Specifically:
>
>
> Context:
> -One collection, consisting of 42 shards, where up to 6 shards can fit on 
a single node. (which means 7 nodes per Replication Factor)
> -Three AZs, each with its own ip_2 value.
>
> Goals:
>
> Goal: Fully utilize available nodes.
> Cluster Preference: {“maximize”: "cores”}
>
> Goal: No node should have more than one replica of a given shard
> Rule: {"replica": "<2", "shard": "#EACH", "node": "#ANY"}
>
> Goal: No node should have more than 6 shards
> Rule: {"replica": "<7", "node":"#ANY"}
>
> Goal: Where possible, distinct RFs should each exist in an AZ.
> (Example1: I’d like 7 nodes with a complete RF in AZ 1 and 7 nodes with a 
complete RF in AZ 2, and not end up with, say, both shard2 replicas in AZ 1)
> (Example2: If I have 14 nodes in AZ 1 and 7 in AZ 2, I should have two 
full RFs in AZ 1 and one in AZ 2)
> Rule: ???
>
> I could have multiple non-strict rules perhaps? Like:
> {"replica": "<2", "shard": "#EACH", "ip_2": "1", "strict":false}
> {"replica": "<3", "shard": "#EACH", "ip_2": "1", "strict":false}
> {"replica": "<4", "shard": "#EACH", "ip_2": "1", "strict":false}
> {"replica": "<2", "shard": "#EACH", "ip_2": "2", "strict":false}
> {"replica": "<3", "shard": "#EACH", "ip_2": "2", "strict":false}
> {"replica": "<4", "shard": "#EACH", "ip_2": "2", "strict":false}
> etc
> So having more than one RF in an AZ is a technical “violation”, but if 
placement minimizes non-strict violations, replicas would tend to get placed 
correctly.
>
>
> Given a working set of rules, I’m still having trouble with two things:
>
>   1.  I’ve manually created the “.system” collection, as it didn’t seem 
to get created automatically. However, autoscaling activity is not getting 
logged to it.
>   2.  I can’t seem to figure out how to scale up.
>  *   I’d presumed editing the collection’s “replicationFactor” would 
do the trick, but it does not.
>  *   The “node-up” trigger will serve to replace lost replicas, but 
won’t otherwise take advantage of additional capacity.
>
>i.  
There’s a UTILIZENODE command in 7.2, but it appears that’s still something you 
need to trigger manually.
>
> Anyone played with this stuff?



-- 
-
Noble Paul

Re: Solr Autoscaling multi-AZ rules

2018-02-12 Thread Noble Paul
>>Goal: No node should have more than 6 shards

This is not possible today

 {"replica": "<7", "node":"#ANY"} , means don't put more than 7
replicas of the collection (irrespective of the shards) in a given
node

what do you mean by distinct 'RF' ? I think we are screwing up the
terminologies a bit here

On Wed, Feb 7, 2018 at 1:38 PM, Jeff Wartes  wrote:
> I’ve been messing around with the Solr 7.2 autoscaling framework this week. 
> Some things seem trivial, but I’m also running into questions and issues. If 
> anyone else has experience with this stuff, I’d be glad to hear it. 
> Specifically:
>
>
> Context:
> -One collection, consisting of 42 shards, where up to 6 shards can fit on a 
> single node. (which means 7 nodes per Replication Factor)
> -Three AZs, each with its own ip_2 value.
>
> Goals:
>
> Goal: Fully utilize available nodes.
> Cluster Preference: {“maximize”: "cores”}
>
> Goal: No node should have more than one replica of a given shard
> Rule: {"replica": "<2", "shard": "#EACH", "node": "#ANY"}
>
> Goal: No node should have more than 6 shards
> Rule: {"replica": "<7", "node":"#ANY"}
>
> Goal: Where possible, distinct RFs should each exist in an AZ.
> (Example1: I’d like 7 nodes with a complete RF in AZ 1 and 7 nodes with a 
> complete RF in AZ 2, and not end up with, say, both shard2 replicas in AZ 1)
> (Example2: If I have 14 nodes in AZ 1 and 7 in AZ 2, I should have two full 
> RFs in AZ 1 and one in AZ 2)
> Rule: ???
>
> I could have multiple non-strict rules perhaps? Like:
> {"replica": "<2", "shard": "#EACH", "ip_2": "1", "strict":false}
> {"replica": "<3", "shard": "#EACH", "ip_2": "1", "strict":false}
> {"replica": "<4", "shard": "#EACH", "ip_2": "1", "strict":false}
> {"replica": "<2", "shard": "#EACH", "ip_2": "2", "strict":false}
> {"replica": "<3", "shard": "#EACH", "ip_2": "2", "strict":false}
> {"replica": "<4", "shard": "#EACH", "ip_2": "2", "strict":false}
> etc
> So having more than one RF in an AZ is a technical “violation”, but if 
> placement minimizes non-strict violations, replicas would tend to get placed 
> correctly.
>
>
> Given a working set of rules, I’m still having trouble with two things:
>
>   1.  I’ve manually created the “.system” collection, as it didn’t seem to 
> get created automatically. However, autoscaling activity is not getting 
> logged to it.
>   2.  I can’t seem to figure out how to scale up.
>  *   I’d presumed editing the collection’s “replicationFactor” would do 
> the trick, but it does not.
>  *   The “node-up” trigger will serve to replace lost replicas, but won’t 
> otherwise take advantage of additional capacity.
>
>i.  
> There’s a UTILIZENODE command in 7.2, but it appears that’s still something 
> you need to trigger manually.
>
> Anyone played with this stuff?



-- 
-
Noble Paul