[jira] [Updated] (CASSANDRA-7032) Improve vnode allocation

2018-05-11 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-7032:

Labels: LWT performance vnodes  (was: performance vnodes)

> Improve vnode allocation
> 
>
> Key: CASSANDRA-7032
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Branimir Lambov
>Priority: Major
>  Labels: LWT, performance, vnodes
> Fix For: 3.0 alpha 1
>
> Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
> TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
> TestVNodeAllocation.java
>
>
> It's been known for a little while that random vnode allocation causes 
> hotspots of ownership. It should be possible to improve dramatically on this 
> with deterministic allocation. I have quickly thrown together a simple greedy 
> algorithm that allocates vnodes efficiently, and will repair hotspots in a 
> randomly allocated cluster gradually as more nodes are added, and also 
> ensures that token ranges are fairly evenly spread between nodes (somewhat 
> tunably so). The allocation still permits slight discrepancies in ownership, 
> but it is bound by the inverse of the size of the cluster (as opposed to 
> random allocation, which strangely gets worse as the cluster size increases). 
> I'm sure there is a decent dynamic programming solution to this that would be 
> even better.
> If on joining the ring a new node were to CAS a shared table where a 
> canonical allocation of token ranges lives after running this (or a similar) 
> algorithm, we could then get guaranteed bounds on the ownership distribution 
> in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-7032) Improve vnode allocation

2015-07-18 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-7032:
-
Fix Version/s: (was: 3.x)
   3.0 beta 1

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0 beta 1

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7032) Improve vnode allocation

2015-02-06 Thread Branimir Lambov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Branimir Lambov updated CASSANDRA-7032:
---
Attachment: TestVNodeAllocation.java

Attached new version that does not depend on a fixed number of vnodes per 
token. It distributes the load proportionally to the number of tokens allocated 
to the node and aims to minimize the over/underutilization as a percentage of 
the node's load.

The scheme is the same as the above, only in order to take varying loads into 
account the load of nodes and tokens in the improvement calculation is 
normalized via the inverse of the size of the node.

The results achieved in asymmetric node configurations are very similar to the 
homogeneous scenario. The behaviour for fixed vnode counts is exactly the same 
as the previous version.

I am done tinkering with the algorithm and have started work to attach it to 
the code base.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7032) Improve vnode allocation

2015-01-20 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-7032:
--
Reviewer: Benedict

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7032) Improve vnode allocation

2015-01-16 Thread Branimir Lambov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Branimir Lambov updated CASSANDRA-7032:
---
Attachment: TestVNodeAllocation.java

New version with acceptable performance (adding a node to 1000 node 256 vnode 
cluster takes ~0.1s).

No changes in the algorithm in this one, just restructures the way things are 
stored to replace a lot of previously complex operations with O(1) steps.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7032) Improve vnode allocation

2015-01-09 Thread Branimir Lambov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Branimir Lambov updated CASSANDRA-7032:
---
Attachment: TestVNodeAllocation.java

A work-in-progress algorithm for selecting vnodes in the replicated case is 
attached. The main idea of the algorithm is to select token positions for each 
new vnode in such a way as to get best improvement in replicated load variance 
(i.e. standard deviation) across nodes and vnodes *1. More specifically, it 
prepares a selection of token positions to try (by picking the middle positions 
between existing vnodes *2), evaluates the expected improvement in variance for 
each selection and chooses the best *3, continuing until all the vnodes of the 
new node have assigned tokens. To improve average performance, the expected 
improvement for all choices is calculated once; for the second and later vnode 
we only recalculate it for the best candidate until we find one that does not 
deteriorate to worse than the next option in the list *4.

Tested with simple factor-3 replication, it maintains the following utilization 
ranges: 
- 1 vnode: 70% - 135%
- 4 vnodes: 80% - 115%
- 16 vnodes: 83 - 106%
- 64 vnodes: 86 - 103%
- 256 vnodes: 87 - 102%

Unlike random allocation, the overutilization does not grow with the number of 
nodes, and a much smaller number of vnodes suffice (4 or 8 vnodes would 
probably be enough for most usecases).

The underutilization for this algorithm is affected less by the number of 
vnodes; this is due to the effect of replication on newly added vnodes: they 
necessarily have to take the share of one fewer vnode replica than the vnode 
they split (regardless of the algorithm we use, if we add a new node to a large 
enough perfectly balanced cluster where all vnodes are responsible for the same 
share of tokens, the new node will necessarily have at most 2/3 (for RF=3) of 
the average load.). This could possibly be improved if we manage to keep enough 
individual tokens with load closer to RF / (RF - 1), which I've yet to try.

The algorithm is implemented in the {{ReplicationAwareTokenDistributor}} in the 
attached file. Running the file simulates the effect of adding nodes using this 
algorithm on a randomly-generated cluster and prints out the minimum and 
maximum per-node and per-token replicated load after each step, as well as the 
standard deviation of the load. Sample results:
{code}
Random generation of 500 nodes with 8 tokens each
Size 500   node max 1.88 min 0.51 stddev 0.22193
Adding 1 node(s) using ReplicationAwareTokenDistributor
Size 501   node max 1.90 min 0.51 stddev 0.21922Simple 3 replicas
Adding 4 node(s) using ReplicationAwareTokenDistributor
Size 505   node max 1.63 min 0.51 stddev 0.20580   token max 3.72 min 0.01 
stddev 0.58768Simple 3 replicas
Adding 15 node(s) using ReplicationAwareTokenDistributor
Size 520   node max 1.51 min 0.53 stddev 0.17369   token max 3.83 min 0.01 
stddev 0.54526Simple 3 replicas
Adding 105 node(s) using ReplicationAwareTokenDistributor
Size 625   node max 1.15 min 0.63 stddev 0.08069   token max 2.73 min 0.00 
stddev 0.40190Simple 3 replicas
Adding 375 node(s) using ReplicationAwareTokenDistributor
Size 1000   node max 1.08 min 0.84 stddev 0.03041   token max 1.99 min 0.00 
stddev 0.22341Simple 3 replicas
Losing 1 nodes
Size 999   node max 1.09 min 0.84 stddev 0.03081   token max 1.98 min 0.00 
stddev 0.22429Simple 3 replicas
Adding 1 node(s) using ReplicationAwareTokenDistributor
Size 1000   node max 1.08 min 0.84 stddev 0.03019   token max 1.99 min 0.00 
stddev 0.22335Simple 3 replicas
Losing 5 nodes
Size 995   node max 1.17 min 0.83 stddev 0.03380   token max 2.01 min 0.00 
stddev 0.22565Simple 3 replicas
Adding 5 node(s) using ReplicationAwareTokenDistributor
Size 1000   node max 1.08 min 0.84 stddev 0.03000   token max 1.99 min 0.00 
stddev 0.22181Simple 3 replicas
Losing 20 nodes
Size 980   node max 1.19 min 0.88 stddev 0.04362   token max 2.44 min 0.00 
stddev 0.23370Simple 3 replicas
Adding 20 node(s) using ReplicationAwareTokenDistributor
Size 1000   node max 1.08 min 0.89 stddev 0.02962   token max 1.99 min 0.00 
stddev 0.21681Simple 3 replicas
Losing 125 nodes
Size 875   node max 1.31 min 0.79 stddev 0.08499   token max 2.81 min 0.00 
stddev 0.28763Simple 3 replicas
Adding 125 node(s) using ReplicationAwareTokenDistributor
Size 1000   node max 1.08 min 0.90 stddev 0.02805   token max 1.85 min 0.00 
stddev 0.19258Simple 3 replicas
{code}

This is far from finished as it is much slower than I'd like it to be.



Notes / other things I've tried:
 - *1 Only controlling individual vnode load: Because of the replication effect 
mentioned above, the ratio between largest and smallest node has to necessarily 
be at best 3:2 (for RF=3). If we don't control overall node size, about 30% 
over/underutilization is the best we can 

[jira] [Updated] (CASSANDRA-7032) Improve vnode allocation

2014-12-10 Thread Branimir Lambov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Branimir Lambov updated CASSANDRA-7032:
---
Attachment: TestVNodeAllocation.java

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7032) Improve vnode allocation

2014-09-17 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7032:

Assignee: Branimir Lambov

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7032) Improve vnode allocation

2014-04-14 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7032:


Attachment: TestVNodeAllocation.java

Attached a very simple algorithm implementation, and some debug code for 
viewing the resulting cluster token range distribution for various sizes of 
cluster.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7032) Improve vnode allocation

2014-04-14 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7032:


Attachment: (was: TestVNodeAllocation.java)

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7032) Improve vnode allocation

2014-04-14 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7032:


Attachment: TestVNodeAllocation.java

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7032) Improve vnode allocation

2014-04-14 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7032:


Attachment: TestVNodeAllocation.java

Another updated version with improved statistics on disk distribution. Not up 
to scratch for CASSANDRA-6696 unfortunately. 

However there's a simpler algorithm still that's much better and trivial to 
implement: when we randomise our token selection, instead of randomising each 
token over the global token range, simply split the token range into vnode 
regions, and randomise a token uniformly within each region. This actually 
gives near optimal behaviour across all cluster sizes - disk distributions are 
still not perfect, on average having as much as 30% gap in max-to-min range 
allocated to each disk with many disks, but this can be improved by having more 
vnodes.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.2#6252)