[jira] [Commented] (CASSANDRA-18084) Introduce tags to snitch for better decision making for replica placement in topology strategies

2023-05-23 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725264#comment-17725264
 ] 

Stefan Miklosovic commented on CASSANDRA-18084:
---

I am not sure what is so hard about looking into this for 10 minutes to tell me 
if I am on the good path and all I need to do is to finish the tests. I am 
working on easy patches on purpose as it is almost impossible to merge anything 
bigger.

> Introduce tags to snitch for better decision making for replica placement in 
> topology strategies
> 
>
> Key: CASSANDRA-18084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18084
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Cluster/Gossip, Cluster/Schema, Legacy/Distributed 
> Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We would like to have extra meta-information in cassandra-rackdc.properties 
> which would further differentiate nodes in dc / racks.
> The main motivation behind this is that we have special requirements around 
> node's characteristics based on which we want to make further decisions when 
> it comes to replica placement in topology strategies. (New topology strategy 
> would be mostly derived from NTS / would be extended)
> The most reasonable way to do that is to introduce a new property into 
> cassandra-rackdc.properties called "tags"
> {code:java}
> # Arbitrary tag to assign to this node, they serve as additional 
> identificator of a node based on which operators might act. 
> # Value of this property is meant to be a comma-separated list of strings.
> #tags=tag1,tag2 
> {code}
> We also want to introduce new application state called TAGS. On startup of a 
> node, this node would advertise its tags to cluster and vice versa, all nodes 
> would tell to that respective node what tags they have so everybody would see 
> the same state of tags across the cluster based on which topology strategies 
> would do same decisions.
> These tags are not meant to be changed during whole runtime of a node, 
> similarly as datacenter and rack is not.
> For performance reasons, we might limit the maximum size of tags (sum of 
> their lenght), to be, for example, 64 characters and anything bigger would be 
> either shortened or the start would fail.
> Once we have tags for all nodes, we can have access to them, cluster-wide, 
> from TokenMetadata which is used quite heavily in topology strategies and it 
> exposes other relevant topology information (dc's, racks ...). We would just 
> add a new way to look at nodes.
> Tags would be a set.
> This would be persisted to the system.local to see what tags a local node has 
> and it would be persisted to system.peers_v2 to see what tags all other nodes 
> have. Column would be called "tags".
> {code:java}
> admin@cqlsh> select * from system.local ;
> @ Row 1
>  key | local
>  bootstrapped| COMPLETED
>  broadcast_address   | 172.19.0.5
>  broadcast_port  | 7000
>  cluster_name| Test Cluster
>  cql_version | 3.4.6
>  data_center | dc1
>  gossip_generation   | 1669739177
>  host_id | 54f8c6ea-a6ba-40c5-8fa5-484b2b4184c9
>  listen_address  | 172.19.0.5
>  listen_port | 7000
>  native_protocol_version | 5
>  partitioner | org.apache.cassandra.dht.Murmur3Partitioner
>  rack| rack1
>  release_version | 4.2-SNAPSHOT
>  rpc_address | 172.19.0.5
>  rpc_port| 9042
>  schema_version  | ef865449-2491-33b8-95b0-47c09cb14ea9
>  tags| {'tag1', 'tag2'}
>  tokens  | {'6504358681601109713'} 
> {code}
> for system.peers_v2:
> {code:java}
> admin@cqlsh> select peer,tags from system.peers_v2 ;
> @ Row 1
> --+-
>  peer | 172.19.0.15
>  tags | {'tag2', 'tag3'}
> @ Row 2
> --+-
>  peer | 172.19.0.11
>  tags | null
> {code}
> the POC implementation doing exactly that is here:
> We do not want to provide our custom topology strategies which are using this 
> feature as that will be the most probably a proprietary solution. This might 
> indeed change in the future. For now, we just want to implement hooks we can 
> base our in-house implementation on. All other people can benefit from this 
> as well if they choose so as this feature enables them to do that.
> Adding tags is not only about custom topology strategies. Operators could tag 
> their nodes if they wish to make further distinctions on topology level for 
> their operational needs.
> 

[jira] [Commented] (CASSANDRA-18084) Introduce tags to snitch for better decision making for replica placement in topology strategies

2023-05-19 Thread Josh McKenzie (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724458#comment-17724458
 ] 

Josh McKenzie commented on CASSANDRA-18084:
---

Missed the ping earlier Stefan; I'm up to my eyeballs in various things right 
now. Any chance you have cycles [~paulo] ?

> Introduce tags to snitch for better decision making for replica placement in 
> topology strategies
> 
>
> Key: CASSANDRA-18084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18084
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Cluster/Gossip, Cluster/Schema, Legacy/Distributed 
> Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We would like to have extra meta-information in cassandra-rackdc.properties 
> which would further differentiate nodes in dc / racks.
> The main motivation behind this is that we have special requirements around 
> node's characteristics based on which we want to make further decisions when 
> it comes to replica placement in topology strategies. (New topology strategy 
> would be mostly derived from NTS / would be extended)
> The most reasonable way to do that is to introduce a new property into 
> cassandra-rackdc.properties called "tags"
> {code:java}
> # Arbitrary tag to assign to this node, they serve as additional 
> identificator of a node based on which operators might act. 
> # Value of this property is meant to be a comma-separated list of strings.
> #tags=tag1,tag2 
> {code}
> We also want to introduce new application state called TAGS. On startup of a 
> node, this node would advertise its tags to cluster and vice versa, all nodes 
> would tell to that respective node what tags they have so everybody would see 
> the same state of tags across the cluster based on which topology strategies 
> would do same decisions.
> These tags are not meant to be changed during whole runtime of a node, 
> similarly as datacenter and rack is not.
> For performance reasons, we might limit the maximum size of tags (sum of 
> their lenght), to be, for example, 64 characters and anything bigger would be 
> either shortened or the start would fail.
> Once we have tags for all nodes, we can have access to them, cluster-wide, 
> from TokenMetadata which is used quite heavily in topology strategies and it 
> exposes other relevant topology information (dc's, racks ...). We would just 
> add a new way to look at nodes.
> Tags would be a set.
> This would be persisted to the system.local to see what tags a local node has 
> and it would be persisted to system.peers_v2 to see what tags all other nodes 
> have. Column would be called "tags".
> {code:java}
> admin@cqlsh> select * from system.local ;
> @ Row 1
>  key | local
>  bootstrapped| COMPLETED
>  broadcast_address   | 172.19.0.5
>  broadcast_port  | 7000
>  cluster_name| Test Cluster
>  cql_version | 3.4.6
>  data_center | dc1
>  gossip_generation   | 1669739177
>  host_id | 54f8c6ea-a6ba-40c5-8fa5-484b2b4184c9
>  listen_address  | 172.19.0.5
>  listen_port | 7000
>  native_protocol_version | 5
>  partitioner | org.apache.cassandra.dht.Murmur3Partitioner
>  rack| rack1
>  release_version | 4.2-SNAPSHOT
>  rpc_address | 172.19.0.5
>  rpc_port| 9042
>  schema_version  | ef865449-2491-33b8-95b0-47c09cb14ea9
>  tags| {'tag1', 'tag2'}
>  tokens  | {'6504358681601109713'} 
> {code}
> for system.peers_v2:
> {code:java}
> admin@cqlsh> select peer,tags from system.peers_v2 ;
> @ Row 1
> --+-
>  peer | 172.19.0.15
>  tags | {'tag2', 'tag3'}
> @ Row 2
> --+-
>  peer | 172.19.0.11
>  tags | null
> {code}
> the POC implementation doing exactly that is here:
> We do not want to provide our custom topology strategies which are using this 
> feature as that will be the most probably a proprietary solution. This might 
> indeed change in the future. For now, we just want to implement hooks we can 
> base our in-house implementation on. All other people can benefit from this 
> as well if they choose so as this feature enables them to do that.
> Adding tags is not only about custom topology strategies. Operators could tag 
> their nodes if they wish to make further distinctions on topology level for 
> their operational needs.
> [https://github.com/instaclustr/cassandra/commit/eddd4681d8678515454dabb926d5f56b0c225eea]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CASSANDRA-18084) Introduce tags to snitch for better decision making for replica placement in topology strategies

2023-05-03 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718881#comment-17718881
 ] 

Stefan Miklosovic commented on CASSANDRA-18084:
---

Thanks [~samt] for letting me know, [~jmckenzie] [~paulo] any chance you would 
take a quick look to evaluate this?

> Introduce tags to snitch for better decision making for replica placement in 
> topology strategies
> 
>
> Key: CASSANDRA-18084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18084
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Cluster/Gossip, Cluster/Schema, Legacy/Distributed 
> Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We would like to have extra meta-information in cassandra-rackdc.properties 
> which would further differentiate nodes in dc / racks.
> The main motivation behind this is that we have special requirements around 
> node's characteristics based on which we want to make further decisions when 
> it comes to replica placement in topology strategies. (New topology strategy 
> would be mostly derived from NTS / would be extended)
> The most reasonable way to do that is to introduce a new property into 
> cassandra-rackdc.properties called "tags"
> {code:java}
> # Arbitrary tag to assign to this node, they serve as additional 
> identificator of a node based on which operators might act. 
> # Value of this property is meant to be a comma-separated list of strings.
> #tags=tag1,tag2 
> {code}
> We also want to introduce new application state called TAGS. On startup of a 
> node, this node would advertise its tags to cluster and vice versa, all nodes 
> would tell to that respective node what tags they have so everybody would see 
> the same state of tags across the cluster based on which topology strategies 
> would do same decisions.
> These tags are not meant to be changed during whole runtime of a node, 
> similarly as datacenter and rack is not.
> For performance reasons, we might limit the maximum size of tags (sum of 
> their lenght), to be, for example, 64 characters and anything bigger would be 
> either shortened or the start would fail.
> Once we have tags for all nodes, we can have access to them, cluster-wide, 
> from TokenMetadata which is used quite heavily in topology strategies and it 
> exposes other relevant topology information (dc's, racks ...). We would just 
> add a new way to look at nodes.
> Tags would be a set.
> This would be persisted to the system.local to see what tags a local node has 
> and it would be persisted to system.peers_v2 to see what tags all other nodes 
> have. Column would be called "tags".
> {code:java}
> admin@cqlsh> select * from system.local ;
> @ Row 1
>  key | local
>  bootstrapped| COMPLETED
>  broadcast_address   | 172.19.0.5
>  broadcast_port  | 7000
>  cluster_name| Test Cluster
>  cql_version | 3.4.6
>  data_center | dc1
>  gossip_generation   | 1669739177
>  host_id | 54f8c6ea-a6ba-40c5-8fa5-484b2b4184c9
>  listen_address  | 172.19.0.5
>  listen_port | 7000
>  native_protocol_version | 5
>  partitioner | org.apache.cassandra.dht.Murmur3Partitioner
>  rack| rack1
>  release_version | 4.2-SNAPSHOT
>  rpc_address | 172.19.0.5
>  rpc_port| 9042
>  schema_version  | ef865449-2491-33b8-95b0-47c09cb14ea9
>  tags| {'tag1', 'tag2'}
>  tokens  | {'6504358681601109713'} 
> {code}
> for system.peers_v2:
> {code:java}
> admin@cqlsh> select peer,tags from system.peers_v2 ;
> @ Row 1
> --+-
>  peer | 172.19.0.15
>  tags | {'tag2', 'tag3'}
> @ Row 2
> --+-
>  peer | 172.19.0.11
>  tags | null
> {code}
> the POC implementation doing exactly that is here:
> We do not want to provide our custom topology strategies which are using this 
> feature as that will be the most probably a proprietary solution. This might 
> indeed change in the future. For now, we just want to implement hooks we can 
> base our in-house implementation on. All other people can benefit from this 
> as well if they choose so as this feature enables them to do that.
> Adding tags is not only about custom topology strategies. Operators could tag 
> their nodes if they wish to make further distinctions on topology level for 
> their operational needs.
> [https://github.com/instaclustr/cassandra/commit/eddd4681d8678515454dabb926d5f56b0c225eea]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CASSANDRA-18084) Introduce tags to snitch for better decision making for replica placement in topology strategies

2023-05-03 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718871#comment-17718871
 ] 

Sam Tunnicliffe commented on CASSANDRA-18084:
-

I can put it on my todo list [~smiklosovic], but I'm afraid it's unlikely that 
I'll get any spare time any time soon, so if you can find an alternative 
reviewer I would go ahead and do that. 

> Introduce tags to snitch for better decision making for replica placement in 
> topology strategies
> 
>
> Key: CASSANDRA-18084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18084
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Cluster/Gossip, Cluster/Schema, Legacy/Distributed 
> Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We would like to have extra meta-information in cassandra-rackdc.properties 
> which would further differentiate nodes in dc / racks.
> The main motivation behind this is that we have special requirements around 
> node's characteristics based on which we want to make further decisions when 
> it comes to replica placement in topology strategies. (New topology strategy 
> would be mostly derived from NTS / would be extended)
> The most reasonable way to do that is to introduce a new property into 
> cassandra-rackdc.properties called "tags"
> {code:java}
> # Arbitrary tag to assign to this node, they serve as additional 
> identificator of a node based on which operators might act. 
> # Value of this property is meant to be a comma-separated list of strings.
> #tags=tag1,tag2 
> {code}
> We also want to introduce new application state called TAGS. On startup of a 
> node, this node would advertise its tags to cluster and vice versa, all nodes 
> would tell to that respective node what tags they have so everybody would see 
> the same state of tags across the cluster based on which topology strategies 
> would do same decisions.
> These tags are not meant to be changed during whole runtime of a node, 
> similarly as datacenter and rack is not.
> For performance reasons, we might limit the maximum size of tags (sum of 
> their lenght), to be, for example, 64 characters and anything bigger would be 
> either shortened or the start would fail.
> Once we have tags for all nodes, we can have access to them, cluster-wide, 
> from TokenMetadata which is used quite heavily in topology strategies and it 
> exposes other relevant topology information (dc's, racks ...). We would just 
> add a new way to look at nodes.
> Tags would be a set.
> This would be persisted to the system.local to see what tags a local node has 
> and it would be persisted to system.peers_v2 to see what tags all other nodes 
> have. Column would be called "tags".
> {code:java}
> admin@cqlsh> select * from system.local ;
> @ Row 1
>  key | local
>  bootstrapped| COMPLETED
>  broadcast_address   | 172.19.0.5
>  broadcast_port  | 7000
>  cluster_name| Test Cluster
>  cql_version | 3.4.6
>  data_center | dc1
>  gossip_generation   | 1669739177
>  host_id | 54f8c6ea-a6ba-40c5-8fa5-484b2b4184c9
>  listen_address  | 172.19.0.5
>  listen_port | 7000
>  native_protocol_version | 5
>  partitioner | org.apache.cassandra.dht.Murmur3Partitioner
>  rack| rack1
>  release_version | 4.2-SNAPSHOT
>  rpc_address | 172.19.0.5
>  rpc_port| 9042
>  schema_version  | ef865449-2491-33b8-95b0-47c09cb14ea9
>  tags| {'tag1', 'tag2'}
>  tokens  | {'6504358681601109713'} 
> {code}
> for system.peers_v2:
> {code:java}
> admin@cqlsh> select peer,tags from system.peers_v2 ;
> @ Row 1
> --+-
>  peer | 172.19.0.15
>  tags | {'tag2', 'tag3'}
> @ Row 2
> --+-
>  peer | 172.19.0.11
>  tags | null
> {code}
> the POC implementation doing exactly that is here:
> We do not want to provide our custom topology strategies which are using this 
> feature as that will be the most probably a proprietary solution. This might 
> indeed change in the future. For now, we just want to implement hooks we can 
> base our in-house implementation on. All other people can benefit from this 
> as well if they choose so as this feature enables them to do that.
> Adding tags is not only about custom topology strategies. Operators could tag 
> their nodes if they wish to make further distinctions on topology level for 
> their operational needs.
> [https://github.com/instaclustr/cassandra/commit/eddd4681d8678515454dabb926d5f56b0c225eea]



--
This message was 

[jira] [Commented] (CASSANDRA-18084) Introduce tags to snitch for better decision making for replica placement in topology strategies

2023-05-03 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718865#comment-17718865
 ] 

Stefan Miklosovic commented on CASSANDRA-18084:
---

https://github.com/apache/cassandra/pull/2305

I also implemented pluggable mechanism where to fetch the metadata from - there 
are 3 sources of the out of the box - system properties, environment properties 
and a file. A user is swichting between them in cassandra.yaml by choosing the 
correct implementation class, the same mechanism we use across whole yaml.

And chance you take a look at this? [~samt]

> Introduce tags to snitch for better decision making for replica placement in 
> topology strategies
> 
>
> Key: CASSANDRA-18084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18084
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Cluster/Gossip, Cluster/Schema, Legacy/Distributed 
> Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We would like to have extra meta-information in cassandra-rackdc.properties 
> which would further differentiate nodes in dc / racks.
> The main motivation behind this is that we have special requirements around 
> node's characteristics based on which we want to make further decisions when 
> it comes to replica placement in topology strategies. (New topology strategy 
> would be mostly derived from NTS / would be extended)
> The most reasonable way to do that is to introduce a new property into 
> cassandra-rackdc.properties called "tags"
> {code:java}
> # Arbitrary tag to assign to this node, they serve as additional 
> identificator of a node based on which operators might act. 
> # Value of this property is meant to be a comma-separated list of strings.
> #tags=tag1,tag2 
> {code}
> We also want to introduce new application state called TAGS. On startup of a 
> node, this node would advertise its tags to cluster and vice versa, all nodes 
> would tell to that respective node what tags they have so everybody would see 
> the same state of tags across the cluster based on which topology strategies 
> would do same decisions.
> These tags are not meant to be changed during whole runtime of a node, 
> similarly as datacenter and rack is not.
> For performance reasons, we might limit the maximum size of tags (sum of 
> their lenght), to be, for example, 64 characters and anything bigger would be 
> either shortened or the start would fail.
> Once we have tags for all nodes, we can have access to them, cluster-wide, 
> from TokenMetadata which is used quite heavily in topology strategies and it 
> exposes other relevant topology information (dc's, racks ...). We would just 
> add a new way to look at nodes.
> Tags would be a set.
> This would be persisted to the system.local to see what tags a local node has 
> and it would be persisted to system.peers_v2 to see what tags all other nodes 
> have. Column would be called "tags".
> {code:java}
> admin@cqlsh> select * from system.local ;
> @ Row 1
>  key | local
>  bootstrapped| COMPLETED
>  broadcast_address   | 172.19.0.5
>  broadcast_port  | 7000
>  cluster_name| Test Cluster
>  cql_version | 3.4.6
>  data_center | dc1
>  gossip_generation   | 1669739177
>  host_id | 54f8c6ea-a6ba-40c5-8fa5-484b2b4184c9
>  listen_address  | 172.19.0.5
>  listen_port | 7000
>  native_protocol_version | 5
>  partitioner | org.apache.cassandra.dht.Murmur3Partitioner
>  rack| rack1
>  release_version | 4.2-SNAPSHOT
>  rpc_address | 172.19.0.5
>  rpc_port| 9042
>  schema_version  | ef865449-2491-33b8-95b0-47c09cb14ea9
>  tags| {'tag1', 'tag2'}
>  tokens  | {'6504358681601109713'} 
> {code}
> for system.peers_v2:
> {code:java}
> admin@cqlsh> select peer,tags from system.peers_v2 ;
> @ Row 1
> --+-
>  peer | 172.19.0.15
>  tags | {'tag2', 'tag3'}
> @ Row 2
> --+-
>  peer | 172.19.0.11
>  tags | null
> {code}
> the POC implementation doing exactly that is here:
> We do not want to provide our custom topology strategies which are using this 
> feature as that will be the most probably a proprietary solution. This might 
> indeed change in the future. For now, we just want to implement hooks we can 
> base our in-house implementation on. All other people can benefit from this 
> as well if they choose so as this feature enables them to do that.
> Adding tags is not only about custom topology strategies. Operators could tag 
> 

[jira] [Commented] (CASSANDRA-18084) Introduce tags to snitch for better decision making for replica placement in topology strategies

2023-02-24 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693101#comment-17693101
 ] 

Stefan Miklosovic commented on CASSANDRA-18084:
---

[~samt]  I implemented it here

[https://github.com/instaclustr/cassandra/commit/213c7ec17aa575e407b9d05fa73f02eaea32173c]

I made it simple stupid intentionally. It is propagated via Gossip but 
interface to that is via MetadataService. If you like this approach I will add 
tests etc. I dont want to do that prematurely.

> Introduce tags to snitch for better decision making for replica placement in 
> topology strategies
> 
>
> Key: CASSANDRA-18084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18084
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Cluster/Gossip, Cluster/Schema, Legacy/Distributed 
> Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>
> We would like to have extra meta-information in cassandra-rackdc.properties 
> which would further differentiate nodes in dc / racks.
> The main motivation behind this is that we have special requirements around 
> node's characteristics based on which we want to make further decisions when 
> it comes to replica placement in topology strategies. (New topology strategy 
> would be mostly derived from NTS / would be extended)
> The most reasonable way to do that is to introduce a new property into 
> cassandra-rackdc.properties called "tags"
> {code:java}
> # Arbitrary tag to assign to this node, they serve as additional 
> identificator of a node based on which operators might act. 
> # Value of this property is meant to be a comma-separated list of strings.
> #tags=tag1,tag2 
> {code}
> We also want to introduce new application state called TAGS. On startup of a 
> node, this node would advertise its tags to cluster and vice versa, all nodes 
> would tell to that respective node what tags they have so everybody would see 
> the same state of tags across the cluster based on which topology strategies 
> would do same decisions.
> These tags are not meant to be changed during whole runtime of a node, 
> similarly as datacenter and rack is not.
> For performance reasons, we might limit the maximum size of tags (sum of 
> their lenght), to be, for example, 64 characters and anything bigger would be 
> either shortened or the start would fail.
> Once we have tags for all nodes, we can have access to them, cluster-wide, 
> from TokenMetadata which is used quite heavily in topology strategies and it 
> exposes other relevant topology information (dc's, racks ...). We would just 
> add a new way to look at nodes.
> Tags would be a set.
> This would be persisted to the system.local to see what tags a local node has 
> and it would be persisted to system.peers_v2 to see what tags all other nodes 
> have. Column would be called "tags".
> {code:java}
> admin@cqlsh> select * from system.local ;
> @ Row 1
>  key | local
>  bootstrapped| COMPLETED
>  broadcast_address   | 172.19.0.5
>  broadcast_port  | 7000
>  cluster_name| Test Cluster
>  cql_version | 3.4.6
>  data_center | dc1
>  gossip_generation   | 1669739177
>  host_id | 54f8c6ea-a6ba-40c5-8fa5-484b2b4184c9
>  listen_address  | 172.19.0.5
>  listen_port | 7000
>  native_protocol_version | 5
>  partitioner | org.apache.cassandra.dht.Murmur3Partitioner
>  rack| rack1
>  release_version | 4.2-SNAPSHOT
>  rpc_address | 172.19.0.5
>  rpc_port| 9042
>  schema_version  | ef865449-2491-33b8-95b0-47c09cb14ea9
>  tags| {'tag1', 'tag2'}
>  tokens  | {'6504358681601109713'} 
> {code}
> for system.peers_v2:
> {code:java}
> admin@cqlsh> select peer,tags from system.peers_v2 ;
> @ Row 1
> --+-
>  peer | 172.19.0.15
>  tags | {'tag2', 'tag3'}
> @ Row 2
> --+-
>  peer | 172.19.0.11
>  tags | null
> {code}
> the POC implementation doing exactly that is here:
> We do not want to provide our custom topology strategies which are using this 
> feature as that will be the most probably a proprietary solution. This might 
> indeed change in the future. For now, we just want to implement hooks we can 
> base our in-house implementation on. All other people can benefit from this 
> as well if they choose so as this feature enables them to do that.
> Adding tags is not only about custom topology strategies. Operators could tag 
> their nodes if they wish to make further distinctions on topology level for 
> their operational needs.
> 

[jira] [Commented] (CASSANDRA-18084) Introduce tags to snitch for better decision making for replica placement in topology strategies

2022-12-06 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644003#comment-17644003
 ] 

Sam Tunnicliffe commented on CASSANDRA-18084:
-

Josh summed up my thoughts pretty well, I was just keen to make this new 
metadata a bit generic so that it can be reused by other plugins/tools/etc. 
That seems a pretty reasonable request, especially as you're only planning to 
contribute the plumbing and not the actual feature which uses it. The other 
point is that once things are added to public interfaces (like the rackdc 
config or {{{}IEndpointSnitch{}}}, the project is committed to maintaining them 
for the foreseeable future. Of course, deprecation is possible but it isn't 
easy, hence the discussion going on on the 
[dev-ml|https://lists.apache.org/thread/3n8lb1syrsmx89d10kyqz94zzdqhz3o5].

I haven't had chance to look too closely at the PoC implementation you posted, 
but I noticed that tags were being exposed via new methods on 
{{{}IEndpointSnitch{}}}, which was what I was getting at regarding coupling 
this to topology. If this is application-neutral metadata, being disseminated 
by gossip, then I would just access it in-jvm via Gossiper. If external tools 
need access, then like you suggested, {{IFailureDetector}} is what exposes 
endpoint states for things like {{{}nodetool gossipinfo{}}}.

> Introduce tags to snitch for better decision making for replica placement in 
> topology strategies
> 
>
> Key: CASSANDRA-18084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18084
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Cluster/Gossip, Cluster/Schema, Legacy/Distributed 
> Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>
> We would like to have extra meta-information in cassandra-rackdc.properties 
> which would further differentiate nodes in dc / racks.
> The main motivation behind this is that we have special requirements around 
> node's characteristics based on which we want to make further decisions when 
> it comes to replica placement in topology strategies. (New topology strategy 
> would be mostly derived from NTS / would be extended)
> The most reasonable way to do that is to introduce a new property into 
> cassandra-rackdc.properties called "tags"
> {code:java}
> # Arbitrary tag to assign to this node, they serve as additional 
> identificator of a node based on which operators might act. 
> # Value of this property is meant to be a comma-separated list of strings.
> #tags=tag1,tag2 
> {code}
> We also want to introduce new application state called TAGS. On startup of a 
> node, this node would advertise its tags to cluster and vice versa, all nodes 
> would tell to that respective node what tags they have so everybody would see 
> the same state of tags across the cluster based on which topology strategies 
> would do same decisions.
> These tags are not meant to be changed during whole runtime of a node, 
> similarly as datacenter and rack is not.
> For performance reasons, we might limit the maximum size of tags (sum of 
> their lenght), to be, for example, 64 characters and anything bigger would be 
> either shortened or the start would fail.
> Once we have tags for all nodes, we can have access to them, cluster-wide, 
> from TokenMetadata which is used quite heavily in topology strategies and it 
> exposes other relevant topology information (dc's, racks ...). We would just 
> add a new way to look at nodes.
> Tags would be a set.
> This would be persisted to the system.local to see what tags a local node has 
> and it would be persisted to system.peers_v2 to see what tags all other nodes 
> have. Column would be called "tags".
> {code:java}
> admin@cqlsh> select * from system.local ;
> @ Row 1
>  key | local
>  bootstrapped| COMPLETED
>  broadcast_address   | 172.19.0.5
>  broadcast_port  | 7000
>  cluster_name| Test Cluster
>  cql_version | 3.4.6
>  data_center | dc1
>  gossip_generation   | 1669739177
>  host_id | 54f8c6ea-a6ba-40c5-8fa5-484b2b4184c9
>  listen_address  | 172.19.0.5
>  listen_port | 7000
>  native_protocol_version | 5
>  partitioner | org.apache.cassandra.dht.Murmur3Partitioner
>  rack| rack1
>  release_version | 4.2-SNAPSHOT
>  rpc_address | 172.19.0.5
>  rpc_port| 9042
>  schema_version  | ef865449-2491-33b8-95b0-47c09cb14ea9
>  tags| {'tag1', 'tag2'}
>  tokens  | {'6504358681601109713'} 
> {code}
> for system.peers_v2:
> {code:java}
> admin@cqlsh> select peer,tags from 

[jira] [Commented] (CASSANDRA-18084) Introduce tags to snitch for better decision making for replica placement in topology strategies

2022-12-03 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642845#comment-17642845
 ] 

Stefan Miklosovic commented on CASSANDRA-18084:
---

Just to comment on implementation we may do here (not saying right now if we do 
that or not eventually), in order to have this change minimally disruptive, we 
do not need to expose this to any CQL tables. One problem less. It does not 
need to be query-able via CQL. It might be exposed via IFailureDetectorMBean 
which has bunch of getAllEndpointStates methods. This is called in nodetool 
gossipinfo too. So it would be nowhere to find but there. 

To load it, as we do not want to put it into rackdc.properties, it might be set 
in a system property. Ideally the loader of these properties would be interface 
so it could be loaded from wherever but I don't consider it too important in 
the light of more mature implementation coming (the cep).

To hold the state we would need to be smart about that, currently Gossip reacts 
by persisting them to system.local/peers_v2 primarily, we would either save it 
directly in failure detector or in something custom fd would call.

I think that FailureDetector is somewhat unnatural place to put that in, not 
super appropriate name of the bean to retrieve that information from but that 
is just the consequence of having all gossip-related information retrievable 
from there. 

New application state would be still added though, that one would be eventually 
deprecated. If CEP is in place which would implement this more robustly I think 
that we might deprecate this quite easily. I do not thing we will ever remove 
it from ApplicationState enum, probably not. But what I got from Sam's response 
is that they plan to move some things out from Gossip so in that case how would 
ApplicationState enum look like? The properties which are currently there would 
be eventually remove anyway, no? If they would stay there?

Anyway, I still feel like hiding that under IFailureDetector's 
getAllEndpointStates method is still not something Sam will be particularly fan 
of. I am curious what he thinks. I am trying to do in such a way that it will 
be easy to deprecate so I am trying to avoid some completely new mbeans and 
putting stuff to dedicated package / classes, you feel me.


> Introduce tags to snitch for better decision making for replica placement in 
> topology strategies
> 
>
> Key: CASSANDRA-18084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18084
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Cluster/Gossip, Cluster/Schema, Legacy/Distributed 
> Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>
> We would like to have extra meta-information in cassandra-rackdc.properties 
> which would further differentiate nodes in dc / racks.
> The main motivation behind this is that we have special requirements around 
> node's characteristics based on which we want to make further decisions when 
> it comes to replica placement in topology strategies. (New topology strategy 
> would be mostly derived from NTS / would be extended)
> The most reasonable way to do that is to introduce a new property into 
> cassandra-rackdc.properties called "tags"
> {code:java}
> # Arbitrary tag to assign to this node, they serve as additional 
> identificator of a node based on which operators might act. 
> # Value of this property is meant to be a comma-separated list of strings.
> #tags=tag1,tag2 
> {code}
> We also want to introduce new application state called TAGS. On startup of a 
> node, this node would advertise its tags to cluster and vice versa, all nodes 
> would tell to that respective node what tags they have so everybody would see 
> the same state of tags across the cluster based on which topology strategies 
> would do same decisions.
> These tags are not meant to be changed during whole runtime of a node, 
> similarly as datacenter and rack is not.
> For performance reasons, we might limit the maximum size of tags (sum of 
> their lenght), to be, for example, 64 characters and anything bigger would be 
> either shortened or the start would fail.
> Once we have tags for all nodes, we can have access to them, cluster-wide, 
> from TokenMetadata which is used quite heavily in topology strategies and it 
> exposes other relevant topology information (dc's, racks ...). We would just 
> add a new way to look at nodes.
> Tags would be a set.
> This would be persisted to the system.local to see what tags a local node has 
> and it would be persisted to system.peers_v2 to see what tags all other nodes 
> have. Column would be called "tags".
> {code:java}
> admin@cqlsh> select * from 

[jira] [Commented] (CASSANDRA-18084) Introduce tags to snitch for better decision making for replica placement in topology strategies

2022-12-03 Thread Josh McKenzie (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642823#comment-17642823
 ] 

Josh McKenzie commented on CASSANDRA-18084:
---

bq. So factors related to topology/ownership/placement of data would move out 
of gossip, but I don't see any real impediment to building this feature in the 
proposed new system. 
So my take here is that Sam's saying "Go for it, consider making it more 
generic so we can use it for more things, and we'll forklift this into CEP-21 
as we work on it". I agree with him for the record - attaching arbitrary 
metadata to nodes seems like a really useful general purpose feature that could 
lead to some interesting experimentation and innovation in the community. Sorry 
if my comment in any way came across as "wait for CEP-21"; my intention was to 
communicate "This sounds really interesting and powerful and like there's 
perhaps a bigger set of things a generalized version of this enables we should 
consider rather than coupling it with a specific subsystem".

bq.  too bad major releases are so far apart, time-wise.
While 4.0 -> 4.1 is going to be about 18 months, we're moving in the right 
direction and collectively decided a "fastest path floor" of 12 months was 
about all most users could stomach since certifying major releases is a big 
lift. Curious to hear if you have a different opinion on that front or are more 
thinking of the delay leading up to 4.0.

> Introduce tags to snitch for better decision making for replica placement in 
> topology strategies
> 
>
> Key: CASSANDRA-18084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18084
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Cluster/Gossip, Cluster/Schema, Legacy/Distributed 
> Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>
> We would like to have extra meta-information in cassandra-rackdc.properties 
> which would further differentiate nodes in dc / racks.
> The main motivation behind this is that we have special requirements around 
> node's characteristics based on which we want to make further decisions when 
> it comes to replica placement in topology strategies. (New topology strategy 
> would be mostly derived from NTS / would be extended)
> The most reasonable way to do that is to introduce a new property into 
> cassandra-rackdc.properties called "tags"
> {code:java}
> # Arbitrary tag to assign to this node, they serve as additional 
> identificator of a node based on which operators might act. 
> # Value of this property is meant to be a comma-separated list of strings.
> #tags=tag1,tag2 
> {code}
> We also want to introduce new application state called TAGS. On startup of a 
> node, this node would advertise its tags to cluster and vice versa, all nodes 
> would tell to that respective node what tags they have so everybody would see 
> the same state of tags across the cluster based on which topology strategies 
> would do same decisions.
> These tags are not meant to be changed during whole runtime of a node, 
> similarly as datacenter and rack is not.
> For performance reasons, we might limit the maximum size of tags (sum of 
> their lenght), to be, for example, 64 characters and anything bigger would be 
> either shortened or the start would fail.
> Once we have tags for all nodes, we can have access to them, cluster-wide, 
> from TokenMetadata which is used quite heavily in topology strategies and it 
> exposes other relevant topology information (dc's, racks ...). We would just 
> add a new way to look at nodes.
> Tags would be a set.
> This would be persisted to the system.local to see what tags a local node has 
> and it would be persisted to system.peers_v2 to see what tags all other nodes 
> have. Column would be called "tags".
> {code:java}
> admin@cqlsh> select * from system.local ;
> @ Row 1
>  key | local
>  bootstrapped| COMPLETED
>  broadcast_address   | 172.19.0.5
>  broadcast_port  | 7000
>  cluster_name| Test Cluster
>  cql_version | 3.4.6
>  data_center | dc1
>  gossip_generation   | 1669739177
>  host_id | 54f8c6ea-a6ba-40c5-8fa5-484b2b4184c9
>  listen_address  | 172.19.0.5
>  listen_port | 7000
>  native_protocol_version | 5
>  partitioner | org.apache.cassandra.dht.Murmur3Partitioner
>  rack| rack1
>  release_version | 4.2-SNAPSHOT
>  rpc_address | 172.19.0.5
>  rpc_port| 9042
>  schema_version  | ef865449-2491-33b8-95b0-47c09cb14ea9
>  tags| {'tag1', 'tag2'}
>  tokens  | 

[jira] [Commented] (CASSANDRA-18084) Introduce tags to snitch for better decision making for replica placement in topology strategies

2022-12-02 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642639#comment-17642639
 ] 

Stefan Miklosovic commented on CASSANDRA-18084:
---

You are 100% correct that if we ever going to do this, lets just do that 
"properly". But on the other hand, if there is already on-going effort to do 
that more robustly (whatever it means), I think that me doing anything which 
would cover what you just wrote feels like duplication of the effort if it is 
eventually going to be replaced by something else. 

I think that it is, strategically, better if I ask you politely to think about 
this while doing that CEP rather than trying to implement some solution which 
might be eventually deprecated.

I agree that this just scratches our itch, too bad major releases are so far 
apart, time-wise. It is always better to have something we depend on in 
upstream rather than plumb something internally. 

> Introduce tags to snitch for better decision making for replica placement in 
> topology strategies
> 
>
> Key: CASSANDRA-18084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18084
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Cluster/Gossip, Cluster/Schema, Legacy/Distributed 
> Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>
> We would like to have extra meta-information in cassandra-rackdc.properties 
> which would further differentiate nodes in dc / racks.
> The main motivation behind this is that we have special requirements around 
> node's characteristics based on which we want to make further decisions when 
> it comes to replica placement in topology strategies. (New topology strategy 
> would be mostly derived from NTS / would be extended)
> The most reasonable way to do that is to introduce a new property into 
> cassandra-rackdc.properties called "tags"
> {code:java}
> # Arbitrary tag to assign to this node, they serve as additional 
> identificator of a node based on which operators might act. 
> # Value of this property is meant to be a comma-separated list of strings.
> #tags=tag1,tag2 
> {code}
> We also want to introduce new application state called TAGS. On startup of a 
> node, this node would advertise its tags to cluster and vice versa, all nodes 
> would tell to that respective node what tags they have so everybody would see 
> the same state of tags across the cluster based on which topology strategies 
> would do same decisions.
> These tags are not meant to be changed during whole runtime of a node, 
> similarly as datacenter and rack is not.
> For performance reasons, we might limit the maximum size of tags (sum of 
> their lenght), to be, for example, 64 characters and anything bigger would be 
> either shortened or the start would fail.
> Once we have tags for all nodes, we can have access to them, cluster-wide, 
> from TokenMetadata which is used quite heavily in topology strategies and it 
> exposes other relevant topology information (dc's, racks ...). We would just 
> add a new way to look at nodes.
> Tags would be a set.
> This would be persisted to the system.local to see what tags a local node has 
> and it would be persisted to system.peers_v2 to see what tags all other nodes 
> have. Column would be called "tags".
> {code:java}
> admin@cqlsh> select * from system.local ;
> @ Row 1
>  key | local
>  bootstrapped| COMPLETED
>  broadcast_address   | 172.19.0.5
>  broadcast_port  | 7000
>  cluster_name| Test Cluster
>  cql_version | 3.4.6
>  data_center | dc1
>  gossip_generation   | 1669739177
>  host_id | 54f8c6ea-a6ba-40c5-8fa5-484b2b4184c9
>  listen_address  | 172.19.0.5
>  listen_port | 7000
>  native_protocol_version | 5
>  partitioner | org.apache.cassandra.dht.Murmur3Partitioner
>  rack| rack1
>  release_version | 4.2-SNAPSHOT
>  rpc_address | 172.19.0.5
>  rpc_port| 9042
>  schema_version  | ef865449-2491-33b8-95b0-47c09cb14ea9
>  tags| {'tag1', 'tag2'}
>  tokens  | {'6504358681601109713'} 
> {code}
> for system.peers_v2:
> {code:java}
> admin@cqlsh> select peer,tags from system.peers_v2 ;
> @ Row 1
> --+-
>  peer | 172.19.0.15
>  tags | {'tag2', 'tag3'}
> @ Row 2
> --+-
>  peer | 172.19.0.11
>  tags | null
> {code}
> the POC implementation doing exactly that is here:
> We do not want to provide our custom topology strategies which are using this 
> feature as that will be the most probably a proprietary solution. This might 
> indeed change in 

[jira] [Commented] (CASSANDRA-18084) Introduce tags to snitch for better decision making for replica placement in topology strategies

2022-12-02 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642618#comment-17642618
 ] 

Sam Tunnicliffe commented on CASSANDRA-18084:
-

bq. We are not in favor of waiting for some theoretical 
sometimes-maybe-will-be-done...So yeah, if you are not interested in this I 
guess we need to just move along

Easy tiger, I wasn't suggesting you should wait on the whole transactional 
cluster metadata implementation to be approved/finalised/delivered, just giving 
you a heads up that we're actively working towards deprecating gossip for a 
bunch of stuff.

My main point was that if you're going to specify "tags" so loosely, then 
coupling them directly to a single specific feature or subsystem seems 
unnecessarily limiting. If someone comes up with another, unrelated, feature 
(proprietory or OSS) which could make use of something similar, they'll have to 
duplicate much of your effort or unpick that coupling. For instance, if someone 
wants to use "tags" to attach metatdata to nodes for some purpose other that 
topology, will they still do that through rackdc.properties? Will you be able 
to differentiate between values your feature stores in tags and metadata 
belonging to some other code which use them for a different purpose?

Nothing about what you described is that controversial. But you are talking 
about modifying core public classes & config files, so I think it's reasonable 
to spend some time/effort thinking through the potential implications of that.  


> Introduce tags to snitch for better decision making for replica placement in 
> topology strategies
> 
>
> Key: CASSANDRA-18084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18084
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Cluster/Gossip, Cluster/Schema, Legacy/Distributed 
> Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>
> We would like to have extra meta-information in cassandra-rackdc.properties 
> which would further differentiate nodes in dc / racks.
> The main motivation behind this is that we have special requirements around 
> node's characteristics based on which we want to make further decisions when 
> it comes to replica placement in topology strategies. (New topology strategy 
> would be mostly derived from NTS / would be extended)
> The most reasonable way to do that is to introduce a new property into 
> cassandra-rackdc.properties called "tags"
> {code:java}
> # Arbitrary tag to assign to this node, they serve as additional 
> identificator of a node based on which operators might act. 
> # Value of this property is meant to be a comma-separated list of strings.
> #tags=tag1,tag2 
> {code}
> We also want to introduce new application state called TAGS. On startup of a 
> node, this node would advertise its tags to cluster and vice versa, all nodes 
> would tell to that respective node what tags they have so everybody would see 
> the same state of tags across the cluster based on which topology strategies 
> would do same decisions.
> These tags are not meant to be changed during whole runtime of a node, 
> similarly as datacenter and rack is not.
> For performance reasons, we might limit the maximum size of tags (sum of 
> their lenght), to be, for example, 64 characters and anything bigger would be 
> either shortened or the start would fail.
> Once we have tags for all nodes, we can have access to them, cluster-wide, 
> from TokenMetadata which is used quite heavily in topology strategies and it 
> exposes other relevant topology information (dc's, racks ...). We would just 
> add a new way to look at nodes.
> Tags would be a set.
> This would be persisted to the system.local to see what tags a local node has 
> and it would be persisted to system.peers_v2 to see what tags all other nodes 
> have. Column would be called "tags".
> {code:java}
> admin@cqlsh> select * from system.local ;
> @ Row 1
>  key | local
>  bootstrapped| COMPLETED
>  broadcast_address   | 172.19.0.5
>  broadcast_port  | 7000
>  cluster_name| Test Cluster
>  cql_version | 3.4.6
>  data_center | dc1
>  gossip_generation   | 1669739177
>  host_id | 54f8c6ea-a6ba-40c5-8fa5-484b2b4184c9
>  listen_address  | 172.19.0.5
>  listen_port | 7000
>  native_protocol_version | 5
>  partitioner | org.apache.cassandra.dht.Murmur3Partitioner
>  rack| rack1
>  release_version | 4.2-SNAPSHOT
>  rpc_address | 172.19.0.5
>  rpc_port| 9042
>  schema_version  | ef865449-2491-33b8-95b0-47c09cb14ea9
>  tags

[jira] [Commented] (CASSANDRA-18084) Introduce tags to snitch for better decision making for replica placement in topology strategies

2022-12-02 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642597#comment-17642597
 ] 

Brandon Williams commented on CASSANDRA-18084:
--

bq. We will most probably port this to 3.11 as well and I do not think that 
will ever end up in upstream.

I would offer a word of caution against making the ApplicationState enum 
diverge from oss as it may not be able to be reconciled later.

> Introduce tags to snitch for better decision making for replica placement in 
> topology strategies
> 
>
> Key: CASSANDRA-18084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18084
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Cluster/Gossip, Cluster/Schema, Legacy/Distributed 
> Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>
> We would like to have extra meta-information in cassandra-rackdc.properties 
> which would further differentiate nodes in dc / racks.
> The main motivation behind this is that we have special requirements around 
> node's characteristics based on which we want to make further decisions when 
> it comes to replica placement in topology strategies. (New topology strategy 
> would be mostly derived from NTS / would be extended)
> The most reasonable way to do that is to introduce a new property into 
> cassandra-rackdc.properties called "tags"
> {code:java}
> # Arbitrary tag to assign to this node, they serve as additional 
> identificator of a node based on which operators might act. 
> # Value of this property is meant to be a comma-separated list of strings.
> #tags=tag1,tag2 
> {code}
> We also want to introduce new application state called TAGS. On startup of a 
> node, this node would advertise its tags to cluster and vice versa, all nodes 
> would tell to that respective node what tags they have so everybody would see 
> the same state of tags across the cluster based on which topology strategies 
> would do same decisions.
> These tags are not meant to be changed during whole runtime of a node, 
> similarly as datacenter and rack is not.
> For performance reasons, we might limit the maximum size of tags (sum of 
> their lenght), to be, for example, 64 characters and anything bigger would be 
> either shortened or the start would fail.
> Once we have tags for all nodes, we can have access to them, cluster-wide, 
> from TokenMetadata which is used quite heavily in topology strategies and it 
> exposes other relevant topology information (dc's, racks ...). We would just 
> add a new way to look at nodes.
> Tags would be a set.
> This would be persisted to the system.local to see what tags a local node has 
> and it would be persisted to system.peers_v2 to see what tags all other nodes 
> have. Column would be called "tags".
> {code:java}
> admin@cqlsh> select * from system.local ;
> @ Row 1
>  key | local
>  bootstrapped| COMPLETED
>  broadcast_address   | 172.19.0.5
>  broadcast_port  | 7000
>  cluster_name| Test Cluster
>  cql_version | 3.4.6
>  data_center | dc1
>  gossip_generation   | 1669739177
>  host_id | 54f8c6ea-a6ba-40c5-8fa5-484b2b4184c9
>  listen_address  | 172.19.0.5
>  listen_port | 7000
>  native_protocol_version | 5
>  partitioner | org.apache.cassandra.dht.Murmur3Partitioner
>  rack| rack1
>  release_version | 4.2-SNAPSHOT
>  rpc_address | 172.19.0.5
>  rpc_port| 9042
>  schema_version  | ef865449-2491-33b8-95b0-47c09cb14ea9
>  tags| {'tag1', 'tag2'}
>  tokens  | {'6504358681601109713'} 
> {code}
> for system.peers_v2:
> {code:java}
> admin@cqlsh> select peer,tags from system.peers_v2 ;
> @ Row 1
> --+-
>  peer | 172.19.0.15
>  tags | {'tag2', 'tag3'}
> @ Row 2
> --+-
>  peer | 172.19.0.11
>  tags | null
> {code}
> the POC implementation doing exactly that is here:
> We do not want to provide our custom topology strategies which are using this 
> feature as that will be the most probably a proprietary solution. This might 
> indeed change in the future. For now, we just want to implement hooks we can 
> base our in-house implementation on. All other people can benefit from this 
> as well if they choose so as this feature enables them to do that.
> Adding tags is not only about custom topology strategies. Operators could tag 
> their nodes if they wish to make further distinctions on topology level for 
> their operational needs.
> [https://github.com/instaclustr/cassandra/commit/eddd4681d8678515454dabb926d5f56b0c225eea]



--
This message was 

[jira] [Commented] (CASSANDRA-18084) Introduce tags to snitch for better decision making for replica placement in topology strategies

2022-12-02 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642595#comment-17642595
 ] 

Stefan Miklosovic commented on CASSANDRA-18084:
---

We wanted to solve the "how how differentiate nodes in topology strategy even 
more" and putting that to TopologyMetadata was pretty obvious. So we came with 
an idea to put it into rackdc.properties which seems to be pretty reasonable to 
me. Sure we can do that more generically but I dont think that is really what 
we are in this particular ticket after.

We are not in favor of waiting for some theoretical 
sometimes-maybe-will-be-done. We will most probably port this to 3.11 as well 
and I do not think that will ever end up in upstream. I humbly guess that 5.0 
will be out sometimes in 2024 and that is just too long to wait from business 
perspective.

So yeah, if you are not interested in this I guess we need to just move along.

> Introduce tags to snitch for better decision making for replica placement in 
> topology strategies
> 
>
> Key: CASSANDRA-18084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18084
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Cluster/Gossip, Cluster/Schema, Legacy/Distributed 
> Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>
> We would like to have extra meta-information in cassandra-rackdc.properties 
> which would further differentiate nodes in dc / racks.
> The main motivation behind this is that we have special requirements around 
> node's characteristics based on which we want to make further decisions when 
> it comes to replica placement in topology strategies. (New topology strategy 
> would be mostly derived from NTS / would be extended)
> The most reasonable way to do that is to introduce a new property into 
> cassandra-rackdc.properties called "tags"
> {code:java}
> # Arbitrary tag to assign to this node, they serve as additional 
> identificator of a node based on which operators might act. 
> # Value of this property is meant to be a comma-separated list of strings.
> #tags=tag1,tag2 
> {code}
> We also want to introduce new application state called TAGS. On startup of a 
> node, this node would advertise its tags to cluster and vice versa, all nodes 
> would tell to that respective node what tags they have so everybody would see 
> the same state of tags across the cluster based on which topology strategies 
> would do same decisions.
> These tags are not meant to be changed during whole runtime of a node, 
> similarly as datacenter and rack is not.
> For performance reasons, we might limit the maximum size of tags (sum of 
> their lenght), to be, for example, 64 characters and anything bigger would be 
> either shortened or the start would fail.
> Once we have tags for all nodes, we can have access to them, cluster-wide, 
> from TokenMetadata which is used quite heavily in topology strategies and it 
> exposes other relevant topology information (dc's, racks ...). We would just 
> add a new way to look at nodes.
> Tags would be a set.
> This would be persisted to the system.local to see what tags a local node has 
> and it would be persisted to system.peers_v2 to see what tags all other nodes 
> have. Column would be called "tags".
> {code:java}
> admin@cqlsh> select * from system.local ;
> @ Row 1
>  key | local
>  bootstrapped| COMPLETED
>  broadcast_address   | 172.19.0.5
>  broadcast_port  | 7000
>  cluster_name| Test Cluster
>  cql_version | 3.4.6
>  data_center | dc1
>  gossip_generation   | 1669739177
>  host_id | 54f8c6ea-a6ba-40c5-8fa5-484b2b4184c9
>  listen_address  | 172.19.0.5
>  listen_port | 7000
>  native_protocol_version | 5
>  partitioner | org.apache.cassandra.dht.Murmur3Partitioner
>  rack| rack1
>  release_version | 4.2-SNAPSHOT
>  rpc_address | 172.19.0.5
>  rpc_port| 9042
>  schema_version  | ef865449-2491-33b8-95b0-47c09cb14ea9
>  tags| {'tag1', 'tag2'}
>  tokens  | {'6504358681601109713'} 
> {code}
> for system.peers_v2:
> {code:java}
> admin@cqlsh> select peer,tags from system.peers_v2 ;
> @ Row 1
> --+-
>  peer | 172.19.0.15
>  tags | {'tag2', 'tag3'}
> @ Row 2
> --+-
>  peer | 172.19.0.11
>  tags | null
> {code}
> the POC implementation doing exactly that is here:
> We do not want to provide our custom topology strategies which are using this 
> feature as that will be the most probably a proprietary solution. This might 
> indeed change in the future. 

[jira] [Commented] (CASSANDRA-18084) Introduce tags to snitch for better decision making for replica placement in topology strategies

2022-12-02 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642561#comment-17642561
 ] 

Sam Tunnicliffe commented on CASSANDRA-18084:
-

Gossip is probably the natural place for this kind of thing at the moment, but 
I am hopeful that that will change before the next major release. The proposal 
in CEP-21 is to limit gossip state to only transient metadata, particularly 
that which we don't depend on for correctness. So factors related to 
topology/ownership/placement of data would move out of gossip, but I don't see 
any real impediment to building this feature in the proposed new system. 

The caveat to that of course, is that CEP-21 hasn't yet been accepted, or even 
voted on, and so any talk of implementation in that context is theoretical at 
the moment. That's my fault, as I've been holding off re-awakening the 
[DISCUSS] thread and calling the vote while we iron out the few details in the 
prototype implementation, but I hope to get back to this by the end of the year.

I've only skimmed the PoC commit, but one question I would have is why couple 
the tags so tightly to topology? These are essentially free form text 
attributes associated to a node, so they could be used for to express all sorts 
of other metadata relevant to any pluggable component. 

> Introduce tags to snitch for better decision making for replica placement in 
> topology strategies
> 
>
> Key: CASSANDRA-18084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18084
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Cluster/Gossip, Cluster/Schema, Legacy/Distributed 
> Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>
> We would like to have extra meta-information in cassandra-rackdc.properties 
> which would further differentiate nodes in dc / racks.
> The main motivation behind this is that we have special requirements around 
> node's characteristics based on which we want to make further decisions when 
> it comes to replica placement in topology strategies. (New topology strategy 
> would be mostly derived from NTS / would be extended)
> The most reasonable way to do that is to introduce a new property into 
> cassandra-rackdc.properties called "tags"
> {code:java}
> # Arbitrary tag to assign to this node, they serve as additional 
> identificator of a node based on which operators might act. 
> # Value of this property is meant to be a comma-separated list of strings.
> #tags=tag1,tag2 
> {code}
> We also want to introduce new application state called TAGS. On startup of a 
> node, this node would advertise its tags to cluster and vice versa, all nodes 
> would tell to that respective node what tags they have so everybody would see 
> the same state of tags across the cluster based on which topology strategies 
> would do same decisions.
> These tags are not meant to be changed during whole runtime of a node, 
> similarly as datacenter and rack is not.
> For performance reasons, we might limit the maximum size of tags (sum of 
> their lenght), to be, for example, 64 characters and anything bigger would be 
> either shortened or the start would fail.
> Once we have tags for all nodes, we can have access to them, cluster-wide, 
> from TokenMetadata which is used quite heavily in topology strategies and it 
> exposes other relevant topology information (dc's, racks ...). We would just 
> add a new way to look at nodes.
> Tags would be a set.
> This would be persisted to the system.local to see what tags a local node has 
> and it would be persisted to system.peers_v2 to see what tags all other nodes 
> have. Column would be called "tags".
> {code:java}
> admin@cqlsh> select * from system.local ;
> @ Row 1
>  key | local
>  bootstrapped| COMPLETED
>  broadcast_address   | 172.19.0.5
>  broadcast_port  | 7000
>  cluster_name| Test Cluster
>  cql_version | 3.4.6
>  data_center | dc1
>  gossip_generation   | 1669739177
>  host_id | 54f8c6ea-a6ba-40c5-8fa5-484b2b4184c9
>  listen_address  | 172.19.0.5
>  listen_port | 7000
>  native_protocol_version | 5
>  partitioner | org.apache.cassandra.dht.Murmur3Partitioner
>  rack| rack1
>  release_version | 4.2-SNAPSHOT
>  rpc_address | 172.19.0.5
>  rpc_port| 9042
>  schema_version  | ef865449-2491-33b8-95b0-47c09cb14ea9
>  tags| {'tag1', 'tag2'}
>  tokens  | {'6504358681601109713'} 
> {code}
> for system.peers_v2:
> {code:java}
> admin@cqlsh> select peer,tags from system.peers_v2 ;
> @ Row 1
> 

[jira] [Commented] (CASSANDRA-18084) Introduce tags to snitch for better decision making for replica placement in topology strategies

2022-12-01 Thread Josh McKenzie (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642080#comment-17642080
 ] 

Josh McKenzie commented on CASSANDRA-18084:
---

{quote}The main motivation behind this is that we have special requirements 
around node's characteristics based on which we want to make further decisions 
when it comes to replica placement in topology strategies. (New topology 
strategy would be mostly derived from NTS / would be extended)
{quote}
ISTM that the intersection of this and Transactional Metadata in the future 
w/range ownership rather than token ownership could become pretty interesting. 
If we tagged topology data with hardware resources in a heterogenous 
environment, for instance, we could preferentially serve hot ranges from 
beefier hardware etc.

I don't have any major concerns w/the idea from a perspective of the old arch, 
but curious what [~ifesdjeen] and/or [~samt] think as they're working CEP-21.

> Introduce tags to snitch for better decision making for replica placement in 
> topology strategies
> 
>
> Key: CASSANDRA-18084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18084
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Cluster/Gossip, Cluster/Schema, Legacy/Distributed 
> Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>
> We would like to have extra meta-information in cassandra-rackdc.properties 
> which would further differentiate nodes in dc / racks.
> The main motivation behind this is that we have special requirements around 
> node's characteristics based on which we want to make further decisions when 
> it comes to replica placement in topology strategies. (New topology strategy 
> would be mostly derived from NTS / would be extended)
> The most reasonable way to do that is to introduce a new property into 
> cassandra-rackdc.properties called "tags"
> {code:java}
> # Arbitrary tag to assign to this node, they serve as additional 
> identificator of a node based on which operators might act. 
> # Value of this property is meant to be a comma-separated list of strings.
> #tags=tag1,tag2 
> {code}
> We also want to introduce new application state called TAGS. On startup of a 
> node, this node would advertise its tags to cluster and vice versa, all nodes 
> would tell to that respective node what tags they have so everybody would see 
> the same state of tags across the cluster based on which topology strategies 
> would do same decisions.
> These tags are not meant to be changed during whole runtime of a node, 
> similarly as datacenter and rack is not.
> For performance reasons, we might limit the maximum size of tags (sum of 
> their lenght), to be, for example, 64 characters and anything bigger would be 
> either shortened or the start would fail.
> Once we have tags for all nodes, we can have access to them, cluster-wide, 
> from TokenMetadata which is used quite heavily in topology strategies and it 
> exposes other relevant topology information (dc's, racks ...). We would just 
> add a new way to look at nodes.
> Tags would be a set.
> This would be persisted to the system.local to see what tags a local node has 
> and it would be persisted to system.peers_v2 to see what tags all other nodes 
> have. Column would be called "tags".
> {code:java}
> admin@cqlsh> select * from system.local ;
> @ Row 1
>  key | local
>  bootstrapped| COMPLETED
>  broadcast_address   | 172.19.0.5
>  broadcast_port  | 7000
>  cluster_name| Test Cluster
>  cql_version | 3.4.6
>  data_center | dc1
>  gossip_generation   | 1669739177
>  host_id | 54f8c6ea-a6ba-40c5-8fa5-484b2b4184c9
>  listen_address  | 172.19.0.5
>  listen_port | 7000
>  native_protocol_version | 5
>  partitioner | org.apache.cassandra.dht.Murmur3Partitioner
>  rack| rack1
>  release_version | 4.2-SNAPSHOT
>  rpc_address | 172.19.0.5
>  rpc_port| 9042
>  schema_version  | ef865449-2491-33b8-95b0-47c09cb14ea9
>  tags| {'tag1', 'tag2'}
>  tokens  | {'6504358681601109713'} 
> {code}
> for system.peers_v2:
> {code:java}
> admin@cqlsh> select peer,tags from system.peers_v2 ;
> @ Row 1
> --+-
>  peer | 172.19.0.15
>  tags | {'tag2', 'tag3'}
> @ Row 2
> --+-
>  peer | 172.19.0.11
>  tags | null
> {code}
> the POC implementation doing exactly that is here:
> We do not want to provide our custom topology strategies which are using this 
> feature as that will be the most probably a proprietary solution. This might