azotcsit commented on a change in pull request #1213:
URL: https://github.com/apache/cassandra/pull/1213#discussion_r732168493



##########
File path: doc/source/operating/denylisting_partitions.rst
##########
@@ -0,0 +1,100 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Denylisting Partitions
+----------------------
+
+Due to access patterns and data modeling, sometimes there are specific 
partitions that are "hot" and can cause instability in a Cassandra cluster. 
This often occurs when your data model includes many update or insert 
operations on a single partition, causing the partition to grow very large over 
time and in turn making it very expensive to read and maintain.
+
+Cassandra supports "denylisting" these problematic partitions so that when 
clients issue point reads (`SELECT` statements with the partition key 
specified) or range reads (`SELECT *`, etc that pull a range of data) that 
intersect with a blocked partition key, the query will be immediately rejected 
with an `InvalidQueryException`.
+
+How to denylist a partition key
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The ``system_distributed.denylisted_partitions`` table can be used to denylist 
partitions. There are a couple of ways to interact with and mutate this data. 
First: diractly via CQL by inserting a record with the following details:

Review comment:
       typo: diractly -> directly

##########
File path: doc/source/operating/denylisting_partitions.rst
##########
@@ -0,0 +1,100 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Denylisting Partitions

Review comment:
       Great documentation!
   
   I feel it additionally needs to be referred from `operating/index.rst`.

##########
File path: doc/source/operating/denylisting_partitions.rst
##########
@@ -0,0 +1,100 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Denylisting Partitions
+----------------------
+
+Due to access patterns and data modeling, sometimes there are specific 
partitions that are "hot" and can cause instability in a Cassandra cluster. 
This often occurs when your data model includes many update or insert 
operations on a single partition, causing the partition to grow very large over 
time and in turn making it very expensive to read and maintain.
+
+Cassandra supports "denylisting" these problematic partitions so that when 
clients issue point reads (`SELECT` statements with the partition key 
specified) or range reads (`SELECT *`, etc that pull a range of data) that 
intersect with a blocked partition key, the query will be immediately rejected 
with an `InvalidQueryException`.
+
+How to denylist a partition key
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The ``system_distributed.denylisted_partitions`` table can be used to denylist 
partitions. There are a couple of ways to interact with and mutate this data. 
First: diractly via CQL by inserting a record with the following details:
+
+- Keyspace name (ks_name)
+- Table name (table_name)
+- Partition Key (partition_key)
+
+The partition key format needs to be in the same form required by ``nodetool 
getendpoints``.
+
+Following are several examples for denylisting partition keys in keyspace `ks` 
and table `table1` for different data types on the primary key `Id`:
+
+ - Id is a simple type - INSERT INTO system_distributed.denylisted_partitions 
(ks_name, table_name, partition_key) VALUES ('ks','table1','1');
+ - Id is a blob        - INSERT INTO system_distributed.denylisted_partitions 
(ks_name, table_name, partition_key) VALUES ('ks','table1','12345f');
+ - Id has a colon      - INSERT INTO system_distributed.denylisted_partitions 
(ks_name, table_name, partition_key) VALUES ('ks','table1','1\:2');
+
+In the case of composite column partition keys (Key1, Key2):
+
+ - INSERT INTO system_distributed.denylisted_partitions (ks_name, table_name, 
partition_key) VALUES ('ks', 'table1', 'k11:k21')
+
+Special considerations
+^^^^^^^^^^^^^^^^^^^^^^
+The denylist has the property in that you want to ensure your cache (see 
below) and CQL data on a replica set match. In order to enforce this, the 
workflow for a denylist change (addition or deletion) should `always be as 
follows`:
+
+1. Mutate the denylisted partition list in CQL
+2. Trigger a reload of the denylist cache on each node (see below)
+
+A denylist cache reload `requires all nodes in the replica set to be up`, 
effectively making this a `CP
+<https://en.wikipedia.org/wiki/CAP_theorem>`_. operation. We require this to 
ensure the cache and CQL table data remains in sync for all nodes participating 
in denial of access.
+
+Denylisted Partitions Cache
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Cassandra internally maintains an on-heap cache of denylisted partitions 
loaded from ``system_distributed.denylisted_partitions``. The values for a 
table will be automatically repopulated every ``denylist_refresh_seconds`` as 
specified in the `conf/cassandra.yaml` file, defaulting to 86,400 seconds, or 1 
day. Invalid records (unknown keyspaces, tables, or keys) will be ignored and 
not cached on load.

Review comment:
       Please, share your thoughts on 
https://github.com/apache/cassandra/pull/1213#discussion_r730426675.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to