RE: [EXTERNAL] Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

Durity, Sean R Tue, 28 May 2019 07:16:57 -0700

This may sound a bit harsh, but I teach my developers that if they are trying 
to use ALLOW FILTERING – they are doing it wrong! We often choose Cassandra for 
its high availability and scalability characteristics. We love no downtime. 
ALLOW FILTERING is breaking the rules of availability and scalability.


Look at the full text of the error (not just the ending):
Bad Request: Cannot execute this query as it might involve data filtering and 
thus may have unpredictable performance. If you want to execute this query 
despite the performance unpredictability, use ALLOW FILTERING.
It is being polite, but it does warn you that performance is unpredictable. I 
can predict this: allow filtering will not scale. It won’t scale to large 
numbers of nodes (with small tables) or to large numbers of rows (regardless of 
node count). If you ignore the admittedly too polite warning, Cassandra will 
try to answer your query. It does it with a brute force, scan everything 
approach on all nodes (because you didn’t give it any partitions to target 
directly). That gets expensive and dangerous quickly. And, yes, it can endanger 
the whole cluster.

As an administrator, I do think that Cassandra should be able to protect itself 
better, perhaps by allowing the administrator to disallow those queries at all. 
It does at least warn you.


From: Attila Wind <attilaw@swf.technology>
Sent: Tuesday, May 28, 2019 4:47 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Select in allow filtering stalls whole cluster. How to 
prevent such behavior?


Hi Shalom,

Thanks for your notes! So you also experienced this thing... fine

Then maybe the best rules to follow are these:
a) never(!) run a query "ALLOW FILTERING" on a Production cluster
b) if you need these queries build a test cluster (somehow) and mirror the data 
(somehow) OR add denormalized tables (write + code complexity overhead) to 
fulfill those queries

Can we agree on this one maybe as a "good to follow" policy?

In our case luckily users = developers always. So I can expect them being aware 
of the consequences of a particular query.
We also have test data fully mirrored into a test cluster. So running those 
queries on test system is possible.
Plus If for whatever reason we really really need to run such a query in Prod I 
can simply instruct them test query like this in the test system for sure

cheers
Attila Wind

http://www.linkedin.com/in/attilaw<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_in_attilaw&d=DwMDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=dlrvbtXjrs3gQcMyMpe0rnkpA-7f6W9V5LIZNoIwntQ&s=-xUtowA6vjoESkKDAzfM17BFUnsOL16hIkQJNf0aChg&e=>
Mobile: +36 31 7811355

On 2019. 05. 28. 8:59, shalom sagges wrote:
Hi Attila,

I'm definitely no guru, but I've experienced several cases where people at my 
company used allow filtering and caused major performance issues.
As data size increases, the impact will be stronger. If you have large 
partitions, performance will decrease.
GC can be affected. And if GC stops the world too long for too many times, you 
will feel it.

I sincerely believe the best way would be to educate the users and remodel the 
data. Perhaps you need to denormalize your tables or at least use secondary 
indices (I prefer to keep it as simple as possible and denormalize).
If it's a cluster for analytics, perhaps you need to build a designated cluster 
only for that so if something does break or get too pressured, normal 
activities wouldn't be affected, but there are pros and cons for that idea too.

Hope this helps.

Regards,


On Tue, May 28, 2019 at 9:43 AM Attila Wind 
<attilaw@swf.technology><mailto:attilaw@swf.technology> wrote:

Hi Gurus,

Looks we stopped this thread. However I would be very much curious answers 
regarding b) ...

Anyone any comments on that?
I do see this as a potential production outage risk now... Especially as we are 
planning to run analysis queries by hand exactly like that over the cluster...

thanks!
Attila Wind

http://www.linkedin.com/in/attilaw<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_in_attilaw&d=DwMDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=dlrvbtXjrs3gQcMyMpe0rnkpA-7f6W9V5LIZNoIwntQ&s=-xUtowA6vjoESkKDAzfM17BFUnsOL16hIkQJNf0aChg&e=>
Mobile: +36 31 7811355

On 2019. 05. 23. 11:42, shalom sagges wrote:
a) Interesting... But only in case you do not provide partitioning key right? 
(so IN() is for partitioning key?)

I think you should ask yourself a different question. Why am I using ALLOW 
FILTERING in the first place? What happens if I remove it from the query?
I prefer to denormalize the data to multiple tables or at least create an index 
on the requested column (preferably queried together with a known partition 
key).


b) Still does not explain or justify "all 8 nodes to halt and unresponsiveness 
to external requests" behavior... Even if servers are busy with the request 
seriously becoming non-responsive...?

I think it can justify the unresponsiveness. When using ALLOW FILTERING, you 
are doing something like a full table scan in a relational database.

There is a lot of information on the internet regarding this subject such as 
https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_apache-2Dcassandra-2Dscalability-2Dallow-2Dfiltering-2Dpartition-2Dkeys_&d=DwMDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=dlrvbtXjrs3gQcMyMpe0rnkpA-7f6W9V5LIZNoIwntQ&s=l353_g5GuCCYA8Fg9j_4SJRgeTpy8HSqN-Ia2EZQWAM&e=>

Hope this helps.

Regards,

On Thu, May 23, 2019 at 7:33 AM Attila Wind 
<attilaw@swf.technology><mailto:attilaw@swf.technology> wrote:

Hi,

"When you run a query with allow filtering, Cassandra doesn't know where the 
data is located, so it has to go node by node, searching for the requested 
data."

a) Interesting... But only in case you do not provide partitioning key right? 
(so IN() is for partitioning key?)

b) Still does not explain or justify "all 8 nodes to halt and unresponsiveness 
to external requests" behavior... Even if servers are busy with the request 
seriously becoming non-responsive...?

cheers
Attila Wind

http://www.linkedin.com/in/attilaw<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_in_attilaw&d=DwMDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=dlrvbtXjrs3gQcMyMpe0rnkpA-7f6W9V5LIZNoIwntQ&s=-xUtowA6vjoESkKDAzfM17BFUnsOL16hIkQJNf0aChg&e=>
Mobile: +36 31 7811355

On 2019. 05. 23. 0:37, shalom sagges wrote:
Hi Vsevolod,

1) Why such behavior? I thought any given SELECT request is handled by a 
limited subset of C* nodes and not by all of them, as per connection 
consistency/table replication settings, in case.
When you run a query with allow filtering, Cassandra doesn't know where the 
data is located, so it has to go node by node, searching for the requested data.

2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
I'm not familiar with such a flag. In my case, I just try to educate the R&D 
teams.

Regards,

On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov 
<vsfilare...@gmail.com<mailto:vsfilare...@gmail.com>> wrote:
Hello everyone,

We have an 8 node C* cluster with large volume of unbalanced data. Usual 
per-partition selects work somewhat fine, and are processed by limited number 
of nodes, but if user issues SELECT WHERE IN () ALLOW FILTERING, such command 
stalls all 8 nodes to halt and unresponsiveness to external requests while disk 
IO jumps to 100% across whole cluster. In several minutes all nodes seem to 
finish ptocessing the request and cluster goes back to being responsive. 
Replication level across whole data is 3.

1) Why such behavior? I thought any given SELECT request is handled by a 
limited subset of C* nodes and not by all of them, as per connection 
consistency/table replication settings, in case.

2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?

Thank you all very much in advance,
Vsevolod Filaretov.

________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

RE: [EXTERNAL] Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

Reply via email to