RE: [EXTERNAL] Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-28 Thread Durity, Sean R
This may sound a bit harsh, but I teach my developers that if they are trying 
to use ALLOW FILTERING – they are doing it wrong! We often choose Cassandra for 
its high availability and scalability characteristics. We love no downtime. 
ALLOW FILTERING is breaking the rules of availability and scalability.

Look at the full text of the error (not just the ending):
Bad Request: Cannot execute this query as it might involve data filtering and 
thus may have unpredictable performance. If you want to execute this query 
despite the performance unpredictability, use ALLOW FILTERING.
It is being polite, but it does warn you that performance is unpredictable. I 
can predict this: allow filtering will not scale. It won’t scale to large 
numbers of nodes (with small tables) or to large numbers of rows (regardless of 
node count). If you ignore the admittedly too polite warning, Cassandra will 
try to answer your query. It does it with a brute force, scan everything 
approach on all nodes (because you didn’t give it any partitions to target 
directly). That gets expensive and dangerous quickly. And, yes, it can endanger 
the whole cluster.

As an administrator, I do think that Cassandra should be able to protect itself 
better, perhaps by allowing the administrator to disallow those queries at all. 
It does at least warn you.


From: Attila Wind 
Sent: Tuesday, May 28, 2019 4:47 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Select in allow filtering stalls whole cluster. How to 
prevent such behavior?


Hi Shalom,

Thanks for your notes! So you also experienced this thing... fine

Then maybe the best rules to follow are these:
a) never(!) run a query "ALLOW FILTERING" on a Production cluster
b) if you need these queries build a test cluster (somehow) and mirror the data 
(somehow) OR add denormalized tables (write + code complexity overhead) to 
fulfill those queries

Can we agree on this one maybe as a "good to follow" policy?

In our case luckily users = developers always. So I can expect them being aware 
of the consequences of a particular query.
We also have test data fully mirrored into a test cluster. So running those 
queries on test system is possible.
Plus If for whatever reason we really really need to run such a query in Prod I 
can simply instruct them test query like this in the test system for sure

cheers
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355

On 2019. 05. 28. 8:59, shalom sagges wrote:
Hi Attila,

I'm definitely no guru, but I've experienced several cases where people at my 
company used allow filtering and caused major performance issues.
As data size increases, the impact will be stronger. If you have large 
partitions, performance will decrease.
GC can be affected. And if GC stops the world too long for too many times, you 
will feel it.

I sincerely believe the best way would be to educate the users and remodel the 
data. Perhaps you need to denormalize your tables or at least use secondary 
indices (I prefer to keep it as simple as possible and denormalize).
If it's a cluster for analytics, perhaps you need to build a designated cluster 
only for that so if something does break or get too pressured, normal 
activities wouldn't be affected, but there are pros and cons for that idea too.

Hope this helps.

Regards,


On Tue, May 28, 2019 at 9:43 AM Attila Wind 
 wrote:

Hi Gurus,

Looks we stopped this thread. However I would be very much curious answers 
regarding b) ...

Anyone any comments on that?
I do see this as a potential production outage risk now... Especially as we are 
planning to run analysis queries by hand exactly like that over the cluster...

thanks!
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355

On 2019. 05. 23. 11:42, shalom sagges wrote:
a) Interesting... But only in case you do not provide partitioning key right? 
(so IN() is for partitioning key?)

I think you should ask yourself a different question. Why am I using ALLOW 
FILTERING in the first place? What happens if I remove it from the query?
I prefer to denormalize the data to multiple tables or at least create an index 
on the requested column (preferably queried together with a known partition 
key).


b) Still does not explain or justify "all 8 nodes to halt and unresponsiveness 
to external requests" behavior... Even if servers are busy with the request 
seriously becoming non-responsive...?

I think it can justify the unresponsiveness. 

Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-28 Thread Attila Wind

Hi Shalom,

Thanks for your notes! So you also experienced this thing... fine

Then maybe the best rules to follow are these:
a) never(!) run a query "ALLOW FILTERING" on a Production cluster
b) if you need these queries build a test cluster (somehow) and mirror 
the data (somehow) OR add denormalized tables (write + code complexity 
overhead) to fulfill those queries


Can we agree on this one maybe as a "good to follow" policy?

In our case luckily users = developers always. So I can expect them 
being aware of the consequences of a particular query.
We also have test data fully mirrored into a test cluster. So running 
those queries on test system is possible.
Plus If for whatever reason we really really need to run such a query in 
Prod I can simply instruct them test query like this in the test system 
for sure


cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 28. 8:59, shalom sagges wrote:

Hi Attila,

I'm definitely no guru, but I've experienced several cases where 
people at my company used allow filtering and caused major performance 
issues.
As data size increases, the impact will be stronger. If you have large 
partitions, performance will decrease.
GC can be affected. And if GC stops the world too long for too many 
times, you will feel it.


I sincerely believe the best way would be to educate the users and 
remodel the data. Perhaps you need to denormalize your tables or at 
least use secondary indices (I prefer to keep it as simple as possible 
and denormalize).
If it's a cluster for analytics, perhaps you need to build a 
designated cluster only for that so if something does break or get too 
pressured, normal activities wouldn't be affected, but there are pros 
and cons for that idea too.


Hope this helps.

Regards,


On Tue, May 28, 2019 at 9:43 AM Attila Wind  
wrote:


Hi Gurus,

Looks we stopped this thread. However I would be very much curious
answers regarding b) ...

Anyone any comments on that?
I do see this as a potential production outage risk now...
Especially as we are planning to run analysis queries by hand
exactly like that over the cluster...

thanks!

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 11:42, shalom sagges wrote:

a) Interesting... But only in case you do not provide
partitioning key right? (so IN() is for partitioning key?)

I think you should ask yourself a different question. Why am I
using ALLOW FILTERING in the first place? What happens if I
remove it from the query?
I prefer to denormalize the data to multiple tables or at least
create an index on the requested column (preferably queried
together with a known partition key).

b) Still does not explain or justify "all 8 nodes to halt and
unresponsiveness to external requests" behavior... Even if
servers are busy with the request seriously becoming
non-responsive...?

I think it can justify the unresponsiveness. When using ALLOW
FILTERING, you are doing something like a full table scan in a
relational database.

There is a lot of information on the internet regarding this
subject such as

https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/

Hope this helps.

Regards,


On Thu, May 23, 2019 at 7:33 AM Attila Wind
  wrote:

Hi,

"When you run a query with allow filtering, Cassandra doesn't
know where the data is located, so it has to go node by node,
searching for the requested data."

a) Interesting... But only in case you do not provide
partitioning key right? (so IN() is for partitioning key?)

b) Still does not explain or justify "all 8 nodes to halt and
unresponsiveness to external requests" behavior... Even if
servers are busy with the request seriously becoming
non-responsive...?

cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 0:37, shalom sagges wrote:

Hi Vsevolod,

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of
them, as per connection consistency/table replication
settings, in case.
When you run a query with allow filtering, Cassandra doesn't
know where the data is located, so it has to go node by
node, searching for the requested data.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?
I'm not familiar with such a flag. In my case, I just try to
educate the R teams.

Regards,

On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov
mailto:vsfilare...@gmail.com>> wrote:

Hello everyone,

We have an 8 node C* 

Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-28 Thread shalom sagges
Hi Attila,

I'm definitely no guru, but I've experienced several cases where people at
my company used allow filtering and caused major performance issues.
As data size increases, the impact will be stronger. If you have large
partitions, performance will decrease.
GC can be affected. And if GC stops the world too long for too many times,
you will feel it.

I sincerely believe the best way would be to educate the users and remodel
the data. Perhaps you need to denormalize your tables or at least use
secondary indices (I prefer to keep it as simple as possible and
denormalize).
If it's a cluster for analytics, perhaps you need to build a designated
cluster only for that so if something does break or get too pressured,
normal activities wouldn't be affected, but there are pros and cons for
that idea too.

Hope this helps.

Regards,


On Tue, May 28, 2019 at 9:43 AM Attila Wind  wrote:

> Hi Gurus,
>
> Looks we stopped this thread. However I would be very much curious answers
> regarding b) ...
>
> Anyone any comments on that?
> I do see this as a potential production outage risk now... Especially as
> we are planning to run analysis queries by hand exactly like that over the
> cluster...
>
> thanks!
> Attila Wind
>
> http://www.linkedin.com/in/attilaw
> Mobile: +36 31 7811355
>
>
> On 2019. 05. 23. 11:42, shalom sagges wrote:
>
> a) Interesting... But only in case you do not provide partitioning key
> right? (so IN() is for partitioning key?)
>
> I think you should ask yourself a different question. Why am I using ALLOW
> FILTERING in the first place? What happens if I remove it from the query?
> I prefer to denormalize the data to multiple tables or at least create an
> index on the requested column (preferably queried together with a known
> partition key).
>
> b) Still does not explain or justify "all 8 nodes to halt and
> unresponsiveness to external requests" behavior... Even if servers are busy
> with the request seriously becoming non-responsive...?
>
> I think it can justify the unresponsiveness. When using ALLOW FILTERING,
> you are doing something like a full table scan in a relational database.
>
> There is a lot of information on the internet regarding this subject such
> as
> https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/
>
> Hope this helps.
>
> Regards,
>
> On Thu, May 23, 2019 at 7:33 AM Attila Wind 
>  wrote:
>
>> Hi,
>>
>> "When you run a query with allow filtering, Cassandra doesn't know where
>> the data is located, so it has to go node by node, searching for the
>> requested data."
>>
>> a) Interesting... But only in case you do not provide partitioning key
>> right? (so IN() is for partitioning key?)
>>
>> b) Still does not explain or justify "all 8 nodes to halt and
>> unresponsiveness to external requests" behavior... Even if servers are busy
>> with the request seriously becoming non-responsive...?
>>
>> cheers
>> Attila Wind
>>
>> http://www.linkedin.com/in/attilaw
>> Mobile: +36 31 7811355
>>
>>
>> On 2019. 05. 23. 0:37, shalom sagges wrote:
>>
>> Hi Vsevolod,
>>
>> 1) Why such behavior? I thought any given SELECT request is handled by a
>> limited subset of C* nodes and not by all of them, as per connection
>> consistency/table replication settings, in case.
>> When you run a query with allow filtering, Cassandra doesn't know where
>> the data is located, so it has to go node by node, searching for the
>> requested data.
>>
>> 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
>> I'm not familiar with such a flag. In my case, I just try to educate the
>> R teams.
>>
>> Regards,
>>
>> On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov 
>> wrote:
>>
>>> Hello everyone,
>>>
>>> We have an 8 node C* cluster with large volume of unbalanced data. Usual
>>> per-partition selects work somewhat fine, and are processed by limited
>>> number of nodes, but if user issues SELECT WHERE IN () ALLOW FILTERING,
>>> such command stalls all 8 nodes to halt and unresponsiveness to external
>>> requests while disk IO jumps to 100% across whole cluster. In several
>>> minutes all nodes seem to finish ptocessing the request and cluster goes
>>> back to being responsive. Replication level across whole data is 3.
>>>
>>> 1) Why such behavior? I thought any given SELECT request is handled by a
>>> limited subset of C* nodes and not by all of them, as per connection
>>> consistency/table replication settings, in case.
>>>
>>> 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
>>>
>>> Thank you all very much in advance,
>>> Vsevolod Filaretov.
>>>
>>


Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-28 Thread Attila Wind

Hi Gurus,

Looks we stopped this thread. However I would be very much curious 
answers regarding b) ...


Anyone any comments on that?
I do see this as a potential production outage risk now... Especially as 
we are planning to run analysis queries by hand exactly like that over 
the cluster...


thanks!

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 11:42, shalom sagges wrote:
a) Interesting... But only in case you do not provide partitioning key 
right? (so IN() is for partitioning key?)


I think you should ask yourself a different question. Why am I using 
ALLOW FILTERING in the first place? What happens if I remove it from 
the query?
I prefer to denormalize the data to multiple tables or at least create 
an index on the requested column (preferably queried together with a 
known partition key).


b) Still does not explain or justify "all 8 nodes to halt and 
unresponsiveness to external requests" behavior... Even if servers are 
busy with the request seriously becoming non-responsive...?


I think it can justify the unresponsiveness. When using ALLOW 
FILTERING, you are doing something like a full table scan in a 
relational database.


There is a lot of information on the internet regarding this subject 
such as 
https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/


Hope this helps.

Regards,


On Thu, May 23, 2019 at 7:33 AM Attila Wind  
wrote:


Hi,

"When you run a query with allow filtering, Cassandra doesn't know
where the data is located, so it has to go node by node, searching
for the requested data."

a) Interesting... But only in case you do not provide partitioning
key right? (so IN() is for partitioning key?)

b) Still does not explain or justify "all 8 nodes to halt and
unresponsiveness to external requests" behavior... Even if servers
are busy with the request seriously becoming non-responsive...?

cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 0:37, shalom sagges wrote:

Hi Vsevolod,

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of them,
as per connection consistency/table replication settings, in case.
When you run a query with allow filtering, Cassandra doesn't know
where the data is located, so it has to go node by node,
searching for the requested data.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?
I'm not familiar with such a flag. In my case, I just try to
educate the R teams.

Regards,

On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov
mailto:vsfilare...@gmail.com>> wrote:

Hello everyone,

We have an 8 node C* cluster with large volume of unbalanced
data. Usual per-partition selects work somewhat fine, and are
processed by limited number of nodes, but if user issues
SELECT WHERE IN () ALLOW FILTERING, such command stalls all 8
nodes to halt and unresponsiveness to external requests while
disk IO jumps to 100% across whole cluster. In several
minutes all nodes seem to finish ptocessing the request and
cluster goes back to being responsive. Replication level
across whole data is 3.

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of
them, as per connection consistency/table replication
settings, in case.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?

Thank you all very much in advance,
Vsevolod Filaretov.



Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-23 Thread Attila Wind

Hi again,

so remaining with a) for a second...
"Why am I using ALLOW FILTERING in the first place?"
Fully agreed! To put it this way: as I reviewer I never want to see 
string occurence "allow filtering" in any selects done by a production 
code. I clearly consider it as an indicator of a wrong db design.
Still! There are use cases - and if I am not mistaken the original 
question was around that - when for whatever reasons PERSONS are running 
such selects manually. E.g. for us where we use Cassandra we have things 
like this:  for analysis purposes. So I think this is a valid use case. 
And once we have found a valid use case question stands. Right? So back 
to the question: "But only in case you do not provide partitioning key 
right?" - I assume the answer is yes right? :-)


b) "I think it can justify the unresponsiveness. When using ALLOW 
FILTERING, you are doing something like a full table scan in a 
relational database"
I get it. Sure. But is Cassandra kind of "single threaded" so much that 
if a node is running one(!) big big extensive query it becomes fully 
unresponsive? I doubt it...
That's what I meant by saying "does not explain or justify". From my 
perspective I definitely consider this kind of being unresponsiveness as 
an abnormal state ...


cheers

Attila


On 23.05.2019 11:42 AM, shalom sagges wrote:
a) Interesting... But only in case you do not provide partitioning key 
right? (so IN() is for partitioning key?)


I think you should ask yourself a different question. Why am I using 
ALLOW FILTERING in the first place? What happens if I remove it from 
the query?
I prefer to denormalize the data to multiple tables or at least create 
an index on the requested column (preferably queried together with a 
known partition key).


b) Still does not explain or justify "all 8 nodes to halt and 
unresponsiveness to external requests" behavior... Even if servers are 
busy with the request seriously becoming non-responsive...?


I think it can justify the unresponsiveness. When using ALLOW 
FILTERING, you are doing something like a full table scan in a 
relational database.


There is a lot of information on the internet regarding this subject 
such as 
https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/


Hope this helps.

Regards,


On Thu, May 23, 2019 at 7:33 AM Attila Wind  
wrote:


Hi,

"When you run a query with allow filtering, Cassandra doesn't know
where the data is located, so it has to go node by node, searching
for the requested data."

a) Interesting... But only in case you do not provide partitioning
key right? (so IN() is for partitioning key?)

b) Still does not explain or justify "all 8 nodes to halt and
unresponsiveness to external requests" behavior... Even if servers
are busy with the request seriously becoming non-responsive...?

cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 0:37, shalom sagges wrote:

Hi Vsevolod,

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of them,
as per connection consistency/table replication settings, in case.
When you run a query with allow filtering, Cassandra doesn't know
where the data is located, so it has to go node by node,
searching for the requested data.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?
I'm not familiar with such a flag. In my case, I just try to
educate the R teams.

Regards,

On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov
mailto:vsfilare...@gmail.com>> wrote:

Hello everyone,

We have an 8 node C* cluster with large volume of unbalanced
data. Usual per-partition selects work somewhat fine, and are
processed by limited number of nodes, but if user issues
SELECT WHERE IN () ALLOW FILTERING, such command stalls all 8
nodes to halt and unresponsiveness to external requests while
disk IO jumps to 100% across whole cluster. In several
minutes all nodes seem to finish ptocessing the request and
cluster goes back to being responsive. Replication level
across whole data is 3.

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of
them, as per connection consistency/table replication
settings, in case.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?

Thank you all very much in advance,
Vsevolod Filaretov.



Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-23 Thread shalom sagges
a) Interesting... But only in case you do not provide partitioning key
right? (so IN() is for partitioning key?)

I think you should ask yourself a different question. Why am I using ALLOW
FILTERING in the first place? What happens if I remove it from the query?
I prefer to denormalize the data to multiple tables or at least create an
index on the requested column (preferably queried together with a known
partition key).

b) Still does not explain or justify "all 8 nodes to halt and
unresponsiveness to external requests" behavior... Even if servers are busy
with the request seriously becoming non-responsive...?

I think it can justify the unresponsiveness. When using ALLOW FILTERING,
you are doing something like a full table scan in a relational database.

There is a lot of information on the internet regarding this subject such
as
https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/

Hope this helps.

Regards,

On Thu, May 23, 2019 at 7:33 AM Attila Wind  wrote:

> Hi,
>
> "When you run a query with allow filtering, Cassandra doesn't know where
> the data is located, so it has to go node by node, searching for the
> requested data."
>
> a) Interesting... But only in case you do not provide partitioning key
> right? (so IN() is for partitioning key?)
>
> b) Still does not explain or justify "all 8 nodes to halt and
> unresponsiveness to external requests" behavior... Even if servers are busy
> with the request seriously becoming non-responsive...?
>
> cheers
> Attila Wind
>
> http://www.linkedin.com/in/attilaw
> Mobile: +36 31 7811355
>
>
> On 2019. 05. 23. 0:37, shalom sagges wrote:
>
> Hi Vsevolod,
>
> 1) Why such behavior? I thought any given SELECT request is handled by a
> limited subset of C* nodes and not by all of them, as per connection
> consistency/table replication settings, in case.
> When you run a query with allow filtering, Cassandra doesn't know where
> the data is located, so it has to go node by node, searching for the
> requested data.
>
> 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
> I'm not familiar with such a flag. In my case, I just try to educate the
> R teams.
>
> Regards,
>
> On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov 
> wrote:
>
>> Hello everyone,
>>
>> We have an 8 node C* cluster with large volume of unbalanced data. Usual
>> per-partition selects work somewhat fine, and are processed by limited
>> number of nodes, but if user issues SELECT WHERE IN () ALLOW FILTERING,
>> such command stalls all 8 nodes to halt and unresponsiveness to external
>> requests while disk IO jumps to 100% across whole cluster. In several
>> minutes all nodes seem to finish ptocessing the request and cluster goes
>> back to being responsive. Replication level across whole data is 3.
>>
>> 1) Why such behavior? I thought any given SELECT request is handled by a
>> limited subset of C* nodes and not by all of them, as per connection
>> consistency/table replication settings, in case.
>>
>> 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
>>
>> Thank you all very much in advance,
>> Vsevolod Filaretov.
>>
>


Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-22 Thread Attila Wind

Hi,

"When you run a query with allow filtering, Cassandra doesn't know where 
the data is located, so it has to go node by node, searching for the 
requested data."


a) Interesting... But only in case you do not provide partitioning key 
right? (so IN() is for partitioning key?)


b) Still does not explain or justify "all 8 nodes to halt and 
unresponsiveness to external requests" behavior... Even if servers are 
busy with the request seriously becoming non-responsive...?


cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 0:37, shalom sagges wrote:

Hi Vsevolod,

1) Why such behavior? I thought any given SELECT request is handled by 
a limited subset of C* nodes and not by all of them, as per connection 
consistency/table replication settings, in case.
When you run a query with allow filtering, Cassandra doesn't know 
where the data is located, so it has to go node by node, searching for 
the requested data.


2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
I'm not familiar with such a flag. In my case, I just try to educate 
the R teams.


Regards,

On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov 
mailto:vsfilare...@gmail.com>> wrote:


Hello everyone,

We have an 8 node C* cluster with large volume of unbalanced data.
Usual per-partition selects work somewhat fine, and are processed
by limited number of nodes, but if user issues SELECT WHERE IN ()
ALLOW FILTERING, such command stalls all 8 nodes to halt and
unresponsiveness to external requests while disk IO jumps to 100%
across whole cluster. In several minutes all nodes seem to finish
ptocessing the request and cluster goes back to being responsive.
Replication level across whole data is 3.

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of them, as
per connection consistency/table replication settings, in case.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?

Thank you all very much in advance,
Vsevolod Filaretov.



Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-22 Thread shalom sagges
Hi Vsevolod,

1) Why such behavior? I thought any given SELECT request is handled by a
limited subset of C* nodes and not by all of them, as per connection
consistency/table replication settings, in case.
When you run a query with allow filtering, Cassandra doesn't know where the
data is located, so it has to go node by node, searching for the requested
data.

2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
I'm not familiar with such a flag. In my case, I just try to educate the
R teams.

Regards,

On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov 
wrote:

> Hello everyone,
>
> We have an 8 node C* cluster with large volume of unbalanced data. Usual
> per-partition selects work somewhat fine, and are processed by limited
> number of nodes, but if user issues SELECT WHERE IN () ALLOW FILTERING,
> such command stalls all 8 nodes to halt and unresponsiveness to external
> requests while disk IO jumps to 100% across whole cluster. In several
> minutes all nodes seem to finish ptocessing the request and cluster goes
> back to being responsive. Replication level across whole data is 3.
>
> 1) Why such behavior? I thought any given SELECT request is handled by a
> limited subset of C* nodes and not by all of them, as per connection
> consistency/table replication settings, in case.
>
> 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
>
> Thank you all very much in advance,
> Vsevolod Filaretov.
>


Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-22 Thread Vsevolod Filaretov
Hello everyone,

We have an 8 node C* cluster with large volume of unbalanced data. Usual
per-partition selects work somewhat fine, and are processed by limited
number of nodes, but if user issues SELECT WHERE IN () ALLOW FILTERING,
such command stalls all 8 nodes to halt and unresponsiveness to external
requests while disk IO jumps to 100% across whole cluster. In several
minutes all nodes seem to finish ptocessing the request and cluster goes
back to being responsive. Replication level across whole data is 3.

1) Why such behavior? I thought any given SELECT request is handled by a
limited subset of C* nodes and not by all of them, as per connection
consistency/table replication settings, in case.

2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?

Thank you all very much in advance,
Vsevolod Filaretov.