Hi again,
so remaining with a) for a second...
"Why am I using ALLOW FILTERING in the first place?"
Fully agreed! To put it this way: as I reviewer I never want to see
string occurence "allow filtering" in any selects done by a production
code. I clearly consider it as an indicator of a wrong db design.
Still! There are use cases - and if I am not mistaken the original
question was around that - when for whatever reasons PERSONS are running
such selects manually. E.g. for us where we use Cassandra we have things
like this: for analysis purposes. So I think this is a valid use case.
And once we have found a valid use case question stands. Right? So back
to the question: "But only in case you do not provide partitioning key
right?" - I assume the answer is yes right? :-)
b) "I think it can justify the unresponsiveness. When using ALLOW
FILTERING, you are doing something like a full table scan in a
relational database"
I get it. Sure. But is Cassandra kind of "single threaded" so much that
if a node is running one(!) big big extensive query it becomes fully
unresponsive? I doubt it...
That's what I meant by saying "does not explain or justify". From my
perspective I definitely consider this kind of being unresponsiveness as
an abnormal state ...
cheers
Attila
On 23.05.2019 11:42 AM, shalom sagges wrote:
a) Interesting... But only in case you do not provide partitioning key
right? (so IN() is for partitioning key?)
I think you should ask yourself a different question. Why am I using
ALLOW FILTERING in the first place? What happens if I remove it from
the query?
I prefer to denormalize the data to multiple tables or at least create
an index on the requested column (preferably queried together with a
known partition key).
b) Still does not explain or justify "all 8 nodes to halt and
unresponsiveness to external requests" behavior... Even if servers are
busy with the request seriously becoming non-responsive...?
I think it can justify the unresponsiveness. When using ALLOW
FILTERING, you are doing something like a full table scan in a
relational database.
There is a lot of information on the internet regarding this subject
such as
https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/
Hope this helps.
Regards,
On Thu, May 23, 2019 at 7:33 AM Attila Wind <attilaw@swf.technology>
wrote:
Hi,
"When you run a query with allow filtering, Cassandra doesn't know
where the data is located, so it has to go node by node, searching
for the requested data."
a) Interesting... But only in case you do not provide partitioning
key right? (so IN() is for partitioning key?)
b) Still does not explain or justify "all 8 nodes to halt and
unresponsiveness to external requests" behavior... Even if servers
are busy with the request seriously becoming non-responsive...?
cheers
Attila Wind
http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355
On 2019. 05. 23. 0:37, shalom sagges wrote:
Hi Vsevolod,
1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of them,
as per connection consistency/table replication settings, in case.
When you run a query with allow filtering, Cassandra doesn't know
where the data is located, so it has to go node by node,
searching for the requested data.
2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?
I'm not familiar with such a flag. In my case, I just try to
educate the R&D teams.
Regards,
On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov
<vsfilare...@gmail.com <mailto:vsfilare...@gmail.com>> wrote:
Hello everyone,
We have an 8 node C* cluster with large volume of unbalanced
data. Usual per-partition selects work somewhat fine, and are
processed by limited number of nodes, but if user issues
SELECT WHERE IN () ALLOW FILTERING, such command stalls all 8
nodes to halt and unresponsiveness to external requests while
disk IO jumps to 100% across whole cluster. In several
minutes all nodes seem to finish ptocessing the request and
cluster goes back to being responsive. Replication level
across whole data is 3.
1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of
them, as per connection consistency/table replication
settings, in case.
2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?
Thank you all very much in advance,
Vsevolod Filaretov.