Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

Attila Wind Mon, 27 May 2019 23:43:30 -0700

Hi Gurus,

Looks we stopped this thread. However I would be very much curiousanswers regarding b) ...


Anyone any comments on that?

I do see this as a potential production outage risk now... Especially aswe are planning to run analysis queries by hand exactly like that overthe cluster...


thanks!

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 11:42, shalom sagges wrote:

a) Interesting... But only in case you do not provide partitioning keyright? (so IN() is for partitioning key?)

I think you should ask yourself a different question. Why am I usingALLOW FILTERING in the first place? What happens if I remove it fromthe query?I prefer to denormalize the data to multiple tables or at least createan index on the requested column (preferably queried together with aknown partition key).

b) Still does not explain or justify "all 8 nodes to halt andunresponsiveness to external requests" behavior... Even if servers arebusy with the request seriously becoming non-responsive...?

I think it can justify the unresponsiveness. When using ALLOWFILTERING, you are doing something like a full table scan in arelational database.

There is a lot of information on the internet regarding this subjectsuch ashttps://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/


Hope this helps.

Regards,

On Thu, May 23, 2019 at 7:33 AM Attila Wind <attilaw@swf.technology>wrote:


    Hi,

    "When you run a query with allow filtering, Cassandra doesn't know
    where the data is located, so it has to go node by node, searching
    for the requested data."

    a) Interesting... But only in case you do not provide partitioning
    key right? (so IN() is for partitioning key?)

    b) Still does not explain or justify "all 8 nodes to halt and
    unresponsiveness to external requests" behavior... Even if servers
    are busy with the request seriously becoming non-responsive...?

    cheers

    Attila Wind

    http://www.linkedin.com/in/attilaw
    Mobile: +36 31 7811355


    On 2019. 05. 23. 0:37, shalom sagges wrote:

    Hi Vsevolod,

    1) Why such behavior? I thought any given SELECT request is
    handled by a limited subset of C* nodes and not by all of them,
    as per connection consistency/table replication settings, in case.
    When you run a query with allow filtering, Cassandra doesn't know
    where the data is located, so it has to go node by node,
    searching for the requested data.

    2) Is it possible to forbid ALLOW FILTERING flag for given
    users/groups?
    I'm not familiar with such a flag. In my case, I just try to
    educate the R&D teams.

    Regards,

    On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov
    <vsfilare...@gmail.com <mailto:vsfilare...@gmail.com>> wrote:

        Hello everyone,

        We have an 8 node C* cluster with large volume of unbalanced
        data. Usual per-partition selects work somewhat fine, and are
        processed by limited number of nodes, but if user issues
        SELECT WHERE IN () ALLOW FILTERING, such command stalls all 8
        nodes to halt and unresponsiveness to external requests while
        disk IO jumps to 100% across whole cluster. In several
        minutes all nodes seem to finish ptocessing the request and
        cluster goes back to being responsive. Replication level
        across whole data is 3.

        1) Why such behavior? I thought any given SELECT request is
        handled by a limited subset of C* nodes and not by all of
        them, as per connection consistency/table replication
        settings, in case.

        2) Is it possible to forbid ALLOW FILTERING flag for given
        users/groups?

        Thank you all very much in advance,
        Vsevolod Filaretov.

Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

Reply via email to