Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

Attila Wind Tue, 28 May 2019 01:47:11 -0700

Hi Shalom,

Thanks for your notes! So you also experienced this thing... fine


Then maybe the best rules to follow are these:
a) never(!) run a query "ALLOW FILTERING" on a Production cluster

b) if you need these queries build a test cluster (somehow) and mirrorthe data (somehow) OR add denormalized tables (write + code complexityoverhead) to fulfill those queries


Can we agree on this one maybe as a "good to follow" policy?

In our case luckily users = developers always. So I can expect thembeing aware of the consequences of a particular query.We also have test data fully mirrored into a test cluster. So runningthose queries on test system is possible.Plus If for whatever reason we really really need to run such a query inProd I can simply instruct them test query like this in the test systemfor sure


cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 28. 8:59, shalom sagges wrote:

Hi Attila,

I'm definitely no guru, but I've experienced several cases wherepeople at my company used allow filtering and caused major performanceissues.As data size increases, the impact will be stronger. If you have largepartitions, performance will decrease.GC can be affected. And if GC stops the world too long for too manytimes, you will feel it.

I sincerely believe the best way would be to educate the users andremodel the data. Perhaps you need to denormalize your tables or atleast use secondary indices (I prefer to keep it as simple as possibleand denormalize).If it's a cluster for analytics, perhaps you need to build adesignated cluster only for that so if something does break or get toopressured, normal activities wouldn't be affected, but there are prosand cons for that idea too.


Hope this helps.

Regards,

On Tue, May 28, 2019 at 9:43 AM Attila Wind <[email protected]>wrote:


    Hi Gurus,

    Looks we stopped this thread. However I would be very much curious
    answers regarding b) ...

    Anyone any comments on that?
    I do see this as a potential production outage risk now...
    Especially as we are planning to run analysis queries by hand
    exactly like that over the cluster...

    thanks!

    Attila Wind

    http://www.linkedin.com/in/attilaw
    Mobile: +36 31 7811355


    On 2019. 05. 23. 11:42, shalom sagges wrote:

    a) Interesting... But only in case you do not provide
    partitioning key right? (so IN() is for partitioning key?)

    I think you should ask yourself a different question. Why am I
    using ALLOW FILTERING in the first place? What happens if I
    remove it from the query?
    I prefer to denormalize the data to multiple tables or at least
    create an index on the requested column (preferably queried
    together with a known partition key).

    b) Still does not explain or justify "all 8 nodes to halt and
    unresponsiveness to external requests" behavior... Even if
    servers are busy with the request seriously becoming
    non-responsive...?

    I think it can justify the unresponsiveness. When using ALLOW
    FILTERING, you are doing something like a full table scan in a
    relational database.

    There is a lot of information on the internet regarding this
    subject such as
    
https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/

    Hope this helps.

    Regards,


    On Thu, May 23, 2019 at 7:33 AM Attila Wind
    <[email protected]> <mailto:[email protected]> wrote:

        Hi,

        "When you run a query with allow filtering, Cassandra doesn't
        know where the data is located, so it has to go node by node,
        searching for the requested data."

        a) Interesting... But only in case you do not provide
        partitioning key right? (so IN() is for partitioning key?)

        b) Still does not explain or justify "all 8 nodes to halt and
        unresponsiveness to external requests" behavior... Even if
        servers are busy with the request seriously becoming
        non-responsive...?

        cheers

        Attila Wind

        http://www.linkedin.com/in/attilaw
        Mobile: +36 31 7811355


        On 2019. 05. 23. 0:37, shalom sagges wrote:

        Hi Vsevolod,

        1) Why such behavior? I thought any given SELECT request is
        handled by a limited subset of C* nodes and not by all of
        them, as per connection consistency/table replication
        settings, in case.
        When you run a query with allow filtering, Cassandra doesn't
        know where the data is located, so it has to go node by
        node, searching for the requested data.

        2) Is it possible to forbid ALLOW FILTERING flag for given
        users/groups?
        I'm not familiar with such a flag. In my case, I just try to
        educate the R&D teams.

        Regards,

        On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov
        <[email protected] <mailto:[email protected]>> wrote:

            Hello everyone,

            We have an 8 node C* cluster with large volume of
            unbalanced data. Usual per-partition selects work
            somewhat fine, and are processed by limited number of
            nodes, but if user issues SELECT WHERE IN () ALLOW
            FILTERING, such command stalls all 8 nodes to halt and
            unresponsiveness to external requests while disk IO
            jumps to 100% across whole cluster. In several minutes
            all nodes seem to finish ptocessing the request and
            cluster goes back to being responsive. Replication level
            across whole data is 3.

            1) Why such behavior? I thought any given SELECT request
            is handled by a limited subset of C* nodes and not by
            all of them, as per connection consistency/table
            replication settings, in case.

            2) Is it possible to forbid ALLOW FILTERING flag for
            given users/groups?

            Thank you all very much in advance,
            Vsevolod Filaretov.

Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

Reply via email to