"unpredictable" is such a loaded word. It's quite predictable, but it's
often mispredicted by users.

"ALLOW FILTERING" basically tells the database you're going to do a query
that will require scanning a bunch of data to return some subset of it, and
you're not able to provide a WHERE clause that's sufficiently fine grained
to avoid the scan. It's a loose equivalent of doing a full table scan in
SQL databases - sometimes it's a valid use case, but it's expensive, you're
ignoring all of the indexes, and you're going to do a lot more work.

It's predictable, though - you're probably going to walk over some range of
data. Spark is grabbing all of the data to load into RDDs, and it probably
does it by slicing up the range, doing a bunch of range scans.

It's doing that so it can get ALL of the data and do the filtering /
joining / searching in-memory in spark, rather than relying on cassandra to
do the scanning/searching on disk.

On Thu, Jul 25, 2019 at 6:49 AM ZAIDI, ASAD A <az1...@att.com> wrote:

> Hello Folks,
>
>
>
> I was going thru documentation and saw at many places saying ALLOW
> FILTERING causes performance unpredictability.  Our developers says ALLOW
> FILTERING clause is implicitly added on bunch of queries by spark-Cassandra
>  connector and they cannot control it; however at the same time we see
> unpredictability in application performance – just as documentation says.
>
>
>
> I’m trying to understand why would a connector add a clause in query when
> this can cause negative impact on database/application performance. Is that
> data model that is driving connector make its decision and add allow
> filtering to query automatically or if there are other reason this clause
> is added to the code. I’m not a developer though I want to know why
> developer don’t have any control on this to happen.
>
>
>
> I’ll appreciate your guidance here.
>
>
>
> Thanks
>
> Asad
>
>
>
>
>

Reply via email to