Thanks Bart. I'll give it a try. Presto has done something very similar on
this (thanks DB for finding this!). They published an article ([1]) last
year with a very thorough analysis on all the cases which I think can be
used as a reference for the implementation in Spark.
[1]:
IMO it's worth an attempt. The previous attempts seem to be closed because
of a general sense that this gets messy and leads to lots of special cases,
but that's just how it is. This optimization would make the difference
between getting sub-par performance for using some of these datatypes to
Hi,
So just realized there were already multiple attempts on this issue in the
past. From the discussion it seems the preferred approach is to eliminate
the cast before they get pushed to data sources, at least for a few
common cases such as numeric types. However, a few PRs following this
> Currently we can't. This is something we should improve, by either
pushing down the cast to the data source, or simplifying the predicates to
eliminate the cast.
Hi all, I've created https://issues.apache.org/jira/browse/SPARK-32694 to
track this. Welcome to comment on the JIRA.
On Wed, Aug
Currently we can't. This is something we should improve, by either pushing
down the cast to the data source, or simplifying the predicates to
eliminate the cast.
On Wed, Aug 19, 2020 at 5:09 PM Bart Samwel
wrote:
> And how are we doing here on integer pushdowns? If someone does e.g.
>
And how are we doing here on integer pushdowns? If someone does e.g.
CAST(short_col AS LONG) < 1000, can we still push down "short_col < 1000"
without the cast?
On Tue, Aug 4, 2020 at 6:55 PM Russell Spitzer
wrote:
> Thanks! That's exactly what I was hoping for! Thanks for finding the Jira
>
Thanks! That's exactly what I was hoping for! Thanks for finding the Jira
for me!
On Tue, Aug 4, 2020 at 11:46 AM Wenchen Fan wrote:
> I think this is not a problem in 3.0 anymore, see
> https://issues.apache.org/jira/browse/SPARK-27638
>
> On Wed, Aug 5, 2020 at 12:08 AM Russell Spitzer
>
Hi, Russell,
You might hit the other cases in which CAST blocks the predicate pushdown.
If the Cast was added by users and it changes the actual type, we are
unable to optimize it automatically because it could change the query
correctness. If it was added by our type coercion rules
I think this is not a problem in 3.0 anymore, see
https://issues.apache.org/jira/browse/SPARK-27638
On Wed, Aug 5, 2020 at 12:08 AM Russell Spitzer
wrote:
> I've just run into this issue again with another user and I feel like most
> folks here have seen some flavor of this at some point.
>
>