Re: Spark sql not pushing down timestamp range queries

Takeshi Yamamuro Thu, 14 Apr 2016 22:58:30 -0700

Hi, Mich

Did you check the URL Josh referred to?;
the cast for string comparisons is needed for accepting `c_date >= "2016"`.


// maropu


On Fri, Apr 15, 2016 at 10:30 AM, Hyukjin Kwon <gurwls...@gmail.com> wrote:

> Hi,
>
>
> String comparison itself is pushed down fine but the problem is to deal
> with Cast.
>
>
> It was pushed down before but is was reverted, (
> https://github.com/apache/spark/pull/8049).
>
> Several fixes were tried here, https://github.com/apache/spark/pull/11005
> and etc. but there were no changes to make it.
>
>
> To cut it short, it is not being pushed down because it is unsafe to
> resolve cast (eg. long to integer)
>
> For an workaround,  the implementation of Solr data source should be
> changed to one with CatalystScan, which take all the filters.
>
> But CatalystScan is not designed to be binary compatible across releases,
> however it looks some think it is stable now, as mentioned here,
> https://github.com/apache/spark/pull/10750#issuecomment-175400704.
>
>
> Thanks!
>
>
> 2016-04-15 3:30 GMT+09:00 Mich Talebzadeh <mich.talebza...@gmail.com>:
>
>> Hi Josh,
>>
>> Can you please clarify whether date comparisons as two strings work at
>> all?
>>
>> I was under the impression is that with string comparison only first
>> characters are compared?
>>
>> Thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 14 April 2016 at 19:26, Josh Rosen <joshro...@databricks.com> wrote:
>>
>>> AFAIK this is not being pushed down because it involves an implicit cast
>>> and we currently don't push casts into data sources or scans; see
>>> https://github.com/databricks/spark-redshift/issues/155 for a
>>> possibly-related discussion.
>>>
>>> On Thu, Apr 14, 2016 at 10:27 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Are you comparing strings in here or timestamp?
>>>>
>>>> Filter ((cast(registration#37 as string) >= 2015-05-28) &&
>>>> (cast(registration#37 as string) <= 2015-05-29))
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 14 April 2016 at 18:04, Kiran Chitturi <
>>>> kiran.chitt...@lucidworks.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Timestamp range filter queries in SQL are not getting pushed down to
>>>>> the PrunedFilteredScan instances. The filtering is happening at the Spark
>>>>> layer.
>>>>>
>>>>> The physical plan for timestamp range queries is not showing the
>>>>> pushed filters where as range queries on other types is working fine as 
>>>>> the
>>>>> physical plan is showing the pushed filters.
>>>>>
>>>>> Please see below for code and examples.
>>>>>
>>>>> *Example:*
>>>>>
>>>>> *1.* Range filter queries on Timestamp types
>>>>>
>>>>>    *code: *
>>>>>
>>>>>> sqlContext.sql("SELECT * from events WHERE `registration` >=
>>>>>> '2015-05-28' AND `registration` <= '2015-05-29' ")
>>>>>
>>>>>    *Full example*:
>>>>> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
>>>>> *    plan*:
>>>>> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-time-range-sql
>>>>>
>>>>> *2. * Range filter queries on Long types
>>>>>
>>>>>     *code*:
>>>>>
>>>>>> sqlContext.sql("SELECT * from events WHERE `length` >= '700' and
>>>>>> `length` <= '1000'")
>>>>>
>>>>>     *Full example*:
>>>>> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
>>>>>     *plan*:
>>>>> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-length-range-sql
>>>>>
>>>>> The SolrRelation class we use extends
>>>>> <https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/SolrRelation.scala#L37>
>>>>> the PrunedFilteredScan.
>>>>>
>>>>> Since Solr supports date ranges, I would like for the timestamp
>>>>> filters to be pushed down to the Solr query.
>>>>>
>>>>> Are there limitations on the type of filters that are passed down with
>>>>> Timestamp types ?
>>>>> Is there something that I should do in my code to fix this ?
>>>>>
>>>>> Thanks,
>>>>> --
>>>>> Kiran Chitturi
>>>>>
>>>>>
>>>>
>>
>


-- 
---
Takeshi Yamamuro

Re: Spark sql not pushing down timestamp range queries

Reply via email to