Re: Spark sql not pushing down timestamp range queries

Mich Talebzadeh Thu, 14 Apr 2016 11:31:31 -0700

Hi Josh,

Can you please clarify whether date comparisons as two strings work at all?


I was under the impression is that with string comparison only first
characters are compared?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 14 April 2016 at 19:26, Josh Rosen <joshro...@databricks.com> wrote:

> AFAIK this is not being pushed down because it involves an implicit cast
> and we currently don't push casts into data sources or scans; see
> https://github.com/databricks/spark-redshift/issues/155 for a
> possibly-related discussion.
>
> On Thu, Apr 14, 2016 at 10:27 AM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Are you comparing strings in here or timestamp?
>>
>> Filter ((cast(registration#37 as string) >= 2015-05-28) &&
>> (cast(registration#37 as string) <= 2015-05-29))
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 14 April 2016 at 18:04, Kiran Chitturi <kiran.chitt...@lucidworks.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Timestamp range filter queries in SQL are not getting pushed down to the
>>> PrunedFilteredScan instances. The filtering is happening at the Spark layer.
>>>
>>> The physical plan for timestamp range queries is not showing the pushed
>>> filters where as range queries on other types is working fine as the
>>> physical plan is showing the pushed filters.
>>>
>>> Please see below for code and examples.
>>>
>>> *Example:*
>>>
>>> *1.* Range filter queries on Timestamp types
>>>
>>>    *code: *
>>>
>>>> sqlContext.sql("SELECT * from events WHERE `registration` >=
>>>> '2015-05-28' AND `registration` <= '2015-05-29' ")
>>>
>>>    *Full example*:
>>> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
>>> *    plan*:
>>> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-time-range-sql
>>>
>>> *2. * Range filter queries on Long types
>>>
>>>     *code*:
>>>
>>>> sqlContext.sql("SELECT * from events WHERE `length` >= '700' and
>>>> `length` <= '1000'")
>>>
>>>     *Full example*:
>>> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
>>>     *plan*:
>>> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-length-range-sql
>>>
>>> The SolrRelation class we use extends
>>> <https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/SolrRelation.scala#L37>
>>> the PrunedFilteredScan.
>>>
>>> Since Solr supports date ranges, I would like for the timestamp filters
>>> to be pushed down to the Solr query.
>>>
>>> Are there limitations on the type of filters that are passed down with
>>> Timestamp types ?
>>> Is there something that I should do in my code to fix this ?
>>>
>>> Thanks,
>>> --
>>> Kiran Chitturi
>>>
>>>
>>

Re: Spark sql not pushing down timestamp range queries

Reply via email to