Thanks Takeshi, I did check it. I believe you are referring to this statement
"This is likely because we cast this expression weirdly to be compatible with Hive. Specifically I think this turns into, CAST(c_date AS STRING) >= "2016-01-01", and we don't push down casts down into data sources. The reason for casting this way is because users currently expect the following to work c_date >= "2016". " There are two issues here: 1. The CAST expression is not pushed down 2. I still don't trust the string comparison of dates. It may or may not work. I recall this as an issue in Hive '2012-11-23' is not a DATE; It is a string. In general from my experience one should not try to compare a DATE with a string. The results will depend on several factors, some related to the tool, some related to the session. For example the following will convert a date as string format into a DATE type in Hive TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(TransactionDate,'dd/MM/yyyy'),'yyyy-MM-dd')) HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 15 April 2016 at 06:58, Takeshi Yamamuro <linguin....@gmail.com> wrote: > Hi, Mich > > Did you check the URL Josh referred to?; > the cast for string comparisons is needed for accepting `c_date >= "2016"`. > > // maropu > > > On Fri, Apr 15, 2016 at 10:30 AM, Hyukjin Kwon <gurwls...@gmail.com> > wrote: > >> Hi, >> >> >> String comparison itself is pushed down fine but the problem is to deal >> with Cast. >> >> >> It was pushed down before but is was reverted, ( >> https://github.com/apache/spark/pull/8049). >> >> Several fixes were tried here, https://github.com/apache/spark/pull/11005 >> and etc. but there were no changes to make it. >> >> >> To cut it short, it is not being pushed down because it is unsafe to >> resolve cast (eg. long to integer) >> >> For an workaround, the implementation of Solr data source should be >> changed to one with CatalystScan, which take all the filters. >> >> But CatalystScan is not designed to be binary compatible across releases, >> however it looks some think it is stable now, as mentioned here, >> https://github.com/apache/spark/pull/10750#issuecomment-175400704. >> >> >> Thanks! >> >> >> 2016-04-15 3:30 GMT+09:00 Mich Talebzadeh <mich.talebza...@gmail.com>: >> >>> Hi Josh, >>> >>> Can you please clarify whether date comparisons as two strings work at >>> all? >>> >>> I was under the impression is that with string comparison only first >>> characters are compared? >>> >>> Thanks >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 14 April 2016 at 19:26, Josh Rosen <joshro...@databricks.com> wrote: >>> >>>> AFAIK this is not being pushed down because it involves an implicit >>>> cast and we currently don't push casts into data sources or scans; see >>>> https://github.com/databricks/spark-redshift/issues/155 for a >>>> possibly-related discussion. >>>> >>>> On Thu, Apr 14, 2016 at 10:27 AM Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> Are you comparing strings in here or timestamp? >>>>> >>>>> Filter ((cast(registration#37 as string) >= 2015-05-28) && >>>>> (cast(registration#37 as string) <= 2015-05-29)) >>>>> >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> >>>>> >>>>> LinkedIn * >>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>> >>>>> >>>>> >>>>> http://talebzadehmich.wordpress.com >>>>> >>>>> >>>>> >>>>> On 14 April 2016 at 18:04, Kiran Chitturi < >>>>> kiran.chitt...@lucidworks.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Timestamp range filter queries in SQL are not getting pushed down to >>>>>> the PrunedFilteredScan instances. The filtering is happening at the Spark >>>>>> layer. >>>>>> >>>>>> The physical plan for timestamp range queries is not showing the >>>>>> pushed filters where as range queries on other types is working fine as >>>>>> the >>>>>> physical plan is showing the pushed filters. >>>>>> >>>>>> Please see below for code and examples. >>>>>> >>>>>> *Example:* >>>>>> >>>>>> *1.* Range filter queries on Timestamp types >>>>>> >>>>>> *code: * >>>>>> >>>>>>> sqlContext.sql("SELECT * from events WHERE `registration` >= >>>>>>> '2015-05-28' AND `registration` <= '2015-05-29' ") >>>>>> >>>>>> *Full example*: >>>>>> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151 >>>>>> * plan*: >>>>>> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-time-range-sql >>>>>> >>>>>> *2. * Range filter queries on Long types >>>>>> >>>>>> *code*: >>>>>> >>>>>>> sqlContext.sql("SELECT * from events WHERE `length` >= '700' and >>>>>>> `length` <= '1000'") >>>>>> >>>>>> *Full example*: >>>>>> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151 >>>>>> *plan*: >>>>>> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-length-range-sql >>>>>> >>>>>> The SolrRelation class we use extends >>>>>> <https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/SolrRelation.scala#L37> >>>>>> the PrunedFilteredScan. >>>>>> >>>>>> Since Solr supports date ranges, I would like for the timestamp >>>>>> filters to be pushed down to the Solr query. >>>>>> >>>>>> Are there limitations on the type of filters that are passed down >>>>>> with Timestamp types ? >>>>>> Is there something that I should do in my code to fix this ? >>>>>> >>>>>> Thanks, >>>>>> -- >>>>>> Kiran Chitturi >>>>>> >>>>>> >>>>> >>> >> > > > -- > --- > Takeshi Yamamuro >