tomasbartalos commented on issue #23749: [SPARK-26841][SQL] Kafka timestamp 
pushdown
URL: https://github.com/apache/spark/pull/23749#issuecomment-473835581
 
 
   > IMO, we could adopt the more generic solution. If its just timestamp based 
filtering of start and end offsets maybe the timestamp offset approach proposed 
in #23747 looks straightforward. However if there are more cases that the 
filter pushdown might be able to handle we should go with that. I am assuming 
the filter condition can appear anywhere in the query and get pushed down to 
filter the rows and if so looks more generic and avoids having to add extra 
options to the kafka source.
   > 
   > I am not sure providing two different options for timestamp based 
filtering is necessary. If we support both, the user can provide different 
values via the options and the filter and it gets very confusing.
   
   @arunmahadevan, IMHO, I would divide this to 2 use cases:
   - **streaming** - specifying the offsets with DS option as proposed in 
#23747 makes sense, because the start & end ranges are static and they doesn't 
change during the lifetime of the app
   - **sql query** - it's more straightforward and flexible to put the 
conditions to "where" part of the query. If the ts ranges are static its 
possible to create a view with static timestamp filter and fully cover #23747 
functionality. However for use cases with dynamic ts ranges this approach is 
more suitable. It's not necessary to recreate a DF or a table from scratch just 
because we need to filter on different TS ranges.
   
   Our specific use-case (from which this PR emerged) is to see the last 30 
minutes of Kafka. With having a timestamp pushdown and a dynamic view:
   ```
   CREATE or replace VIEW kafka_30_min as 
       select value, timestamp from kafka_source 
       where timestamp > cast(from_unixtime(unix_timestamp() - 10 * 60, 
"YYYY-MM-dd HH:mm:ss") as TIMESTAMP);`
   ```
   I can repeatedly query the same view and always get up-to date last 30 
minutes of Kafka.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to