Re: Structured streaming from Kafka by timestamp

2019-02-05 Thread Cody Koeninger
To be more explicit, the easiest thing to do in the short term is use your own instance of KafkaConsumer to get the offsets for the timestamps you're interested in, using offsetsForTimes, and use those for the start / end offsets. See

Re: Structured streaming from Kafka by timestamp

2019-02-01 Thread Tomas Bartalos
Hello, sorry for my late answer. You're right, what I'm doing is a one time query, not a structured streaming. Probably it will be best to describe my use case: I'd like to expose live data (via jdbc/odbc) residing in Kafka with the power of spark's distributed sql engine. As jdbc server I use

Re: Structured streaming from Kafka by timestamp

2019-01-24 Thread Shixiong(Ryan) Zhu
Hey Tomas, >From your description, you just ran a batch query rather than a Structured Streaming query. The Kafka data source doesn't support filter push down right now. But that's definitely doable. One workaround here is setting proper "startingOffsets" and "endingOffsets" options when loading

Re: Structured streaming from Kafka by timestamp

2019-01-24 Thread Gabor Somogyi
Hi Tomas, As a general note don't fully understand your use-case. You've mentioned structured streaming but your query is more like a one-time SQL statement. Kafka doesn't support predicates how it's integrated with spark. What can be done from spark perspective is to look for an offset for a

Structured streaming from Kafka by timestamp

2019-01-24 Thread Tomas Bartalos
Hello, I'm trying to read Kafka via spark structured streaming. I'm trying to read data within specific time range: select count(*) from kafka_table where timestamp > cast('2019-01-23 1:00' as TIMESTAMP) and timestamp < cast('2019-01-23 1:01' as TIMESTAMP); The problem is that timestamp query