If you want to find what offset ranges are present in a microbatch in
Structured Streaming, you have to look at the StreamingQuery.lastProgress or
use the QueryProgressListener
<https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/streaming/StreamingQueryListener.html>.
Both of these approaches gives you access to the SourceProgress
<https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/streaming/SourceProgress.html>
which gives Kafka offsets as a JSON string.

Hope this helps!

On Wed, May 22, 2024 at 10:04 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> OK to understand better your current model relies on streaming data input
> through Kafka topic, Spark does some ETL and you send to a sink, a
> database for file storage like HDFS etc?
>
> Your current architecture relies on Direct Streams (DStream) and RDDs and
> you want to move to Spark sStructured Streaming based on dataframes and
> datasets?
>
> You have not specified your sink
>
> With regard to your question?
>
> "Is there an equivalent of Dstream HasOffsetRanges in structure streaming
> to get the microbatch end offsets to the checkpoint in our external
> checkpoint store ?"
>
> There is not a direct equivalent of DStream HasOffsetRanges in Spark
> Structured Streaming. However, Structured Streaming provides mechanisms to
> achieve similar functionality:
>
> HTH
>
> Mich Talebzadeh,
> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
> London
> United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On Wed, 22 May 2024 at 10:32, ashok34...@yahoo.com.INVALID
> <ashok34...@yahoo.com.invalid> wrote:
>
>> Hello,
>>
>> what options are you considering yourself?
>>
>> On Wednesday 22 May 2024 at 07:37:30 BST, Anil Dasari <
>> adas...@guidewire.com> wrote:
>>
>>
>> Hello,
>>
>> We are on Spark 3.x and using Spark dstream + kafka and planning to use
>> structured streaming + Kafka.
>> Is there an equivalent of Dstream HasOffsetRanges in structure streaming
>> to get the microbatch end offsets to the checkpoint in our external
>> checkpoint store ? Thanks in advance.
>>
>> Regards
>>
>>

Reply via email to