Hi,

I guess whether structured streaming (SS) inherited anything from spark
streaming is a moot point now, although it is a concept built on spark
streaming which will be defunct soon.

Going forward, It all depends on what problem you are trying to address.

These are explained in the following doc
<https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html>

However, within SSS micro-batching, you have the concept of working out
running aggregates within a given timeframe  akin to spark streaming with
sliding window and window's length.

The other one relies on a fixed triggering mechanism to invoke a function
to perform some specific tasks (processing CDC, writing the result set to a
database, working out average prices per security etc) on streaming data in
that triggering period.

To see a discussion on running aggregates please look for the following
thread in this forum

"Calculate average from Spark stream"

And for triggering mechanism you can see an example in my linkedin  below

https://www.linkedin.com/pulse/processing-change-data-capture-spark-structured-talebzadeh-ph-d-/


HTH



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 31 May 2021 at 19:32, S <sheelst...@gmail.com> wrote:

> Hi,
>
> I am using Structured Streaming on Azure HdInsight. The version is 2.4.6.
>
> I am trying to understand the microbatch mode - default and fixed
> intervals. Does the fixed interval microbatch follow something similar to
> receiver based model where records keep getting pulled and stored into
> blocks for the duration of the interval at the end of which a job is kicked
> off? Or, does the job just process the current microbatch and sleep for the
> rest of the interval and pulls records only at the end of the interval?
>
> I am fully aware of the two dstreams models - receiver and direct based
> dstreams. I am just trying to figure out if either of these two models were
> reused in Structured Streaming.
>
> Regards,
> Sheel
>

Reply via email to