Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Shixiong Zhu
+1 (binding)

Best Regards,
Shixiong Zhu


On Tue, Jan 9, 2024 at 6:47 PM 刘唯  wrote:

> This is a good addition! +1
>
> Raghu Angadi  于2024年1月9日周二 13:17写道:
>
>> +1. This is a major improvement to the state API.
>>
>> Raghu.
>>
>> On Tue, Jan 9, 2024 at 1:42 AM Mich Talebzadeh 
>> wrote:
>>
>>> +1 for me as well
>>>
>>>
>>> Mich Talebzadeh,
>>> Dad | Technologist | Solutions Architect | Engineer
>>> London
>>> United Kingdom
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 9 Jan 2024 at 03:24, Anish Shrigondekar
>>>  wrote:
>>>
 Thanks Jungtaek for creating the Vote thread.

 +1 (non-binding) from my side too.

 Thanks,
 Anish

 On Tue, Jan 9, 2024 at 6:09 AM Jungtaek Lim <
 kabhwan.opensou...@gmail.com> wrote:

> Starting with my +1 (non-binding). Thanks!
>
> On Tue, Jan 9, 2024 at 9:37 AM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
>
>> Hi all,
>>
>> I'd like to start the vote for SPIP: Structured Streaming - Arbitrary
>> State API v2.
>>
>> References:
>>
>>- JIRA ticket 
>>- SPIP doc
>>
>> 
>>- Discussion thread
>>
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thanks!
>> Jungtaek Lim (HeartSaVioR)
>>
>


Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Jungtaek Lim
Friendly reminder, VOTE thread is now live!
https://lists.apache.org/thread/16ryx828bwoth31hobknxnjfxjxj07mf
The vote made here is not counted toward, so please ensure you vote in the
VOTE thread. Thanks!

On Tue, Jan 9, 2024 at 9:33 AM Jungtaek Lim 
wrote:

> Thanks everyone for the feedback!
>
> Given that we get positive feedback without major concerns, I will
> initiate the vote thread soon. Please make a vote in that thread as well.
>
> Thanks again!
>
> On Tue, Jan 9, 2024 at 7:44 AM Bhuwan Sahni
>  wrote:
>
>> +1 on the newer APIs. I believe these APIs provide a much powerful
>> mechanism for the user to perform arbitrary state management in Structured
>> Streaming queries.
>>
>> Thanks
>> Bhuwan Sahni
>>
>> On Mon, Jan 8, 2024 at 10:07 AM L. C. Hsieh  wrote:
>>
>>> +1
>>>
>>> I left some comments in the SPIP doc and got replies quickly. The new
>>> API looks good and more comprehensive. I think it will help Spark
>>> Structured Streaming to be more useful in more complicated streaming
>>> use cases.
>>>
>>> On Fri, Jan 5, 2024 at 8:15 PM Burak Yavuz  wrote:
>>> >
>>> > I'm also a +1 on the newer APIs. We had a lot of learnings from using
>>> flatMapGroupsWithState and I believe that we can make the APIs a lot easier
>>> to use.
>>> >
>>> > On Wed, Nov 29, 2023 at 6:43 PM Anish Shrigondekar
>>>  wrote:
>>> >>
>>> >> Hi dev,
>>> >>
>>> >> Addressed the comments that Jungtaek had on the doc. Bumping the
>>> thread once again to see if other folks have any feedback on the proposal.
>>> >>
>>> >> Thanks,
>>> >> Anish
>>> >>
>>> >> On Mon, Nov 27, 2023 at 8:15 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>> >>>
>>> >>> Kindly bump for better reach after the long holiday. Please kindly
>>> review the proposal which opens the chance to address complex use cases of
>>> streaming. Thanks!
>>> >>>
>>> >>> On Thu, Nov 23, 2023 at 8:19 AM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>> 
>>>  Thanks Anish for proposing SPIP and initiating this thread! I
>>> believe this SPIP will help a bunch of complex use cases on streaming.
>>> 
>>>  dev@: We are coincidentally initiating this discussion in
>>> thanksgiving holidays. We understand people in the US may not have time to
>>> review the SPIP, and we plan to bump this thread in early next week. We are
>>> open for any feedback from non-US during the holiday. We can either address
>>> feedback altogether after the holiday (Anish is in the US) or I can answer
>>> if the feedback is more about the question. Thanks!
>>> 
>>>  On Thu, Nov 23, 2023 at 5:27 AM Anish Shrigondekar <
>>> anish.shrigonde...@databricks.com> wrote:
>>> >
>>> > Hi dev,
>>> >
>>> > I would like to start a discussion on "Structured Streaming -
>>> Arbitrary State API v2". This proposal aims to address a bunch of
>>> limitations we see today using mapGroupsWithState/flatMapGroupsWithState
>>> operator. The detailed set of limitations is described in the SPIP doc.
>>> >
>>> > We propose to support various features such as multiple state
>>> variables (flexible data modeling), composite types, enhanced timer
>>> functionality, support for chaining operators after new operator, handling
>>> initial state along with state data source, schema evolution etc This will
>>> allow users to write more powerful streaming state management logic
>>> primarily used in operational use-cases. Other built-in stateful operators
>>> could also benefit from such changes in the future.
>>> >
>>> > JIRA: https://issues.apache.org/jira/browse/SPARK-45939
>>> > SPIP:
>>> https://docs.google.com/document/d/1QtC5qd4WQEia9kl1Qv74WE0TiXYy3x6zeTykygwPWig/edit?usp=sharing
>>> > Design Doc:
>>> https://docs.google.com/document/d/1QjZmNZ-fHBeeCYKninySDIoOEWfX6EmqXs2lK097u9o/edit?usp=sharing
>>> >
>>> > cc - @Jungtaek Lim  who has graciously agreed to be the shepherd
>>> for this project
>>> >
>>> > Looking forward to your feedback !
>>> >
>>> > Thanks,
>>> > Anish
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>> --
>> 
>> *Bhuwan Sahni*
>> Staff Software Engineer
>>
>> bhuwan.sa...@databricks.com
>> 500 108th Ave. NE
>> Bellevue, WA 98004
>> USA
>>
>


Re: Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-09 Thread Mich Talebzadeh
Hi Ashok,

Thanks for pointing out the databricks article Scalable Spark Structured
Streaming for REST API Destinations | Databricks Blog


I browsed it and it is basically similar to many of us involved with spark
structure streaming with *foreachBatch. *This article and mine both mention
REST API as part of the architecture. However, there are notable
differences I believe.

In my proposed approach:

   1. Event-Driven Model:


   - Spark Streaming waits until Flask REST API makes a request for events
   to be generated within PySpark.
   - Messages are generated and then fed into any sink based on the Flask
   REST API's request.
   - This creates a more event-driven model where Spark generates data when
   prompted by external requests.





In the Databricks article scenario:

Continuous Data Stream:

   - There is an incoming stream of data from sources like Kafka, AWS
   Kinesis, or Azure Event Hub handled by foreachBatch
   - As messages flow off this stream, calls are made to a REST API with
   some or all of the message data.
   - This suggests a continuous flow of data where messages are sent to a
   REST API as soon as they are available in the streaming source.


*Benefits of Event-Driven Model:*


   1. Responsiveness: Ideal for scenarios where data generation needs to be
   aligned with specific events or user actions.
   2. Resource Optimization: Can reduce resource consumption by processing
   data only when needed.
   3. Flexibility: Allows for dynamic control over data generation based on
   external triggers.

*Benefits of Continuous Data Stream Mode with foreachBatch:*

   1. Real-Time Processing: Facilitates immediate analysis and action on
   incoming data.
   2. Handling High Volumes: Well-suited for scenarios with
   continuous, high-volume data streams.
   3. Low-Latency Applications: Essential for applications requiring near
   real-time responses.

*Potential Use Cases for my approach:*

   - On-Demand Data Generation: Generating data for
   simulations, reports, or visualizations based on user requests.
   - Triggered Analytics: Executing specific analytics tasks only when
   certain events occur, such as detecting anomalies or reaching thresholds
   say fraud detection.
   - Custom ETL Processes: Facilitating data
   extraction, transformation, and loading workflows based on external events
   or triggers


Something to note on latency. Event-driven models like mine can potentially
introduce slight latency compared to continuous processing, as data
generation depends on API calls.

So my approach is more event-triggered and responsive to external requests,
while foreachBatch scenario is more continuous and real-time, processing
and sending data as it becomes available.

In summary, both approaches have their merits and are suited to different
use cases depending on the nature of the data flow and processing
requirements.

Cheers

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 9 Jan 2024 at 19:11, ashok34...@yahoo.com 
wrote:

> Hey Mich,
>
> Thanks for this introduction on your forthcoming proposal "Spark
> Structured Streaming and Flask REST API for Real-Time Data Ingestion and
> Analytics". I recently came across an article by Databricks with title 
> Scalable
> Spark Structured Streaming for REST API Destinations
> 
> . Their use case is similar to your suggestion but what they are saying
> is that they have incoming stream of data from sources like Kafka, AWS
> Kinesis, or Azure Event Hub. In other words, a continuous flow of data
> where messages are sent to a REST API as soon as they are available in the
> streaming source. Their approach is practical but wanted to get your
> thoughts on their article with a better understanding on your proposal and
> differences.
>
> Thanks
>
>
> On Tuesday, 9 January 2024 at 00:24:19 GMT, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>
> Please also note that Flask, by default, is a single-threaded web
> framework. While it is suitable for development and small-scale
> applications, it may not handle concurrent requests efficiently in a
> production environment.
> In production, one can utilise Gunicorn (Green Unicorn) which is a WSGI (
> Web Server Gateway Interface) that is commonly used to serve Flask
> applications in 

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread 刘唯
This is a good addition! +1

Raghu Angadi  于2024年1月9日周二 13:17写道:

> +1. This is a major improvement to the state API.
>
> Raghu.
>
> On Tue, Jan 9, 2024 at 1:42 AM Mich Talebzadeh 
> wrote:
>
>> +1 for me as well
>>
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 9 Jan 2024 at 03:24, Anish Shrigondekar
>>  wrote:
>>
>>> Thanks Jungtaek for creating the Vote thread.
>>>
>>> +1 (non-binding) from my side too.
>>>
>>> Thanks,
>>> Anish
>>>
>>> On Tue, Jan 9, 2024 at 6:09 AM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Starting with my +1 (non-binding). Thanks!

 On Tue, Jan 9, 2024 at 9:37 AM Jungtaek Lim <
 kabhwan.opensou...@gmail.com> wrote:

> Hi all,
>
> I'd like to start the vote for SPIP: Structured Streaming - Arbitrary
> State API v2.
>
> References:
>
>- JIRA ticket 
>- SPIP doc
>
> 
>- Discussion thread
>
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thanks!
> Jungtaek Lim (HeartSaVioR)
>



Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Raghu Angadi
+1. This is a major improvement to the state API.

Raghu.

On Tue, Jan 9, 2024 at 1:42 AM Mich Talebzadeh 
wrote:

> +1 for me as well
>
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 9 Jan 2024 at 03:24, Anish Shrigondekar
>  wrote:
>
>> Thanks Jungtaek for creating the Vote thread.
>>
>> +1 (non-binding) from my side too.
>>
>> Thanks,
>> Anish
>>
>> On Tue, Jan 9, 2024 at 6:09 AM Jungtaek Lim 
>> wrote:
>>
>>> Starting with my +1 (non-binding). Thanks!
>>>
>>> On Tue, Jan 9, 2024 at 9:37 AM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Hi all,

 I'd like to start the vote for SPIP: Structured Streaming - Arbitrary
 State API v2.

 References:

- JIRA ticket 
- SPIP doc

 
- Discussion thread


 Please vote on the SPIP for the next 72 hours:

 [ ] +1: Accept the proposal as an official SPIP
 [ ] +0
 [ ] -1: I don’t think this is a good idea because …

 Thanks!
 Jungtaek Lim (HeartSaVioR)

>>>


RE: Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread 刘唯
+1 This is a good addition!

On 2024/01/09 03:23:35 Anish Shrigondekar wrote:
> Thanks Jungtaek for creating the Vote thread.
>
> +1 (non-binding) from my side too.
>
> Thanks,
> Anish
>
> On Tue, Jan 9, 2024 at 6:09 AM Jungtaek Lim 
> wrote:
>
> > Starting with my +1 (non-binding). Thanks!
> >
> > On Tue, Jan 9, 2024 at 9:37 AM Jungtaek Lim 
> > wrote:
> >
> >> Hi all,
> >>
> >> I'd like to start the vote for SPIP: Structured Streaming - Arbitrary
> >> State API v2.
> >>
> >> References:
> >>
> >>- JIRA ticket 
> >>- SPIP doc
> >><
https://docs.google.com/document/d/1QtC5qd4WQEia9kl1Qv74WE0TiXYy3x6zeTykygwPWig/edit?usp=sharing
>
> >>- Discussion thread
> >>
> >>
> >> Please vote on the SPIP for the next 72 hours:
> >>
> >> [ ] +1: Accept the proposal as an official SPIP
> >> [ ] +0
> >> [ ] -1: I don’t think this is a good idea because …
> >>
> >> Thanks!
> >> Jungtaek Lim (HeartSaVioR)
> >>
> >
>


Re: AutoReply: Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Mich Talebzadeh
Hi,

Please stop this acknowledgement email. It is spamming the forum
unnecessarily!

Thanks

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 9 Jan 2024 at 09:44, laglanyue  wrote:

> thx for your email, and I receiver it.


Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Mich Talebzadeh
+1 for me as well


Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 9 Jan 2024 at 03:24, Anish Shrigondekar
 wrote:

> Thanks Jungtaek for creating the Vote thread.
>
> +1 (non-binding) from my side too.
>
> Thanks,
> Anish
>
> On Tue, Jan 9, 2024 at 6:09 AM Jungtaek Lim 
> wrote:
>
>> Starting with my +1 (non-binding). Thanks!
>>
>> On Tue, Jan 9, 2024 at 9:37 AM Jungtaek Lim 
>> wrote:
>>
>>> Hi all,
>>>
>>> I'd like to start the vote for SPIP: Structured Streaming - Arbitrary
>>> State API v2.
>>>
>>> References:
>>>
>>>- JIRA ticket 
>>>- SPIP doc
>>>
>>> 
>>>- Discussion thread
>>>
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>> Thanks!
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>