Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Anish Shrigondekar
Thanks Jungtaek for creating the Vote thread.

+1 (non-binding) from my side too.

Thanks,
Anish

On Tue, Jan 9, 2024 at 6:09 AM Jungtaek Lim 
wrote:

> Starting with my +1 (non-binding). Thanks!
>
> On Tue, Jan 9, 2024 at 9:37 AM Jungtaek Lim 
> wrote:
>
>> Hi all,
>>
>> I'd like to start the vote for SPIP: Structured Streaming - Arbitrary
>> State API v2.
>>
>> References:
>>
>>- JIRA ticket 
>>- SPIP doc
>>
>> 
>>- Discussion thread
>>
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thanks!
>> Jungtaek Lim (HeartSaVioR)
>>
>


Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Jungtaek Lim
Starting with my +1 (non-binding). Thanks!

On Tue, Jan 9, 2024 at 9:37 AM Jungtaek Lim 
wrote:

> Hi all,
>
> I'd like to start the vote for SPIP: Structured Streaming - Arbitrary
> State API v2.
>
> References:
>
>- JIRA ticket 
>- SPIP doc
>
> 
>- Discussion thread
>
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thanks!
> Jungtaek Lim (HeartSaVioR)
>


[VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Jungtaek Lim
Hi all,

I'd like to start the vote for SPIP: Structured Streaming - Arbitrary State
API v2.

References:

   - JIRA ticket 
   - SPIP doc
   

   - Discussion thread
   

Please vote on the SPIP for the next 72 hours:

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don’t think this is a good idea because …

Thanks!
Jungtaek Lim (HeartSaVioR)


Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Jungtaek Lim
Thanks everyone for the feedback!

Given that we get positive feedback without major concerns, I will initiate
the vote thread soon. Please make a vote in that thread as well.

Thanks again!

On Tue, Jan 9, 2024 at 7:44 AM Bhuwan Sahni
 wrote:

> +1 on the newer APIs. I believe these APIs provide a much powerful
> mechanism for the user to perform arbitrary state management in Structured
> Streaming queries.
>
> Thanks
> Bhuwan Sahni
>
> On Mon, Jan 8, 2024 at 10:07 AM L. C. Hsieh  wrote:
>
>> +1
>>
>> I left some comments in the SPIP doc and got replies quickly. The new
>> API looks good and more comprehensive. I think it will help Spark
>> Structured Streaming to be more useful in more complicated streaming
>> use cases.
>>
>> On Fri, Jan 5, 2024 at 8:15 PM Burak Yavuz  wrote:
>> >
>> > I'm also a +1 on the newer APIs. We had a lot of learnings from using
>> flatMapGroupsWithState and I believe that we can make the APIs a lot easier
>> to use.
>> >
>> > On Wed, Nov 29, 2023 at 6:43 PM Anish Shrigondekar
>>  wrote:
>> >>
>> >> Hi dev,
>> >>
>> >> Addressed the comments that Jungtaek had on the doc. Bumping the
>> thread once again to see if other folks have any feedback on the proposal.
>> >>
>> >> Thanks,
>> >> Anish
>> >>
>> >> On Mon, Nov 27, 2023 at 8:15 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>> >>>
>> >>> Kindly bump for better reach after the long holiday. Please kindly
>> review the proposal which opens the chance to address complex use cases of
>> streaming. Thanks!
>> >>>
>> >>> On Thu, Nov 23, 2023 at 8:19 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>> 
>>  Thanks Anish for proposing SPIP and initiating this thread! I
>> believe this SPIP will help a bunch of complex use cases on streaming.
>> 
>>  dev@: We are coincidentally initiating this discussion in
>> thanksgiving holidays. We understand people in the US may not have time to
>> review the SPIP, and we plan to bump this thread in early next week. We are
>> open for any feedback from non-US during the holiday. We can either address
>> feedback altogether after the holiday (Anish is in the US) or I can answer
>> if the feedback is more about the question. Thanks!
>> 
>>  On Thu, Nov 23, 2023 at 5:27 AM Anish Shrigondekar <
>> anish.shrigonde...@databricks.com> wrote:
>> >
>> > Hi dev,
>> >
>> > I would like to start a discussion on "Structured Streaming -
>> Arbitrary State API v2". This proposal aims to address a bunch of
>> limitations we see today using mapGroupsWithState/flatMapGroupsWithState
>> operator. The detailed set of limitations is described in the SPIP doc.
>> >
>> > We propose to support various features such as multiple state
>> variables (flexible data modeling), composite types, enhanced timer
>> functionality, support for chaining operators after new operator, handling
>> initial state along with state data source, schema evolution etc This will
>> allow users to write more powerful streaming state management logic
>> primarily used in operational use-cases. Other built-in stateful operators
>> could also benefit from such changes in the future.
>> >
>> > JIRA: https://issues.apache.org/jira/browse/SPARK-45939
>> > SPIP:
>> https://docs.google.com/document/d/1QtC5qd4WQEia9kl1Qv74WE0TiXYy3x6zeTykygwPWig/edit?usp=sharing
>> > Design Doc:
>> https://docs.google.com/document/d/1QjZmNZ-fHBeeCYKninySDIoOEWfX6EmqXs2lK097u9o/edit?usp=sharing
>> >
>> > cc - @Jungtaek Lim  who has graciously agreed to be the shepherd
>> for this project
>> >
>> > Looking forward to your feedback !
>> >
>> > Thanks,
>> > Anish
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
> --
> 
> *Bhuwan Sahni*
> Staff Software Engineer
>
> bhuwan.sa...@databricks.com
> 500 108th Ave. NE
> Bellevue, WA 98004
> USA
>


Re: Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-08 Thread Mich Talebzadeh
Please also note that Flask, by default, is a single-threaded web
framework. While it is suitable for development and small-scale
applications, it may not handle concurrent requests efficiently in a
production environment.
In production, one can utilise Gunicorn (Green Unicorn) which is a WSGI (
Web Server Gateway Interface) that is commonly used to serve Flask
applications in production. It provides multiple worker processes, each
capable of handling a single request at a time. This makes Gunicorn
suitable for handling multiple simultaneous requests and improves the
concurrency and performance of your Flask application.

HTH

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 8 Jan 2024 at 19:30, Mich Talebzadeh 
wrote:

> Thought it might be useful to share my idea with fellow forum members.  During
> the breaks, I worked on the *seamless integration of Spark Structured
> Streaming with Flask REST API for real-time data ingestion and analytics*.
> The use case revolves around a scenario where data is generated through
> REST API requests in real time. The Flask REST AP
> I efficiently
> captures and processes this data, saving it to a Spark Structured Streaming
> DataFrame. Subsequently, the processed data could be channelled into any
> sink of your choice including Kafka pipeline, showing a robust end-to-end
> solution for dynamic and responsive data streaming. I will delve into the
> architecture, implementation, and benefits of this combination, enabling
> one to build an agile and efficient real-time data application. I will put
> the code in GitHub for everyone's benefit. Hopefully your comments will
> help me to improve it.
>
> Cheers
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>


Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Bhuwan Sahni
+1 on the newer APIs. I believe these APIs provide a much powerful
mechanism for the user to perform arbitrary state management in Structured
Streaming queries.

Thanks
Bhuwan Sahni

On Mon, Jan 8, 2024 at 10:07 AM L. C. Hsieh  wrote:

> +1
>
> I left some comments in the SPIP doc and got replies quickly. The new
> API looks good and more comprehensive. I think it will help Spark
> Structured Streaming to be more useful in more complicated streaming
> use cases.
>
> On Fri, Jan 5, 2024 at 8:15 PM Burak Yavuz  wrote:
> >
> > I'm also a +1 on the newer APIs. We had a lot of learnings from using
> flatMapGroupsWithState and I believe that we can make the APIs a lot easier
> to use.
> >
> > On Wed, Nov 29, 2023 at 6:43 PM Anish Shrigondekar
>  wrote:
> >>
> >> Hi dev,
> >>
> >> Addressed the comments that Jungtaek had on the doc. Bumping the thread
> once again to see if other folks have any feedback on the proposal.
> >>
> >> Thanks,
> >> Anish
> >>
> >> On Mon, Nov 27, 2023 at 8:15 PM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
> >>>
> >>> Kindly bump for better reach after the long holiday. Please kindly
> review the proposal which opens the chance to address complex use cases of
> streaming. Thanks!
> >>>
> >>> On Thu, Nov 23, 2023 at 8:19 AM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
> 
>  Thanks Anish for proposing SPIP and initiating this thread! I believe
> this SPIP will help a bunch of complex use cases on streaming.
> 
>  dev@: We are coincidentally initiating this discussion in
> thanksgiving holidays. We understand people in the US may not have time to
> review the SPIP, and we plan to bump this thread in early next week. We are
> open for any feedback from non-US during the holiday. We can either address
> feedback altogether after the holiday (Anish is in the US) or I can answer
> if the feedback is more about the question. Thanks!
> 
>  On Thu, Nov 23, 2023 at 5:27 AM Anish Shrigondekar <
> anish.shrigonde...@databricks.com> wrote:
> >
> > Hi dev,
> >
> > I would like to start a discussion on "Structured Streaming -
> Arbitrary State API v2". This proposal aims to address a bunch of
> limitations we see today using mapGroupsWithState/flatMapGroupsWithState
> operator. The detailed set of limitations is described in the SPIP doc.
> >
> > We propose to support various features such as multiple state
> variables (flexible data modeling), composite types, enhanced timer
> functionality, support for chaining operators after new operator, handling
> initial state along with state data source, schema evolution etc This will
> allow users to write more powerful streaming state management logic
> primarily used in operational use-cases. Other built-in stateful operators
> could also benefit from such changes in the future.
> >
> > JIRA: https://issues.apache.org/jira/browse/SPARK-45939
> > SPIP:
> https://docs.google.com/document/d/1QtC5qd4WQEia9kl1Qv74WE0TiXYy3x6zeTykygwPWig/edit?usp=sharing
> > Design Doc:
> https://docs.google.com/document/d/1QjZmNZ-fHBeeCYKninySDIoOEWfX6EmqXs2lK097u9o/edit?usp=sharing
> >
> > cc - @Jungtaek Lim  who has graciously agreed to be the shepherd for
> this project
> >
> > Looking forward to your feedback !
> >
> > Thanks,
> > Anish
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 

*Bhuwan Sahni*
Staff Software Engineer

bhuwan.sa...@databricks.com
500 108th Ave. NE
Bellevue, WA 98004
USA


Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-08 Thread Mich Talebzadeh
Thought it might be useful to share my idea with fellow forum members.  During
the breaks, I worked on the *seamless integration of Spark Structured
Streaming with Flask REST API for real-time data ingestion and analytics*.
The use case revolves around a scenario where data is generated through
REST API requests in real time. The Flask REST AP
I efficiently captures
and processes this data, saving it to a Spark Structured Streaming
DataFrame. Subsequently, the processed data could be channelled into any
sink of your choice including Kafka pipeline, showing a robust end-to-end
solution for dynamic and responsive data streaming. I will delve into the
architecture, implementation, and benefits of this combination, enabling
one to build an agile and efficient real-time data application. I will put
the code in GitHub for everyone's benefit. Hopefully your comments will
help me to improve it.

Cheers

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.


Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread L. C. Hsieh
+1

I left some comments in the SPIP doc and got replies quickly. The new
API looks good and more comprehensive. I think it will help Spark
Structured Streaming to be more useful in more complicated streaming
use cases.

On Fri, Jan 5, 2024 at 8:15 PM Burak Yavuz  wrote:
>
> I'm also a +1 on the newer APIs. We had a lot of learnings from using 
> flatMapGroupsWithState and I believe that we can make the APIs a lot easier 
> to use.
>
> On Wed, Nov 29, 2023 at 6:43 PM Anish Shrigondekar 
>  wrote:
>>
>> Hi dev,
>>
>> Addressed the comments that Jungtaek had on the doc. Bumping the thread once 
>> again to see if other folks have any feedback on the proposal.
>>
>> Thanks,
>> Anish
>>
>> On Mon, Nov 27, 2023 at 8:15 PM Jungtaek Lim  
>> wrote:
>>>
>>> Kindly bump for better reach after the long holiday. Please kindly review 
>>> the proposal which opens the chance to address complex use cases of 
>>> streaming. Thanks!
>>>
>>> On Thu, Nov 23, 2023 at 8:19 AM Jungtaek Lim  
>>> wrote:

 Thanks Anish for proposing SPIP and initiating this thread! I believe this 
 SPIP will help a bunch of complex use cases on streaming.

 dev@: We are coincidentally initiating this discussion in thanksgiving 
 holidays. We understand people in the US may not have time to review the 
 SPIP, and we plan to bump this thread in early next week. We are open for 
 any feedback from non-US during the holiday. We can either address 
 feedback altogether after the holiday (Anish is in the US) or I can answer 
 if the feedback is more about the question. Thanks!

 On Thu, Nov 23, 2023 at 5:27 AM Anish Shrigondekar 
  wrote:
>
> Hi dev,
>
> I would like to start a discussion on "Structured Streaming - Arbitrary 
> State API v2". This proposal aims to address a bunch of limitations we 
> see today using mapGroupsWithState/flatMapGroupsWithState operator. The 
> detailed set of limitations is described in the SPIP doc.
>
> We propose to support various features such as multiple state variables 
> (flexible data modeling), composite types, enhanced timer functionality, 
> support for chaining operators after new operator, handling initial state 
> along with state data source, schema evolution etc This will allow users 
> to write more powerful streaming state management logic primarily used in 
> operational use-cases. Other built-in stateful operators could also 
> benefit from such changes in the future.
>
> JIRA: https://issues.apache.org/jira/browse/SPARK-45939
> SPIP: 
> https://docs.google.com/document/d/1QtC5qd4WQEia9kl1Qv74WE0TiXYy3x6zeTykygwPWig/edit?usp=sharing
> Design Doc: 
> https://docs.google.com/document/d/1QjZmNZ-fHBeeCYKninySDIoOEWfX6EmqXs2lK097u9o/edit?usp=sharing
>
> cc - @Jungtaek Lim  who has graciously agreed to be the shepherd for this 
> project
>
> Looking forward to your feedback !
>
> Thanks,
> Anish

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Regression? - UIUtils::formatBatchTime - [SPARK-46611][CORE] Remove ThreadLocal by replace SimpleDateFormat with DateTimeFormatter

2024-01-08 Thread Sean Owen
Agreed, that looks wrong. From the code, it seems that "timezone" is only
used for testing, though apparently no test caught this. I'll submit a PR
to patch it in any event: https://github.com/apache/spark/pull/44619

On Mon, Jan 8, 2024 at 1:33 AM Janda Martin  wrote:

> I think that
>  [SPARK-46611][CORE] Remove ThreadLocal by replace SimpleDateFormat with
> DateTimeFormatter
>
>   introduced regression in UIUtils::formatBatchTime when timezone is
> defined.
>
> DateTimeFormatter is thread-safe and immutable according to JavaDoc so
> method DateTimeFormatter::withZone returns new instance when zone is
> changed.
>
> Following code has no effect:
>   val oldTimezones = (batchTimeFormat.getZone,
> batchTimeFormatWithMilliseconds.getZone)
>   if (timezone != null) {
>   val zoneId = timezone.toZoneId
>   batchTimeFormat.withZone(zoneId)
>   batchTimeFormatWithMilliseconds.withZone(zoneId)
> }
>
> Suggested fix:
> introduce local variables for "batchTimeFormat" and
> "batchTimeFormatWithMilliseconds" and remove "oldTimezones" and "finally"
> block.
>
>   I hope that I'm right. I just read the code. I didn't make any tests.
>
>  Thank you
>Martin
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>