Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-04-06 Thread Pavan Kotikalapudi
Hi Jungtaek,

Status on current SPARK-24815
<https://issues.apache.org/jira/browse/SPARK-24815>:
Thomas Graves is reviewing the draft PR
<https://github.com/apache/spark/pull/42352>. I need to add documentation
about the configs and usage details, I am planning to do that this week.
He did mention that it would be great if somebody with experience in
structured streaming would take a look at the algorithm. Will you be able
to review it?

Another point I wanted to discuss is, as you might have already seen
in the design
doc
<https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing>
we
use traditional DRA configs
spark.dynamicAllocation.enabled,
spark.dynamicAllocation.schedulerBacklogTimeout,
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout,
spark.dynamicAllocation.executorIdleTimeout,
spark.dynamicAllocation.cachedExecutorIdleTimeout,

and few new configs

spark.dynamicAllocation.streaming.enabled,
spark.dynamicAllocation.streaming.executorDeallocationRatio,
spark.dynamicAllocation.streaming.executorDeallocationTimeout.

to make the DRA work for structured streaming.

While in the design doc I did mention that we have to calculate  and set
scale out/back thresholds based on the trigger interval
<https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit#heading=h.15dsvnvbhu2d>.
We (internally in the company) do have helper functions to auto-generate
the above configs based on trigger interval and the threshold configs (we
also got similar feedback in reviews
<https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?disco=AAABIH1y18w>
).
Here are such configs

  # required - should be greater than 3 seconds as that gives enough
seconds for scaleOut and scaleBack thresholds to work with.
  "spark.sql.streaming.triggerInterval.seconds": 
  # optional - value should be between 0 and 1 and greater than
scaleBackThreshold : default is 0.9
  "spark.dynamicAllocation.streaming.scaleOutThreshold": 
  # optional - value should be between 0 and 1 and less than
scaleOutThreshold : default is 0.5
  "spark.dynamicAllocation.streaming.scaleBackThreshold": 

The above configs helps us to generate the below configs for app with
different trigger intervals ( or if they change them for some reason)

spark.dynamicAllocation.schedulerBacklogTimeout,
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout,
spark.dynamicAllocation.executorIdleTimeout,
spark.dynamicAllocation.cachedExecutorIdleTimeout.

While our additional configs have its own limitations. I would like to get
some feedback if adding such kinds of configs to automate
the process of calculating the thresholds and their respective configs
makes sense?

Thank you,

Pavan

On Thu, Mar 28, 2024 at 3:38 PM Pavan Kotikalapudi 
wrote:

> Hi Jungtaek,
>
> Sorry for the late reply.
>
> I understand the concerns towards finding PMC members, I had similar
> concerns in the past. Do you think we have something to improve in the SPIP
> (certain areas) so that it would get traction from PMC members? Or this
> SPIP might not be a priority to the PMC right now?
>
> I agree this change is small enough that it might not be tagged as an
> SPIP. I started with the template SPIP questions so that it would be easier
> to understand the limitations of the current system, new solution, how it
> works, how to use it, limitations etcAs you might have already
> noticed in the PR, This change is turned off by default, will only work if
> `spark.dynamicAllocation.streaming.enabled` is true.
>
> Regarding the concerns about expertise in DRA,  I will find some core
> contributors of this module/DRA and tag them to this email with details,
> Mich has also highlighted the same in the past. Once we get approval from
> them we can further discuss and enhance this to make the user experience
> better.
>
> Thank you,
>
> Pavan
>
>
> On Tue, Mar 26, 2024 at 8:12 PM Jungtaek Lim 
> wrote:
>
>> Sounds good.
>>
>> One thing I'd like to clarify before shepherding this SPIP is the process
>> itself. Getting enough traction from PMC members is another issue to pass
>> the SPIP vote. Even a vote from committer is not counted. (I don't have a
>> binding vote.) I only see one PMC member (Thomas Graves, not my team) in
>> the design doc and we still don't get positive feedback. So still a long
>> way to go. We need three supporters from PMC members.
>>
>> Another thing is, I get the proposal at a high level, but I don't have
>> actual expertise in DRA. I could review the code in general, but I feel
>> like I'm not qualified to approve the code. We still need an expert on the
>> CORE area, especially who has expertise with DRA. (Could you please
>> an

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-28 Thread Pavan Kotikalapudi
Hi Andrew, Sandy, Jerry, Thomas, marcelo, Whenchen, YangJie, Shixiong,

My apologies. I have tagged soo many of you (on multiple emails), I am in
the process of finding the core contributors of the Dynamic resource
allocation (DRA) feature in apache/spark <https://github.com/apache/spark>,
I could find you folks as some of the core contributing members to this
feature.

We(cc'd) would like to extend the current DRA to work for structured
streaming [SPARK-24815 <https://issues.apache.org/jira/browse/SPARK-24815>]
use-case (based on the heuristics of trigger interval).
Here is the design doc
<https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing>.
We also have a draft PR <https://github.com/apache/spark/pull/42352> with
initial implementation.

This feature has been running well for the past one year at my company
(twilio) and there are a lot of folks in the community who are interested
in this feature.

Do get the PR to a mergeable state. We would love to leverage your
expertise on DRA. I request you to please review the design doc and the
draft PR, let us know your thoughts and concerns if any. This will hugely
benefit the community utilizing structured streaming applications in their
data pipelines.

Looking forward to hear back from you.

Thank you,

Pavan

On Thu, Mar 28, 2024 at 3:38 PM Pavan Kotikalapudi 
wrote:

> Hi Jungtaek,
>
> Sorry for the late reply.
>
> I understand the concerns towards finding PMC members, I had similar
> concerns in the past. Do you think we have something to improve in the SPIP
> (certain areas) so that it would get traction from PMC members? Or this
> SPIP might not be a priority to the PMC right now?
>
> I agree this change is small enough that it might not be tagged as an
> SPIP. I started with the template SPIP questions so that it would be easier
> to understand the limitations of the current system, new solution, how it
> works, how to use it, limitations etcAs you might have already
> noticed in the PR, This change is turned off by default, will only work if
> `spark.dynamicAllocation.streaming.enabled` is true.
>
> Regarding the concerns about expertise in DRA,  I will find some core
> contributors of this module/DRA and tag them to this email with details,
> Mich has also highlighted the same in the past. Once we get approval from
> them we can further discuss and enhance this to make the user experience
> better.
>
> Thank you,
>
> Pavan
>
>
> On Tue, Mar 26, 2024 at 8:12 PM Jungtaek Lim 
> wrote:
>
>> Sounds good.
>>
>> One thing I'd like to clarify before shepherding this SPIP is the process
>> itself. Getting enough traction from PMC members is another issue to pass
>> the SPIP vote. Even a vote from committer is not counted. (I don't have a
>> binding vote.) I only see one PMC member (Thomas Graves, not my team) in
>> the design doc and we still don't get positive feedback. So still a long
>> way to go. We need three supporters from PMC members.
>>
>> Another thing is, I get the proposal at a high level, but I don't have
>> actual expertise in DRA. I could review the code in general, but I feel
>> like I'm not qualified to approve the code. We still need an expert on the
>> CORE area, especially who has expertise with DRA. (Could you please
>> annotate the code and enumerate several people who worked on the codebase?)
>> If they need an expertise of streaming to understand how things will work
>> then either you or I can explain, but I can't just approve and merge the
>> code.
>>
>> That said, if we succeed in finding one and they review the code and
>> LGTM, I'd rather say not to go with taking the process of SPIP unless the
>> expert reviewing your code requires us to do so. The change you proposed is
>> rather small and does not seem to be invasive (experts can also weigh), and
>> there must never be the case that this feature is turned on by default (as
>> we pointed out limitation). It doesn't look like requiring SPIP, if we
>> carefully document the new change and also clearly describe the limitation.
>> (Also a warning in the codebase that this must not be enabled by default.)
>>
>>
>> On Tue, Mar 26, 2024 at 7:02 PM Pavan Kotikalapudi <
>> pkotikalap...@twilio.com> wrote:
>>
>>> Hi Bhuwan,
>>>
>>> Glad to hear back from you! Very much appreciate your help on reviewing
>>> the design doc/PR and endorsing this proposal.
>>>
>>> Thank you so much @Jungtaek Lim  , @Mich
>>> Talebzadeh   for graciously agreeing to
>>> mentor/shepherd this effort.
>>>
>>> Regarding Twilio copyright in Notice binary file:
>>&g

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-28 Thread Pavan Kotikalapudi
Hi Jungtaek,

Sorry for the late reply.

I understand the concerns towards finding PMC members, I had similar
concerns in the past. Do you think we have something to improve in the SPIP
(certain areas) so that it would get traction from PMC members? Or this
SPIP might not be a priority to the PMC right now?

I agree this change is small enough that it might not be tagged as an SPIP.
I started with the template SPIP questions so that it would be easier to
understand the limitations of the current system, new solution, how it
works, how to use it, limitations etcAs you might have already
noticed in the PR, This change is turned off by default, will only work if
`spark.dynamicAllocation.streaming.enabled` is true.

Regarding the concerns about expertise in DRA,  I will find some core
contributors of this module/DRA and tag them to this email with details,
Mich has also highlighted the same in the past. Once we get approval from
them we can further discuss and enhance this to make the user experience
better.

Thank you,

Pavan


On Tue, Mar 26, 2024 at 8:12 PM Jungtaek Lim 
wrote:

> Sounds good.
>
> One thing I'd like to clarify before shepherding this SPIP is the process
> itself. Getting enough traction from PMC members is another issue to pass
> the SPIP vote. Even a vote from committer is not counted. (I don't have a
> binding vote.) I only see one PMC member (Thomas Graves, not my team) in
> the design doc and we still don't get positive feedback. So still a long
> way to go. We need three supporters from PMC members.
>
> Another thing is, I get the proposal at a high level, but I don't have
> actual expertise in DRA. I could review the code in general, but I feel
> like I'm not qualified to approve the code. We still need an expert on the
> CORE area, especially who has expertise with DRA. (Could you please
> annotate the code and enumerate several people who worked on the codebase?)
> If they need an expertise of streaming to understand how things will work
> then either you or I can explain, but I can't just approve and merge the
> code.
>
> That said, if we succeed in finding one and they review the code and LGTM,
> I'd rather say not to go with taking the process of SPIP unless the expert
> reviewing your code requires us to do so. The change you proposed is rather
> small and does not seem to be invasive (experts can also weigh), and there
> must never be the case that this feature is turned on by default (as we
> pointed out limitation). It doesn't look like requiring SPIP, if we
> carefully document the new change and also clearly describe the limitation.
> (Also a warning in the codebase that this must not be enabled by default.)
>
>
> On Tue, Mar 26, 2024 at 7:02 PM Pavan Kotikalapudi <
> pkotikalap...@twilio.com> wrote:
>
>> Hi Bhuwan,
>>
>> Glad to hear back from you! Very much appreciate your help on reviewing
>> the design doc/PR and endorsing this proposal.
>>
>> Thank you so much @Jungtaek Lim  , @Mich
>> Talebzadeh   for graciously agreeing to
>> mentor/shepherd this effort.
>>
>> Regarding Twilio copyright in Notice binary file:
>> Twilio Opensource counsel was involved all through the process, I have
>> placed it in the project file prior to Twilio signing a CCLA for the spark
>> project contribution( Aug '23).
>>
>> Since the CCLA is signed now, I have removed the twilio copyright from
>> that file. I didn't get a chance to update the PR after github-actions
>> closed it.
>>
>> Please let me know of next steps needed to bring this draft PR/effort to
>> completion.
>>
>> Thank you,
>>
>> Pavan
>>
>>
>> On Tue, Mar 26, 2024 at 12:01 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> I'm happy to, but it looks like I need to check one more thing about the
>>> license, according to the WIP PR
>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!a1C5BeYxzO7gVVrGZ56kzunhigqd4SeXMg3dHddtkIdIpO5UwFH3dxzNpK3bc53vuAkFYJ3goLU8Hxev8npLyDrA6JBQ8S0$>
>>> .
>>>
>>> @Pavan Kotikalapudi 
>>> I see you've added the copyright of Twilio in the NOTICE-binary file,
>>> which makes me wonder if Twilio had filed CCLA to the Apache Software
>>> Foundation.
>>>
>>> PMC members can correct me if I'm mistaken, but from my understanding
>>> (and experiences of PMC member in other ASF project), code contribution is
>>> considered as code donation and copyright belongs to ASF. That's why you
>>> can't find the copyright of employers for contributors in the codebase.
>>> What you see copyrights in NOTICE-binary is due to the fact we have binary
&g

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Pavan Kotikalapudi
Sounds good.

Thanks again for your help on guiding the effort from discussion/review
through voting phases in the spark dev community.

Thank you,

Pavan

On Tue, Mar 26, 2024 at 4:20 AM Mich Talebzadeh 
wrote:

> Hi Pavan,
>
> Thanks for instigating this proposal. Looks like the proposal is ready and
> has enough votes to be implemented. Having a sheppard will make it more
> fruitful.
>
> I will leave it to @Jungtaek Lim  's
> capable hands to drive it forward.
>
> Will be there to help if needed.
>
> Cheers
>
> Mich Talebzadeh,
> Technologist | Solutions Architect | Data Engineer  | Generative AI
> London
> United Kingdom
>
>
>view my Linkedin profile
> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!fi2RXZxkkJ-KxDWvHO9lRdlxh1Nu7smOSqX0Wbi6Gq0s7elQplrqshAH89qRNXw44q1o3Uk1q7FXrFHobRp6rkB9dUUgig$>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!fi2RXZxkkJ-KxDWvHO9lRdlxh1Nu7smOSqX0Wbi6Gq0s7elQplrqshAH89qRNXw44q1o3Uk1q7FXrFHobRp6rkD7rLCHOA$>
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner
> <https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Wernher_von_Braun__;!!NCc8flgU!fi2RXZxkkJ-KxDWvHO9lRdlxh1Nu7smOSqX0Wbi6Gq0s7elQplrqshAH89qRNXw44q1o3Uk1q7FXrFHobRp6rkC5hzctXw$>Von
> Braun
> <https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Wernher_von_Braun__;!!NCc8flgU!fi2RXZxkkJ-KxDWvHO9lRdlxh1Nu7smOSqX0Wbi6Gq0s7elQplrqshAH89qRNXw44q1o3Uk1q7FXrFHobRp6rkC5hzctXw$>
> )".
>
>
> On Tue, 26 Mar 2024 at 10:02, Pavan Kotikalapudi 
> wrote:
>
>> Hi Bhuwan,
>>
>> Glad to hear back from you! Very much appreciate your help on reviewing
>> the design doc/PR and endorsing this proposal.
>>
>> Thank you so much @Jungtaek Lim  , @Mich
>> Talebzadeh   for graciously agreeing to
>> mentor/shepherd this effort.
>>
>> Regarding Twilio copyright in Notice binary file:
>> Twilio Opensource counsel was involved all through the process, I have
>> placed it in the project file prior to Twilio signing a CCLA for the spark
>> project contribution( Aug '23).
>>
>> Since the CCLA is signed now, I have removed the twilio copyright from
>> that file. I didn't get a chance to update the PR after github-actions
>> closed it.
>>
>> Please let me know of next steps needed to bring this draft PR/effort to
>> completion.
>>
>> Thank you,
>>
>> Pavan
>>
>>
>> On Tue, Mar 26, 2024 at 12:01 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> I'm happy to, but it looks like I need to check one more thing about the
>>> license, according to the WIP PR
>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!a1C5BeYxzO7gVVrGZ56kzunhigqd4SeXMg3dHddtkIdIpO5UwFH3dxzNpK3bc53vuAkFYJ3goLU8Hxev8npLyDrA6JBQ8S0$>
>>> .
>>>
>>> @Pavan Kotikalapudi 
>>> I see you've added the copyright of Twilio in the NOTICE-binary file,
>>> which makes me wonder if Twilio had filed CCLA to the Apache Software
>>> Foundation.
>>>
>>> PMC members can correct me if I'm mistaken, but from my understanding
>>> (and experiences of PMC member in other ASF project), code contribution is
>>> considered as code donation and copyright belongs to ASF. That's why you
>>> can't find the copyright of employers for contributors in the codebase.
>>> What you see copyrights in NOTICE-binary is due to the fact we have binary
>>> dependency and their licenses may require to explicitly mention about
>>> copyright. It's not about direct code contribution.
>>>
>>> Is Twilio aware of this? Also, if Twilio did not file CCLA in prior,
>>> could you please engage with a relevant group in the company (could be a
>>> legal team, or similar with OSS advocate team if there is any) and ensure
>>> that CCLA is filed? The copyright issue is a legal issue, so we have to be
>>> conservative and 100% sure that the employer is aware of what is the
>>> meaning of donating the code to ASF via reviewing CCLA and relevant doc,
>>> and explicitly express that they are OK with it via filing CCLA.
>>>
>>> You can read the description of agreements on contribution and ICLA/CCLA
>>> form from this page.
>>> https://www.apache.org/lice

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Pavan Kotikalapudi
Hi Bhuwan,

Glad to hear back from you! Very much appreciate your help on reviewing the
design doc/PR and endorsing this proposal.

Thank you so much @Jungtaek Lim  , @Mich
Talebzadeh   for graciously agreeing to
mentor/shepherd this effort.

Regarding Twilio copyright in Notice binary file:
Twilio Opensource counsel was involved all through the process, I have
placed it in the project file prior to Twilio signing a CCLA for the spark
project contribution( Aug '23).

Since the CCLA is signed now, I have removed the twilio copyright from that
file. I didn't get a chance to update the PR after github-actions closed it.

Please let me know of next steps needed to bring this draft PR/effort to
completion.

Thank you,

Pavan


On Tue, Mar 26, 2024 at 12:01 AM Jungtaek Lim 
wrote:

> I'm happy to, but it looks like I need to check one more thing about the
> license, according to the WIP PR
> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!a1C5BeYxzO7gVVrGZ56kzunhigqd4SeXMg3dHddtkIdIpO5UwFH3dxzNpK3bc53vuAkFYJ3goLU8Hxev8npLyDrA6JBQ8S0$>
> .
>
> @Pavan Kotikalapudi 
> I see you've added the copyright of Twilio in the NOTICE-binary file,
> which makes me wonder if Twilio had filed CCLA to the Apache Software
> Foundation.
>
> PMC members can correct me if I'm mistaken, but from my understanding (and
> experiences of PMC member in other ASF project), code contribution is
> considered as code donation and copyright belongs to ASF. That's why you
> can't find the copyright of employers for contributors in the codebase.
> What you see copyrights in NOTICE-binary is due to the fact we have binary
> dependency and their licenses may require to explicitly mention about
> copyright. It's not about direct code contribution.
>
> Is Twilio aware of this? Also, if Twilio did not file CCLA in prior, could
> you please engage with a relevant group in the company (could be a legal
> team, or similar with OSS advocate team if there is any) and ensure that
> CCLA is filed? The copyright issue is a legal issue, so we have to be
> conservative and 100% sure that the employer is aware of what is the
> meaning of donating the code to ASF via reviewing CCLA and relevant doc,
> and explicitly express that they are OK with it via filing CCLA.
>
> You can read the description of agreements on contribution and ICLA/CCLA
> form from this page.
> https://www.apache.org/licenses/contributor-agreements.html
> <https://urldefense.com/v3/__https://www.apache.org/licenses/contributor-agreements.html__;!!NCc8flgU!a1C5BeYxzO7gVVrGZ56kzunhigqd4SeXMg3dHddtkIdIpO5UwFH3dxzNpK3bc53vuAkFYJ3goLU8Hxev8npLyDrAktmm6BY$>
>
> Please let me know if this is resolved. This seems to me as a blocker to
> move on. Please also let me know if the contribution is withdrawn from the
> employer.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
>
> On Mon, Mar 25, 2024 at 11:47 PM Bhuwan Sahni
>  wrote:
>
>> Hi Pavan,
>>
>> I looked at the PR, and the changes look simple and contained. It would
>> be useful to add dynamic resource allocation to Spark Structured Streaming.
>>
>> Jungtaek. Would you be able to shepherd this change?
>>
>>
>> On Tue, Mar 19, 2024 at 10:38 AM Bhuwan Sahni <
>> bhuwan.sa...@databricks.com> wrote:
>>
>>> Thanks a lot for creating the risk table Pavan. My apologies. I was tied
>>> up with high priority items for the last couple weeks and could not
>>> respond. I will review the PR by tomorrow's end, and get back to you.
>>>
>>> Appreciate your patience.
>>>
>>> Thanks
>>> Bhuwan Sahni
>>>
>>> On Sun, Mar 17, 2024 at 4:42 PM Pavan Kotikalapudi <
>>> pkotikalap...@twilio.com> wrote:
>>>
>>>> Hi Bhuwan,
>>>>
>>>> I hope the team got a chance to review the draft PR, looking for some
>>>> comments to see if the plan looks alright?. I have updated the document
>>>> about the risks
>>>> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit*heading=h.577aawlyiedf__;Iw!!NCc8flgU!a1C5BeYxzO7gVVrGZ56kzunhigqd4SeXMg3dHddtkIdIpO5UwFH3dxzNpK3bc53vuAkFYJ3goLU8Hxev8npLyDrAzuRa_bM$>.(also
>>>> mentioned below). Please confirm if it looks alright?
>>>>
>>>> *Spark application type*
>>>>
>>>> *auto-scaling capability*
>>>>
>>>> *with New auto-scaling capability*
>>>>
>>>> Spark Batch job
>>>>
>>>> Works with current DRA
>>>>
>>>> No - change
>>>>
>>>> Streaming query without trigger inte

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-17 Thread Pavan Kotikalapudi
Hi Bhuwan,

I hope the team got a chance to review the draft PR, looking for some
comments to see if the plan looks alright?. I have updated the document
about the risks
<https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit#heading=h.577aawlyiedf>.(also
mentioned below). Please confirm if it looks alright?

*Spark application type*

*auto-scaling capability*

*with New auto-scaling capability*

Spark Batch job

Works with current DRA

No - change

Streaming query without trigger interval

No implementation

Can work with this implementation - (have to set certain scale back configs
based on previous usage pattern) - maybe automate with future work?

Spark Streaming query with Trigger interval

No implementation

With this implementation

Spark Streaming query with one-time micro batch

Works with current DRA

No - change

Spark Streaming query with

Availablenow micro batch

Works with current DRA

No - change

Batch + Streaming query (

default/

triggger-interval/

once/

availablenow modes), other notebook use cases.

No implementation

No implementation



We are more than happy to collaborate on a call to make better progress
on this enhancement. Please let us know.

Thank you,

Pavan

On Fri, Mar 1, 2024 at 12:26 PM Mich Talebzadeh 
wrote:

>
> Hi Bhuwan et al,
>
> Thank you for passing on the DataBricks Structured Streaming team's review
> of the SPIP document. FYI, I work closely with Pawan and other members to
> help deliver this piece of work. We appreciate your insights, especially
> regarding the cost savings potential from the PoC.
>
> Pavan already furnished you with some additional info. Your team's point
> about the SPIP currently addressing a specific use case (single streaming
> query with Processing Time trigger) is well-taken. We agree that
> maintaining simplicity is key, particularly as we explore more general
> resource allocation mechanisms in the future. To address the concerns and
> foster open discussion, The DataBricks team are invited to directly add
> their comments and suggestions to the Jira itself
>
> [SPARK-24815] Structured Streaming should support dynamic allocation - ASF
> JIRA (apache.org)
> <https://urldefense.com/v3/__https://issues.apache.org/jira/browse/SPARK-24815__;!!NCc8flgU!ZBV18VoUoRaD0b9X-yFgk39nnRoGZbGmeye3it4vXjffFIYZXF72EIjYL38AN1F-vPRwKCPGD4-gfiDnr8AS4UBUjIj4Iw$>
> This will ensure everyone involved can benefit from your team's expertise
> and facilitate further collaboration.
>
> Thanks
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!ZBV18VoUoRaD0b9X-yFgk39nnRoGZbGmeye3it4vXjffFIYZXF72EIjYL38AN1F-vPRwKCPGD4-gfiDnr8AS4UCNE366aQ$>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!ZBV18VoUoRaD0b9X-yFgk39nnRoGZbGmeye3it4vXjffFIYZXF72EIjYL38AN1F-vPRwKCPGD4-gfiDnr8AS4UCJndqi8A$>
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner
> <https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Wernher_von_Braun__;!!NCc8flgU!ZBV18VoUoRaD0b9X-yFgk39nnRoGZbGmeye3it4vXjffFIYZXF72EIjYL38AN1F-vPRwKCPGD4-gfiDnr8AS4UDxzB-u4g$>Von
> Braun
> <https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Wernher_von_Braun__;!!NCc8flgU!ZBV18VoUoRaD0b9X-yFgk39nnRoGZbGmeye3it4vXjffFIYZXF72EIjYL38AN1F-vPRwKCPGD4-gfiDnr8AS4UDxzB-u4g$>
> )".
>
>
> On Fri, 1 Mar 2024 at 19:59, Pavan Kotikalapudi
>  wrote:
>
>> Thanks Bhuwan and rest of the databricks team for the reviews,
>>
>> I appreciate your reviews, was very helpful in evaluating a few options
>> that were overlooked earlier (especially about mixed spark apps running on
>> notebooks). Regarding the use-cases, It could handle multiple streaming
>> queries provided that they are run on the same trigger interval processing
>> time (very similar to how current batch dra is set up)..but I felt like it
>> would be beneficial if we separate out streaming queries when setting up
>> production pipelines.
>>
>> Regarding the implementation, here is the draft PR
>> https://github.com/apache/spark/pull/42352
>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!ZBV18VoUoRaD0b9X-yFgk39nnRoGZbGmeye3it4vXjffFIYZXF72EIjYL38AN1F-vPRwKCPGD4-gfiDnr8AS4UC8iQomlg$>.
>> (already mentioned in ticket SPARK-24815
>> <https://u

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Pavan Kotikalapudi
NCc8flgU!aSaWrvwsxmouPhWml3DfaL6LSwSmsaX4XQP34pD4nXINAKXtLWeYqtNIUjJnqKdot44IaAexEVjBpcnuKih5d6ZKLWRYWfbGToAE$>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!aSaWrvwsxmouPhWml3DfaL6LSwSmsaX4XQP34pD4nXINAKXtLWeYqtNIUjJnqKdot44IaAexEVjBpcnuKih5d6ZKLWRYWTA0_mlE$>
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> <https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Wernher_von_Braun__;!!NCc8flgU!aSaWrvwsxmouPhWml3DfaL6LSwSmsaX4XQP34pD4nXINAKXtLWeYqtNIUjJnqKdot44IaAexEVjBpcnuKih5d6ZKLWRYWSBVjq6O$>Von
>> Braun
>> <https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Wernher_von_Braun__;!!NCc8flgU!aSaWrvwsxmouPhWml3DfaL6LSwSmsaX4XQP34pD4nXINAKXtLWeYqtNIUjJnqKdot44IaAexEVjBpcnuKih5d6ZKLWRYWSBVjq6O$>
>> )".
>>
>>
>> On Fri, 23 Feb 2024 at 17:41, Pavan Kotikalapudi
>>  wrote:
>>
>>> Thanks for the pointers Mich, will wait for Jungtaek Lee or any other
>>> PMC members to respond.
>>>
>>> aggregating upvotes to this email thread
>>>
>>> +6
>>> Mich Talebzadeh
>>> Adam Hobbs
>>> Pavan Kotikalapudi
>>> Krystal Mitchell
>>> Sona Torosyan
>>> Aaron Kern
>>>
>>> Thank you,
>>>
>>> Pavan
>>>
>>> On Thu, Feb 22, 2024 at 3:07 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> please check this doc
>>>>
>>>> Spark Project Improvement Proposals (SPIP) | Apache Spark
>>>> <https://urldefense.com/v3/__https://spark.apache.org/improvement-proposals.html__;!!NCc8flgU!dJHLBpsBdsmdGt7dGsV2kyUhjpah0Z3g27vaxbmk2IA8gKdE4x_RgGK9V4wFOK7k2sZNMxzBz_9MHb9C5YHtjL5qy0rbHA$>
>>>>
>>>> and specifically the below extract
>>>>
>>>> Discussing an SPIP
>>>>
>>>> All discussion of an SPIP should take place in a public forum,
>>>> preferably the discussion attached to the Jira. Any discussions that happen
>>>> offline should be made available online for the public via meeting notes
>>>> summarizing the discussions.(done)
>>>>
>>>> During this discussion, one or more shepherds should be identified
>>>> among PMC members. (outstanding)
>>>>
>>>> Once the discussion settles, the shepherd(s) should call for a vote on
>>>> the SPIP moving forward on the dev@ list. The vote should be open for
>>>> at least 72 hours and follows the typical Apache vote process and passes
>>>> upon consensus (at least 3 +1 votes from PMC members and no -1 votes from
>>>> PMC members). dev@ should be notified of the vote result.
>>>>
>>>> If there does not exist at least one PMC member that is committed to
>>>> shepherding the change within a month, the SPIP is rejected.
>>>>
>>>> If a committer does not think a SPIP aligns with long-term project
>>>> goals, or is not practical at the point of proposal, the committer should
>>>> -1 the SPIP explicitly and give technical justifications.
>>>> OK a shepherd from PMC members is required. Maybe Jungtaek Lee can
>>>> kindly help the process
>>>>
>>>> cheers
>>>>
>>>> Mich Talebzadeh,
>>>> Dad | Technologist | Solutions Architect | Engineer
>>>> London
>>>> United Kingdom
>>>>
>>>>
>>>>view my Linkedin profile
>>>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!dJHLBpsBdsmdGt7dGsV2kyUhjpah0Z3g27vaxbmk2IA8gKdE4x_RgGK9V4wFOK7k2sZNMxzBz_9MHb9C5YHtjL6nGmLi3g$>
>>>>
>>>>
>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!dJHLBpsBdsmdGt7dGsV2kyUhjpah0Z3g27vaxbmk2IA8gKdE4x_RgGK9V4wFOK7k2sZNMxzBz_9MHb9C5YHtjL5rLq6E3w$>
>>>>
>>>>
>>>>
>>>> *Disclaimer:* The information provided is correct to the best of my
>>>> knowledge but of course cannot be guaranteed . It is essential to note
>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>> expert opini

Re: Please unlock Jira ticket for SPARK-24815, Dynamic resource allocation for structured streaming

2024-02-26 Thread Pavan Kotikalapudi
Thanks Yuming.

On Mon, Feb 26, 2024 at 9:55 PM Yuming Wang  wrote:

> Unlocked.
>
> On Tue, Feb 27, 2024 at 11:47 AM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>>
>> Hi,
>>
>> Can a committer please unlock this SPIP? It is for Dynamic resource
>> allocation for structured streaming that has got 6 votes. it was locked
>> because of inactivity by GitHub actions
>>
>> [SPARK-24815] Structured Streaming should support dynamic allocation -
>> ASF JIRA (apache.org)
>> 
>>
>> For now I have volunteered to mentor the team until a committer
>> volunteers to take it over. This should not be that strenuous  hopefully.
>>
>> Thanks
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>> 
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> Von
>> Braun
>> 
>> )".
>>
>


Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Pavan Kotikalapudi
Thanks for the pointers Mich, will wait for Jungtaek Lee or any other PMC
members to respond.

aggregating upvotes to this email thread

+6
Mich Talebzadeh
Adam Hobbs
Pavan Kotikalapudi
Krystal Mitchell
Sona Torosyan
Aaron Kern

Thank you,

Pavan

On Thu, Feb 22, 2024 at 3:07 PM Mich Talebzadeh 
wrote:

> Hi,
>
> please check this doc
>
> Spark Project Improvement Proposals (SPIP) | Apache Spark
> <https://urldefense.com/v3/__https://spark.apache.org/improvement-proposals.html__;!!NCc8flgU!dJHLBpsBdsmdGt7dGsV2kyUhjpah0Z3g27vaxbmk2IA8gKdE4x_RgGK9V4wFOK7k2sZNMxzBz_9MHb9C5YHtjL5qy0rbHA$>
>
> and specifically the below extract
>
> Discussing an SPIP
>
> All discussion of an SPIP should take place in a public forum, preferably
> the discussion attached to the Jira. Any discussions that happen offline
> should be made available online for the public via meeting notes
> summarizing the discussions.(done)
>
> During this discussion, one or more shepherds should be identified among
> PMC members. (outstanding)
>
> Once the discussion settles, the shepherd(s) should call for a vote on the
> SPIP moving forward on the dev@ list. The vote should be open for at
> least 72 hours and follows the typical Apache vote process and passes upon
> consensus (at least 3 +1 votes from PMC members and no -1 votes from PMC
> members). dev@ should be notified of the vote result.
>
> If there does not exist at least one PMC member that is committed to
> shepherding the change within a month, the SPIP is rejected.
>
> If a committer does not think a SPIP aligns with long-term project goals,
> or is not practical at the point of proposal, the committer should -1 the
> SPIP explicitly and give technical justifications.
> OK a shepherd from PMC members is required. Maybe Jungtaek Lee can kindly
> help the process
>
> cheers
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!dJHLBpsBdsmdGt7dGsV2kyUhjpah0Z3g27vaxbmk2IA8gKdE4x_RgGK9V4wFOK7k2sZNMxzBz_9MHb9C5YHtjL6nGmLi3g$>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!dJHLBpsBdsmdGt7dGsV2kyUhjpah0Z3g27vaxbmk2IA8gKdE4x_RgGK9V4wFOK7k2sZNMxzBz_9MHb9C5YHtjL5rLq6E3w$>
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner
> <https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Wernher_von_Braun__;!!NCc8flgU!dJHLBpsBdsmdGt7dGsV2kyUhjpah0Z3g27vaxbmk2IA8gKdE4x_RgGK9V4wFOK7k2sZNMxzBz_9MHb9C5YHtjL4exCs1_Q$>Von
> Braun
> <https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Wernher_von_Braun__;!!NCc8flgU!dJHLBpsBdsmdGt7dGsV2kyUhjpah0Z3g27vaxbmk2IA8gKdE4x_RgGK9V4wFOK7k2sZNMxzBz_9MHb9C5YHtjL4exCs1_Q$>
> )".
>
>
> On Thu, 22 Feb 2024 at 21:52, Pavan Kotikalapudi
>  wrote:
>
>> Hi Mich,
>>
>> We have
>>
>> five  +1s till now.
>>
>> Mich Talebzadeh
>> Adam Hobbs
>> Pavan Kotikalapudi
>> Krystal Mitchell
>> Sona Torosyan
>> (few more in github pr)
>> +0: None
>>
>> -1: None
>>
>> Does it pass the required condition as approved?
>>
>>
>> Not sure of that though, nothing about minimum required is mentioned in
>> the past emails.
>>
>> I would request spark PMC members or any others who have done this in the
>> past to understand the process better.
>>
>> Thank you,
>>
>> Pavan
>>
>> On Thu, Feb 22, 2024 at 3:20 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Pavan,
>>>
>>> Do you have a list of votes for this feature by any chance? Does it pass
>>> the required condition as approved?
>>>
>>> HTH
>>>
>>> Mich Talebzadeh,
>>> Dad | Technologist | Solutions Architect | Engineer
>>> London
>>> United Kingdom
>>>
>>>
>>>view my Linkedin profile
>>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!d1kZcsoBaeESUOMsb65wLw8dWRZEP3M2DyjVC4M4ie4NbCcMm9jETo-zSzhl3hcGLSFKRzsfReUfos7lbV5t0A1aYWcDAg$>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!d1kZcsoBaeESUOM

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Pavan Kotikalapudi
Hi Mich,

We have

five  +1s till now.

Mich Talebzadeh
Adam Hobbs
Pavan Kotikalapudi
Krystal Mitchell
Sona Torosyan
(few more in github pr)
+0: None

-1: None

Does it pass the required condition as approved?


Not sure of that though, nothing about minimum required is mentioned in the
past emails.

I would request spark PMC members or any others who have done this in the
past to understand the process better.

Thank you,

Pavan

On Thu, Feb 22, 2024 at 3:20 AM Mich Talebzadeh 
wrote:

> Hi Pavan,
>
> Do you have a list of votes for this feature by any chance? Does it pass
> the required condition as approved?
>
> HTH
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!d1kZcsoBaeESUOMsb65wLw8dWRZEP3M2DyjVC4M4ie4NbCcMm9jETo-zSzhl3hcGLSFKRzsfReUfos7lbV5t0A1aYWcDAg$>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!d1kZcsoBaeESUOMsb65wLw8dWRZEP3M2DyjVC4M4ie4NbCcMm9jETo-zSzhl3hcGLSFKRzsfReUfos7lbV5t0A0gQVKWXw$>
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner
> <https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Wernher_von_Braun__;!!NCc8flgU!d1kZcsoBaeESUOMsb65wLw8dWRZEP3M2DyjVC4M4ie4NbCcMm9jETo-zSzhl3hcGLSFKRzsfReUfos7lbV5t0A0P4WA5mw$>Von
> Braun
> <https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Wernher_von_Braun__;!!NCc8flgU!d1kZcsoBaeESUOMsb65wLw8dWRZEP3M2DyjVC4M4ie4NbCcMm9jETo-zSzhl3hcGLSFKRzsfReUfos7lbV5t0A0P4WA5mw$>
> )".
>
>
> On Thu, 22 Feb 2024 at 10:04, Pavan Kotikalapudi
>  wrote:
>
>> Yes. The PR was closed due to inactivity by github actions..
>>
>> The msg
>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352*issuecomment-1865306284__;Iw!!NCc8flgU!d1kZcsoBaeESUOMsb65wLw8dWRZEP3M2DyjVC4M4ie4NbCcMm9jETo-zSzhl3hcGLSFKRzsfReUfos7lbV5t0A113artKQ$>
>>  also
>> says
>>
>> > If you'd like to revive this PR, please reopen it and ask a committer
>> to remove the Stale tag!
>>
>> On Thu, Feb 22, 2024 at 1:09 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> I can see it was closed. Was it because of inactivity?
>>>
>>>
>>> Mich Talebzadeh,
>>> Dad | Technologist | Solutions Architect | Engineer
>>> London
>>> United Kingdom
>>>
>>>
>>>view my Linkedin profile
>>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!ay85y5IRZ-bv2v2dR8HP7lChTidWLK_bsLQVbOqng9bwhC30-WY-SKIUNTIJCJaVCLHGgHDJOCmJ11L9pU6yO7lCFDAOXA$>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!ay85y5IRZ-bv2v2dR8HP7lChTidWLK_bsLQVbOqng9bwhC30-WY-SKIUNTIJCJaVCLHGgHDJOCmJ11L9pU6yO7kBRUgBOQ$>
>>>
>>>
>>>
>>> *Disclaimer:* The information provided is correct to the best of my
>>> knowledge but of course cannot be guaranteed . It is essential to note
>>> that, as with any advice, quote "one test result is worth one-thousand
>>> expert opinions (Werner
>>> <https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Wernher_von_Braun__;!!NCc8flgU!ay85y5IRZ-bv2v2dR8HP7lChTidWLK_bsLQVbOqng9bwhC30-WY-SKIUNTIJCJaVCLHGgHDJOCmJ11L9pU6yO7lSMcDbbg$>Von
>>> Braun
>>> <https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Wernher_von_Braun__;!!NCc8flgU!ay85y5IRZ-bv2v2dR8HP7lChTidWLK_bsLQVbOqng9bwhC30-WY-SKIUNTIJCJaVCLHGgHDJOCmJ11L9pU6yO7lSMcDbbg$>
>>> )".
>>>
>>>
>>> On Thu, 22 Feb 2024 at 06:58, Pavan Kotikalapudi
>>>  wrote:
>>>
>>>> Hi Spark PMC members,
>>>>
>>>> I think we have few upvotes for this effort here and more people are
>>>> showing interest (see  PR comments
>>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352*issuecomment-1955238640__;Iw!!NCc8flgU!ay85y5IRZ-bv2v2dR8HP7lChTidWLK_bsLQVbOqng9bwhC30-WY-SKIUNTIJCJaVCLHGgHDJOCmJ11L9pU6yO7k0wc9hCg$>
>>>> .)
>>>>
>>>> Is anyone interested in mentoring and reviewing this effort?
>>>>
>>>> Also can the repository admin/owner re-open the PR?  (

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Pavan Kotikalapudi
Yes. The PR was closed due to inactivity by github actions..

The msg <https://github.com/apache/spark/pull/42352#issuecomment-1865306284>
also
says

> If you'd like to revive this PR, please reopen it and ask a committer to
remove the Stale tag!

On Thu, Feb 22, 2024 at 1:09 AM Mich Talebzadeh 
wrote:

> I can see it was closed. Was it because of inactivity?
>
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!ay85y5IRZ-bv2v2dR8HP7lChTidWLK_bsLQVbOqng9bwhC30-WY-SKIUNTIJCJaVCLHGgHDJOCmJ11L9pU6yO7lCFDAOXA$>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!ay85y5IRZ-bv2v2dR8HP7lChTidWLK_bsLQVbOqng9bwhC30-WY-SKIUNTIJCJaVCLHGgHDJOCmJ11L9pU6yO7kBRUgBOQ$>
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner
> <https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Wernher_von_Braun__;!!NCc8flgU!ay85y5IRZ-bv2v2dR8HP7lChTidWLK_bsLQVbOqng9bwhC30-WY-SKIUNTIJCJaVCLHGgHDJOCmJ11L9pU6yO7lSMcDbbg$>Von
> Braun
> <https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Wernher_von_Braun__;!!NCc8flgU!ay85y5IRZ-bv2v2dR8HP7lChTidWLK_bsLQVbOqng9bwhC30-WY-SKIUNTIJCJaVCLHGgHDJOCmJ11L9pU6yO7lSMcDbbg$>
> )".
>
>
> On Thu, 22 Feb 2024 at 06:58, Pavan Kotikalapudi
>  wrote:
>
>> Hi Spark PMC members,
>>
>> I think we have few upvotes for this effort here and more people are
>> showing interest (see  PR comments
>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352*issuecomment-1955238640__;Iw!!NCc8flgU!ay85y5IRZ-bv2v2dR8HP7lChTidWLK_bsLQVbOqng9bwhC30-WY-SKIUNTIJCJaVCLHGgHDJOCmJ11L9pU6yO7k0wc9hCg$>
>> .)
>>
>> Is anyone interested in mentoring and reviewing this effort?
>>
>> Also can the repository admin/owner re-open the PR?  ( I guess people
>> only with admin access to the repository can do that).
>>
>> Thank you,
>>
>> Pavan
>>
>> On Tue, Feb 20, 2024 at 2:08 PM Krystal Mitchell
>>  wrote:
>>
>>> +1
>>>
>>> On 2024/01/17 17:49:32 Pavan Kotikalapudi wrote:
>>> > Thanks for proposing and voting for the feature Mich.
>>> >
>>> > adding some references to the thread.
>>> >
>>> >- Jira ticket - SPARK-24815
>>> ><https://issues.apache.org/jira/browse/SPARK-24815>
>>> <https://urldefense.com/v3/__https://issues.apache.org/jira/browse/SPARK-24815*3E__;JQ!!NCc8flgU!b8v0cnobIeWmrtrGvm7r3lY83cOCZBDfHYW8xGj1tzG-9XYCnzsQoebrCmyMCJBXU52BSm3phgntc1HXve-r64f0rbw$>
>>> >- Design Doc
>>> ><
>>> https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing>
>>> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing*3E__;JQ!!NCc8flgU!b8v0cnobIeWmrtrGvm7r3lY83cOCZBDfHYW8xGj1tzG-9XYCnzsQoebrCmyMCJBXU52BSm3phgntc1HXve-r44a1rO8$>
>>> >
>>> >- discussion thread
>>> ><https://lists.apache.org/thread/9yx0jnk9h1234joymwlzfx2gh2m8b9bo>
>>> <https://urldefense.com/v3/__https://lists.apache.org/thread/9yx0jnk9h1234joymwlzfx2gh2m8b9bo*3E__;JQ!!NCc8flgU!b8v0cnobIeWmrtrGvm7r3lY83cOCZBDfHYW8xGj1tzG-9XYCnzsQoebrCmyMCJBXU52BSm3phgntc1HXve-rkLpTOYM$>
>>> >- PR with initial implementation -
>>> >https://github.com/apache/spark/pull/42352
>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!b8v0cnobIeWmrtrGvm7r3lY83cOCZBDfHYW8xGj1tzG-9XYCnzsQoebrCmyMCJBXU52BSm3phgntc1HXve-rZAZFOls$>
>>> >
>>> > Please vote with:
>>> >
>>> > [ ] +1: Accept the proposal and start with the development.
>>> > [ ] +0
>>> > [ ] -1: I don’t think this is a good idea because …
>>> >
>>> > Thank you,
>>> >
>>> > Pavan
>>> >
>>> > On Wed, Jan 17, 2024 at 9:52 PM Mich Talebzadeh 
>>> > wrote:
>>> >
>>> > >
>>> > > +1 for me  (non binding)
>>> > >
>>> > >
>>> > >
>>> > > *Disclaimer:* Use it at your own risk. Any and all responsibility
>>> for any
>>> > > loss, damage or destruction of data or any other property which may
>>> arise
>>> > > from relying on this email's technical content is explicitly
>>> disclaimed.
>>> > > The author will in no case be liable for any monetary damages
>>> arising from
>>> > > such loss, damage or destruction.
>>> > >
>>> > >
>>> > >
>>> >
>>>
>>


Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-21 Thread Pavan Kotikalapudi
Hi Spark PMC members,

I think we have few upvotes for this effort here and more people are
showing interest (see  PR comments
<https://github.com/apache/spark/pull/42352#issuecomment-1955238640>.)

Is anyone interested in mentoring and reviewing this effort?

Also can the repository admin/owner re-open the PR?  ( I guess people only
with admin access to the repository can do that).

Thank you,

Pavan

On Tue, Feb 20, 2024 at 2:08 PM Krystal Mitchell 
wrote:

> +1
>
> On 2024/01/17 17:49:32 Pavan Kotikalapudi wrote:
> > Thanks for proposing and voting for the feature Mich.
> >
> > adding some references to the thread.
> >
> >- Jira ticket - SPARK-24815
> ><https://issues.apache.org/jira/browse/SPARK-24815>
> <https://urldefense.com/v3/__https://issues.apache.org/jira/browse/SPARK-24815*3E__;JQ!!NCc8flgU!b8v0cnobIeWmrtrGvm7r3lY83cOCZBDfHYW8xGj1tzG-9XYCnzsQoebrCmyMCJBXU52BSm3phgntc1HXve-r64f0rbw$>
> >- Design Doc
> ><
> https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing>
> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing*3E__;JQ!!NCc8flgU!b8v0cnobIeWmrtrGvm7r3lY83cOCZBDfHYW8xGj1tzG-9XYCnzsQoebrCmyMCJBXU52BSm3phgntc1HXve-r44a1rO8$>
> >
> >- discussion thread
> ><https://lists.apache.org/thread/9yx0jnk9h1234joymwlzfx2gh2m8b9bo>
> <https://urldefense.com/v3/__https://lists.apache.org/thread/9yx0jnk9h1234joymwlzfx2gh2m8b9bo*3E__;JQ!!NCc8flgU!b8v0cnobIeWmrtrGvm7r3lY83cOCZBDfHYW8xGj1tzG-9XYCnzsQoebrCmyMCJBXU52BSm3phgntc1HXve-rkLpTOYM$>
> >- PR with initial implementation -
> >https://github.com/apache/spark/pull/42352
> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!b8v0cnobIeWmrtrGvm7r3lY83cOCZBDfHYW8xGj1tzG-9XYCnzsQoebrCmyMCJBXU52BSm3phgntc1HXve-rZAZFOls$>
> >
> > Please vote with:
> >
> > [ ] +1: Accept the proposal and start with the development.
> > [ ] +0
> > [ ] -1: I don’t think this is a good idea because …
> >
> > Thank you,
> >
> > Pavan
> >
> > On Wed, Jan 17, 2024 at 9:52 PM Mich Talebzadeh 
> > wrote:
> >
> > >
> > > +1 for me  (non binding)
> > >
> > >
> > >
> > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> > > loss, damage or destruction of data or any other property which may
> arise
> > > from relying on this email's technical content is explicitly
> disclaimed.
> > > The author will in no case be liable for any monetary damages arising
> from
> > > such loss, damage or destruction.
> > >
> > >
> > >
> >
>


Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-20 Thread Pavan Kotikalapudi
Here is the link to the voting thread
https://lists.apache.org/thread/rlwqrw6ddxdkbvkp78kpd0zgvglgbbp8.

Thank you,

Pavan

On Wed, Jan 17, 2024 at 7:15 PM Pavan Kotikalapudi 
wrote:

> Thanks for the +1, I will propose voting in a new thread now.
>
> - Pavan
>
> On Wed, Jan 17, 2024 at 5:28 PM Mich Talebzadeh 
> wrote:
>
>> I think we have discussed this enough and I consider it as a useful
>> feature.. I propose a vote on it.
>>
>> + 1 for me
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!br6vkYxlYJoUs71zWxWqmXKikxKL1hrtcoEcKKYIqCFZHqLhft5MEAbyAsrB2WsCtKKFzpWuntpZWRkDhNvflH7P4kL6SQ$>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!br6vkYxlYJoUs71zWxWqmXKikxKL1hrtcoEcKKYIqCFZHqLhft5MEAbyAsrB2WsCtKKFzpWuntpZWRkDhNvflH6pG6p5_A$>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 8 Aug 2023 at 01:30, Pavan Kotikalapudi
>>  wrote:
>>
>>> Hi Spark Dev,
>>>
>>> I have extended traditional DRA to work for structured streaming
>>> use-case.
>>>
>>> Here is an initial Implementation draft PR
>>> https://github.com/apache/spark/pull/42352
>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!br6vkYxlYJoUs71zWxWqmXKikxKL1hrtcoEcKKYIqCFZHqLhft5MEAbyAsrB2WsCtKKFzpWuntpZWRkDhNvflH5u3TzO2w$>
>>>  and
>>> design doc:
>>> https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing
>>> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing__;!!NCc8flgU!br6vkYxlYJoUs71zWxWqmXKikxKL1hrtcoEcKKYIqCFZHqLhft5MEAbyAsrB2WsCtKKFzpWuntpZWRkDhNvflH4J5R_RWw$>
>>>
>>> Please review and let me know what you think.
>>>
>>> Thank you,
>>>
>>> Pavan
>>>
>>


Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-19 Thread Pavan Kotikalapudi
+1
If my vote counts.

Does only spark PMC votes count?

Thanks,

Pavan

On Thu, Jan 18, 2024 at 3:19 AM Adam Hobbs
 wrote:

> +1
> --
> *From:* Pavan Kotikalapudi 
> *Sent:* Thursday, January 18, 2024 4:19:32 AM
> *To:* Spark dev list 
> *Subject:* Re: Vote on Dynamic resource allocation for structured
> streaming [SPARK-24815]
>
>
> CAUTION: This email originated from outside of the organisation. Do not
> click links or open attachments unless you recognise the sender's full
> email address and know the content is safe.
>
> Thanks for proposing and voting for the feature Mich.
>
> adding some references to the thread.
>
>- Jira ticket - SPARK-24815
>
> <https://urldefense.com/v3/__https://issues.apache.org/jira/browse/SPARK-24815__;!!OkoFT9xN!M8RjO-4PxxtSXLdZ72VEqpLZr9IE1m1Gj4YHrjSKR-6ZwOH-1RMbh-d9RZlvDvxwMrhtlDCGv7l6zFvILPwy_fEyuSdA5k0zCn0_Z1lI$>
>- Design Doc
>
> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing__;!!OkoFT9xN!M8RjO-4PxxtSXLdZ72VEqpLZr9IE1m1Gj4YHrjSKR-6ZwOH-1RMbh-d9RZlvDvxwMrhtlDCGv7l6zFvILPwy_fEyuSdA5k0zCuAyVt8y$>
>
>- discussion thread
>
> <https://urldefense.com/v3/__https://lists.apache.org/thread/9yx0jnk9h1234joymwlzfx2gh2m8b9bo__;!!OkoFT9xN!M8RjO-4PxxtSXLdZ72VEqpLZr9IE1m1Gj4YHrjSKR-6ZwOH-1RMbh-d9RZlvDvxwMrhtlDCGv7l6zFvILPwy_fEyuSdA5k0zCqHoXny8$>
>- PR with initial implementation -
>https://github.com/apache/spark/pull/42352
>
> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!OkoFT9xN!M8RjO-4PxxtSXLdZ72VEqpLZr9IE1m1Gj4YHrjSKR-6ZwOH-1RMbh-d9RZlvDvxwMrhtlDCGv7l6zFvILPwy_fEyuSdA5k0zCisLiWaP$>
>
> Please vote with:
>
> [ ] +1: Accept the proposal and start with the development.
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thank you,
>
> Pavan
>
> On Wed, Jan 17, 2024 at 9:52 PM Mich Talebzadeh 
> wrote:
>
>
> +1 for me  (non binding)
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> 
>
> This communication is intended only for use of the addressee and may
> contain legally privileged and confidential information.
> If you are not the addressee or intended recipient, you are notified that
> any dissemination, copying or use of any of the information is unauthorised.
>
> The legal privilege and confidentiality attached to this e-mail is not
> waived, lost or destroyed by reason of a mistaken delivery to you.
> If you have received this message in error, we would appreciate an
> immediate notification via e-mail to contac...@bendigoadelaide.com.au or
> by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be
> permanently deleted from your system.
>
> Bendigo and Adelaide Bank Limited ABN 11 068 049 178
>
>
> 
>


Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Pavan Kotikalapudi
Thanks for proposing and voting for the feature Mich.

adding some references to the thread.

   - Jira ticket - SPARK-24815
   
   - Design Doc
   


   - discussion thread
   
   - PR with initial implementation -
   https://github.com/apache/spark/pull/42352

Please vote with:

[ ] +1: Accept the proposal and start with the development.
[ ] +0
[ ] -1: I don’t think this is a good idea because …

Thank you,

Pavan

On Wed, Jan 17, 2024 at 9:52 PM Mich Talebzadeh 
wrote:

>
> +1 for me  (non binding)
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>


Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Pavan Kotikalapudi
Thanks for the +1, I will propose voting in a new thread now.

- Pavan

On Wed, Jan 17, 2024 at 5:28 PM Mich Talebzadeh 
wrote:

> I think we have discussed this enough and I consider it as a useful
> feature.. I propose a vote on it.
>
> + 1 for me
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!br6vkYxlYJoUs71zWxWqmXKikxKL1hrtcoEcKKYIqCFZHqLhft5MEAbyAsrB2WsCtKKFzpWuntpZWRkDhNvflH7P4kL6SQ$>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!br6vkYxlYJoUs71zWxWqmXKikxKL1hrtcoEcKKYIqCFZHqLhft5MEAbyAsrB2WsCtKKFzpWuntpZWRkDhNvflH6pG6p5_A$>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 8 Aug 2023 at 01:30, Pavan Kotikalapudi
>  wrote:
>
>> Hi Spark Dev,
>>
>> I have extended traditional DRA to work for structured streaming
>> use-case.
>>
>> Here is an initial Implementation draft PR
>> https://github.com/apache/spark/pull/42352
>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!br6vkYxlYJoUs71zWxWqmXKikxKL1hrtcoEcKKYIqCFZHqLhft5MEAbyAsrB2WsCtKKFzpWuntpZWRkDhNvflH5u3TzO2w$>
>>  and
>> design doc:
>> https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing
>> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing__;!!NCc8flgU!br6vkYxlYJoUs71zWxWqmXKikxKL1hrtcoEcKKYIqCFZHqLhft5MEAbyAsrB2WsCtKKFzpWuntpZWRkDhNvflH4J5R_RWw$>
>>
>> Please review and let me know what you think.
>>
>> Thank you,
>>
>> Pavan
>>
>


Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-05 Thread Pavan Kotikalapudi
y on
>the accuracy of the trigger interval heuristics used to guide scaling
>decisions.
>- Overhead: Monitoring and scaling processes themselves introduce some
>overhead, which needs to be balanced against the potential performance
>gains. For example, how we can utilise Input Rate, process rate and 
> Operation
>Duration from Streaming Query Statistics page etc
>- We ought to consider the potential impact on latency. Scaling
>operations, especially scaling up, may introduce some latency. Ensuring
>minimal impact on the processing time is crucial
>- Implementing mechanisms for graceful scaling operations, avoiding
>abrupt changes, can contribute to a smoother user experience.
>
> I do not know whether some of these points are already considered in your
> proposal?
>
> HTH
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!f57o0p_8gfCLFNDpC01KL-ol2cIFY9ToRmVSpnKl8EzBHNF7tqnvFzcGx94xjl2DzrNQSBnFrtE44gyMDwT9slYR39CQjA$>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!f57o0p_8gfCLFNDpC01KL-ol2cIFY9ToRmVSpnKl8EzBHNF7tqnvFzcGx94xjl2DzrNQSBnFrtE44gyMDwT9slY3V9EBhw$>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 1 Jan 2024 at 10:34, Pavan Kotikalapudi
>  wrote:
>
>> Hi PMC members,
>>
>> Bumping this idea for one last time to see if there are any approvals to
>> take it forward.
>>
>> Here is an initial Implementation draft PR
>> https://github.com/apache/spark/pull/42352
>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!f57o0p_8gfCLFNDpC01KL-ol2cIFY9ToRmVSpnKl8EzBHNF7tqnvFzcGx94xjl2DzrNQSBnFrtE44gyMDwT9slb8WuoTPA$>
>>  and
>> design doc:
>> https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing
>> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing__;!!NCc8flgU!f57o0p_8gfCLFNDpC01KL-ol2cIFY9ToRmVSpnKl8EzBHNF7tqnvFzcGx94xjl2DzrNQSBnFrtE44gyMDwT9sladyidw2Q$>
>>
>>
>> Thank you,
>>
>> Pavan
>>
>> On Mon, Nov 13, 2023 at 6:57 AM Pavan Kotikalapudi <
>> pkotikalap...@twilio.com> wrote:
>>
>>>
>>>
>>> Here is an initial Implementation draft PR
>>> https://github.com/apache/spark/pull/42352
>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!f57o0p_8gfCLFNDpC01KL-ol2cIFY9ToRmVSpnKl8EzBHNF7tqnvFzcGx94xjl2DzrNQSBnFrtE44gyMDwT9slb8WuoTPA$>
>>>  and
>>> design doc:
>>> https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing
>>> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing__;!!NCc8flgU!f57o0p_8gfCLFNDpC01KL-ol2cIFY9ToRmVSpnKl8EzBHNF7tqnvFzcGx94xjl2DzrNQSBnFrtE44gyMDwT9sladyidw2Q$>
>>>
>>>
>>> On Sun, Nov 12, 2023 at 5:24 PM Pavan Kotikalapudi <
>>> pkotikalap...@twilio.com> wrote:
>>>
>>>> Hi Dev community,
>>>>
>>>> Just bumping to see if there are more reviews to evaluate this idea of
>>>> adding auto-scaling to structured streaming.
>>>>
>>>> Thanks again,
>>>>
>>>> Pavan
>>>>
>>>> On Wed, Aug 23, 2023 at 2:49 PM Pavan Kotikalapudi <
>>>> pkotikalap...@twilio.com> wrote:
>>>>
>>>>> Thanks for the review Mich.
>>>>>
>>>>> I have updated the Q4 with as concise information as possible and left
>>>>> the detailed explanation to Appendix.
>>>>>
>>>>> here is the updated answer to the Q4
>>>>> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit*heading=h.xe0x4i9gc1dg__;Iw!!NCc8flgU!f57o0p_8gfCLFNDpC01KL-ol2cIFY9ToRmVSpnKl8EzBHNF7tqnvFzcGx94xjl2DzrNQSBnFrtE44gyMDwT9slZp0etSTw$>
>>>>>
>>>>> Thank you,
>>>>>
&g

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-01 Thread Pavan Kotikalapudi
Hi PMC members,

Bumping this idea for one last time to see if there are any approvals to
take it forward.

Here is an initial Implementation draft PR
https://github.com/apache/spark/pull/42352 and design doc:
https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing


Thank you,

Pavan

On Mon, Nov 13, 2023 at 6:57 AM Pavan Kotikalapudi 
wrote:

>
>
> Here is an initial Implementation draft PR
> https://github.com/apache/spark/pull/42352 and design doc:
> https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing
>
>
> On Sun, Nov 12, 2023 at 5:24 PM Pavan Kotikalapudi <
> pkotikalap...@twilio.com> wrote:
>
>> Hi Dev community,
>>
>> Just bumping to see if there are more reviews to evaluate this idea of
>> adding auto-scaling to structured streaming.
>>
>> Thanks again,
>>
>> Pavan
>>
>> On Wed, Aug 23, 2023 at 2:49 PM Pavan Kotikalapudi <
>> pkotikalap...@twilio.com> wrote:
>>
>>> Thanks for the review Mich.
>>>
>>> I have updated the Q4 with as concise information as possible and left
>>> the detailed explanation to Appendix.
>>>
>>> here is the updated answer to the Q4
>>> <https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit#heading=h.xe0x4i9gc1dg>
>>>
>>> Thank you,
>>>
>>> Pavan
>>>
>>> On Wed, Aug 23, 2023 at 2:46 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Hi Pavan,
>>>>
>>>> I started reading your SPIP but have difficulty understanding it in
>>>> detail.
>>>>
>>>> Specifically under Q4, " What is new in your approach and why do you
>>>> think it will be successful?", I believe it would be better to remove the
>>>> plots and focus on "what this proposed solution is going to add to the
>>>> current play". At this stage a concise briefing would be appreciated and
>>>> the specific plots should be left to the Appendix.
>>>>
>>>> HTH
>>>>
>>>>
>>>> Mich Talebzadeh,
>>>> Distinguished Technologist, Solutions Architect & Engineer
>>>> London
>>>> United Kingdom
>>>>
>>>>
>>>>view my Linkedin profile
>>>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!Z1-Qlb9LL5r97D1tGWz_pKDVDYm-S_n99e_jhraM5XA4B058OHmw47z_FmbEVHeXdgLqEkkvS4W88hGkTBlB4wSpQtgviw$>
>>>>
>>>>
>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!Z1-Qlb9LL5r97D1tGWz_pKDVDYm-S_n99e_jhraM5XA4B058OHmw47z_FmbEVHeXdgLqEkkvS4W88hGkTBlB4wR3SukiIw$>
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, 20 Aug 2023 at 07:40, Pavan Kotikalapudi <
>>>> pkotikalap...@twilio.com> wrote:
>>>>
>>>>> IMO ML might be good for cluster scheduler but for the core DRA
>>>>> algorithm of SSS I believe we should start with some primitives of
>>>>> Structured streaming. I would love to get some reviews on the doc and
>>>>> opinions on the feasibility of the solution.
>>>>>
>>>>> We have seen quite some savings using this solution in our team, Would
>>>>> like to listen to the dev community to see if they are looking
>>>>> for/interested in DRA for structured streaming.
>>>>>
>>>>> On Mon, Aug 14, 2023 at 9:12 AM Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com> wrote:
>>>>>
>>>>>> Thank you for your comments.
>>>>>>
>>>>>> My vision of integrating machine learning (ML) into Spark Structured
>>>>>> Streaming (SSS) for capacity planning and performance optimization seems 
>>>>>> to
>>>>>> be promising. By leveraging ML techniques, I believe that we can
>>>>>> poten

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-11-12 Thread Pavan Kotikalapudi
Here is an initial Implementation draft PR
https://github.com/apache/spark/pull/42352 and design doc:
https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing


On Sun, Nov 12, 2023 at 5:24 PM Pavan Kotikalapudi 
wrote:

> Hi Dev community,
>
> Just bumping to see if there are more reviews to evaluate this idea of
> adding auto-scaling to structured streaming.
>
> Thanks again,
>
> Pavan
>
> On Wed, Aug 23, 2023 at 2:49 PM Pavan Kotikalapudi <
> pkotikalap...@twilio.com> wrote:
>
>> Thanks for the review Mich.
>>
>> I have updated the Q4 with as concise information as possible and left
>> the detailed explanation to Appendix.
>>
>> here is the updated answer to the Q4
>> <https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit#heading=h.xe0x4i9gc1dg>
>>
>> Thank you,
>>
>> Pavan
>>
>> On Wed, Aug 23, 2023 at 2:46 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Pavan,
>>>
>>> I started reading your SPIP but have difficulty understanding it in
>>> detail.
>>>
>>> Specifically under Q4, " What is new in your approach and why do you
>>> think it will be successful?", I believe it would be better to remove the
>>> plots and focus on "what this proposed solution is going to add to the
>>> current play". At this stage a concise briefing would be appreciated and
>>> the specific plots should be left to the Appendix.
>>>
>>> HTH
>>>
>>>
>>> Mich Talebzadeh,
>>> Distinguished Technologist, Solutions Architect & Engineer
>>> London
>>> United Kingdom
>>>
>>>
>>>view my Linkedin profile
>>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!Z1-Qlb9LL5r97D1tGWz_pKDVDYm-S_n99e_jhraM5XA4B058OHmw47z_FmbEVHeXdgLqEkkvS4W88hGkTBlB4wSpQtgviw$>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!Z1-Qlb9LL5r97D1tGWz_pKDVDYm-S_n99e_jhraM5XA4B058OHmw47z_FmbEVHeXdgLqEkkvS4W88hGkTBlB4wR3SukiIw$>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Sun, 20 Aug 2023 at 07:40, Pavan Kotikalapudi <
>>> pkotikalap...@twilio.com> wrote:
>>>
>>>> IMO ML might be good for cluster scheduler but for the core DRA
>>>> algorithm of SSS I believe we should start with some primitives of
>>>> Structured streaming. I would love to get some reviews on the doc and
>>>> opinions on the feasibility of the solution.
>>>>
>>>> We have seen quite some savings using this solution in our team, Would
>>>> like to listen to the dev community to see if they are looking
>>>> for/interested in DRA for structured streaming.
>>>>
>>>> On Mon, Aug 14, 2023 at 9:12 AM Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Thank you for your comments.
>>>>>
>>>>> My vision of integrating machine learning (ML) into Spark Structured
>>>>> Streaming (SSS) for capacity planning and performance optimization seems 
>>>>> to
>>>>> be promising. By leveraging ML techniques, I believe that we can
>>>>> potentially create predictive models that enhance the efficiency and
>>>>> resource allocation of the data processing pipelines. Here are some
>>>>> potential benefits and considerations for adding ML to SSS for capacity
>>>>> planning. However, I stand corrected
>>>>>
>>>>>1.
>>>>>
>>>>>*Predictive Capacity Planning:* ML models can analyze historical
>>>>>data (that we discussed already), workloads, and trends to predict 
>>>>> future
>>>>>resource needs accurately. This enables proactive scaling and 
>>>>> allocation of
>>>>>resources, ensuring optimal performance during high-demand periods, 
>>>>>

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-11-12 Thread Pavan Kotikalapudi
Hi Dev community,

Just bumping to see if there are more reviews to evaluate this idea of
adding auto-scaling to structured streaming.

Thanks again,

Pavan

On Wed, Aug 23, 2023 at 2:49 PM Pavan Kotikalapudi 
wrote:

> Thanks for the review Mich.
>
> I have updated the Q4 with as concise information as possible and left the
> detailed explanation to Appendix.
>
> here is the updated answer to the Q4
> <https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit#heading=h.xe0x4i9gc1dg>
>
> Thank you,
>
> Pavan
>
> On Wed, Aug 23, 2023 at 2:46 AM Mich Talebzadeh 
> wrote:
>
>> Hi Pavan,
>>
>> I started reading your SPIP but have difficulty understanding it in
>> detail.
>>
>> Specifically under Q4, " What is new in your approach and why do you
>> think it will be successful?", I believe it would be better to remove the
>> plots and focus on "what this proposed solution is going to add to the
>> current play". At this stage a concise briefing would be appreciated and
>> the specific plots should be left to the Appendix.
>>
>> HTH
>>
>>
>> Mich Talebzadeh,
>> Distinguished Technologist, Solutions Architect & Engineer
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!Z1-Qlb9LL5r97D1tGWz_pKDVDYm-S_n99e_jhraM5XA4B058OHmw47z_FmbEVHeXdgLqEkkvS4W88hGkTBlB4wSpQtgviw$>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!Z1-Qlb9LL5r97D1tGWz_pKDVDYm-S_n99e_jhraM5XA4B058OHmw47z_FmbEVHeXdgLqEkkvS4W88hGkTBlB4wR3SukiIw$>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sun, 20 Aug 2023 at 07:40, Pavan Kotikalapudi <
>> pkotikalap...@twilio.com> wrote:
>>
>>> IMO ML might be good for cluster scheduler but for the core DRA
>>> algorithm of SSS I believe we should start with some primitives of
>>> Structured streaming. I would love to get some reviews on the doc and
>>> opinions on the feasibility of the solution.
>>>
>>> We have seen quite some savings using this solution in our team, Would
>>> like to listen to the dev community to see if they are looking
>>> for/interested in DRA for structured streaming.
>>>
>>> On Mon, Aug 14, 2023 at 9:12 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Thank you for your comments.
>>>>
>>>> My vision of integrating machine learning (ML) into Spark Structured
>>>> Streaming (SSS) for capacity planning and performance optimization seems to
>>>> be promising. By leveraging ML techniques, I believe that we can
>>>> potentially create predictive models that enhance the efficiency and
>>>> resource allocation of the data processing pipelines. Here are some
>>>> potential benefits and considerations for adding ML to SSS for capacity
>>>> planning. However, I stand corrected
>>>>
>>>>1.
>>>>
>>>>*Predictive Capacity Planning:* ML models can analyze historical
>>>>data (that we discussed already), workloads, and trends to predict 
>>>> future
>>>>resource needs accurately. This enables proactive scaling and 
>>>> allocation of
>>>>resources, ensuring optimal performance during high-demand periods, 
>>>> such as
>>>>times of high trades.
>>>>2.
>>>>
>>>>*Real-time Decision Making: *ML can be used to make real-time
>>>>decisions on resource allocation (software and cluster) based on current
>>>>data and conditions, allowing for dynamic adjustments to meet the
>>>>processing demands.
>>>>3.
>>>>
>>>>*Complex Data Analysis: *In a heterogeneous setup involving
>>>>multiple databases, ML can analyze various factors like data read and 
>>>> write
>>>>times from different databases, data volumes, and data distribution
>>>>pat

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-23 Thread Pavan Kotikalapudi
Thanks for the review Mich.

I have updated the Q4 with as concise information as possible and left the
detailed explanation to Appendix.

here is the updated answer to the Q4
<https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit#heading=h.xe0x4i9gc1dg>

Thank you,

Pavan

On Wed, Aug 23, 2023 at 2:46 AM Mich Talebzadeh 
wrote:

> Hi Pavan,
>
> I started reading your SPIP but have difficulty understanding it in detail.
>
> Specifically under Q4, " What is new in your approach and why do you
> think it will be successful?", I believe it would be better to remove the
> plots and focus on "what this proposed solution is going to add to the
> current play". At this stage a concise briefing would be appreciated and
> the specific plots should be left to the Appendix.
>
> HTH
>
>
> Mich Talebzadeh,
> Distinguished Technologist, Solutions Architect & Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!Z1-Qlb9LL5r97D1tGWz_pKDVDYm-S_n99e_jhraM5XA4B058OHmw47z_FmbEVHeXdgLqEkkvS4W88hGkTBlB4wSpQtgviw$>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!Z1-Qlb9LL5r97D1tGWz_pKDVDYm-S_n99e_jhraM5XA4B058OHmw47z_FmbEVHeXdgLqEkkvS4W88hGkTBlB4wR3SukiIw$>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sun, 20 Aug 2023 at 07:40, Pavan Kotikalapudi 
> wrote:
>
>> IMO ML might be good for cluster scheduler but for the core DRA algorithm
>> of SSS I believe we should start with some primitives of Structured
>> streaming. I would love to get some reviews on the doc and opinions on the
>> feasibility of the solution.
>>
>> We have seen quite some savings using this solution in our team, Would
>> like to listen to the dev community to see if they are looking
>> for/interested in DRA for structured streaming.
>>
>> On Mon, Aug 14, 2023 at 9:12 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Thank you for your comments.
>>>
>>> My vision of integrating machine learning (ML) into Spark Structured
>>> Streaming (SSS) for capacity planning and performance optimization seems to
>>> be promising. By leveraging ML techniques, I believe that we can
>>> potentially create predictive models that enhance the efficiency and
>>> resource allocation of the data processing pipelines. Here are some
>>> potential benefits and considerations for adding ML to SSS for capacity
>>> planning. However, I stand corrected
>>>
>>>1.
>>>
>>>*Predictive Capacity Planning:* ML models can analyze historical
>>>data (that we discussed already), workloads, and trends to predict future
>>>resource needs accurately. This enables proactive scaling and allocation 
>>> of
>>>resources, ensuring optimal performance during high-demand periods, such 
>>> as
>>>times of high trades.
>>>2.
>>>
>>>*Real-time Decision Making: *ML can be used to make real-time
>>>decisions on resource allocation (software and cluster) based on current
>>>data and conditions, allowing for dynamic adjustments to meet the
>>>processing demands.
>>>3.
>>>
>>>*Complex Data Analysis: *In a heterogeneous setup involving multiple
>>>databases, ML can analyze various factors like data read and write times
>>>from different databases, data volumes, and data distribution patterns to
>>>optimize the overall data processing flow.
>>>4.
>>>
>>>*Anomaly Detection: *ML models can identify unusual patterns or
>>>performance deviations, alerting us to potential issues before they 
>>> impact
>>>the system.
>>>5.
>>>
>>>Integration with Monitoring: ML models can work alongside monitoring
>>>tools, gathering real-time data on various performance metrics, and using
>>>this data for making intelligent decisions on capacity and resource
>>>allocation.
>>>
>>> However, there are some important considerations to keep in mind:

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-20 Thread Pavan Kotikalapudi
IMO ML might be good for cluster scheduler but for the core DRA algorithm
of SSS I believe we should start with some primitives of Structured
streaming. I would love to get some reviews on the doc and opinions on the
feasibility of the solution.

We have seen quite some savings using this solution in our team, Would like
to listen to the dev community to see if they are looking for/interested in
DRA for structured streaming.

On Mon, Aug 14, 2023 at 9:12 AM Mich Talebzadeh 
wrote:

> Thank you for your comments.
>
> My vision of integrating machine learning (ML) into Spark Structured
> Streaming (SSS) for capacity planning and performance optimization seems to
> be promising. By leveraging ML techniques, I believe that we can
> potentially create predictive models that enhance the efficiency and
> resource allocation of the data processing pipelines. Here are some
> potential benefits and considerations for adding ML to SSS for capacity
> planning. However, I stand corrected
>
>1.
>
>*Predictive Capacity Planning:* ML models can analyze historical data
>(that we discussed already), workloads, and trends to predict future
>resource needs accurately. This enables proactive scaling and allocation of
>resources, ensuring optimal performance during high-demand periods, such as
>times of high trades.
>2.
>
>*Real-time Decision Making: *ML can be used to make real-time
>decisions on resource allocation (software and cluster) based on current
>data and conditions, allowing for dynamic adjustments to meet the
>processing demands.
>3.
>
>*Complex Data Analysis: *In a heterogeneous setup involving multiple
>databases, ML can analyze various factors like data read and write times
>from different databases, data volumes, and data distribution patterns to
>optimize the overall data processing flow.
>4.
>
>*Anomaly Detection: *ML models can identify unusual patterns or
>performance deviations, alerting us to potential issues before they impact
>the system.
>5.
>
>Integration with Monitoring: ML models can work alongside monitoring
>tools, gathering real-time data on various performance metrics, and using
>this data for making intelligent decisions on capacity and resource
>allocation.
>
> However, there are some important considerations to keep in mind:
>
>1.
>
>*Model Training: *ML models require training and validation using
>relevant data. Our DS colleagues need to define appropriate features,
>select the right ML algorithms, and fine-tune the model parameters to
>achieve optimal performance.
>2.
>
>*Complexity:* Integrating ML adds complexity to our architecture.
>Moreover, we need to have the necessary expertise in both Spark Structured
>Streaming and machine learning to design, implement, and maintain the
>system effectively.
>3.
>
>*Resource Overhead: *ML algorithms can be resource-intensive. We ought
>to consider the additional computational requirements, especially during
>the model training and inference phases.
>4.
>
>In summary, this idea of utilizing ML for capacity planning in Spark
>Structured Streaming can possibly hold significant potential for improving
>system performance and resource utilization. Having said that, I totally
>agree that we need to evaluate the feasibility, potential benefits, and
>challenges and we will need involving experts in both Spark and machine
>learning to ensure a successful outcome.
>
> HTH
>
> Mich Talebzadeh,
> Solutions Architect/Engineering Lead
> London
> United Kingdom
>
>
>view my Linkedin profile
> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!ag4RKtjaus5ggrkrgIaT1uG75X7gM3CjxLhkaIZMA5VGjc7h7N3BHXkBHRaR3T8ludHCpxKNgQ9ugixgI3MGy-bP2VmxTg$>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!ag4RKtjaus5ggrkrgIaT1uG75X7gM3CjxLhkaIZMA5VGjc7h7N3BHXkBHRaR3T8ludHCpxKNgQ9ugixgI3MGy-as0BFUVQ$>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 14 Aug 2023 at 14:58, Martin Andersson 
> wrote:
>
>> IMO, using any kind of machine learning or AI for DRA is overkill. The
>> effort involved would be considerable and likely coun

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-08 Thread Pavan Kotikalapudi
Listeners are the best resources to the allocation manager  afaik... It
already has SparkListener
<https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L640>
that
it utilizes. We can use it to extract more information (like processing
times).
The one with more information regarding streaming query resides in sql
module
<https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryListener.scala>
though.

Thanks

Pavan

On Tue, Aug 8, 2023 at 5:43 AM Mich Talebzadeh 
wrote:

> Hi Pavan or anyone else
>
> Is there any way one access the matrix displayed on SparkGUI? For example
> the readings for processing time? Can these be acessed?
>
> Thanks
>
> For example,
> Mich Talebzadeh,
> Solutions Architect/Engineering Lead
> London
> United Kingdom
>
>
>view my Linkedin profile
> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!d-qX4RylsnHucGkE4OdsO8agaKMFV59tVQnWZL1FbbZLVLWVUWgWmiiKC1Mvyy-796X-uP5XZfjLEbrVfe771d6VrCySTg$>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!d-qX4RylsnHucGkE4OdsO8agaKMFV59tVQnWZL1FbbZLVLWVUWgWmiiKC1Mvyy-796X-uP5XZfjLEbrVfe771d4r4xOqSg$>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 8 Aug 2023 at 06:44, Pavan Kotikalapudi 
> wrote:
>
>> Thanks for the review Mich,
>>
>> Yes, the configuration parameters we end up setting would be based on the
>> trigger interval.
>>
>> > If you are going to have additional indicators why not look at
>> scheduling delay as well
>> Yes. The implementation is based on scheduling delays, not for pending
>> tasks of the current stage but rather pending tasks of all the stages in
>> a micro-batch
>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352/files*diff-fdddb0421641035be18233c212f0e3ccd2d6a49d345bd0cd4eac08fc4d911e21R1025__;Iw!!NCc8flgU!d-qX4RylsnHucGkE4OdsO8agaKMFV59tVQnWZL1FbbZLVLWVUWgWmiiKC1Mvyy-796X-uP5XZfjLEbrVfe771d6feoFH2Q$>
>>  (hence
>> trigger interval).
>>
>> > we ought to utilise the historical statistics collected under the
>> checkpointing directory to get more accurate statistics
>> You are right! This is just a simple implementation based on one factor,
>> we should also look into other indicators as well If that would help build
>> a better scaling algorithm.
>>
>> Thank you,
>>
>> Pavan
>>
>> On Mon, Aug 7, 2023 at 9:55 PM Mich Talebzadeh 
>> wrote:
>>
>>> Hi,
>>>
>>> I glanced over the design doc.
>>>
>>> You are providing certain configuration parameters plus some settings
>>> based on static values. For example:
>>>
>>> spark.dynamicAllocation.schedulerBacklogTimeout": 54s
>>>
>>> I cannot see any use of  which ought to be at least
>>> half of the batch interval to have the correct margins (confidence level). 
>>> If
>>> you are going to have additional indicators why not look at scheduling
>>> delay as well. Moreover most of the needed statistics are also available to
>>> set accurate values. My inclination is that this is a great effort but
>>> we ought to utilise the historical statistics collected under
>>> checkpointing directory to get more accurate statistics. I will review
>>> the design document in duew course
>>>
>>> HTH
>>>
>>> Mich Talebzadeh,
>>> Solutions Architect/Engineering Lead
>>> London
>>> United Kingdom
>>>
>>>
>>>view my Linkedin profile
>>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!blQ5zGotPbReMPXKaZw50BES4V_1AKqHv6bIxHVlc0QfY9iisFjT-u0be3CR6C6-41dtKLX5Ija0-EmAYfkcxLFr9YSZnw$>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!blQ5zGotPbReMPXKaZw50BES4V_1AKqHv6bIxHVlc0QfY9iisFjT-u0be3CR6C6-41dtKLX5Ija0-EmAYfkcxLEPx44C1w$>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>&g

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-07 Thread Pavan Kotikalapudi
Thanks for the review Mich,

Yes, the configuration parameters we end up setting would be based on the
trigger interval.

> If you are going to have additional indicators why not look at scheduling
delay as well
Yes. The implementation is based on scheduling delays, not for pending
tasks of the current stage but rather pending tasks of all the stages in a
micro-batch
<https://github.com/apache/spark/pull/42352/files#diff-fdddb0421641035be18233c212f0e3ccd2d6a49d345bd0cd4eac08fc4d911e21R1025>
(hence
trigger interval).

> we ought to utilise the historical statistics collected under the
checkpointing directory to get more accurate statistics
You are right! This is just a simple implementation based on one factor, we
should also look into other indicators as well If that would help build a
better scaling algorithm.

Thank you,

Pavan

On Mon, Aug 7, 2023 at 9:55 PM Mich Talebzadeh 
wrote:

> Hi,
>
> I glanced over the design doc.
>
> You are providing certain configuration parameters plus some settings
> based on static values. For example:
>
> spark.dynamicAllocation.schedulerBacklogTimeout": 54s
>
> I cannot see any use of  which ought to be at least half
> of the batch interval to have the correct margins (confidence level). If
> you are going to have additional indicators why not look at scheduling
> delay as well. Moreover most of the needed statistics are also available to
> set accurate values. My inclination is that this is a great effort but we
> ought to utilise the historical statistics collected under checkpointing
> directory to get more accurate statistics. I will review the design
> document in duew course
>
> HTH
>
> Mich Talebzadeh,
> Solutions Architect/Engineering Lead
> London
> United Kingdom
>
>
>view my Linkedin profile
> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!blQ5zGotPbReMPXKaZw50BES4V_1AKqHv6bIxHVlc0QfY9iisFjT-u0be3CR6C6-41dtKLX5Ija0-EmAYfkcxLFr9YSZnw$>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!blQ5zGotPbReMPXKaZw50BES4V_1AKqHv6bIxHVlc0QfY9iisFjT-u0be3CR6C6-41dtKLX5Ija0-EmAYfkcxLEPx44C1w$>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 8 Aug 2023 at 01:30, Pavan Kotikalapudi
>  wrote:
>
>> Hi Spark Dev,
>>
>> I have extended traditional DRA to work for structured streaming
>> use-case.
>>
>> Here is an initial Implementation draft PR
>> https://github.com/apache/spark/pull/42352
>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!blQ5zGotPbReMPXKaZw50BES4V_1AKqHv6bIxHVlc0QfY9iisFjT-u0be3CR6C6-41dtKLX5Ija0-EmAYfkcxLHLe7WCUw$>
>>  and
>> design doc:
>> https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing
>> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing__;!!NCc8flgU!blQ5zGotPbReMPXKaZw50BES4V_1AKqHv6bIxHVlc0QfY9iisFjT-u0be3CR6C6-41dtKLX5Ija0-EmAYfkcxLFAjJfilg$>
>>
>> Please review and let me know what you think.
>>
>> Thank you,
>>
>> Pavan
>>
>


Fwd: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-07 Thread Pavan Kotikalapudi
Hi Spark Dev,

I have extended traditional DRA to work for structured streaming use-case.


Here is an initial Implementation draft PR
https://github.com/apache/spark/pull/42352 and design doc:
https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing

Please review and let me know what you think.

Thank you,

Pavan