Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Jungtaek Lim
Sounds good.

One thing I'd like to clarify before shepherding this SPIP is the process
itself. Getting enough traction from PMC members is another issue to pass
the SPIP vote. Even a vote from committer is not counted. (I don't have a
binding vote.) I only see one PMC member (Thomas Graves, not my team) in
the design doc and we still don't get positive feedback. So still a long
way to go. We need three supporters from PMC members.

Another thing is, I get the proposal at a high level, but I don't have
actual expertise in DRA. I could review the code in general, but I feel
like I'm not qualified to approve the code. We still need an expert on the
CORE area, especially who has expertise with DRA. (Could you please
annotate the code and enumerate several people who worked on the codebase?)
If they need an expertise of streaming to understand how things will work
then either you or I can explain, but I can't just approve and merge the
code.

That said, if we succeed in finding one and they review the code and LGTM,
I'd rather say not to go with taking the process of SPIP unless the expert
reviewing your code requires us to do so. The change you proposed is rather
small and does not seem to be invasive (experts can also weigh), and there
must never be the case that this feature is turned on by default (as we
pointed out limitation). It doesn't look like requiring SPIP, if we
carefully document the new change and also clearly describe the limitation.
(Also a warning in the codebase that this must not be enabled by default.)


On Tue, Mar 26, 2024 at 7:02 PM Pavan Kotikalapudi 
wrote:

> Hi Bhuwan,
>
> Glad to hear back from you! Very much appreciate your help on reviewing
> the design doc/PR and endorsing this proposal.
>
> Thank you so much @Jungtaek Lim  , @Mich
> Talebzadeh   for graciously agreeing to
> mentor/shepherd this effort.
>
> Regarding Twilio copyright in Notice binary file:
> Twilio Opensource counsel was involved all through the process, I have
> placed it in the project file prior to Twilio signing a CCLA for the spark
> project contribution( Aug '23).
>
> Since the CCLA is signed now, I have removed the twilio copyright from
> that file. I didn't get a chance to update the PR after github-actions
> closed it.
>
> Please let me know of next steps needed to bring this draft PR/effort to
> completion.
>
> Thank you,
>
> Pavan
>
>
> On Tue, Mar 26, 2024 at 12:01 AM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
>
>> I'm happy to, but it looks like I need to check one more thing about the
>> license, according to the WIP PR
>> 
>> .
>>
>> @Pavan Kotikalapudi 
>> I see you've added the copyright of Twilio in the NOTICE-binary file,
>> which makes me wonder if Twilio had filed CCLA to the Apache Software
>> Foundation.
>>
>> PMC members can correct me if I'm mistaken, but from my understanding
>> (and experiences of PMC member in other ASF project), code contribution is
>> considered as code donation and copyright belongs to ASF. That's why you
>> can't find the copyright of employers for contributors in the codebase.
>> What you see copyrights in NOTICE-binary is due to the fact we have binary
>> dependency and their licenses may require to explicitly mention about
>> copyright. It's not about direct code contribution.
>>
>> Is Twilio aware of this? Also, if Twilio did not file CCLA in prior,
>> could you please engage with a relevant group in the company (could be a
>> legal team, or similar with OSS advocate team if there is any) and ensure
>> that CCLA is filed? The copyright issue is a legal issue, so we have to be
>> conservative and 100% sure that the employer is aware of what is the
>> meaning of donating the code to ASF via reviewing CCLA and relevant doc,
>> and explicitly express that they are OK with it via filing CCLA.
>>
>> You can read the description of agreements on contribution and ICLA/CCLA
>> form from this page.
>> https://www.apache.org/licenses/contributor-agreements.html
>> 
>>
>> Please let me know if this is resolved. This seems to me as a blocker to
>> move on. Please also let me know if the contribution is withdrawn from the
>> employer.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>>
>> On Mon, Mar 25, 2024 at 11:47 PM Bhuwan Sahni
>>  wrote:
>>
>>> Hi Pavan,
>>>
>>> I looked at the PR, and the changes look simple and contained. It would
>>> be useful to add dynamic resource allocation to Spark Structured Streaming.
>>>
>>> Jungtaek. Would you be able to shepherd this change?
>>>
>>>
>>> On Tue, Mar 19, 2024 at 10:38 AM Bhuwan Sahni <
>>> bhuwan.sa...@databricks.com> wrote:
>>>
 

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Pavan Kotikalapudi
Sounds good.

Thanks again for your help on guiding the effort from discussion/review
through voting phases in the spark dev community.

Thank you,

Pavan

On Tue, Mar 26, 2024 at 4:20 AM Mich Talebzadeh 
wrote:

> Hi Pavan,
>
> Thanks for instigating this proposal. Looks like the proposal is ready and
> has enough votes to be implemented. Having a sheppard will make it more
> fruitful.
>
> I will leave it to @Jungtaek Lim  's
> capable hands to drive it forward.
>
> Will be there to help if needed.
>
> Cheers
>
> Mich Talebzadeh,
> Technologist | Solutions Architect | Data Engineer  | Generative AI
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
> 
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner
> Von
> Braun
> 
> )".
>
>
> On Tue, 26 Mar 2024 at 10:02, Pavan Kotikalapudi 
> wrote:
>
>> Hi Bhuwan,
>>
>> Glad to hear back from you! Very much appreciate your help on reviewing
>> the design doc/PR and endorsing this proposal.
>>
>> Thank you so much @Jungtaek Lim  , @Mich
>> Talebzadeh   for graciously agreeing to
>> mentor/shepherd this effort.
>>
>> Regarding Twilio copyright in Notice binary file:
>> Twilio Opensource counsel was involved all through the process, I have
>> placed it in the project file prior to Twilio signing a CCLA for the spark
>> project contribution( Aug '23).
>>
>> Since the CCLA is signed now, I have removed the twilio copyright from
>> that file. I didn't get a chance to update the PR after github-actions
>> closed it.
>>
>> Please let me know of next steps needed to bring this draft PR/effort to
>> completion.
>>
>> Thank you,
>>
>> Pavan
>>
>>
>> On Tue, Mar 26, 2024 at 12:01 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> I'm happy to, but it looks like I need to check one more thing about the
>>> license, according to the WIP PR
>>> 
>>> .
>>>
>>> @Pavan Kotikalapudi 
>>> I see you've added the copyright of Twilio in the NOTICE-binary file,
>>> which makes me wonder if Twilio had filed CCLA to the Apache Software
>>> Foundation.
>>>
>>> PMC members can correct me if I'm mistaken, but from my understanding
>>> (and experiences of PMC member in other ASF project), code contribution is
>>> considered as code donation and copyright belongs to ASF. That's why you
>>> can't find the copyright of employers for contributors in the codebase.
>>> What you see copyrights in NOTICE-binary is due to the fact we have binary
>>> dependency and their licenses may require to explicitly mention about
>>> copyright. It's not about direct code contribution.
>>>
>>> Is Twilio aware of this? Also, if Twilio did not file CCLA in prior,
>>> could you please engage with a relevant group in the company (could be a
>>> legal team, or similar with OSS advocate team if there is any) and ensure
>>> that CCLA is filed? The copyright issue is a legal issue, so we have to be
>>> conservative and 100% sure that the employer is aware of what is the
>>> meaning of donating the code to ASF via reviewing CCLA and relevant doc,
>>> and explicitly express that they are OK with it via filing CCLA.
>>>
>>> You can read the description of agreements on contribution and ICLA/CCLA
>>> form from this page.
>>> https://www.apache.org/licenses/contributor-agreements.html
>>> 
>>>
>>> Please let me know if this is resolved. This seems to me as a blocker to
>>> move on. Please also let me know if the contribution is withdrawn from the
>>> employer.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>>
>>> On Mon, Mar 25, 2024 at 11:47 PM Bhuwan Sahni
>>>  wrote:
>>>
 Hi Pavan,

 I looked at the PR, and the changes look simple 

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Mich Talebzadeh
Hi Pavan,

Thanks for instigating this proposal. Looks like the proposal is ready and
has enough votes to be implemented. Having a sheppard will make it more
fruitful.

I will leave it to @Jungtaek Lim  's
capable hands to drive it forward.

Will be there to help if needed.

Cheers

Mich Talebzadeh,
Technologist | Solutions Architect | Data Engineer  | Generative AI
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Tue, 26 Mar 2024 at 10:02, Pavan Kotikalapudi 
wrote:

> Hi Bhuwan,
>
> Glad to hear back from you! Very much appreciate your help on reviewing
> the design doc/PR and endorsing this proposal.
>
> Thank you so much @Jungtaek Lim  , @Mich
> Talebzadeh   for graciously agreeing to
> mentor/shepherd this effort.
>
> Regarding Twilio copyright in Notice binary file:
> Twilio Opensource counsel was involved all through the process, I have
> placed it in the project file prior to Twilio signing a CCLA for the spark
> project contribution( Aug '23).
>
> Since the CCLA is signed now, I have removed the twilio copyright from
> that file. I didn't get a chance to update the PR after github-actions
> closed it.
>
> Please let me know of next steps needed to bring this draft PR/effort to
> completion.
>
> Thank you,
>
> Pavan
>
>
> On Tue, Mar 26, 2024 at 12:01 AM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
>
>> I'm happy to, but it looks like I need to check one more thing about the
>> license, according to the WIP PR
>> 
>> .
>>
>> @Pavan Kotikalapudi 
>> I see you've added the copyright of Twilio in the NOTICE-binary file,
>> which makes me wonder if Twilio had filed CCLA to the Apache Software
>> Foundation.
>>
>> PMC members can correct me if I'm mistaken, but from my understanding
>> (and experiences of PMC member in other ASF project), code contribution is
>> considered as code donation and copyright belongs to ASF. That's why you
>> can't find the copyright of employers for contributors in the codebase.
>> What you see copyrights in NOTICE-binary is due to the fact we have binary
>> dependency and their licenses may require to explicitly mention about
>> copyright. It's not about direct code contribution.
>>
>> Is Twilio aware of this? Also, if Twilio did not file CCLA in prior,
>> could you please engage with a relevant group in the company (could be a
>> legal team, or similar with OSS advocate team if there is any) and ensure
>> that CCLA is filed? The copyright issue is a legal issue, so we have to be
>> conservative and 100% sure that the employer is aware of what is the
>> meaning of donating the code to ASF via reviewing CCLA and relevant doc,
>> and explicitly express that they are OK with it via filing CCLA.
>>
>> You can read the description of agreements on contribution and ICLA/CCLA
>> form from this page.
>> https://www.apache.org/licenses/contributor-agreements.html
>> 
>>
>> Please let me know if this is resolved. This seems to me as a blocker to
>> move on. Please also let me know if the contribution is withdrawn from the
>> employer.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>>
>> On Mon, Mar 25, 2024 at 11:47 PM Bhuwan Sahni
>>  wrote:
>>
>>> Hi Pavan,
>>>
>>> I looked at the PR, and the changes look simple and contained. It would
>>> be useful to add dynamic resource allocation to Spark Structured Streaming.
>>>
>>> Jungtaek. Would you be able to shepherd this change?
>>>
>>>
>>> On Tue, Mar 19, 2024 at 10:38 AM Bhuwan Sahni <
>>> bhuwan.sa...@databricks.com> wrote:
>>>
 Thanks a lot for creating the risk table Pavan. My apologies. I was
 tied up with high priority items for the last couple weeks and could not
 respond. I will review the PR by tomorrow's end, and get back to you.

 Appreciate your patience.

 Thanks
 Bhuwan Sahni

 On Sun, Mar 17, 2024 at 4:42 PM Pavan Kotikalapudi <
 pkotikalap...@twilio.com> wrote:

> Hi Bhuwan,
>
> I hope the team got a chance to review the draft PR, looking for some
> comments to see if the plan looks alright?. I have updated the document
> about the risks
> 

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Pavan Kotikalapudi
Hi Bhuwan,

Glad to hear back from you! Very much appreciate your help on reviewing the
design doc/PR and endorsing this proposal.

Thank you so much @Jungtaek Lim  , @Mich
Talebzadeh   for graciously agreeing to
mentor/shepherd this effort.

Regarding Twilio copyright in Notice binary file:
Twilio Opensource counsel was involved all through the process, I have
placed it in the project file prior to Twilio signing a CCLA for the spark
project contribution( Aug '23).

Since the CCLA is signed now, I have removed the twilio copyright from that
file. I didn't get a chance to update the PR after github-actions closed it.

Please let me know of next steps needed to bring this draft PR/effort to
completion.

Thank you,

Pavan


On Tue, Mar 26, 2024 at 12:01 AM Jungtaek Lim 
wrote:

> I'm happy to, but it looks like I need to check one more thing about the
> license, according to the WIP PR
> 
> .
>
> @Pavan Kotikalapudi 
> I see you've added the copyright of Twilio in the NOTICE-binary file,
> which makes me wonder if Twilio had filed CCLA to the Apache Software
> Foundation.
>
> PMC members can correct me if I'm mistaken, but from my understanding (and
> experiences of PMC member in other ASF project), code contribution is
> considered as code donation and copyright belongs to ASF. That's why you
> can't find the copyright of employers for contributors in the codebase.
> What you see copyrights in NOTICE-binary is due to the fact we have binary
> dependency and their licenses may require to explicitly mention about
> copyright. It's not about direct code contribution.
>
> Is Twilio aware of this? Also, if Twilio did not file CCLA in prior, could
> you please engage with a relevant group in the company (could be a legal
> team, or similar with OSS advocate team if there is any) and ensure that
> CCLA is filed? The copyright issue is a legal issue, so we have to be
> conservative and 100% sure that the employer is aware of what is the
> meaning of donating the code to ASF via reviewing CCLA and relevant doc,
> and explicitly express that they are OK with it via filing CCLA.
>
> You can read the description of agreements on contribution and ICLA/CCLA
> form from this page.
> https://www.apache.org/licenses/contributor-agreements.html
> 
>
> Please let me know if this is resolved. This seems to me as a blocker to
> move on. Please also let me know if the contribution is withdrawn from the
> employer.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
>
> On Mon, Mar 25, 2024 at 11:47 PM Bhuwan Sahni
>  wrote:
>
>> Hi Pavan,
>>
>> I looked at the PR, and the changes look simple and contained. It would
>> be useful to add dynamic resource allocation to Spark Structured Streaming.
>>
>> Jungtaek. Would you be able to shepherd this change?
>>
>>
>> On Tue, Mar 19, 2024 at 10:38 AM Bhuwan Sahni <
>> bhuwan.sa...@databricks.com> wrote:
>>
>>> Thanks a lot for creating the risk table Pavan. My apologies. I was tied
>>> up with high priority items for the last couple weeks and could not
>>> respond. I will review the PR by tomorrow's end, and get back to you.
>>>
>>> Appreciate your patience.
>>>
>>> Thanks
>>> Bhuwan Sahni
>>>
>>> On Sun, Mar 17, 2024 at 4:42 PM Pavan Kotikalapudi <
>>> pkotikalap...@twilio.com> wrote:
>>>
 Hi Bhuwan,

 I hope the team got a chance to review the draft PR, looking for some
 comments to see if the plan looks alright?. I have updated the document
 about the risks
 .(also
 mentioned below). Please confirm if it looks alright?

 *Spark application type*

 *auto-scaling capability*

 *with New auto-scaling capability*

 Spark Batch job

 Works with current DRA

 No - change

 Streaming query without trigger interval

 No implementation

 Can work with this implementation - (have to set certain scale back
 configs based on previous usage pattern) - maybe automate with future work?

 Spark Streaming query with Trigger interval

 No implementation

 With this implementation

 Spark Streaming query with one-time micro batch

 Works with current DRA

 No - change

 Spark Streaming query with

 Availablenow micro batch

 Works with current DRA

 No - change

 Batch + Streaming query (

 

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Jungtaek Lim
I'm happy to, but it looks like I need to check one more thing about the
license, according to the WIP PR
.

@Pavan Kotikalapudi 
I see you've added the copyright of Twilio in the NOTICE-binary file, which
makes me wonder if Twilio had filed CCLA to the Apache Software Foundation.

PMC members can correct me if I'm mistaken, but from my understanding (and
experiences of PMC member in other ASF project), code contribution is
considered as code donation and copyright belongs to ASF. That's why you
can't find the copyright of employers for contributors in the codebase.
What you see copyrights in NOTICE-binary is due to the fact we have binary
dependency and their licenses may require to explicitly mention about
copyright. It's not about direct code contribution.

Is Twilio aware of this? Also, if Twilio did not file CCLA in prior, could
you please engage with a relevant group in the company (could be a legal
team, or similar with OSS advocate team if there is any) and ensure that
CCLA is filed? The copyright issue is a legal issue, so we have to be
conservative and 100% sure that the employer is aware of what is the
meaning of donating the code to ASF via reviewing CCLA and relevant doc,
and explicitly express that they are OK with it via filing CCLA.

You can read the description of agreements on contribution and ICLA/CCLA
form from this page.
https://www.apache.org/licenses/contributor-agreements.html

Please let me know if this is resolved. This seems to me as a blocker to
move on. Please also let me know if the contribution is withdrawn from the
employer.

Thanks,
Jungtaek Lim (HeartSaVioR)


On Mon, Mar 25, 2024 at 11:47 PM Bhuwan Sahni
 wrote:

> Hi Pavan,
>
> I looked at the PR, and the changes look simple and contained. It would be
> useful to add dynamic resource allocation to Spark Structured Streaming.
>
> Jungtaek. Would you be able to shepherd this change?
>
>
> On Tue, Mar 19, 2024 at 10:38 AM Bhuwan Sahni 
> wrote:
>
>> Thanks a lot for creating the risk table Pavan. My apologies. I was tied
>> up with high priority items for the last couple weeks and could not
>> respond. I will review the PR by tomorrow's end, and get back to you.
>>
>> Appreciate your patience.
>>
>> Thanks
>> Bhuwan Sahni
>>
>> On Sun, Mar 17, 2024 at 4:42 PM Pavan Kotikalapudi <
>> pkotikalap...@twilio.com> wrote:
>>
>>> Hi Bhuwan,
>>>
>>> I hope the team got a chance to review the draft PR, looking for some
>>> comments to see if the plan looks alright?. I have updated the document
>>> about the risks
>>> .(also
>>> mentioned below). Please confirm if it looks alright?
>>>
>>> *Spark application type*
>>>
>>> *auto-scaling capability*
>>>
>>> *with New auto-scaling capability*
>>>
>>> Spark Batch job
>>>
>>> Works with current DRA
>>>
>>> No - change
>>>
>>> Streaming query without trigger interval
>>>
>>> No implementation
>>>
>>> Can work with this implementation - (have to set certain scale back
>>> configs based on previous usage pattern) - maybe automate with future work?
>>>
>>> Spark Streaming query with Trigger interval
>>>
>>> No implementation
>>>
>>> With this implementation
>>>
>>> Spark Streaming query with one-time micro batch
>>>
>>> Works with current DRA
>>>
>>> No - change
>>>
>>> Spark Streaming query with
>>>
>>> Availablenow micro batch
>>>
>>> Works with current DRA
>>>
>>> No - change
>>>
>>> Batch + Streaming query (
>>>
>>> default/
>>>
>>> triggger-interval/
>>>
>>> once/
>>>
>>> availablenow modes), other notebook use cases.
>>>
>>> No implementation
>>>
>>> No implementation
>>>
>>>
>>>
>>> We are more than happy to collaborate on a call to make better progress
>>> on this enhancement. Please let us know.
>>>
>>> Thank you,
>>>
>>> Pavan
>>>
>>> On Fri, Mar 1, 2024 at 12:26 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>

 Hi Bhuwan et al,

 Thank you for passing on the DataBricks Structured Streaming team's
 review of the SPIP document. FYI, I work closely with Pawan and other
 members to help deliver this piece of work. We appreciate your insights,
 especially regarding the cost savings potential from the PoC.

 Pavan already furnished you with some additional info. Your team's
 point about the SPIP currently addressing a specific use case (single
 streaming query with Processing Time trigger) is well-taken. We agree that
 maintaining simplicity is key, particularly as we explore more general
 resource allocation mechanisms in the future. To address the concerns and
 foster open discussion, The DataBricks team are invited to directly add
 their comments and suggestions to the Jira itself

 [SPARK-24815] Structured Streaming should support dynamic allocation -
 ASF JIRA (apache.org)