Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-23 Thread Jungtaek Lim
Thanks for figuring this out. That is my bad. My understanding is that
3.5.1 RC2 doc should be correctly generated in VOTE but it happened during
the finalization step.

I lost the build artifact for docs (I followed steps and removed docs from
dev dist before realizing I shouldn't remove them) and I accidentally
rebuilt the doc with the branch which I used for debugging issue in RC.

I'll rebuild the doc from tag and submit a PR again.

On Sat, Feb 24, 2024 at 7:16 AM Dongjoon Hyun 
wrote:

> Hi, All.
>
> Unfortunately, the Apache Spark `3.5.1 RC2` document artifact seems to be
> generated from unknown source code instead of the correct source code of
> the tag, `3.5.1`.
>
> https://spark.apache.org/docs/3.5.1/
>
> [image: Screenshot 2024-02-23 at 14.13.07.png]
>
> Dongjoon.
>
>
>
> On Wed, Feb 21, 2024 at 7:15 AM Jungtaek Lim 
> wrote:
>
>> Thanks everyone for participating the vote! The vote passed.
>> I'll send out the vote result and proceed to the next steps.
>>
>> On Wed, Feb 21, 2024 at 4:36 PM Maxim Gekk 
>> wrote:
>>
>>> +1
>>>
>>> On Wed, Feb 21, 2024 at 9:50 AM Hyukjin Kwon 
>>> wrote:
>>>
 +1

 On Tue, 20 Feb 2024 at 22:00, Cheng Pan  wrote:

> +1 (non-binding)
>
> - Build successfully from source code.
> - Pass integration tests with Spark ClickHouse Connector[1]
>
> [1] https://github.com/housepower/spark-clickhouse-connector/pull/299
>
> Thanks,
> Cheng Pan
>
>
> > On Feb 20, 2024, at 10:56, Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
> >
> > Thanks Sean, let's continue the process for this RC.
> >
> > +1 (non-binding)
> >
> > - downloaded all files from URL
> > - checked signature
> > - extracted all archives
> > - ran all tests from source files in source archive file, via
> running "sbt clean test package" - Ubuntu 20.04.4 LTS, OpenJDK 17.0.9.
> >
> > Also bump to dev@ to encourage participation - looks like the
> timing is not good for US folks but let's see more days.
> >
> >
> > On Sat, Feb 17, 2024 at 1:49 AM Sean Owen  wrote:
> > Yeah let's get that fix in, but it seems to be a minor test only
> issue so should not block release.
> >
> > On Fri, Feb 16, 2024, 9:30 AM yangjie01  wrote:
> > Very sorry. When I was fixing `SPARK-45242 (
> https://github.com/apache/spark/pull/43594)`
> , I noticed that its
> `Affects Version` and `Fix Version` of SPARK-45242 were both 4.0, and I
> didn't realize that it had also been merged into branch-3.5, so I didn't
> advocate for SPARK-45357 to be backported to branch-3.5.
> >  As far as I know, the condition to trigger this test failure is:
> when using Maven to test the `connect` module, if  `sparkTestRelation` in
> `SparkConnectProtoSuite` is not the first `DataFrame` to be initialized,
> then the `id` of `sparkTestRelation` will no longer be 0. So, I think this
> is indeed related to the order in which Maven executes the test cases in
> the `connect` module.
> >  I have submitted a backport PR to branch-3.5, and if necessary, we
> can merge it to fix this test issue.
> >  Jie Yang
> >   发件人: Jungtaek Lim 
> > 日期: 2024年2月16日 星期五 22:15
> > 收件人: Sean Owen , Rui Wang 
> > 抄送: dev 
> > 主题: Re: [VOTE] Release Apache Spark 3.5.1 (RC2)
> >   I traced back relevant changes and got a sense of what happened.
> >   Yangjie figured out the issue via link. It's a tricky issue
> according to the comments from Yangjie - the test is dependent on ordering
> of execution for test suites. He said it does not fail in sbt, hence CI
> build couldn't catch it.
> > He fixed it via link, but we missed that the offending commit was
> also ported back to 3.5 as well, hence the fix wasn't ported back to 3.5.
> >   Surprisingly, I can't reproduce locally even with maven. In my
> attempt to reproduce, SparkConnectProtoSuite was executed at third,
> SparkConnectStreamingQueryCacheSuite, and ExecuteEventsManagerSuite, and
> then SparkConnectProtoSuite. Maybe very specific to the environment, not
> just maven? My env: MBP M1 pro chip, MacOS 14.3.1, Openjdk 17.0.9. I used
> build/mvn (Maven 3.8.8).
> >   I'm not 100% sure this is something we should fail the release as
> it's a test only and sounds very environment dependent, but I'll respect
> your call on vote.
> >   Btw, looks like Rui also made a relevant fix via link (not to fix
> the failing test but to fix other issues), but this also wasn't ported 
> back
> to 3.5. @Rui Wang Do you think this is a regression issue and warrants a
> new RC?
> > On Fri, Feb 16, 2024 at 11:38 AM Sean Owen 
> wrote:
> > Is anyone seeing this Spark Connect test failure? then again, I have
> some weird issue with this env that always fails 1 or 2 tests that nobody

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-23 Thread Dongjoon Hyun
Hi, All.

Unfortunately, the Apache Spark `3.5.1 RC2` document artifact seems to be
generated from unknown source code instead of the correct source code of
the tag, `3.5.1`.

https://spark.apache.org/docs/3.5.1/

[image: Screenshot 2024-02-23 at 14.13.07.png]

Dongjoon.



On Wed, Feb 21, 2024 at 7:15 AM Jungtaek Lim 
wrote:

> Thanks everyone for participating the vote! The vote passed.
> I'll send out the vote result and proceed to the next steps.
>
> On Wed, Feb 21, 2024 at 4:36 PM Maxim Gekk 
> wrote:
>
>> +1
>>
>> On Wed, Feb 21, 2024 at 9:50 AM Hyukjin Kwon 
>> wrote:
>>
>>> +1
>>>
>>> On Tue, 20 Feb 2024 at 22:00, Cheng Pan  wrote:
>>>
 +1 (non-binding)

 - Build successfully from source code.
 - Pass integration tests with Spark ClickHouse Connector[1]

 [1] https://github.com/housepower/spark-clickhouse-connector/pull/299

 Thanks,
 Cheng Pan


 > On Feb 20, 2024, at 10:56, Jungtaek Lim 
 wrote:
 >
 > Thanks Sean, let's continue the process for this RC.
 >
 > +1 (non-binding)
 >
 > - downloaded all files from URL
 > - checked signature
 > - extracted all archives
 > - ran all tests from source files in source archive file, via running
 "sbt clean test package" - Ubuntu 20.04.4 LTS, OpenJDK 17.0.9.
 >
 > Also bump to dev@ to encourage participation - looks like the timing
 is not good for US folks but let's see more days.
 >
 >
 > On Sat, Feb 17, 2024 at 1:49 AM Sean Owen  wrote:
 > Yeah let's get that fix in, but it seems to be a minor test only
 issue so should not block release.
 >
 > On Fri, Feb 16, 2024, 9:30 AM yangjie01  wrote:
 > Very sorry. When I was fixing `SPARK-45242 (
 https://github.com/apache/spark/pull/43594)`
 , I noticed that its
 `Affects Version` and `Fix Version` of SPARK-45242 were both 4.0, and I
 didn't realize that it had also been merged into branch-3.5, so I didn't
 advocate for SPARK-45357 to be backported to branch-3.5.
 >  As far as I know, the condition to trigger this test failure is:
 when using Maven to test the `connect` module, if  `sparkTestRelation` in
 `SparkConnectProtoSuite` is not the first `DataFrame` to be initialized,
 then the `id` of `sparkTestRelation` will no longer be 0. So, I think this
 is indeed related to the order in which Maven executes the test cases in
 the `connect` module.
 >  I have submitted a backport PR to branch-3.5, and if necessary, we
 can merge it to fix this test issue.
 >  Jie Yang
 >   发件人: Jungtaek Lim 
 > 日期: 2024年2月16日 星期五 22:15
 > 收件人: Sean Owen , Rui Wang 
 > 抄送: dev 
 > 主题: Re: [VOTE] Release Apache Spark 3.5.1 (RC2)
 >   I traced back relevant changes and got a sense of what happened.
 >   Yangjie figured out the issue via link. It's a tricky issue
 according to the comments from Yangjie - the test is dependent on ordering
 of execution for test suites. He said it does not fail in sbt, hence CI
 build couldn't catch it.
 > He fixed it via link, but we missed that the offending commit was
 also ported back to 3.5 as well, hence the fix wasn't ported back to 3.5.
 >   Surprisingly, I can't reproduce locally even with maven. In my
 attempt to reproduce, SparkConnectProtoSuite was executed at third,
 SparkConnectStreamingQueryCacheSuite, and ExecuteEventsManagerSuite, and
 then SparkConnectProtoSuite. Maybe very specific to the environment, not
 just maven? My env: MBP M1 pro chip, MacOS 14.3.1, Openjdk 17.0.9. I used
 build/mvn (Maven 3.8.8).
 >   I'm not 100% sure this is something we should fail the release as
 it's a test only and sounds very environment dependent, but I'll respect
 your call on vote.
 >   Btw, looks like Rui also made a relevant fix via link (not to fix
 the failing test but to fix other issues), but this also wasn't ported back
 to 3.5. @Rui Wang Do you think this is a regression issue and warrants a
 new RC?
 > On Fri, Feb 16, 2024 at 11:38 AM Sean Owen 
 wrote:
 > Is anyone seeing this Spark Connect test failure? then again, I have
 some weird issue with this env that always fails 1 or 2 tests that nobody
 else can replicate.
 >   - Test observe *** FAILED ***
 >   == FAIL: Plans do not match ===
 >   !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS
 max_val#0, sum(id#0) AS sum(id)#0L], 0   CollectMetrics my_metric,
 [min(id#0) AS min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L],
 44
 >+- LocalRelation , [id#0, name#0]
+- LocalRelation , [id#0,
 name#0] (PlanTest.scala:179)
 >   On Thu, Feb 15, 2024 at 1:34 PM Jungtaek Lim <
 kabhwan.opensou...@gmail.com> wrote:
 > DISCLAIMER: RC for Apache Spark 3.5.1 starts with RC2 as I 

Proposal about moving on from the Shepherd terminology in SPIPs

2024-02-23 Thread Mich Talebzadeh
We had a discussion about getting a Shepherd to assist with Structured
streaming SPIP a few hours ago.

As an active member I am proposing a move to replace the current
terminology "SPIP Shepherd" with the more respectful and inclusive term
"SPIP Mentor." We have over the past few years have tried to replace some
past terminologies with more acceptable ones.

While some may not find "Shepherd" offensive, it can unintentionally imply
passivity or dependence on community members, which might not accurately
reflect their expertise and contributions. Additionally, the shepherd-sheep
dynamic might be interpreted as hierarchical, which does not align with the
collaborative and open nature of Spark community.

*"SPIP Mentor"* better emphasizes the collaborative nature of the process,
focusing on supporting and guiding members while respecting their strengths
and contributions. It also avoids any potentially offensive or hierarchical
connotations.
Great if you share your thoughts and participate in discussion to consider
this proposal and discuss any potential challenges or solutions during the
transition period.in SPIP (assuming we accept this or another alternative
proposal).

HTH

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Mich Talebzadeh
Hi Pavan and those who kindly voted for this SPIP

Great to have 6+ votes and no -1 and 0. The so-called mass volume is there.
The rest is admin matter and how to drive the project forward and yes there
is more than one way of skinning the cat. I think we need some flexibility
in the rules given the dwindling (IMO) number of comitters who are willing
or actively participating. For example, on a similar matter I
approached Codi Koeninger who was one of the founders of Spark Streaming,
to shepherd a project almost a year back. Sadly he is no longer active and
quotes "I haven't been involved lately and would be missing a lot of
context." So we need to improvise and see how best we can drive this and
similar ones. We wait a short while for a response otherwise I am happy to
give a hand if needed and work with you guys to drive this. It is something
worthwhile.

HTH

T
Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Fri, 23 Feb 2024 at 17:41, Pavan Kotikalapudi
 wrote:

> Thanks for the pointers Mich, will wait for Jungtaek Lee or any other PMC
> members to respond.
>
> aggregating upvotes to this email thread
>
> +6
> Mich Talebzadeh
> Adam Hobbs
> Pavan Kotikalapudi
> Krystal Mitchell
> Sona Torosyan
> Aaron Kern
>
> Thank you,
>
> Pavan
>
> On Thu, Feb 22, 2024 at 3:07 PM Mich Talebzadeh 
> wrote:
>
>> Hi,
>>
>> please check this doc
>>
>> Spark Project Improvement Proposals (SPIP) | Apache Spark
>> 
>>
>> and specifically the below extract
>>
>> Discussing an SPIP
>>
>> All discussion of an SPIP should take place in a public forum, preferably
>> the discussion attached to the Jira. Any discussions that happen offline
>> should be made available online for the public via meeting notes
>> summarizing the discussions.(done)
>>
>> During this discussion, one or more shepherds should be identified among
>> PMC members. (outstanding)
>>
>> Once the discussion settles, the shepherd(s) should call for a vote on
>> the SPIP moving forward on the dev@ list. The vote should be open for at
>> least 72 hours and follows the typical Apache vote process and passes upon
>> consensus (at least 3 +1 votes from PMC members and no -1 votes from PMC
>> members). dev@ should be notified of the vote result.
>>
>> If there does not exist at least one PMC member that is committed to
>> shepherding the change within a month, the SPIP is rejected.
>>
>> If a committer does not think a SPIP aligns with long-term project goals,
>> or is not practical at the point of proposal, the committer should -1 the
>> SPIP explicitly and give technical justifications.
>> OK a shepherd from PMC members is required. Maybe Jungtaek Lee can kindly
>> help the process
>>
>> cheers
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>> 
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> Von
>> Braun
>> 
>> )".
>>
>>
>> On Thu, 22 Feb 2024 at 21:52, Pavan Kotikalapudi
>>  wrote:
>>
>>> Hi Mich,
>>>
>>> We have
>>>
>>> five  +1s till now.
>>>
>>> Mich Talebzadeh
>>> Adam Hobbs
>>> Pavan Kotikalapudi
>>> Krystal Mitchell
>>> Sona Torosyan
>>> (few more in github pr)
>>> +0: None
>>>
>>> -1: None
>>>
>>> Does it pass the required condition as approved?
>>>
>>>

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Pavan Kotikalapudi
Thanks for the pointers Mich, will wait for Jungtaek Lee or any other PMC
members to respond.

aggregating upvotes to this email thread

+6
Mich Talebzadeh
Adam Hobbs
Pavan Kotikalapudi
Krystal Mitchell
Sona Torosyan
Aaron Kern

Thank you,

Pavan

On Thu, Feb 22, 2024 at 3:07 PM Mich Talebzadeh 
wrote:

> Hi,
>
> please check this doc
>
> Spark Project Improvement Proposals (SPIP) | Apache Spark
> 
>
> and specifically the below extract
>
> Discussing an SPIP
>
> All discussion of an SPIP should take place in a public forum, preferably
> the discussion attached to the Jira. Any discussions that happen offline
> should be made available online for the public via meeting notes
> summarizing the discussions.(done)
>
> During this discussion, one or more shepherds should be identified among
> PMC members. (outstanding)
>
> Once the discussion settles, the shepherd(s) should call for a vote on the
> SPIP moving forward on the dev@ list. The vote should be open for at
> least 72 hours and follows the typical Apache vote process and passes upon
> consensus (at least 3 +1 votes from PMC members and no -1 votes from PMC
> members). dev@ should be notified of the vote result.
>
> If there does not exist at least one PMC member that is committed to
> shepherding the change within a month, the SPIP is rejected.
>
> If a committer does not think a SPIP aligns with long-term project goals,
> or is not practical at the point of proposal, the committer should -1 the
> SPIP explicitly and give technical justifications.
> OK a shepherd from PMC members is required. Maybe Jungtaek Lee can kindly
> help the process
>
> cheers
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
> 
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner
> Von
> Braun
> 
> )".
>
>
> On Thu, 22 Feb 2024 at 21:52, Pavan Kotikalapudi
>  wrote:
>
>> Hi Mich,
>>
>> We have
>>
>> five  +1s till now.
>>
>> Mich Talebzadeh
>> Adam Hobbs
>> Pavan Kotikalapudi
>> Krystal Mitchell
>> Sona Torosyan
>> (few more in github pr)
>> +0: None
>>
>> -1: None
>>
>> Does it pass the required condition as approved?
>>
>>
>> Not sure of that though, nothing about minimum required is mentioned in
>> the past emails.
>>
>> I would request spark PMC members or any others who have done this in the
>> past to understand the process better.
>>
>> Thank you,
>>
>> Pavan
>>
>> On Thu, Feb 22, 2024 at 3:20 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Pavan,
>>>
>>> Do you have a list of votes for this feature by any chance? Does it pass
>>> the required condition as approved?
>>>
>>> HTH
>>>
>>> Mich Talebzadeh,
>>> Dad | Technologist | Solutions Architect | Engineer
>>> London
>>> United Kingdom
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>> 
>>>
>>>
>>>
>>> *Disclaimer:* The information provided is correct to the best of my
>>> knowledge but of course cannot be guaranteed . It is essential to note
>>> that, as with any advice, quote "one test result is worth one-thousand
>>> expert opinions (Werner
>>> Von
>>> Braun
>>> 

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Mich Talebzadeh
+1 for me

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Fri, 23 Feb 2024 at 16:05, Aaron Kern  wrote:

> +1
>


Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Aaron Kern
+1