Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-23 Thread Jungtaek Lim
Thanks for figuring this out. That is my bad. My understanding is that
3.5.1 RC2 doc should be correctly generated in VOTE but it happened during
the finalization step.

I lost the build artifact for docs (I followed steps and removed docs from
dev dist before realizing I shouldn't remove them) and I accidentally
rebuilt the doc with the branch which I used for debugging issue in RC.

I'll rebuild the doc from tag and submit a PR again.

On Sat, Feb 24, 2024 at 7:16 AM Dongjoon Hyun 
wrote:

> Hi, All.
>
> Unfortunately, the Apache Spark `3.5.1 RC2` document artifact seems to be
> generated from unknown source code instead of the correct source code of
> the tag, `3.5.1`.
>
> https://spark.apache.org/docs/3.5.1/
>
> [image: Screenshot 2024-02-23 at 14.13.07.png]
>
> Dongjoon.
>
>
>
> On Wed, Feb 21, 2024 at 7:15 AM Jungtaek Lim 
> wrote:
>
>> Thanks everyone for participating the vote! The vote passed.
>> I'll send out the vote result and proceed to the next steps.
>>
>> On Wed, Feb 21, 2024 at 4:36 PM Maxim Gekk 
>> wrote:
>>
>>> +1
>>>
>>> On Wed, Feb 21, 2024 at 9:50 AM Hyukjin Kwon 
>>> wrote:
>>>
>>>> +1
>>>>
>>>> On Tue, 20 Feb 2024 at 22:00, Cheng Pan  wrote:
>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>> - Build successfully from source code.
>>>>> - Pass integration tests with Spark ClickHouse Connector[1]
>>>>>
>>>>> [1] https://github.com/housepower/spark-clickhouse-connector/pull/299
>>>>>
>>>>> Thanks,
>>>>> Cheng Pan
>>>>>
>>>>>
>>>>> > On Feb 20, 2024, at 10:56, Jungtaek Lim <
>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>> >
>>>>> > Thanks Sean, let's continue the process for this RC.
>>>>> >
>>>>> > +1 (non-binding)
>>>>> >
>>>>> > - downloaded all files from URL
>>>>> > - checked signature
>>>>> > - extracted all archives
>>>>> > - ran all tests from source files in source archive file, via
>>>>> running "sbt clean test package" - Ubuntu 20.04.4 LTS, OpenJDK 17.0.9.
>>>>> >
>>>>> > Also bump to dev@ to encourage participation - looks like the
>>>>> timing is not good for US folks but let's see more days.
>>>>> >
>>>>> >
>>>>> > On Sat, Feb 17, 2024 at 1:49 AM Sean Owen  wrote:
>>>>> > Yeah let's get that fix in, but it seems to be a minor test only
>>>>> issue so should not block release.
>>>>> >
>>>>> > On Fri, Feb 16, 2024, 9:30 AM yangjie01  wrote:
>>>>> > Very sorry. When I was fixing `SPARK-45242 (
>>>>> https://github.com/apache/spark/pull/43594)`
>>>>> <https://github.com/apache/spark/pull/43594)>, I noticed that its
>>>>> `Affects Version` and `Fix Version` of SPARK-45242 were both 4.0, and I
>>>>> didn't realize that it had also been merged into branch-3.5, so I didn't
>>>>> advocate for SPARK-45357 to be backported to branch-3.5.
>>>>> >  As far as I know, the condition to trigger this test failure is:
>>>>> when using Maven to test the `connect` module, if  `sparkTestRelation` in
>>>>> `SparkConnectProtoSuite` is not the first `DataFrame` to be initialized,
>>>>> then the `id` of `sparkTestRelation` will no longer be 0. So, I think this
>>>>> is indeed related to the order in which Maven executes the test cases in
>>>>> the `connect` module.
>>>>> >  I have submitted a backport PR to branch-3.5, and if necessary, we
>>>>> can merge it to fix this test issue.
>>>>> >  Jie Yang
>>>>> >   发件人: Jungtaek Lim 
>>>>> > 日期: 2024年2月16日 星期五 22:15
>>>>> > 收件人: Sean Owen , Rui Wang 
>>>>> > 抄送: dev 
>>>>> > 主题: Re: [VOTE] Release Apache Spark 3.5.1 (RC2)
>>>>> >   I traced back relevant changes and got a sense of what happened.
>>>>> >   Yangjie figured out the issue via link. It's a tricky issue
>>>>> according to the comments from Yangjie - the test is dependent on ordering
>>>>> of execution for test suites. He said it does not fail in sbt, hence CI
>>>>> build couldn't catch it.
>>>>> > He fixed it via link, but we missed that the off

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-23 Thread Dongjoon Hyun
Hi, All.

Unfortunately, the Apache Spark `3.5.1 RC2` document artifact seems to be
generated from unknown source code instead of the correct source code of
the tag, `3.5.1`.

https://spark.apache.org/docs/3.5.1/

[image: Screenshot 2024-02-23 at 14.13.07.png]

Dongjoon.



On Wed, Feb 21, 2024 at 7:15 AM Jungtaek Lim 
wrote:

> Thanks everyone for participating the vote! The vote passed.
> I'll send out the vote result and proceed to the next steps.
>
> On Wed, Feb 21, 2024 at 4:36 PM Maxim Gekk 
> wrote:
>
>> +1
>>
>> On Wed, Feb 21, 2024 at 9:50 AM Hyukjin Kwon 
>> wrote:
>>
>>> +1
>>>
>>> On Tue, 20 Feb 2024 at 22:00, Cheng Pan  wrote:
>>>
>>>> +1 (non-binding)
>>>>
>>>> - Build successfully from source code.
>>>> - Pass integration tests with Spark ClickHouse Connector[1]
>>>>
>>>> [1] https://github.com/housepower/spark-clickhouse-connector/pull/299
>>>>
>>>> Thanks,
>>>> Cheng Pan
>>>>
>>>>
>>>> > On Feb 20, 2024, at 10:56, Jungtaek Lim 
>>>> wrote:
>>>> >
>>>> > Thanks Sean, let's continue the process for this RC.
>>>> >
>>>> > +1 (non-binding)
>>>> >
>>>> > - downloaded all files from URL
>>>> > - checked signature
>>>> > - extracted all archives
>>>> > - ran all tests from source files in source archive file, via running
>>>> "sbt clean test package" - Ubuntu 20.04.4 LTS, OpenJDK 17.0.9.
>>>> >
>>>> > Also bump to dev@ to encourage participation - looks like the timing
>>>> is not good for US folks but let's see more days.
>>>> >
>>>> >
>>>> > On Sat, Feb 17, 2024 at 1:49 AM Sean Owen  wrote:
>>>> > Yeah let's get that fix in, but it seems to be a minor test only
>>>> issue so should not block release.
>>>> >
>>>> > On Fri, Feb 16, 2024, 9:30 AM yangjie01  wrote:
>>>> > Very sorry. When I was fixing `SPARK-45242 (
>>>> https://github.com/apache/spark/pull/43594)`
>>>> <https://github.com/apache/spark/pull/43594)>, I noticed that its
>>>> `Affects Version` and `Fix Version` of SPARK-45242 were both 4.0, and I
>>>> didn't realize that it had also been merged into branch-3.5, so I didn't
>>>> advocate for SPARK-45357 to be backported to branch-3.5.
>>>> >  As far as I know, the condition to trigger this test failure is:
>>>> when using Maven to test the `connect` module, if  `sparkTestRelation` in
>>>> `SparkConnectProtoSuite` is not the first `DataFrame` to be initialized,
>>>> then the `id` of `sparkTestRelation` will no longer be 0. So, I think this
>>>> is indeed related to the order in which Maven executes the test cases in
>>>> the `connect` module.
>>>> >  I have submitted a backport PR to branch-3.5, and if necessary, we
>>>> can merge it to fix this test issue.
>>>> >  Jie Yang
>>>> >   发件人: Jungtaek Lim 
>>>> > 日期: 2024年2月16日 星期五 22:15
>>>> > 收件人: Sean Owen , Rui Wang 
>>>> > 抄送: dev 
>>>> > 主题: Re: [VOTE] Release Apache Spark 3.5.1 (RC2)
>>>> >   I traced back relevant changes and got a sense of what happened.
>>>> >   Yangjie figured out the issue via link. It's a tricky issue
>>>> according to the comments from Yangjie - the test is dependent on ordering
>>>> of execution for test suites. He said it does not fail in sbt, hence CI
>>>> build couldn't catch it.
>>>> > He fixed it via link, but we missed that the offending commit was
>>>> also ported back to 3.5 as well, hence the fix wasn't ported back to 3.5.
>>>> >   Surprisingly, I can't reproduce locally even with maven. In my
>>>> attempt to reproduce, SparkConnectProtoSuite was executed at third,
>>>> SparkConnectStreamingQueryCacheSuite, and ExecuteEventsManagerSuite, and
>>>> then SparkConnectProtoSuite. Maybe very specific to the environment, not
>>>> just maven? My env: MBP M1 pro chip, MacOS 14.3.1, Openjdk 17.0.9. I used
>>>> build/mvn (Maven 3.8.8).
>>>> >   I'm not 100% sure this is something we should fail the release as
>>>> it's a test only and sounds very environment dependent, but I'll respect
>>>> your call on vote.
>>>> >   Btw, looks like Rui also made a relevant fix via link (not to fix
>&

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-21 Thread Jungtaek Lim
Thanks everyone for participating the vote! The vote passed.
I'll send out the vote result and proceed to the next steps.

On Wed, Feb 21, 2024 at 4:36 PM Maxim Gekk 
wrote:

> +1
>
> On Wed, Feb 21, 2024 at 9:50 AM Hyukjin Kwon  wrote:
>
>> +1
>>
>> On Tue, 20 Feb 2024 at 22:00, Cheng Pan  wrote:
>>
>>> +1 (non-binding)
>>>
>>> - Build successfully from source code.
>>> - Pass integration tests with Spark ClickHouse Connector[1]
>>>
>>> [1] https://github.com/housepower/spark-clickhouse-connector/pull/299
>>>
>>> Thanks,
>>> Cheng Pan
>>>
>>>
>>> > On Feb 20, 2024, at 10:56, Jungtaek Lim 
>>> wrote:
>>> >
>>> > Thanks Sean, let's continue the process for this RC.
>>> >
>>> > +1 (non-binding)
>>> >
>>> > - downloaded all files from URL
>>> > - checked signature
>>> > - extracted all archives
>>> > - ran all tests from source files in source archive file, via running
>>> "sbt clean test package" - Ubuntu 20.04.4 LTS, OpenJDK 17.0.9.
>>> >
>>> > Also bump to dev@ to encourage participation - looks like the timing
>>> is not good for US folks but let's see more days.
>>> >
>>> >
>>> > On Sat, Feb 17, 2024 at 1:49 AM Sean Owen  wrote:
>>> > Yeah let's get that fix in, but it seems to be a minor test only issue
>>> so should not block release.
>>> >
>>> > On Fri, Feb 16, 2024, 9:30 AM yangjie01  wrote:
>>> > Very sorry. When I was fixing `SPARK-45242 (
>>> https://github.com/apache/spark/pull/43594)`
>>> <https://github.com/apache/spark/pull/43594)>, I noticed that its
>>> `Affects Version` and `Fix Version` of SPARK-45242 were both 4.0, and I
>>> didn't realize that it had also been merged into branch-3.5, so I didn't
>>> advocate for SPARK-45357 to be backported to branch-3.5.
>>> >  As far as I know, the condition to trigger this test failure is: when
>>> using Maven to test the `connect` module, if  `sparkTestRelation` in
>>> `SparkConnectProtoSuite` is not the first `DataFrame` to be initialized,
>>> then the `id` of `sparkTestRelation` will no longer be 0. So, I think this
>>> is indeed related to the order in which Maven executes the test cases in
>>> the `connect` module.
>>> >  I have submitted a backport PR to branch-3.5, and if necessary, we
>>> can merge it to fix this test issue.
>>> >  Jie Yang
>>> >   发件人: Jungtaek Lim 
>>> > 日期: 2024年2月16日 星期五 22:15
>>> > 收件人: Sean Owen , Rui Wang 
>>> > 抄送: dev 
>>> > 主题: Re: [VOTE] Release Apache Spark 3.5.1 (RC2)
>>> >   I traced back relevant changes and got a sense of what happened.
>>> >   Yangjie figured out the issue via link. It's a tricky issue
>>> according to the comments from Yangjie - the test is dependent on ordering
>>> of execution for test suites. He said it does not fail in sbt, hence CI
>>> build couldn't catch it.
>>> > He fixed it via link, but we missed that the offending commit was also
>>> ported back to 3.5 as well, hence the fix wasn't ported back to 3.5.
>>> >   Surprisingly, I can't reproduce locally even with maven. In my
>>> attempt to reproduce, SparkConnectProtoSuite was executed at third,
>>> SparkConnectStreamingQueryCacheSuite, and ExecuteEventsManagerSuite, and
>>> then SparkConnectProtoSuite. Maybe very specific to the environment, not
>>> just maven? My env: MBP M1 pro chip, MacOS 14.3.1, Openjdk 17.0.9. I used
>>> build/mvn (Maven 3.8.8).
>>> >   I'm not 100% sure this is something we should fail the release as
>>> it's a test only and sounds very environment dependent, but I'll respect
>>> your call on vote.
>>> >   Btw, looks like Rui also made a relevant fix via link (not to fix
>>> the failing test but to fix other issues), but this also wasn't ported back
>>> to 3.5. @Rui Wang Do you think this is a regression issue and warrants a
>>> new RC?
>>> > On Fri, Feb 16, 2024 at 11:38 AM Sean Owen 
>>> wrote:
>>> > Is anyone seeing this Spark Connect test failure? then again, I have
>>> some weird issue with this env that always fails 1 or 2 tests that nobody
>>> else can replicate.
>>> >   - Test observe *** FAILED ***
>>> >   == FAIL: Plans do not match ===
>>> >   !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS
>>

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-20 Thread Maxim Gekk
+1

On Wed, Feb 21, 2024 at 9:50 AM Hyukjin Kwon  wrote:

> +1
>
> On Tue, 20 Feb 2024 at 22:00, Cheng Pan  wrote:
>
>> +1 (non-binding)
>>
>> - Build successfully from source code.
>> - Pass integration tests with Spark ClickHouse Connector[1]
>>
>> [1] https://github.com/housepower/spark-clickhouse-connector/pull/299
>>
>> Thanks,
>> Cheng Pan
>>
>>
>> > On Feb 20, 2024, at 10:56, Jungtaek Lim 
>> wrote:
>> >
>> > Thanks Sean, let's continue the process for this RC.
>> >
>> > +1 (non-binding)
>> >
>> > - downloaded all files from URL
>> > - checked signature
>> > - extracted all archives
>> > - ran all tests from source files in source archive file, via running
>> "sbt clean test package" - Ubuntu 20.04.4 LTS, OpenJDK 17.0.9.
>> >
>> > Also bump to dev@ to encourage participation - looks like the timing
>> is not good for US folks but let's see more days.
>> >
>> >
>> > On Sat, Feb 17, 2024 at 1:49 AM Sean Owen  wrote:
>> > Yeah let's get that fix in, but it seems to be a minor test only issue
>> so should not block release.
>> >
>> > On Fri, Feb 16, 2024, 9:30 AM yangjie01  wrote:
>> > Very sorry. When I was fixing `SPARK-45242 (
>> https://github.com/apache/spark/pull/43594)`
>> <https://github.com/apache/spark/pull/43594)>, I noticed that its
>> `Affects Version` and `Fix Version` of SPARK-45242 were both 4.0, and I
>> didn't realize that it had also been merged into branch-3.5, so I didn't
>> advocate for SPARK-45357 to be backported to branch-3.5.
>> >  As far as I know, the condition to trigger this test failure is: when
>> using Maven to test the `connect` module, if  `sparkTestRelation` in
>> `SparkConnectProtoSuite` is not the first `DataFrame` to be initialized,
>> then the `id` of `sparkTestRelation` will no longer be 0. So, I think this
>> is indeed related to the order in which Maven executes the test cases in
>> the `connect` module.
>> >  I have submitted a backport PR to branch-3.5, and if necessary, we can
>> merge it to fix this test issue.
>> >  Jie Yang
>> >   发件人: Jungtaek Lim 
>> > 日期: 2024年2月16日 星期五 22:15
>> > 收件人: Sean Owen , Rui Wang 
>> > 抄送: dev 
>> > 主题: Re: [VOTE] Release Apache Spark 3.5.1 (RC2)
>> >   I traced back relevant changes and got a sense of what happened.
>> >   Yangjie figured out the issue via link. It's a tricky issue according
>> to the comments from Yangjie - the test is dependent on ordering of
>> execution for test suites. He said it does not fail in sbt, hence CI build
>> couldn't catch it.
>> > He fixed it via link, but we missed that the offending commit was also
>> ported back to 3.5 as well, hence the fix wasn't ported back to 3.5.
>> >   Surprisingly, I can't reproduce locally even with maven. In my
>> attempt to reproduce, SparkConnectProtoSuite was executed at third,
>> SparkConnectStreamingQueryCacheSuite, and ExecuteEventsManagerSuite, and
>> then SparkConnectProtoSuite. Maybe very specific to the environment, not
>> just maven? My env: MBP M1 pro chip, MacOS 14.3.1, Openjdk 17.0.9. I used
>> build/mvn (Maven 3.8.8).
>> >   I'm not 100% sure this is something we should fail the release as
>> it's a test only and sounds very environment dependent, but I'll respect
>> your call on vote.
>> >   Btw, looks like Rui also made a relevant fix via link (not to fix the
>> failing test but to fix other issues), but this also wasn't ported back to
>> 3.5. @Rui Wang Do you think this is a regression issue and warrants a new
>> RC?
>> > On Fri, Feb 16, 2024 at 11:38 AM Sean Owen 
>> wrote:
>> > Is anyone seeing this Spark Connect test failure? then again, I have
>> some weird issue with this env that always fails 1 or 2 tests that nobody
>> else can replicate.
>> >   - Test observe *** FAILED ***
>> >   == FAIL: Plans do not match ===
>> >   !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS
>> max_val#0, sum(id#0) AS sum(id)#0L], 0   CollectMetrics my_metric,
>> [min(id#0) AS min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L],
>> 44
>> >+- LocalRelation , [id#0, name#0]
>>  +- LocalRelation , [id#0,
>> name#0] (PlanTest.scala:179)
>> >   On Thu, Feb 15, 2024 at 1:34 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>> > DISCLAIMER: RC for Apache Spark 3.5.1 starts with RC2 as I lately
>> figure

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-20 Thread Hyukjin Kwon
+1

On Tue, 20 Feb 2024 at 22:00, Cheng Pan  wrote:

> +1 (non-binding)
>
> - Build successfully from source code.
> - Pass integration tests with Spark ClickHouse Connector[1]
>
> [1] https://github.com/housepower/spark-clickhouse-connector/pull/299
>
> Thanks,
> Cheng Pan
>
>
> > On Feb 20, 2024, at 10:56, Jungtaek Lim 
> wrote:
> >
> > Thanks Sean, let's continue the process for this RC.
> >
> > +1 (non-binding)
> >
> > - downloaded all files from URL
> > - checked signature
> > - extracted all archives
> > - ran all tests from source files in source archive file, via running
> "sbt clean test package" - Ubuntu 20.04.4 LTS, OpenJDK 17.0.9.
> >
> > Also bump to dev@ to encourage participation - looks like the timing is
> not good for US folks but let's see more days.
> >
> >
> > On Sat, Feb 17, 2024 at 1:49 AM Sean Owen  wrote:
> > Yeah let's get that fix in, but it seems to be a minor test only issue
> so should not block release.
> >
> > On Fri, Feb 16, 2024, 9:30 AM yangjie01  wrote:
> > Very sorry. When I was fixing `SPARK-45242 (
> https://github.com/apache/spark/pull/43594)`
> <https://github.com/apache/spark/pull/43594)>, I noticed that its
> `Affects Version` and `Fix Version` of SPARK-45242 were both 4.0, and I
> didn't realize that it had also been merged into branch-3.5, so I didn't
> advocate for SPARK-45357 to be backported to branch-3.5.
> >  As far as I know, the condition to trigger this test failure is: when
> using Maven to test the `connect` module, if  `sparkTestRelation` in
> `SparkConnectProtoSuite` is not the first `DataFrame` to be initialized,
> then the `id` of `sparkTestRelation` will no longer be 0. So, I think this
> is indeed related to the order in which Maven executes the test cases in
> the `connect` module.
> >  I have submitted a backport PR to branch-3.5, and if necessary, we can
> merge it to fix this test issue.
> >  Jie Yang
> >   发件人: Jungtaek Lim 
> > 日期: 2024年2月16日 星期五 22:15
> > 收件人: Sean Owen , Rui Wang 
> > 抄送: dev 
> > 主题: Re: [VOTE] Release Apache Spark 3.5.1 (RC2)
> >   I traced back relevant changes and got a sense of what happened.
> >   Yangjie figured out the issue via link. It's a tricky issue according
> to the comments from Yangjie - the test is dependent on ordering of
> execution for test suites. He said it does not fail in sbt, hence CI build
> couldn't catch it.
> > He fixed it via link, but we missed that the offending commit was also
> ported back to 3.5 as well, hence the fix wasn't ported back to 3.5.
> >   Surprisingly, I can't reproduce locally even with maven. In my attempt
> to reproduce, SparkConnectProtoSuite was executed at third,
> SparkConnectStreamingQueryCacheSuite, and ExecuteEventsManagerSuite, and
> then SparkConnectProtoSuite. Maybe very specific to the environment, not
> just maven? My env: MBP M1 pro chip, MacOS 14.3.1, Openjdk 17.0.9. I used
> build/mvn (Maven 3.8.8).
> >   I'm not 100% sure this is something we should fail the release as it's
> a test only and sounds very environment dependent, but I'll respect your
> call on vote.
> >   Btw, looks like Rui also made a relevant fix via link (not to fix the
> failing test but to fix other issues), but this also wasn't ported back to
> 3.5. @Rui Wang Do you think this is a regression issue and warrants a new
> RC?
> > On Fri, Feb 16, 2024 at 11:38 AM Sean Owen  wrote:
> > Is anyone seeing this Spark Connect test failure? then again, I have
> some weird issue with this env that always fails 1 or 2 tests that nobody
> else can replicate.
> >   - Test observe *** FAILED ***
> >   == FAIL: Plans do not match ===
> >   !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS
> max_val#0, sum(id#0) AS sum(id)#0L], 0   CollectMetrics my_metric,
> [min(id#0) AS min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L],
> 44
> >+- LocalRelation , [id#0, name#0]
>+- LocalRelation , [id#0, name#0]
> (PlanTest.scala:179)
> >   On Thu, Feb 15, 2024 at 1:34 PM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
> > DISCLAIMER: RC for Apache Spark 3.5.1 starts with RC2 as I lately
> figured out doc generation issue after tagging RC1.
> >   Please vote on releasing the following candidate as Apache Spark
> version 3.5.1.
> >
> > The vote is open until February 18th 9AM (PST) and passes if a majority
> +1 PMC votes are cast, with
> > a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 3.5.1
> > [ ] -1 Do not release this package because ...
> >
> >

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-20 Thread Xiao Li
+1

Xiao

Cheng Pan  于2024年2月20日周二 04:59写道:

> +1 (non-binding)
>
> - Build successfully from source code.
> - Pass integration tests with Spark ClickHouse Connector[1]
>
> [1] https://github.com/housepower/spark-clickhouse-connector/pull/299
>
> Thanks,
> Cheng Pan
>
>
> > On Feb 20, 2024, at 10:56, Jungtaek Lim 
> wrote:
> >
> > Thanks Sean, let's continue the process for this RC.
> >
> > +1 (non-binding)
> >
> > - downloaded all files from URL
> > - checked signature
> > - extracted all archives
> > - ran all tests from source files in source archive file, via running
> "sbt clean test package" - Ubuntu 20.04.4 LTS, OpenJDK 17.0.9.
> >
> > Also bump to dev@ to encourage participation - looks like the timing is
> not good for US folks but let's see more days.
> >
> >
> > On Sat, Feb 17, 2024 at 1:49 AM Sean Owen  wrote:
> > Yeah let's get that fix in, but it seems to be a minor test only issue
> so should not block release.
> >
> > On Fri, Feb 16, 2024, 9:30 AM yangjie01  wrote:
> > Very sorry. When I was fixing `SPARK-45242 (
> https://github.com/apache/spark/pull/43594)`
> <https://github.com/apache/spark/pull/43594)>, I noticed that its
> `Affects Version` and `Fix Version` of SPARK-45242 were both 4.0, and I
> didn't realize that it had also been merged into branch-3.5, so I didn't
> advocate for SPARK-45357 to be backported to branch-3.5.
> >  As far as I know, the condition to trigger this test failure is: when
> using Maven to test the `connect` module, if  `sparkTestRelation` in
> `SparkConnectProtoSuite` is not the first `DataFrame` to be initialized,
> then the `id` of `sparkTestRelation` will no longer be 0. So, I think this
> is indeed related to the order in which Maven executes the test cases in
> the `connect` module.
> >  I have submitted a backport PR to branch-3.5, and if necessary, we can
> merge it to fix this test issue.
> >  Jie Yang
> >   发件人: Jungtaek Lim 
> > 日期: 2024年2月16日 星期五 22:15
> > 收件人: Sean Owen , Rui Wang 
> > 抄送: dev 
> > 主题: Re: [VOTE] Release Apache Spark 3.5.1 (RC2)
> >   I traced back relevant changes and got a sense of what happened.
> >   Yangjie figured out the issue via link. It's a tricky issue according
> to the comments from Yangjie - the test is dependent on ordering of
> execution for test suites. He said it does not fail in sbt, hence CI build
> couldn't catch it.
> > He fixed it via link, but we missed that the offending commit was also
> ported back to 3.5 as well, hence the fix wasn't ported back to 3.5.
> >   Surprisingly, I can't reproduce locally even with maven. In my attempt
> to reproduce, SparkConnectProtoSuite was executed at third,
> SparkConnectStreamingQueryCacheSuite, and ExecuteEventsManagerSuite, and
> then SparkConnectProtoSuite. Maybe very specific to the environment, not
> just maven? My env: MBP M1 pro chip, MacOS 14.3.1, Openjdk 17.0.9. I used
> build/mvn (Maven 3.8.8).
> >   I'm not 100% sure this is something we should fail the release as it's
> a test only and sounds very environment dependent, but I'll respect your
> call on vote.
> >   Btw, looks like Rui also made a relevant fix via link (not to fix the
> failing test but to fix other issues), but this also wasn't ported back to
> 3.5. @Rui Wang Do you think this is a regression issue and warrants a new
> RC?
> > On Fri, Feb 16, 2024 at 11:38 AM Sean Owen  wrote:
> > Is anyone seeing this Spark Connect test failure? then again, I have
> some weird issue with this env that always fails 1 or 2 tests that nobody
> else can replicate.
> >   - Test observe *** FAILED ***
> >   == FAIL: Plans do not match ===
> >   !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS
> max_val#0, sum(id#0) AS sum(id)#0L], 0   CollectMetrics my_metric,
> [min(id#0) AS min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L],
> 44
> >+- LocalRelation , [id#0, name#0]
>+- LocalRelation , [id#0, name#0]
> (PlanTest.scala:179)
> >   On Thu, Feb 15, 2024 at 1:34 PM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
> > DISCLAIMER: RC for Apache Spark 3.5.1 starts with RC2 as I lately
> figured out doc generation issue after tagging RC1.
> >   Please vote on releasing the following candidate as Apache Spark
> version 3.5.1.
> >
> > The vote is open until February 18th 9AM (PST) and passes if a majority
> +1 PMC votes are cast, with
> > a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 3.5.1
> > [ ] -1 Do not release this package because ...
> >
> > To l

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-20 Thread Cheng Pan
+1 (non-binding)

- Build successfully from source code.
- Pass integration tests with Spark ClickHouse Connector[1]

[1] https://github.com/housepower/spark-clickhouse-connector/pull/299

Thanks,
Cheng Pan


> On Feb 20, 2024, at 10:56, Jungtaek Lim  wrote:
> 
> Thanks Sean, let's continue the process for this RC.
> 
> +1 (non-binding)
> 
> - downloaded all files from URL
> - checked signature
> - extracted all archives
> - ran all tests from source files in source archive file, via running "sbt 
> clean test package" - Ubuntu 20.04.4 LTS, OpenJDK 17.0.9.
> 
> Also bump to dev@ to encourage participation - looks like the timing is not 
> good for US folks but let's see more days.
> 
> 
> On Sat, Feb 17, 2024 at 1:49 AM Sean Owen  wrote:
> Yeah let's get that fix in, but it seems to be a minor test only issue so 
> should not block release.
> 
> On Fri, Feb 16, 2024, 9:30 AM yangjie01  wrote:
> Very sorry. When I was fixing `SPARK-45242 
> (https://github.com/apache/spark/pull/43594)`, I noticed that its `Affects 
> Version` and `Fix Version` of SPARK-45242 were both 4.0, and I didn't realize 
> that it had also been merged into branch-3.5, so I didn't advocate for 
> SPARK-45357 to be backported to branch-3.5.
>  As far as I know, the condition to trigger this test failure is: when using 
> Maven to test the `connect` module, if  `sparkTestRelation` in 
> `SparkConnectProtoSuite` is not the first `DataFrame` to be initialized, then 
> the `id` of `sparkTestRelation` will no longer be 0. So, I think this is 
> indeed related to the order in which Maven executes the test cases in the 
> `connect` module.
>  I have submitted a backport PR to branch-3.5, and if necessary, we can merge 
> it to fix this test issue.
>  Jie Yang
>   发件人: Jungtaek Lim 
> 日期: 2024年2月16日 星期五 22:15
> 收件人: Sean Owen , Rui Wang 
> 抄送: dev 
> 主题: Re: [VOTE] Release Apache Spark 3.5.1 (RC2)
>   I traced back relevant changes and got a sense of what happened.
>   Yangjie figured out the issue via link. It's a tricky issue according to 
> the comments from Yangjie - the test is dependent on ordering of execution 
> for test suites. He said it does not fail in sbt, hence CI build couldn't 
> catch it.
> He fixed it via link, but we missed that the offending commit was also ported 
> back to 3.5 as well, hence the fix wasn't ported back to 3.5.
>   Surprisingly, I can't reproduce locally even with maven. In my attempt to 
> reproduce, SparkConnectProtoSuite was executed at third, 
> SparkConnectStreamingQueryCacheSuite, and ExecuteEventsManagerSuite, and then 
> SparkConnectProtoSuite. Maybe very specific to the environment, not just 
> maven? My env: MBP M1 pro chip, MacOS 14.3.1, Openjdk 17.0.9. I used 
> build/mvn (Maven 3.8.8).
>   I'm not 100% sure this is something we should fail the release as it's a 
> test only and sounds very environment dependent, but I'll respect your call 
> on vote.
>   Btw, looks like Rui also made a relevant fix via link (not to fix the 
> failing test but to fix other issues), but this also wasn't ported back to 
> 3.5. @Rui Wang Do you think this is a regression issue and warrants a new RC?
> On Fri, Feb 16, 2024 at 11:38 AM Sean Owen  wrote:
> Is anyone seeing this Spark Connect test failure? then again, I have some 
> weird issue with this env that always fails 1 or 2 tests that nobody else can 
> replicate. 
>   - Test observe *** FAILED ***
>   == FAIL: Plans do not match ===
>   !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS max_val#0, 
> sum(id#0) AS sum(id)#0L], 0   CollectMetrics my_metric, [min(id#0) AS 
> min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L], 44
>+- LocalRelation , [id#0, name#0]   
>   +- LocalRelation , [id#0, name#0] 
> (PlanTest.scala:179)
>   On Thu, Feb 15, 2024 at 1:34 PM Jungtaek Lim  
> wrote:
> DISCLAIMER: RC for Apache Spark 3.5.1 starts with RC2 as I lately figured out 
> doc generation issue after tagging RC1.
>   Please vote on releasing the following candidate as Apache Spark version 
> 3.5.1.
> 
> The vote is open until February 18th 9AM (PST) and passes if a majority +1 
> PMC votes are cast, with
> a minimum of 3 +1 votes.
> 
> [ ] +1 Release this package as Apache Spark 3.5.1
> [ ] -1 Do not release this package because ...
> 
> To learn more about Apache Spark, please see https://spark.apache.org/
> 
> The tag to be voted on is v3.5.1-rc2 (commit 
> fd86f85e181fc2dc0f50a096855acf83a6cc5d9c):
> https://github.com/apache/spark/tree/v3.5.1-rc2
> 
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-bi

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-19 Thread Wenchen Fan
+1, thanks for making the release!

On Sat, Feb 17, 2024 at 3:54 AM Sean Owen  wrote:

> Yeah let's get that fix in, but it seems to be a minor test only issue so
> should not block release.
>
> On Fri, Feb 16, 2024, 9:30 AM yangjie01  wrote:
>
>> Very sorry. When I was fixing `SPARK-45242 (
>> https://github.com/apache/spark/pull/43594)`
>> <https://github.com/apache/spark/pull/43594)>, I noticed that its
>> `Affects Version` and `Fix Version` of SPARK-45242 were both 4.0, and I
>> didn't realize that it had also been merged into branch-3.5, so I didn't
>> advocate for SPARK-45357 to be backported to branch-3.5.
>>
>>
>>
>> As far as I know, the condition to trigger this test failure is: when
>> using Maven to test the `connect` module, if  `sparkTestRelation` in
>> `SparkConnectProtoSuite` is not the first `DataFrame` to be initialized,
>> then the `id` of `sparkTestRelation` will no longer be 0. So, I think this
>> is indeed related to the order in which Maven executes the test cases in
>> the `connect` module.
>>
>>
>>
>> I have submitted a backport PR
>> <https://github.com/apache/spark/pull/45141> to branch-3.5, and if
>> necessary, we can merge it to fix this test issue.
>>
>>
>>
>> Jie Yang
>>
>>
>>
>> *发件人**: *Jungtaek Lim 
>> *日期**: *2024年2月16日 星期五 22:15
>> *收件人**: *Sean Owen , Rui Wang 
>> *抄送**: *dev 
>> *主题**: *Re: [VOTE] Release Apache Spark 3.5.1 (RC2)
>>
>>
>>
>> I traced back relevant changes and got a sense of what happened.
>>
>>
>>
>> Yangjie figured out the issue via link
>> <https://mailshield.baidu.com/check?q=8dOSfwXDFpe5HSp%2b%2bgCPsNQ52B7S7TAFG56Vj3tiFgMkCyOrQEGbg03AVWDX5bwwyIW7sZx3JZox3w8Jz1iw%2bPjaOZYmLWn2>.
>> It's a tricky issue according to the comments from Yangjie - the test is
>> dependent on ordering of execution for test suites. He said it does not
>> fail in sbt, hence CI build couldn't catch it.
>>
>> He fixed it via link
>> <https://mailshield.baidu.com/check?q=ojK3dg%2fDFf3xmQ8SPzsIou3EKaE1ZePctdB%2fUzhWmewnZb5chnQM1%2f8D1JDJnkxF>,
>> but we missed that the offending commit was also ported back to 3.5 as
>> well, hence the fix wasn't ported back to 3.5.
>>
>>
>>
>> Surprisingly, I can't reproduce locally even with maven. In my attempt to
>> reproduce, SparkConnectProtoSuite was executed at
>> third, SparkConnectStreamingQueryCacheSuite, and ExecuteEventsManagerSuite,
>> and then SparkConnectProtoSuite. Maybe very specific to the environment,
>> not just maven? My env: MBP M1 pro chip, MacOS 14.3.1, Openjdk 17.0.9. I
>> used build/mvn (Maven 3.8.8).
>>
>>
>>
>> I'm not 100% sure this is something we should fail the release as it's a
>> test only and sounds very environment dependent, but I'll respect your call
>> on vote.
>>
>>
>>
>> Btw, looks like Rui also made a relevant fix via link
>> <https://mailshield.baidu.com/check?q=TUbVzroxG%2fbi2P4qN0kbggzXuPzSN%2bKDoUFGhS9xMet8aXVw6EH0rMr1MKJqp2E2>
>>  (not
>> to fix the failing test but to fix other issues), but this also wasn't
>> ported back to 3.5. @Rui Wang  Do you think this
>> is a regression issue and warrants a new RC?
>>
>>
>>
>>
>>
>> On Fri, Feb 16, 2024 at 11:38 AM Sean Owen  wrote:
>>
>> Is anyone seeing this Spark Connect test failure? then again, I have some
>> weird issue with this env that always fails 1 or 2 tests that nobody else
>> can replicate.
>>
>>
>>
>> - Test observe *** FAILED ***
>>   == FAIL: Plans do not match ===
>>   !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS
>> max_val#0, sum(id#0) AS sum(id)#0L], 0   CollectMetrics my_metric,
>> [min(id#0) AS min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L],
>> 44
>>+- LocalRelation , [id#0, name#0]
>>   +- LocalRelation , [id#0, name#0]
>> (PlanTest.scala:179)
>>
>>
>>
>> On Thu, Feb 15, 2024 at 1:34 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>> DISCLAIMER: RC for Apache Spark 3.5.1 starts with RC2 as I lately figured
>> out doc generation issue after tagging RC1.
>>
>>
>>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.5.1.
>>
>> The vote is open until February 18th 9AM (PST) and passes if a majority
>> +1 PMC votes are cast, with
>> a minimum of 3 +1 votes.
>>
>> [ ] +

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-19 Thread Jungtaek Lim
Thanks Sean, let's continue the process for this RC.

+1 (non-binding)

- downloaded all files from URL
- checked signature
- extracted all archives
- ran all tests from source files in source archive file, via running "sbt
clean test package" - Ubuntu 20.04.4 LTS, OpenJDK 17.0.9.

Also bump to dev@ to encourage participation - looks like the timing is not
good for US folks but let's see more days.


On Sat, Feb 17, 2024 at 1:49 AM Sean Owen  wrote:

> Yeah let's get that fix in, but it seems to be a minor test only issue so
> should not block release.
>
> On Fri, Feb 16, 2024, 9:30 AM yangjie01  wrote:
>
>> Very sorry. When I was fixing `SPARK-45242 (
>> https://github.com/apache/spark/pull/43594)`
>> <https://github.com/apache/spark/pull/43594)>, I noticed that its
>> `Affects Version` and `Fix Version` of SPARK-45242 were both 4.0, and I
>> didn't realize that it had also been merged into branch-3.5, so I didn't
>> advocate for SPARK-45357 to be backported to branch-3.5.
>>
>>
>>
>> As far as I know, the condition to trigger this test failure is: when
>> using Maven to test the `connect` module, if  `sparkTestRelation` in
>> `SparkConnectProtoSuite` is not the first `DataFrame` to be initialized,
>> then the `id` of `sparkTestRelation` will no longer be 0. So, I think this
>> is indeed related to the order in which Maven executes the test cases in
>> the `connect` module.
>>
>>
>>
>> I have submitted a backport PR
>> <https://github.com/apache/spark/pull/45141> to branch-3.5, and if
>> necessary, we can merge it to fix this test issue.
>>
>>
>>
>> Jie Yang
>>
>>
>>
>> *发件人**: *Jungtaek Lim 
>> *日期**: *2024年2月16日 星期五 22:15
>> *收件人**: *Sean Owen , Rui Wang 
>> *抄送**: *dev 
>> *主题**: *Re: [VOTE] Release Apache Spark 3.5.1 (RC2)
>>
>>
>>
>> I traced back relevant changes and got a sense of what happened.
>>
>>
>>
>> Yangjie figured out the issue via link
>> <https://mailshield.baidu.com/check?q=8dOSfwXDFpe5HSp%2b%2bgCPsNQ52B7S7TAFG56Vj3tiFgMkCyOrQEGbg03AVWDX5bwwyIW7sZx3JZox3w8Jz1iw%2bPjaOZYmLWn2>.
>> It's a tricky issue according to the comments from Yangjie - the test is
>> dependent on ordering of execution for test suites. He said it does not
>> fail in sbt, hence CI build couldn't catch it.
>>
>> He fixed it via link
>> <https://mailshield.baidu.com/check?q=ojK3dg%2fDFf3xmQ8SPzsIou3EKaE1ZePctdB%2fUzhWmewnZb5chnQM1%2f8D1JDJnkxF>,
>> but we missed that the offending commit was also ported back to 3.5 as
>> well, hence the fix wasn't ported back to 3.5.
>>
>>
>>
>> Surprisingly, I can't reproduce locally even with maven. In my attempt to
>> reproduce, SparkConnectProtoSuite was executed at
>> third, SparkConnectStreamingQueryCacheSuite, and ExecuteEventsManagerSuite,
>> and then SparkConnectProtoSuite. Maybe very specific to the environment,
>> not just maven? My env: MBP M1 pro chip, MacOS 14.3.1, Openjdk 17.0.9. I
>> used build/mvn (Maven 3.8.8).
>>
>>
>>
>> I'm not 100% sure this is something we should fail the release as it's a
>> test only and sounds very environment dependent, but I'll respect your call
>> on vote.
>>
>>
>>
>> Btw, looks like Rui also made a relevant fix via link
>> <https://mailshield.baidu.com/check?q=TUbVzroxG%2fbi2P4qN0kbggzXuPzSN%2bKDoUFGhS9xMet8aXVw6EH0rMr1MKJqp2E2>
>>  (not
>> to fix the failing test but to fix other issues), but this also wasn't
>> ported back to 3.5. @Rui Wang  Do you think this
>> is a regression issue and warrants a new RC?
>>
>>
>>
>>
>>
>> On Fri, Feb 16, 2024 at 11:38 AM Sean Owen  wrote:
>>
>> Is anyone seeing this Spark Connect test failure? then again, I have some
>> weird issue with this env that always fails 1 or 2 tests that nobody else
>> can replicate.
>>
>>
>>
>> - Test observe *** FAILED ***
>>   == FAIL: Plans do not match ===
>>   !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS
>> max_val#0, sum(id#0) AS sum(id)#0L], 0   CollectMetrics my_metric,
>> [min(id#0) AS min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L],
>> 44
>>+- LocalRelation , [id#0, name#0]
>>   +- LocalRelation , [id#0, name#0]
>> (PlanTest.scala:179)
>>
>>
>>
>> On Thu, Feb 15, 2024 at 1:34 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>> DISCLAIMER: RC for Apache Spark 3.5.1 starts with RC2 as I lately figured
>> out doc gene

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-16 Thread Sean Owen
Yeah let's get that fix in, but it seems to be a minor test only issue so
should not block release.

On Fri, Feb 16, 2024, 9:30 AM yangjie01  wrote:

> Very sorry. When I was fixing `SPARK-45242 (
> https://github.com/apache/spark/pull/43594)`
> <https://github.com/apache/spark/pull/43594)>, I noticed that its
> `Affects Version` and `Fix Version` of SPARK-45242 were both 4.0, and I
> didn't realize that it had also been merged into branch-3.5, so I didn't
> advocate for SPARK-45357 to be backported to branch-3.5.
>
>
>
> As far as I know, the condition to trigger this test failure is: when
> using Maven to test the `connect` module, if  `sparkTestRelation` in
> `SparkConnectProtoSuite` is not the first `DataFrame` to be initialized,
> then the `id` of `sparkTestRelation` will no longer be 0. So, I think this
> is indeed related to the order in which Maven executes the test cases in
> the `connect` module.
>
>
>
> I have submitted a backport PR
> <https://github.com/apache/spark/pull/45141> to branch-3.5, and if
> necessary, we can merge it to fix this test issue.
>
>
>
> Jie Yang
>
>
>
> *发件人**: *Jungtaek Lim 
> *日期**: *2024年2月16日 星期五 22:15
> *收件人**: *Sean Owen , Rui Wang 
> *抄送**: *dev 
> *主题**: *Re: [VOTE] Release Apache Spark 3.5.1 (RC2)
>
>
>
> I traced back relevant changes and got a sense of what happened.
>
>
>
> Yangjie figured out the issue via link
> <https://mailshield.baidu.com/check?q=8dOSfwXDFpe5HSp%2b%2bgCPsNQ52B7S7TAFG56Vj3tiFgMkCyOrQEGbg03AVWDX5bwwyIW7sZx3JZox3w8Jz1iw%2bPjaOZYmLWn2>.
> It's a tricky issue according to the comments from Yangjie - the test is
> dependent on ordering of execution for test suites. He said it does not
> fail in sbt, hence CI build couldn't catch it.
>
> He fixed it via link
> <https://mailshield.baidu.com/check?q=ojK3dg%2fDFf3xmQ8SPzsIou3EKaE1ZePctdB%2fUzhWmewnZb5chnQM1%2f8D1JDJnkxF>,
> but we missed that the offending commit was also ported back to 3.5 as
> well, hence the fix wasn't ported back to 3.5.
>
>
>
> Surprisingly, I can't reproduce locally even with maven. In my attempt to
> reproduce, SparkConnectProtoSuite was executed at
> third, SparkConnectStreamingQueryCacheSuite, and ExecuteEventsManagerSuite,
> and then SparkConnectProtoSuite. Maybe very specific to the environment,
> not just maven? My env: MBP M1 pro chip, MacOS 14.3.1, Openjdk 17.0.9. I
> used build/mvn (Maven 3.8.8).
>
>
>
> I'm not 100% sure this is something we should fail the release as it's a
> test only and sounds very environment dependent, but I'll respect your call
> on vote.
>
>
>
> Btw, looks like Rui also made a relevant fix via link
> <https://mailshield.baidu.com/check?q=TUbVzroxG%2fbi2P4qN0kbggzXuPzSN%2bKDoUFGhS9xMet8aXVw6EH0rMr1MKJqp2E2>
>  (not
> to fix the failing test but to fix other issues), but this also wasn't
> ported back to 3.5. @Rui Wang  Do you think this is
> a regression issue and warrants a new RC?
>
>
>
>
>
> On Fri, Feb 16, 2024 at 11:38 AM Sean Owen  wrote:
>
> Is anyone seeing this Spark Connect test failure? then again, I have some
> weird issue with this env that always fails 1 or 2 tests that nobody else
> can replicate.
>
>
>
> - Test observe *** FAILED ***
>   == FAIL: Plans do not match ===
>   !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS
> max_val#0, sum(id#0) AS sum(id)#0L], 0   CollectMetrics my_metric,
> [min(id#0) AS min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L],
> 44
>+- LocalRelation , [id#0, name#0]
>   +- LocalRelation , [id#0, name#0]
> (PlanTest.scala:179)
>
>
>
> On Thu, Feb 15, 2024 at 1:34 PM Jungtaek Lim 
> wrote:
>
> DISCLAIMER: RC for Apache Spark 3.5.1 starts with RC2 as I lately figured
> out doc generation issue after tagging RC1.
>
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.5.1.
>
> The vote is open until February 18th 9AM (PST) and passes if a majority +1
> PMC votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.5.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
> <https://mailshield.baidu.com/check?q=iR6md5rYrz%2bpTPJlEXXlR6NN3aGjunZT0DADO3Pcgs0%3d>
>
> The tag to be voted on is v3.5.1-rc2 (commit
> fd86f85e181fc2dc0f50a096855acf83a6cc5d9c):
> https://github.com/apache/spark/tree/v3.5.1-rc2
> <https://mailshield.baidu.com/check?q=BMfFodF3wXGjeH1b9pbW8V4xeWam1vqNNCMtg1lcpC0d4WtLLiIr8UPiFKSwNMjbEy0AJw%3d%3d>
>
> The release files, including signatures, dige

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-16 Thread yangjie01
Very sorry. When I was fixing `SPARK-45242 
(https://github.com/apache/spark/pull/43594)`, I noticed that its `Affects 
Version` and `Fix Version` of SPARK-45242 were both 4.0, and I didn't realize 
that it had also been merged into branch-3.5, so I didn't advocate for 
SPARK-45357 to be backported to branch-3.5.

As far as I know, the condition to trigger this test failure is: when using 
Maven to test the `connect` module, if  `sparkTestRelation` in 
`SparkConnectProtoSuite` is not the first `DataFrame` to be initialized, then 
the `id` of `sparkTestRelation` will no longer be 0. So, I think this is indeed 
related to the order in which Maven executes the test cases in the `connect` 
module.

I have submitted a backport PR<https://github.com/apache/spark/pull/45141> to 
branch-3.5, and if necessary, we can merge it to fix this test issue.

Jie Yang

发件人: Jungtaek Lim 
日期: 2024年2月16日 星期五 22:15
收件人: Sean Owen , Rui Wang 
抄送: dev 
主题: Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

I traced back relevant changes and got a sense of what happened.

Yangjie figured out the issue via 
link<https://mailshield.baidu.com/check?q=8dOSfwXDFpe5HSp%2b%2bgCPsNQ52B7S7TAFG56Vj3tiFgMkCyOrQEGbg03AVWDX5bwwyIW7sZx3JZox3w8Jz1iw%2bPjaOZYmLWn2>.
 It's a tricky issue according to the comments from Yangjie - the test is 
dependent on ordering of execution for test suites. He said it does not fail in 
sbt, hence CI build couldn't catch it.
He fixed it via 
link<https://mailshield.baidu.com/check?q=ojK3dg%2fDFf3xmQ8SPzsIou3EKaE1ZePctdB%2fUzhWmewnZb5chnQM1%2f8D1JDJnkxF>,
 but we missed that the offending commit was also ported back to 3.5 as well, 
hence the fix wasn't ported back to 3.5.

Surprisingly, I can't reproduce locally even with maven. In my attempt to 
reproduce, SparkConnectProtoSuite was executed at third, 
SparkConnectStreamingQueryCacheSuite, and ExecuteEventsManagerSuite, and then 
SparkConnectProtoSuite. Maybe very specific to the environment, not just maven? 
My env: MBP M1 pro chip, MacOS 14.3.1, Openjdk 17.0.9. I used build/mvn (Maven 
3.8.8).

I'm not 100% sure this is something we should fail the release as it's a test 
only and sounds very environment dependent, but I'll respect your call on vote.

Btw, looks like Rui also made a relevant fix via 
link<https://mailshield.baidu.com/check?q=TUbVzroxG%2fbi2P4qN0kbggzXuPzSN%2bKDoUFGhS9xMet8aXVw6EH0rMr1MKJqp2E2>
 (not to fix the failing test but to fix other issues), but this also wasn't 
ported back to 3.5. @Rui Wang<mailto:amaliu...@apache.org> Do you think this is 
a regression issue and warrants a new RC?


On Fri, Feb 16, 2024 at 11:38 AM Sean Owen 
mailto:sro...@gmail.com>> wrote:
Is anyone seeing this Spark Connect test failure? then again, I have some weird 
issue with this env that always fails 1 or 2 tests that nobody else can 
replicate.

- Test observe *** FAILED ***
  == FAIL: Plans do not match ===
  !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS max_val#0, 
sum(id#0) AS sum(id)#0L], 0   CollectMetrics my_metric, [min(id#0) AS 
min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L], 44
   +- LocalRelation , [id#0, name#0] 
+- LocalRelation , [id#0, name#0] 
(PlanTest.scala:179)

On Thu, Feb 15, 2024 at 1:34 PM Jungtaek Lim 
mailto:kabhwan.opensou...@gmail.com>> wrote:
DISCLAIMER: RC for Apache Spark 3.5.1 starts with RC2 as I lately figured out 
doc generation issue after tagging RC1.

Please vote on releasing the following candidate as Apache Spark version 3.5.1.

The vote is open until February 18th 9AM (PST) and passes if a majority +1 PMC 
votes are cast, with
a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.5.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see 
https://spark.apache.org/<https://mailshield.baidu.com/check?q=iR6md5rYrz%2bpTPJlEXXlR6NN3aGjunZT0DADO3Pcgs0%3d>

The tag to be voted on is v3.5.1-rc2 (commit 
fd86f85e181fc2dc0f50a096855acf83a6cc5d9c):
https://github.com/apache/spark/tree/v3.5.1-rc2<https://mailshield.baidu.com/check?q=BMfFodF3wXGjeH1b9pbW8V4xeWam1vqNNCMtg1lcpC0d4WtLLiIr8UPiFKSwNMjbEy0AJw%3d%3d>

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-bin/<https://mailshield.baidu.com/check?q=GisJJtraQY1N6Eyahj4wgpwh0wps%2bZC4JtMrCvefk0scRi8wuiCglswMgKTAct5KKjhc%2fw%2f2NWCY4YCv2NIWVg%3d%3d>

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS<https://mailshield.baidu.com/check?q=E6fHbSXEWw02TTJBpc3bfA9mi7ea0YiWcNHkm%2fDJxwlaWinGnMdaoO1PahHhgj00vKwcbElpuHA%3d>

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1452/<https://mailshield.baidu.com/check?q=buXpvEpH6X6T3RyvYe2VQXDD5HPLWSOBI0hXYHpxkBXBL

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-16 Thread Jungtaek Lim
I traced back relevant changes and got a sense of what happened.

Yangjie figured out the issue via link
. It's a
tricky issue according to the comments from Yangjie - the test is dependent
on ordering of execution for test suites. He said it does not fail in sbt,
hence CI build couldn't catch it.
He fixed it via link , but we
missed that the offending commit was also ported back to 3.5 as well, hence
the fix wasn't ported back to 3.5.

Surprisingly, I can't reproduce locally even with maven. In my attempt to
reproduce, SparkConnectProtoSuite was executed at
third, SparkConnectStreamingQueryCacheSuite, and ExecuteEventsManagerSuite,
and then SparkConnectProtoSuite. Maybe very specific to the environment,
not just maven? My env: MBP M1 pro chip, MacOS 14.3.1, Openjdk 17.0.9. I
used build/mvn (Maven 3.8.8).

I'm not 100% sure this is something we should fail the release as it's a
test only and sounds very environment dependent, but I'll respect your call
on vote.

Btw, looks like Rui also made a relevant fix via link
 (not to fix the failing test
but to fix other issues), but this also wasn't ported back to 3.5. @Rui Wang
 Do you think this is a regression issue and warrants
a new RC?


On Fri, Feb 16, 2024 at 11:38 AM Sean Owen  wrote:

> Is anyone seeing this Spark Connect test failure? then again, I have some
> weird issue with this env that always fails 1 or 2 tests that nobody else
> can replicate.
>
> - Test observe *** FAILED ***
>   == FAIL: Plans do not match ===
>   !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS
> max_val#0, sum(id#0) AS sum(id)#0L], 0   CollectMetrics my_metric,
> [min(id#0) AS min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L],
> 44
>+- LocalRelation , [id#0, name#0]
>   +- LocalRelation , [id#0, name#0]
> (PlanTest.scala:179)
>
> On Thu, Feb 15, 2024 at 1:34 PM Jungtaek Lim 
> wrote:
>
>> DISCLAIMER: RC for Apache Spark 3.5.1 starts with RC2 as I lately figured
>> out doc generation issue after tagging RC1.
>>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.5.1.
>>
>> The vote is open until February 18th 9AM (PST) and passes if a majority
>> +1 PMC votes are cast, with
>> a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.5.1
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>>
>> The tag to be voted on is v3.5.1-rc2 (commit
>> fd86f85e181fc2dc0f50a096855acf83a6cc5d9c):
>> https://github.com/apache/spark/tree/v3.5.1-rc2
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1452/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-docs/
>>
>> The list of bug fixes going into 3.5.1 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12353495
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC via "pip install
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-bin/pyspark-3.5.1.tar.gz
>> "
>> and see if anything important breaks.
>> In the Java/Scala, you can add the staging repository to your projects
>> resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.5.1?
>> ===
>>
>> The current list of open tickets targeted at 3.5.1 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.5.1
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, 

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-15 Thread Sean Owen
Is anyone seeing this Spark Connect test failure? then again, I have some
weird issue with this env that always fails 1 or 2 tests that nobody else
can replicate.

- Test observe *** FAILED ***
  == FAIL: Plans do not match ===
  !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS
max_val#0, sum(id#0) AS sum(id)#0L], 0   CollectMetrics my_metric,
[min(id#0) AS min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L],
44
   +- LocalRelation , [id#0, name#0]
+- LocalRelation , [id#0, name#0]
(PlanTest.scala:179)

On Thu, Feb 15, 2024 at 1:34 PM Jungtaek Lim 
wrote:

> DISCLAIMER: RC for Apache Spark 3.5.1 starts with RC2 as I lately figured
> out doc generation issue after tagging RC1.
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.5.1.
>
> The vote is open until February 18th 9AM (PST) and passes if a majority +1
> PMC votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.5.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v3.5.1-rc2 (commit
> fd86f85e181fc2dc0f50a096855acf83a6cc5d9c):
> https://github.com/apache/spark/tree/v3.5.1-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1452/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-docs/
>
> The list of bug fixes going into 3.5.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12353495
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC via "pip install
> https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-bin/pyspark-3.5.1.tar.gz
> "
> and see if anything important breaks.
> In the Java/Scala, you can add the staging repository to your projects
> resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.5.1?
> ===
>
> The current list of open tickets targeted at 3.5.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.5.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


[VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-15 Thread Jungtaek Lim
DISCLAIMER: RC for Apache Spark 3.5.1 starts with RC2 as I lately figured
out doc generation issue after tagging RC1.

Please vote on releasing the following candidate as Apache Spark version
3.5.1.

The vote is open until February 18th 9AM (PST) and passes if a majority +1
PMC votes are cast, with
a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.5.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see https://spark.apache.org/

The tag to be voted on is v3.5.1-rc2 (commit
fd86f85e181fc2dc0f50a096855acf83a6cc5d9c):
https://github.com/apache/spark/tree/v3.5.1-rc2

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1452/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-docs/

The list of bug fixes going into 3.5.1 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12353495

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC via "pip install
https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-bin/pyspark-3.5.1.tar.gz
"
and see if anything important breaks.
In the Java/Scala, you can add the staging repository to your projects
resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.5.1?
===

The current list of open tickets targeted at 3.5.1 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.5.1

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.