Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-23 Thread Gengliang Wang
Hi All,

Thanks for the votes and suggestions!
Because of the issues above and SPARK-36782
, I decide to build RC4
and start new votes now.


On Wed, Sep 22, 2021 at 10:18 AM Venkatakrishnan Sowrirajan <
vsowr...@asu.edu> wrote:

> Yes that's correct, the failure is observed with both Hadoop-2.7 as well
> as Hadoop-2.10 (internal use)
>
> On Tue, Sep 21, 2021, 7:15 PM Mridul Muralidharan 
> wrote:
>
>> The failure I observed looks the same as what Venkat mentioned, lz4 tests
>> in FileSuite in core were failing with hadoop-2.7 profile.
>>
>> Regards,
>> Mridul
>>
>> On Tue, Sep 21, 2021 at 7:44 PM Chao Sun  wrote:
>>
>>> Hi Venkata, I'm not aware of the FileSuite test failures. In fact I just
>>> tried it locally on the master branch and the tests are all passing. Could
>>> you provide more details?
>>>
>>> The reason we want to disable the LZ4 test is because it requires the
>>> native LZ4 library when running with Hadoop 2.x, which the Spark CI doesn't
>>> have.
>>>
>>> On Tue, Sep 21, 2021 at 3:46 PM Venkatakrishnan Sowrirajan <
>>> vsowr...@asu.edu> wrote:
>>>
 Hi Chao,

 But there are tests in core as well failing. For
 eg: org.apache.spark.FileSuite But these tests are passing in 3.1, why do
 you think we should disable these tests for hadoop version < 3.x?

 Regards
 Venkata krishnan


 On Tue, Sep 21, 2021 at 3:33 PM Chao Sun  wrote:

> I just created SPARK-36820 for the above LZ4 test issue. Will post a
> PR there soon.
>
> On Tue, Sep 21, 2021 at 2:05 PM Chao Sun  wrote:
>
>> Mridul, is the LZ4 failure about Parquet? I think Parquet currently
>> uses Hadoop compression codec while Hadoop 2.7 still depends on native 
>> lib
>> for the LZ4. Maybe we should run the test only for Hadoop 3.2 profile.
>>
>> On Tue, Sep 21, 2021 at 10:08 AM Mridul Muralidharan <
>> mri...@gmail.com> wrote:
>>
>>>
>>> Signatures, digests, etc check out fine.
>>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes,
>>> this worked fine.
>>>
>>> I found that including "-Phadoop-2.7" failed on lz4 tests ("native
>>> lz4 library not available").
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Tue, Sep 21, 2021 at 10:18 AM Gengliang Wang 
>>> wrote:
>>>
 To Stephen: Thanks for pointing that out. I agree with that.
 To Sean: I made a PR
 
  to
 remove the test dependency so that we can start RC4 ASAP.

 Gengliang

 On Tue, Sep 21, 2021 at 8:14 PM Sean Owen  wrote:

> Hm yeah I tend to agree. See
> https://github.com/apache/spark/pull/33912
> 
> This _is_ a test-only dependency which makes it less of an issue.
> I'm guessing it's not in Maven as it's a small one-off utility; we
> _could_ just inline the ~100 lines of code in test code instead?
>
> On Tue, Sep 21, 2021 at 12:33 AM Stephen Coy
>  wrote:
>
>> Hi there,
>>
>> I was going to -1 this because of the
>> com.github.rdblue:brotli-codec:0.1.1 dependency, which is not 
>> available on
>> Maven Central, and therefore is not available from our repository 
>> manager
>> (Nexus).
>>
>> Historically  most places I have worked have avoided other public
>> maven repositories because they are not well curated. i.e artifacts 
>> with
>> the same GAV have been known to change over time, which never 
>> happens with
>> Maven Central.
>>
>> I know that I can address this by changing my settings.xml file.
>>
>> Anyway, I can see this biting other people so I thought that I
>> would mention it.
>>
>> Steve C
>>
>> On 19 Sep 2021, at 1:18 pm, Gengliang Wang 
>> wrote:
>>
>> Please vote on releasing the following candidate as
>> Apache Spark version 3.2.0.
>>
>> The vote is open until 11:59pm Pacific time September 24 and
>> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 
>> votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.2.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see
>> http://spark.apache.org/
>> 

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Venkatakrishnan Sowrirajan
Yes that's correct, the failure is observed with both Hadoop-2.7 as well as
Hadoop-2.10 (internal use)

On Tue, Sep 21, 2021, 7:15 PM Mridul Muralidharan  wrote:

> The failure I observed looks the same as what Venkat mentioned, lz4 tests
> in FileSuite in core were failing with hadoop-2.7 profile.
>
> Regards,
> Mridul
>
> On Tue, Sep 21, 2021 at 7:44 PM Chao Sun  wrote:
>
>> Hi Venkata, I'm not aware of the FileSuite test failures. In fact I just
>> tried it locally on the master branch and the tests are all passing. Could
>> you provide more details?
>>
>> The reason we want to disable the LZ4 test is because it requires the
>> native LZ4 library when running with Hadoop 2.x, which the Spark CI doesn't
>> have.
>>
>> On Tue, Sep 21, 2021 at 3:46 PM Venkatakrishnan Sowrirajan <
>> vsowr...@asu.edu> wrote:
>>
>>> Hi Chao,
>>>
>>> But there are tests in core as well failing. For
>>> eg: org.apache.spark.FileSuite But these tests are passing in 3.1, why do
>>> you think we should disable these tests for hadoop version < 3.x?
>>>
>>> Regards
>>> Venkata krishnan
>>>
>>>
>>> On Tue, Sep 21, 2021 at 3:33 PM Chao Sun  wrote:
>>>
 I just created SPARK-36820 for the above LZ4 test issue. Will post a PR
 there soon.

 On Tue, Sep 21, 2021 at 2:05 PM Chao Sun  wrote:

> Mridul, is the LZ4 failure about Parquet? I think Parquet currently
> uses Hadoop compression codec while Hadoop 2.7 still depends on native lib
> for the LZ4. Maybe we should run the test only for Hadoop 3.2 profile.
>
> On Tue, Sep 21, 2021 at 10:08 AM Mridul Muralidharan 
> wrote:
>
>>
>> Signatures, digests, etc check out fine.
>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes,
>> this worked fine.
>>
>> I found that including "-Phadoop-2.7" failed on lz4 tests ("native
>> lz4 library not available").
>>
>> Regards,
>> Mridul
>>
>> On Tue, Sep 21, 2021 at 10:18 AM Gengliang Wang 
>> wrote:
>>
>>> To Stephen: Thanks for pointing that out. I agree with that.
>>> To Sean: I made a PR
>>> 
>>>  to
>>> remove the test dependency so that we can start RC4 ASAP.
>>>
>>> Gengliang
>>>
>>> On Tue, Sep 21, 2021 at 8:14 PM Sean Owen  wrote:
>>>
 Hm yeah I tend to agree. See
 https://github.com/apache/spark/pull/33912
 
 This _is_ a test-only dependency which makes it less of an issue.
 I'm guessing it's not in Maven as it's a small one-off utility; we
 _could_ just inline the ~100 lines of code in test code instead?

 On Tue, Sep 21, 2021 at 12:33 AM Stephen Coy
  wrote:

> Hi there,
>
> I was going to -1 this because of the
> com.github.rdblue:brotli-codec:0.1.1 dependency, which is not 
> available on
> Maven Central, and therefore is not available from our repository 
> manager
> (Nexus).
>
> Historically  most places I have worked have avoided other public
> maven repositories because they are not well curated. i.e artifacts 
> with
> the same GAV have been known to change over time, which never happens 
> with
> Maven Central.
>
> I know that I can address this by changing my settings.xml file.
>
> Anyway, I can see this biting other people so I thought that I
> would mention it.
>
> Steve C
>
> On 19 Sep 2021, at 1:18 pm, Gengliang Wang 
> wrote:
>
> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 24 and
> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 
> votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
> 
>
> The tag to be voted on is v3.2.0-rc3 

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Mridul Muralidharan
The failure I observed looks the same as what Venkat mentioned, lz4 tests
in FileSuite in core were failing with hadoop-2.7 profile.

Regards,
Mridul

On Tue, Sep 21, 2021 at 7:44 PM Chao Sun  wrote:

> Hi Venkata, I'm not aware of the FileSuite test failures. In fact I just
> tried it locally on the master branch and the tests are all passing. Could
> you provide more details?
>
> The reason we want to disable the LZ4 test is because it requires the
> native LZ4 library when running with Hadoop 2.x, which the Spark CI doesn't
> have.
>
> On Tue, Sep 21, 2021 at 3:46 PM Venkatakrishnan Sowrirajan <
> vsowr...@asu.edu> wrote:
>
>> Hi Chao,
>>
>> But there are tests in core as well failing. For
>> eg: org.apache.spark.FileSuite But these tests are passing in 3.1, why do
>> you think we should disable these tests for hadoop version < 3.x?
>>
>> Regards
>> Venkata krishnan
>>
>>
>> On Tue, Sep 21, 2021 at 3:33 PM Chao Sun  wrote:
>>
>>> I just created SPARK-36820 for the above LZ4 test issue. Will post a PR
>>> there soon.
>>>
>>> On Tue, Sep 21, 2021 at 2:05 PM Chao Sun  wrote:
>>>
 Mridul, is the LZ4 failure about Parquet? I think Parquet currently
 uses Hadoop compression codec while Hadoop 2.7 still depends on native lib
 for the LZ4. Maybe we should run the test only for Hadoop 3.2 profile.

 On Tue, Sep 21, 2021 at 10:08 AM Mridul Muralidharan 
 wrote:

>
> Signatures, digests, etc check out fine.
> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes,
> this worked fine.
>
> I found that including "-Phadoop-2.7" failed on lz4 tests ("native lz4
> library not available").
>
> Regards,
> Mridul
>
> On Tue, Sep 21, 2021 at 10:18 AM Gengliang Wang 
> wrote:
>
>> To Stephen: Thanks for pointing that out. I agree with that.
>> To Sean: I made a PR
>> 
>>  to
>> remove the test dependency so that we can start RC4 ASAP.
>>
>> Gengliang
>>
>> On Tue, Sep 21, 2021 at 8:14 PM Sean Owen  wrote:
>>
>>> Hm yeah I tend to agree. See
>>> https://github.com/apache/spark/pull/33912
>>> 
>>> This _is_ a test-only dependency which makes it less of an issue.
>>> I'm guessing it's not in Maven as it's a small one-off utility; we
>>> _could_ just inline the ~100 lines of code in test code instead?
>>>
>>> On Tue, Sep 21, 2021 at 12:33 AM Stephen Coy
>>>  wrote:
>>>
 Hi there,

 I was going to -1 this because of the
 com.github.rdblue:brotli-codec:0.1.1 dependency, which is not 
 available on
 Maven Central, and therefore is not available from our repository 
 manager
 (Nexus).

 Historically  most places I have worked have avoided other public
 maven repositories because they are not well curated. i.e artifacts 
 with
 the same GAV have been known to change over time, which never happens 
 with
 Maven Central.

 I know that I can address this by changing my settings.xml file.

 Anyway, I can see this biting other people so I thought that I
 would mention it.

 Steve C

 On 19 Sep 2021, at 1:18 pm, Gengliang Wang 
 wrote:

 Please vote on releasing the following candidate as
 Apache Spark version 3.2.0.

 The vote is open until 11:59pm Pacific time September 24 and passes
 if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.2.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/
 

 The tag to be voted on is v3.2.0-rc3 (commit
 96044e97353a079d3a7233ed3795ca82f3d9a101):
 https://github.com/apache/spark/tree/v3.2.0-rc3
 

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Chao Sun
Hi Venkata, I'm not aware of the FileSuite test failures. In fact I just
tried it locally on the master branch and the tests are all passing. Could
you provide more details?

The reason we want to disable the LZ4 test is because it requires the
native LZ4 library when running with Hadoop 2.x, which the Spark CI doesn't
have.

On Tue, Sep 21, 2021 at 3:46 PM Venkatakrishnan Sowrirajan 
wrote:

> Hi Chao,
>
> But there are tests in core as well failing. For
> eg: org.apache.spark.FileSuite But these tests are passing in 3.1, why do
> you think we should disable these tests for hadoop version < 3.x?
>
> Regards
> Venkata krishnan
>
>
> On Tue, Sep 21, 2021 at 3:33 PM Chao Sun  wrote:
>
>> I just created SPARK-36820 for the above LZ4 test issue. Will post a PR
>> there soon.
>>
>> On Tue, Sep 21, 2021 at 2:05 PM Chao Sun  wrote:
>>
>>> Mridul, is the LZ4 failure about Parquet? I think Parquet currently uses
>>> Hadoop compression codec while Hadoop 2.7 still depends on native lib for
>>> the LZ4. Maybe we should run the test only for Hadoop 3.2 profile.
>>>
>>> On Tue, Sep 21, 2021 at 10:08 AM Mridul Muralidharan 
>>> wrote:
>>>

 Signatures, digests, etc check out fine.
 Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes, this
 worked fine.

 I found that including "-Phadoop-2.7" failed on lz4 tests ("native lz4
 library not available").

 Regards,
 Mridul

 On Tue, Sep 21, 2021 at 10:18 AM Gengliang Wang 
 wrote:

> To Stephen: Thanks for pointing that out. I agree with that.
> To Sean: I made a PR
> 
>  to
> remove the test dependency so that we can start RC4 ASAP.
>
> Gengliang
>
> On Tue, Sep 21, 2021 at 8:14 PM Sean Owen  wrote:
>
>> Hm yeah I tend to agree. See
>> https://github.com/apache/spark/pull/33912
>> 
>> This _is_ a test-only dependency which makes it less of an issue.
>> I'm guessing it's not in Maven as it's a small one-off utility; we
>> _could_ just inline the ~100 lines of code in test code instead?
>>
>> On Tue, Sep 21, 2021 at 12:33 AM Stephen Coy
>>  wrote:
>>
>>> Hi there,
>>>
>>> I was going to -1 this because of the
>>> com.github.rdblue:brotli-codec:0.1.1 dependency, which is not available 
>>> on
>>> Maven Central, and therefore is not available from our repository 
>>> manager
>>> (Nexus).
>>>
>>> Historically  most places I have worked have avoided other public
>>> maven repositories because they are not well curated. i.e artifacts with
>>> the same GAV have been known to change over time, which never happens 
>>> with
>>> Maven Central.
>>>
>>> I know that I can address this by changing my settings.xml file.
>>>
>>> Anyway, I can see this biting other people so I thought that I would
>>> mention it.
>>>
>>> Steve C
>>>
>>> On 19 Sep 2021, at 1:18 pm, Gengliang Wang  wrote:
>>>
>>> Please vote on releasing the following candidate as
>>> Apache Spark version 3.2.0.
>>>
>>> The vote is open until 11:59pm Pacific time September 24 and passes
>>> if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.2.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see
>>> http://spark.apache.org/
>>> 
>>>
>>> The tag to be voted on is v3.2.0-rc3 (commit
>>> 96044e97353a079d3a7233ed3795ca82f3d9a101):
>>> https://github.com/apache/spark/tree/v3.2.0-rc3
>>> 

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Venkatakrishnan Sowrirajan
Hi Chao,

But there are tests in core as well failing. For
eg: org.apache.spark.FileSuite But these tests are passing in 3.1, why do
you think we should disable these tests for hadoop version < 3.x?

Regards
Venkata krishnan


On Tue, Sep 21, 2021 at 3:33 PM Chao Sun  wrote:

> I just created SPARK-36820 for the above LZ4 test issue. Will post a PR
> there soon.
>
> On Tue, Sep 21, 2021 at 2:05 PM Chao Sun  wrote:
>
>> Mridul, is the LZ4 failure about Parquet? I think Parquet currently uses
>> Hadoop compression codec while Hadoop 2.7 still depends on native lib for
>> the LZ4. Maybe we should run the test only for Hadoop 3.2 profile.
>>
>> On Tue, Sep 21, 2021 at 10:08 AM Mridul Muralidharan 
>> wrote:
>>
>>>
>>> Signatures, digests, etc check out fine.
>>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes, this
>>> worked fine.
>>>
>>> I found that including "-Phadoop-2.7" failed on lz4 tests ("native lz4
>>> library not available").
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Tue, Sep 21, 2021 at 10:18 AM Gengliang Wang 
>>> wrote:
>>>
 To Stephen: Thanks for pointing that out. I agree with that.
 To Sean: I made a PR
 
  to
 remove the test dependency so that we can start RC4 ASAP.

 Gengliang

 On Tue, Sep 21, 2021 at 8:14 PM Sean Owen  wrote:

> Hm yeah I tend to agree. See
> https://github.com/apache/spark/pull/33912
> 
> This _is_ a test-only dependency which makes it less of an issue.
> I'm guessing it's not in Maven as it's a small one-off utility; we
> _could_ just inline the ~100 lines of code in test code instead?
>
> On Tue, Sep 21, 2021 at 12:33 AM Stephen Coy
>  wrote:
>
>> Hi there,
>>
>> I was going to -1 this because of the
>> com.github.rdblue:brotli-codec:0.1.1 dependency, which is not available 
>> on
>> Maven Central, and therefore is not available from our repository manager
>> (Nexus).
>>
>> Historically  most places I have worked have avoided other public
>> maven repositories because they are not well curated. i.e artifacts with
>> the same GAV have been known to change over time, which never happens 
>> with
>> Maven Central.
>>
>> I know that I can address this by changing my settings.xml file.
>>
>> Anyway, I can see this biting other people so I thought that I would
>> mention it.
>>
>> Steve C
>>
>> On 19 Sep 2021, at 1:18 pm, Gengliang Wang  wrote:
>>
>> Please vote on releasing the following candidate as
>> Apache Spark version 3.2.0.
>>
>> The vote is open until 11:59pm Pacific time September 24 and passes
>> if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.2.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>> 
>>
>> The tag to be voted on is v3.2.0-rc3 (commit
>> 96044e97353a079d3a7233ed3795ca82f3d9a101):
>> https://github.com/apache/spark/tree/v3.2.0-rc3
>> 
>>
>> The release files, including signatures, digests, etc. can be found
>> at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/
>> 

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Chao Sun
I just created SPARK-36820 for the above LZ4 test issue. Will post a PR
there soon.

On Tue, Sep 21, 2021 at 2:05 PM Chao Sun  wrote:

> Mridul, is the LZ4 failure about Parquet? I think Parquet currently uses
> Hadoop compression codec while Hadoop 2.7 still depends on native lib for
> the LZ4. Maybe we should run the test only for Hadoop 3.2 profile.
>
> On Tue, Sep 21, 2021 at 10:08 AM Mridul Muralidharan 
> wrote:
>
>>
>> Signatures, digests, etc check out fine.
>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes, this
>> worked fine.
>>
>> I found that including "-Phadoop-2.7" failed on lz4 tests ("native lz4
>> library not available").
>>
>> Regards,
>> Mridul
>>
>> On Tue, Sep 21, 2021 at 10:18 AM Gengliang Wang  wrote:
>>
>>> To Stephen: Thanks for pointing that out. I agree with that.
>>> To Sean: I made a PR  to
>>> remove the test dependency so that we can start RC4 ASAP.
>>>
>>> Gengliang
>>>
>>> On Tue, Sep 21, 2021 at 8:14 PM Sean Owen  wrote:
>>>
 Hm yeah I tend to agree. See https://github.com/apache/spark/pull/33912
 This _is_ a test-only dependency which makes it less of an issue.
 I'm guessing it's not in Maven as it's a small one-off utility; we
 _could_ just inline the ~100 lines of code in test code instead?

 On Tue, Sep 21, 2021 at 12:33 AM Stephen Coy
  wrote:

> Hi there,
>
> I was going to -1 this because of the
> com.github.rdblue:brotli-codec:0.1.1 dependency, which is not available on
> Maven Central, and therefore is not available from our repository manager
> (Nexus).
>
> Historically  most places I have worked have avoided other public
> maven repositories because they are not well curated. i.e artifacts with
> the same GAV have been known to change over time, which never happens with
> Maven Central.
>
> I know that I can address this by changing my settings.xml file.
>
> Anyway, I can see this biting other people so I thought that I would
> mention it.
>
> Steve C
>
> On 19 Sep 2021, at 1:18 pm, Gengliang Wang  wrote:
>
> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 24 and passes if
> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
> 
>
> The tag to be voted on is v3.2.0-rc3 (commit
> 96044e97353a079d3a7233ed3795ca82f3d9a101):
> https://github.com/apache/spark/tree/v3.2.0-rc3
> 
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/
> 
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
> 
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1390
> 

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Chao Sun
Mridul, is the LZ4 failure about Parquet? I think Parquet currently uses
Hadoop compression codec while Hadoop 2.7 still depends on native lib for
the LZ4. Maybe we should run the test only for Hadoop 3.2 profile.

On Tue, Sep 21, 2021 at 10:08 AM Mridul Muralidharan 
wrote:

>
> Signatures, digests, etc check out fine.
> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes, this
> worked fine.
>
> I found that including "-Phadoop-2.7" failed on lz4 tests ("native lz4
> library not available").
>
> Regards,
> Mridul
>
> On Tue, Sep 21, 2021 at 10:18 AM Gengliang Wang  wrote:
>
>> To Stephen: Thanks for pointing that out. I agree with that.
>> To Sean: I made a PR  to
>> remove the test dependency so that we can start RC4 ASAP.
>>
>> Gengliang
>>
>> On Tue, Sep 21, 2021 at 8:14 PM Sean Owen  wrote:
>>
>>> Hm yeah I tend to agree. See https://github.com/apache/spark/pull/33912
>>> This _is_ a test-only dependency which makes it less of an issue.
>>> I'm guessing it's not in Maven as it's a small one-off utility; we
>>> _could_ just inline the ~100 lines of code in test code instead?
>>>
>>> On Tue, Sep 21, 2021 at 12:33 AM Stephen Coy
>>>  wrote:
>>>
 Hi there,

 I was going to -1 this because of the
 com.github.rdblue:brotli-codec:0.1.1 dependency, which is not available on
 Maven Central, and therefore is not available from our repository manager
 (Nexus).

 Historically  most places I have worked have avoided other public maven
 repositories because they are not well curated. i.e artifacts with the same
 GAV have been known to change over time, which never happens with Maven
 Central.

 I know that I can address this by changing my settings.xml file.

 Anyway, I can see this biting other people so I thought that I would
 mention it.

 Steve C

 On 19 Sep 2021, at 1:18 pm, Gengliang Wang  wrote:

 Please vote on releasing the following candidate as
 Apache Spark version 3.2.0.

 The vote is open until 11:59pm Pacific time September 24 and passes if
 a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.2.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/
 

 The tag to be voted on is v3.2.0-rc3 (commit
 96044e97353a079d3a7233ed3795ca82f3d9a101):
 https://github.com/apache/spark/tree/v3.2.0-rc3
 

 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/
 

 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS
 

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1390
 

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Mridul Muralidharan
Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes, this
worked fine.

I found that including "-Phadoop-2.7" failed on lz4 tests ("native lz4
library not available").

Regards,
Mridul

On Tue, Sep 21, 2021 at 10:18 AM Gengliang Wang  wrote:

> To Stephen: Thanks for pointing that out. I agree with that.
> To Sean: I made a PR  to
> remove the test dependency so that we can start RC4 ASAP.
>
> Gengliang
>
> On Tue, Sep 21, 2021 at 8:14 PM Sean Owen  wrote:
>
>> Hm yeah I tend to agree. See https://github.com/apache/spark/pull/33912
>> This _is_ a test-only dependency which makes it less of an issue.
>> I'm guessing it's not in Maven as it's a small one-off utility; we
>> _could_ just inline the ~100 lines of code in test code instead?
>>
>> On Tue, Sep 21, 2021 at 12:33 AM Stephen Coy
>>  wrote:
>>
>>> Hi there,
>>>
>>> I was going to -1 this because of the
>>> com.github.rdblue:brotli-codec:0.1.1 dependency, which is not available on
>>> Maven Central, and therefore is not available from our repository manager
>>> (Nexus).
>>>
>>> Historically  most places I have worked have avoided other public maven
>>> repositories because they are not well curated. i.e artifacts with the same
>>> GAV have been known to change over time, which never happens with Maven
>>> Central.
>>>
>>> I know that I can address this by changing my settings.xml file.
>>>
>>> Anyway, I can see this biting other people so I thought that I would
>>> mention it.
>>>
>>> Steve C
>>>
>>> On 19 Sep 2021, at 1:18 pm, Gengliang Wang  wrote:
>>>
>>> Please vote on releasing the following candidate as
>>> Apache Spark version 3.2.0.
>>>
>>> The vote is open until 11:59pm Pacific time September 24 and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.2.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>> 
>>>
>>> The tag to be voted on is v3.2.0-rc3 (commit
>>> 96044e97353a079d3a7233ed3795ca82f3d9a101):
>>> https://github.com/apache/spark/tree/v3.2.0-rc3
>>> 
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/
>>> 
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> 
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1390
>>> 
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-docs/
>>> 

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Gengliang Wang
To Stephen: Thanks for pointing that out. I agree with that.
To Sean: I made a PR  to remove
the test dependency so that we can start RC4 ASAP.

Gengliang

On Tue, Sep 21, 2021 at 8:14 PM Sean Owen  wrote:

> Hm yeah I tend to agree. See https://github.com/apache/spark/pull/33912
> This _is_ a test-only dependency which makes it less of an issue.
> I'm guessing it's not in Maven as it's a small one-off utility; we _could_
> just inline the ~100 lines of code in test code instead?
>
> On Tue, Sep 21, 2021 at 12:33 AM Stephen Coy 
> wrote:
>
>> Hi there,
>>
>> I was going to -1 this because of the
>> com.github.rdblue:brotli-codec:0.1.1 dependency, which is not available on
>> Maven Central, and therefore is not available from our repository manager
>> (Nexus).
>>
>> Historically  most places I have worked have avoided other public maven
>> repositories because they are not well curated. i.e artifacts with the same
>> GAV have been known to change over time, which never happens with Maven
>> Central.
>>
>> I know that I can address this by changing my settings.xml file.
>>
>> Anyway, I can see this biting other people so I thought that I would
>> mention it.
>>
>> Steve C
>>
>> On 19 Sep 2021, at 1:18 pm, Gengliang Wang  wrote:
>>
>> Please vote on releasing the following candidate as
>> Apache Spark version 3.2.0.
>>
>> The vote is open until 11:59pm Pacific time September 24 and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.2.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>> 
>>
>> The tag to be voted on is v3.2.0-rc3 (commit
>> 96044e97353a079d3a7233ed3795ca82f3d9a101):
>> https://github.com/apache/spark/tree/v3.2.0-rc3
>> 
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/
>> 
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> 
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1390
>> 
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-docs/
>> 
>>
>> The list of bug fixes going into 3.2.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>
>> 

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Sean Owen
Hm yeah I tend to agree. See https://github.com/apache/spark/pull/33912
This _is_ a test-only dependency which makes it less of an issue.
I'm guessing it's not in Maven as it's a small one-off utility; we _could_
just inline the ~100 lines of code in test code instead?

On Tue, Sep 21, 2021 at 12:33 AM Stephen Coy 
wrote:

> Hi there,
>
> I was going to -1 this because of the com.github.rdblue:brotli-codec:0.1.1
> dependency, which is not available on Maven Central, and therefore is not
> available from our repository manager (Nexus).
>
> Historically  most places I have worked have avoided other public maven
> repositories because they are not well curated. i.e artifacts with the same
> GAV have been known to change over time, which never happens with Maven
> Central.
>
> I know that I can address this by changing my settings.xml file.
>
> Anyway, I can see this biting other people so I thought that I would
> mention it.
>
> Steve C
>
> On 19 Sep 2021, at 1:18 pm, Gengliang Wang  wrote:
>
> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 24 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
> 
>
> The tag to be voted on is v3.2.0-rc3 (commit
> 96044e97353a079d3a7233ed3795ca82f3d9a101):
> https://github.com/apache/spark/tree/v3.2.0-rc3
> 
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/
> 
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
> 
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1390
> 
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-docs/
> 
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> 

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-20 Thread Stephen Coy
Hi there,

I was going to -1 this because of the com.github.rdblue:brotli-codec:0.1.1 
dependency, which is not available on Maven Central, and therefore is not 
available from our repository manager (Nexus).

Historically  most places I have worked have avoided other public maven 
repositories because they are not well curated. i.e artifacts with the same GAV 
have been known to change over time, which never happens with Maven Central.

I know that I can address this by changing my settings.xml file.

Anyway, I can see this biting other people so I thought that I would mention it.

Steve C

On 19 Sep 2021, at 1:18 pm, Gengliang Wang 
mailto:ltn...@gmail.com>> wrote:

Please vote on releasing the following candidate as Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time September 24 and passes if a 
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see 
http://spark.apache.org/

The tag to be voted on is v3.2.0-rc3 (commit 
96044e97353a079d3a7233ed3795ca82f3d9a101):
https://github.com/apache/spark/tree/v3.2.0-rc3

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1390

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407


This release is using the release script of the tag v3.2.0-rc3.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any 

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-20 Thread Michael Heuer
+1 (non-binding)

Spark 3.2.0 RC3, Parquet 1.12.1, and Avro 1.10.2 together removes the need for 
various conflict-preventing workarounds we've needed to maintain for several 
years.

Cheers,

   michael


> On Sep 18, 2021, at 10:18 PM, Gengliang Wang  wrote:
> 
> Please vote on releasing the following candidate as Apache Spark version 
> 3.2.0.
> 
> The vote is open until 11:59pm Pacific time September 24 and passes if a 
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> 
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
> 
> To learn more about Apache Spark, please see http://spark.apache.org/ 
> 
> 
> The tag to be voted on is v3.2.0-rc3 (commit 
> 96044e97353a079d3a7233ed3795ca82f3d9a101):
> https://github.com/apache/spark/tree/v3.2.0-rc3 
> 
> 
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/ 
> 
> 
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS 
> 
> 
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1390 
> 
> 
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-docs/ 
> 
> 
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>  
> 
> This release is using the release script of the tag v3.2.0-rc3.
> 
> 
> FAQ
> 
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
> 
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
> 
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK 
>  and search for "Target 
> Version/s" = 3.2.0
> 
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
> 
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.



Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-20 Thread Sean Owen
+1 from me, same results as the last RC from my side.
The Scala 2.13 POM issue was resolved and the 2.13 build appears to be OK.

On Sat, Sep 18, 2021 at 10:19 PM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 24 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc3 (commit
> 96044e97353a079d3a7233ed3795ca82f3d9a101):
> https://github.com/apache/spark/tree/v3.2.0-rc3
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1390
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc3.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-18 Thread Gengliang Wang
Starting with my +1(non-binding)

Thanks,
Gengliang

On Sun, Sep 19, 2021 at 11:18 AM Gengliang Wang  wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.2.0.
>
> The vote is open until 11:59pm Pacific time September 24 and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.2.0-rc3 (commit
> 96044e97353a079d3a7233ed3795ca82f3d9a101):
> https://github.com/apache/spark/tree/v3.2.0-rc3
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1390
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-docs/
>
> The list of bug fixes going into 3.2.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>
> This release is using the release script of the tag v3.2.0-rc3.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.0?
> ===
> The current list of open tickets targeted at 3.2.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


[VOTE] Release Spark 3.2.0 (RC3)

2021-09-18 Thread Gengliang Wang
Please vote on releasing the following candidate as
Apache Spark version 3.2.0.

The vote is open until 11:59pm Pacific time September 24 and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.2.0-rc3 (commit
96044e97353a079d3a7233ed3795ca82f3d9a101):
https://github.com/apache/spark/tree/v3.2.0-rc3

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1390

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc3-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407

This release is using the release script of the tag v3.2.0-rc3.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.0?
===
The current list of open tickets targeted at 3.2.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.