Re: [VOTE] Release Spark 3.3.1 (RC2)

2022-10-05 Thread Yuming Wang
Hi All,

Thank you all for testing and voting!

There's a -1 vote here, so I think this RC fails. I will prepare for
RC3 soon.

On Tue, Oct 4, 2022 at 6:34 AM Mridul Muralidharan  wrote:

> +1 from me, with a few comments.
>
> I saw the following failures, are these known issues/flakey tests ?
>
> * PersistenceEngineSuite.ZooKeeperPersistenceEngine
> Looks like a port conflict issue from a quick look into logs (conflict
> with starting admin port at 8080) - is this expected behavior for the test ?
> I worked around it by shutting down the process which was using the port -
> though did not investigate deeply.
>
> * org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite was aborted
> It is expecting these artifacts in $HOME/.m2/repository
>
> 1. tomcat#jasper-compiler;5.5.23!jasper-compiler.jar
> 2. tomcat#jasper-runtime;5.5.23!jasper-runtime.jar
> 3. commons-el#commons-el;1.0!commons-el.jar
> 4. org.apache.hive#hive-exec;2.3.7!hive-exec.jar
>
> I worked around it by adding them locally explicitly - we should probably
> add them as test dependency ?
> Not sure if this changed in this release though (I had cleaned my local
> .m2 recently)
>
> Other than this, rest looks good to me.
>
> Regards,
> Mridul
>
>
> On Wed, Sep 28, 2022 at 2:56 PM Sean Owen  wrote:
>
>> +1 from me, same result as last RC.
>>
>> On Wed, Sep 28, 2022 at 12:21 AM Yuming Wang  wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version 
>>> 3.3.1.
>>>
>>> The vote is open until 11:59pm Pacific time October 3th and passes if a 
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.3.1
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see https://spark.apache.org
>>>
>>> The tag to be voted on is v3.3.1-rc2 (commit 
>>> 1d3b8f7cb15283a1e37ecada6d751e17f30647ce):
>>> https://github.com/apache/spark/tree/v3.3.1-rc2
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc2-bin
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1421
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc2-docs
>>>
>>> The list of bug fixes going into 3.3.1 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12351710
>>>
>>> This release is using the release script of the tag v3.3.1-rc2.
>>>
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.3.1?
>>> ===
>>> The current list of open tickets targeted at 3.3.1 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target 
>>> Version/s" = 3.3.1
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>>
>>>
>>>


Re: Dropping Apache Spark Hadoop2 Binary Distribution?

2022-10-05 Thread Dongjoon Hyun
Thank you all.

SPARK-40651 is merged to Apache Spark master branch for Apache Spark 3.4.0
now.

Dongjoon.

On Wed, Oct 5, 2022 at 3:24 PM L. C. Hsieh  wrote:

> +1
>
> Thanks Dongjoon.
>
> On Wed, Oct 5, 2022 at 3:11 PM Jungtaek Lim
>  wrote:
> >
> > +1
> >
> > On Thu, Oct 6, 2022 at 5:59 AM Chao Sun  wrote:
> >>
> >> +1
> >>
> >> > and specifically may allow us to finally move off of the ancient
> version of Guava (?)
> >>
> >> I think the Guava issue comes from Hive 2.3 dependency, not Hadoop.
> >>
> >> On Wed, Oct 5, 2022 at 1:55 PM Xinrong Meng 
> wrote:
> >>>
> >>> +1.
> >>>
> >>> On Wed, Oct 5, 2022 at 1:53 PM Xiao Li 
> wrote:
> 
>  +1.
> 
>  Xiao
> 
>  On Wed, Oct 5, 2022 at 12:49 PM Sean Owen  wrote:
> >
> > I'm OK with this. It simplifies maintenance a bit, and specifically
> may allow us to finally move off of the ancient version of Guava (?)
> >
> > On Mon, Oct 3, 2022 at 10:16 PM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> >>
> >> Hi, All.
> >>
> >> I'm wondering if the following Apache Spark Hadoop2 Binary
> Distribution
> >> is still used by someone in the community or not. If it's not used
> or not useful,
> >> we may remove it from Apache Spark 3.4.0 release.
> >>
> >>
> https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.tgz
> >>
> >> Here is the background of this question.
> >> Since Apache Spark 2.2.0 (SPARK-19493, SPARK-19550), the Apache
> >> Spark community has been building and releasing with Java 8 only.
> >> I believe that the user applications also use Java8+ in these days.
> >> Recently, I received the following message from the Hadoop PMC.
> >>
> >>   > "if you really want to claim hadoop 2.x compatibility, then you
> have to
> >>   > be building against java 7". Otherwise a lot of people with
> hadoop 2.x
> >>   > clusters won't be able to run your code. If your projects are
> java8+
> >>   > only, then they are implicitly hadoop 3.1+, no matter what you
> use
> >>   > in your build. Hence: no need for branch-2 branches except
> >>   > to complicate your build/test/release processes [1]
> >>
> >> If Hadoop2 binary distribution is no longer used as of today,
> >> or incomplete somewhere due to Java 8 building, the following three
> >> existing alternative Hadoop 3 binary distributions could be
> >> the better official solution for old Hadoop 2 clusters.
> >>
> >> 1) Scala 2.12 and without-hadoop distribution
> >> 2) Scala 2.12 and Hadoop 3 distribution
> >> 3) Scala 2.13 and Hadoop 3 distribution
> >>
> >> In short, is there anyone who is using Apache Spark 3.3.0 Hadoop2
> Binary distribution?
> >>
> >> Dongjoon
> >>
> >> [1]
> https://issues.apache.org/jira/browse/ORC-1251?focusedCommentId=17608247=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17608247
> 
> 
> 
>  --
> 
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Dropping Apache Spark Hadoop2 Binary Distribution?

2022-10-05 Thread L. C. Hsieh
+1

Thanks Dongjoon.

On Wed, Oct 5, 2022 at 3:11 PM Jungtaek Lim
 wrote:
>
> +1
>
> On Thu, Oct 6, 2022 at 5:59 AM Chao Sun  wrote:
>>
>> +1
>>
>> > and specifically may allow us to finally move off of the ancient version 
>> > of Guava (?)
>>
>> I think the Guava issue comes from Hive 2.3 dependency, not Hadoop.
>>
>> On Wed, Oct 5, 2022 at 1:55 PM Xinrong Meng  wrote:
>>>
>>> +1.
>>>
>>> On Wed, Oct 5, 2022 at 1:53 PM Xiao Li  
>>> wrote:

 +1.

 Xiao

 On Wed, Oct 5, 2022 at 12:49 PM Sean Owen  wrote:
>
> I'm OK with this. It simplifies maintenance a bit, and specifically may 
> allow us to finally move off of the ancient version of Guava (?)
>
> On Mon, Oct 3, 2022 at 10:16 PM Dongjoon Hyun  
> wrote:
>>
>> Hi, All.
>>
>> I'm wondering if the following Apache Spark Hadoop2 Binary Distribution
>> is still used by someone in the community or not. If it's not used or 
>> not useful,
>> we may remove it from Apache Spark 3.4.0 release.
>>
>> 
>> https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.tgz
>>
>> Here is the background of this question.
>> Since Apache Spark 2.2.0 (SPARK-19493, SPARK-19550), the Apache
>> Spark community has been building and releasing with Java 8 only.
>> I believe that the user applications also use Java8+ in these days.
>> Recently, I received the following message from the Hadoop PMC.
>>
>>   > "if you really want to claim hadoop 2.x compatibility, then you have 
>> to
>>   > be building against java 7". Otherwise a lot of people with hadoop 
>> 2.x
>>   > clusters won't be able to run your code. If your projects are java8+
>>   > only, then they are implicitly hadoop 3.1+, no matter what you use
>>   > in your build. Hence: no need for branch-2 branches except
>>   > to complicate your build/test/release processes [1]
>>
>> If Hadoop2 binary distribution is no longer used as of today,
>> or incomplete somewhere due to Java 8 building, the following three
>> existing alternative Hadoop 3 binary distributions could be
>> the better official solution for old Hadoop 2 clusters.
>>
>> 1) Scala 2.12 and without-hadoop distribution
>> 2) Scala 2.12 and Hadoop 3 distribution
>> 3) Scala 2.13 and Hadoop 3 distribution
>>
>> In short, is there anyone who is using Apache Spark 3.3.0 Hadoop2 Binary 
>> distribution?
>>
>> Dongjoon
>>
>> [1] 
>> https://issues.apache.org/jira/browse/ORC-1251?focusedCommentId=17608247=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17608247



 --


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Dropping Apache Spark Hadoop2 Binary Distribution?

2022-10-05 Thread Jungtaek Lim
+1

On Thu, Oct 6, 2022 at 5:59 AM Chao Sun  wrote:

> +1
>
> > and specifically may allow us to finally move off of the ancient version
> of Guava (?)
>
> I think the Guava issue comes from Hive 2.3 dependency, not Hadoop.
>
> On Wed, Oct 5, 2022 at 1:55 PM Xinrong Meng 
> wrote:
>
>> +1.
>>
>> On Wed, Oct 5, 2022 at 1:53 PM Xiao Li 
>> wrote:
>>
>>> +1.
>>>
>>> Xiao
>>>
>>> On Wed, Oct 5, 2022 at 12:49 PM Sean Owen  wrote:
>>>
 I'm OK with this. It simplifies maintenance a bit, and specifically may
 allow us to finally move off of the ancient version of Guava (?)

 On Mon, Oct 3, 2022 at 10:16 PM Dongjoon Hyun 
 wrote:

> Hi, All.
>
> I'm wondering if the following Apache Spark Hadoop2 Binary Distribution
> is still used by someone in the community or not. If it's not used or
> not useful,
> we may remove it from Apache Spark 3.4.0 release.
>
>
> https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.tgz
>
> Here is the background of this question.
> Since Apache Spark 2.2.0 (SPARK-19493, SPARK-19550), the Apache
> Spark community has been building and releasing with Java 8 only.
> I believe that the user applications also use Java8+ in these days.
> Recently, I received the following message from the Hadoop PMC.
>
>   > "if you really want to claim hadoop 2.x compatibility, then you
> have to
>   > be building against java 7". Otherwise a lot of people with hadoop
> 2.x
>   > clusters won't be able to run your code. If your projects are
> java8+
>   > only, then they are implicitly hadoop 3.1+, no matter what you use
>   > in your build. Hence: no need for branch-2 branches except
>   > to complicate your build/test/release processes [1]
>
> If Hadoop2 binary distribution is no longer used as of today,
> or incomplete somewhere due to Java 8 building, the following three
> existing alternative Hadoop 3 binary distributions could be
> the better official solution for old Hadoop 2 clusters.
>
> 1) Scala 2.12 and without-hadoop distribution
> 2) Scala 2.12 and Hadoop 3 distribution
> 3) Scala 2.13 and Hadoop 3 distribution
>
> In short, is there anyone who is using Apache Spark 3.3.0 Hadoop2
> Binary distribution?
>
> Dongjoon
>
> [1]
> https://issues.apache.org/jira/browse/ORC-1251?focusedCommentId=17608247=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17608247
>

>>>
>>> --
>>>
>>>


Re: Dropping Apache Spark Hadoop2 Binary Distribution?

2022-10-05 Thread Chao Sun
+1

> and specifically may allow us to finally move off of the ancient version
of Guava (?)

I think the Guava issue comes from Hive 2.3 dependency, not Hadoop.

On Wed, Oct 5, 2022 at 1:55 PM Xinrong Meng 
wrote:

> +1.
>
> On Wed, Oct 5, 2022 at 1:53 PM Xiao Li 
> wrote:
>
>> +1.
>>
>> Xiao
>>
>> On Wed, Oct 5, 2022 at 12:49 PM Sean Owen  wrote:
>>
>>> I'm OK with this. It simplifies maintenance a bit, and specifically may
>>> allow us to finally move off of the ancient version of Guava (?)
>>>
>>> On Mon, Oct 3, 2022 at 10:16 PM Dongjoon Hyun 
>>> wrote:
>>>
 Hi, All.

 I'm wondering if the following Apache Spark Hadoop2 Binary Distribution
 is still used by someone in the community or not. If it's not used or
 not useful,
 we may remove it from Apache Spark 3.4.0 release.


 https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.tgz

 Here is the background of this question.
 Since Apache Spark 2.2.0 (SPARK-19493, SPARK-19550), the Apache
 Spark community has been building and releasing with Java 8 only.
 I believe that the user applications also use Java8+ in these days.
 Recently, I received the following message from the Hadoop PMC.

   > "if you really want to claim hadoop 2.x compatibility, then you
 have to
   > be building against java 7". Otherwise a lot of people with hadoop
 2.x
   > clusters won't be able to run your code. If your projects are java8+
   > only, then they are implicitly hadoop 3.1+, no matter what you use
   > in your build. Hence: no need for branch-2 branches except
   > to complicate your build/test/release processes [1]

 If Hadoop2 binary distribution is no longer used as of today,
 or incomplete somewhere due to Java 8 building, the following three
 existing alternative Hadoop 3 binary distributions could be
 the better official solution for old Hadoop 2 clusters.

 1) Scala 2.12 and without-hadoop distribution
 2) Scala 2.12 and Hadoop 3 distribution
 3) Scala 2.13 and Hadoop 3 distribution

 In short, is there anyone who is using Apache Spark 3.3.0 Hadoop2
 Binary distribution?

 Dongjoon

 [1]
 https://issues.apache.org/jira/browse/ORC-1251?focusedCommentId=17608247=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17608247

>>>
>>
>> --
>>
>>


Re: Dropping Apache Spark Hadoop2 Binary Distribution?

2022-10-05 Thread Xinrong Meng
+1.

On Wed, Oct 5, 2022 at 1:53 PM Xiao Li 
wrote:

> +1.
>
> Xiao
>
> On Wed, Oct 5, 2022 at 12:49 PM Sean Owen  wrote:
>
>> I'm OK with this. It simplifies maintenance a bit, and specifically may
>> allow us to finally move off of the ancient version of Guava (?)
>>
>> On Mon, Oct 3, 2022 at 10:16 PM Dongjoon Hyun 
>> wrote:
>>
>>> Hi, All.
>>>
>>> I'm wondering if the following Apache Spark Hadoop2 Binary Distribution
>>> is still used by someone in the community or not. If it's not used or
>>> not useful,
>>> we may remove it from Apache Spark 3.4.0 release.
>>>
>>>
>>> https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.tgz
>>>
>>> Here is the background of this question.
>>> Since Apache Spark 2.2.0 (SPARK-19493, SPARK-19550), the Apache
>>> Spark community has been building and releasing with Java 8 only.
>>> I believe that the user applications also use Java8+ in these days.
>>> Recently, I received the following message from the Hadoop PMC.
>>>
>>>   > "if you really want to claim hadoop 2.x compatibility, then you have
>>> to
>>>   > be building against java 7". Otherwise a lot of people with hadoop
>>> 2.x
>>>   > clusters won't be able to run your code. If your projects are java8+
>>>   > only, then they are implicitly hadoop 3.1+, no matter what you use
>>>   > in your build. Hence: no need for branch-2 branches except
>>>   > to complicate your build/test/release processes [1]
>>>
>>> If Hadoop2 binary distribution is no longer used as of today,
>>> or incomplete somewhere due to Java 8 building, the following three
>>> existing alternative Hadoop 3 binary distributions could be
>>> the better official solution for old Hadoop 2 clusters.
>>>
>>> 1) Scala 2.12 and without-hadoop distribution
>>> 2) Scala 2.12 and Hadoop 3 distribution
>>> 3) Scala 2.13 and Hadoop 3 distribution
>>>
>>> In short, is there anyone who is using Apache Spark 3.3.0 Hadoop2 Binary
>>> distribution?
>>>
>>> Dongjoon
>>>
>>> [1]
>>> https://issues.apache.org/jira/browse/ORC-1251?focusedCommentId=17608247=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17608247
>>>
>>
>
> --
>
>


Re: Dropping Apache Spark Hadoop2 Binary Distribution?

2022-10-05 Thread Xiao Li
+1.

Xiao

On Wed, Oct 5, 2022 at 12:49 PM Sean Owen  wrote:

> I'm OK with this. It simplifies maintenance a bit, and specifically may
> allow us to finally move off of the ancient version of Guava (?)
>
> On Mon, Oct 3, 2022 at 10:16 PM Dongjoon Hyun 
> wrote:
>
>> Hi, All.
>>
>> I'm wondering if the following Apache Spark Hadoop2 Binary Distribution
>> is still used by someone in the community or not. If it's not used or not
>> useful,
>> we may remove it from Apache Spark 3.4.0 release.
>>
>>
>> https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.tgz
>>
>> Here is the background of this question.
>> Since Apache Spark 2.2.0 (SPARK-19493, SPARK-19550), the Apache
>> Spark community has been building and releasing with Java 8 only.
>> I believe that the user applications also use Java8+ in these days.
>> Recently, I received the following message from the Hadoop PMC.
>>
>>   > "if you really want to claim hadoop 2.x compatibility, then you have
>> to
>>   > be building against java 7". Otherwise a lot of people with hadoop 2.x
>>   > clusters won't be able to run your code. If your projects are java8+
>>   > only, then they are implicitly hadoop 3.1+, no matter what you use
>>   > in your build. Hence: no need for branch-2 branches except
>>   > to complicate your build/test/release processes [1]
>>
>> If Hadoop2 binary distribution is no longer used as of today,
>> or incomplete somewhere due to Java 8 building, the following three
>> existing alternative Hadoop 3 binary distributions could be
>> the better official solution for old Hadoop 2 clusters.
>>
>> 1) Scala 2.12 and without-hadoop distribution
>> 2) Scala 2.12 and Hadoop 3 distribution
>> 3) Scala 2.13 and Hadoop 3 distribution
>>
>> In short, is there anyone who is using Apache Spark 3.3.0 Hadoop2 Binary
>> distribution?
>>
>> Dongjoon
>>
>> [1]
>> https://issues.apache.org/jira/browse/ORC-1251?focusedCommentId=17608247=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17608247
>>
>

--


Re: Dropping Apache Spark Hadoop2 Binary Distribution?

2022-10-05 Thread Sean Owen
I'm OK with this. It simplifies maintenance a bit, and specifically may
allow us to finally move off of the ancient version of Guava (?)

On Mon, Oct 3, 2022 at 10:16 PM Dongjoon Hyun 
wrote:

> Hi, All.
>
> I'm wondering if the following Apache Spark Hadoop2 Binary Distribution
> is still used by someone in the community or not. If it's not used or not
> useful,
> we may remove it from Apache Spark 3.4.0 release.
>
>
> https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.tgz
>
> Here is the background of this question.
> Since Apache Spark 2.2.0 (SPARK-19493, SPARK-19550), the Apache
> Spark community has been building and releasing with Java 8 only.
> I believe that the user applications also use Java8+ in these days.
> Recently, I received the following message from the Hadoop PMC.
>
>   > "if you really want to claim hadoop 2.x compatibility, then you have to
>   > be building against java 7". Otherwise a lot of people with hadoop 2.x
>   > clusters won't be able to run your code. If your projects are java8+
>   > only, then they are implicitly hadoop 3.1+, no matter what you use
>   > in your build. Hence: no need for branch-2 branches except
>   > to complicate your build/test/release processes [1]
>
> If Hadoop2 binary distribution is no longer used as of today,
> or incomplete somewhere due to Java 8 building, the following three
> existing alternative Hadoop 3 binary distributions could be
> the better official solution for old Hadoop 2 clusters.
>
> 1) Scala 2.12 and without-hadoop distribution
> 2) Scala 2.12 and Hadoop 3 distribution
> 3) Scala 2.13 and Hadoop 3 distribution
>
> In short, is there anyone who is using Apache Spark 3.3.0 Hadoop2 Binary
> distribution?
>
> Dongjoon
>
> [1]
> https://issues.apache.org/jira/browse/ORC-1251?focusedCommentId=17608247=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17608247
>