Re: Apache Spark 3.5.0 Expectations (?)

2023-05-29 Thread Jungtaek Lim
Shall we initiate a new discussion thread for Scala 2.13 by default? While
I'm not an expert on this area, it sounds like the change is major and
(probably) breaking. It seems to be worth having a separate
discussion thread rather than just treat it like one of 25 items.

On Tue, May 30, 2023 at 9:54 AM Sean Owen  wrote:

> It does seem risky; there are still likely libs out there that don't cross
> compile for 2.13. I would make it the default at 4.0, myself.
>
> On Mon, May 29, 2023 at 7:16 PM Hyukjin Kwon  wrote:
>
>> While I support going forward with a higher version, actually using Scala
>> 2.13 by default is a big deal especially in a way that:
>>
>>- Users would likely download the built-in version assuming that it’s
>>backward binary compatible.
>>- PyPI doesn't allow specifying the Scala version, meaning that users
>>wouldn’t have a way to 'pip install pyspark' based on Scala 2.12.
>>
>> I wonder if it’s safer to do it in Spark 4 (which I believe will be
>> discussed soon).
>>
>>
>> On Mon, 29 May 2023 at 13:21, Jia Fan  wrote:
>>
>>> Thanks Dongjoon!
>>> There are some ticket I want to share.
>>> SPARK-39420 Support ANALYZE TABLE on v2 tables
>>> SPARK-42750 Support INSERT INTO by name
>>> SPARK-43521 Support CREATE TABLE LIKE FILE
>>>
>>> Dongjoon Hyun  于2023年5月29日周一 08:42写道:
>>>
 Hi, All.

 Apache Spark 3.5.0 is scheduled for August (1st Release Candidate) and
 currently a few notable things are under discussions in the mailing list.

 I believe it's a good time to share a short summary list (containing
 both completed and in-progress items) to give a highlight in advance and to
 collect your targets too.

 Please share your expectations or working items if you want to
 prioritize them more in the community in Apache Spark 3.5.0 timeframe.

 (Sorted by ID)
 SPARK-40497 Upgrade Scala 2.13.11
 SPARK-42452 Remove hadoop-2 profile from Apache Spark 3.5.0
 SPARK-42913 Upgrade to Hadoop 3.3.5 (aws-java-sdk-bundle: 1.12.262 ->
 1.12.316)
 SPARK-43024 Upgrade Pandas to 2.0.0
 SPARK-43200 Remove Hadoop 2 reference in docs
 SPARK-43347 Remove Python 3.7 Support
 SPARK-43348 Support Python 3.8 in PyPy3
 SPARK-43351 Add Spark Connect Go prototype code and example
 SPARK-43379 Deprecate old Java 8 versions prior to 8u371
 SPARK-43394 Upgrade to Maven 3.8.8
 SPARK-43436 Upgrade to RocksDbjni 8.1.1.1
 SPARK-43446 Upgrade to Apache Arrow 12.0.0
 SPARK-43447 Support R 4.3.0
 SPARK-43489 Remove protobuf 2.5.0
 SPARK-43519 Bump Parquet to 1.13.1
 SPARK-43581 Upgrade kubernetes-client to 6.6.2
 SPARK-43588 Upgrade to ASM 9.5
 SPARK-43600 Update K8s doc to recommend K8s 1.24+
 SPARK-43738 Upgrade to DropWizard Metrics 4.2.18
 SPARK-43831 Build and Run Spark on Java 21
 SPARK-43832 Upgrade to Scala 2.12.18
 SPARK-43836 Make Scala 2.13 as default in Spark 3.5
 SPARK-43842 Upgrade gcs-connector to 2.2.14
 SPARK-43844 Update to ORC 1.9.0
 UMBRELLA: Add SQL functions into Scala, Python and R API

 Thanks,
 Dongjoon.

 PS. The above is not a list of release blockers. Instead, it could be a
 nice-to-have from someone's perspective.

>>>


Re: Apache Spark 3.5.0 Expectations (?)

2023-05-29 Thread Sean Owen
It does seem risky; there are still likely libs out there that don't cross
compile for 2.13. I would make it the default at 4.0, myself.

On Mon, May 29, 2023 at 7:16 PM Hyukjin Kwon  wrote:

> While I support going forward with a higher version, actually using Scala
> 2.13 by default is a big deal especially in a way that:
>
>- Users would likely download the built-in version assuming that it’s
>backward binary compatible.
>- PyPI doesn't allow specifying the Scala version, meaning that users
>wouldn’t have a way to 'pip install pyspark' based on Scala 2.12.
>
> I wonder if it’s safer to do it in Spark 4 (which I believe will be
> discussed soon).
>
>
> On Mon, 29 May 2023 at 13:21, Jia Fan  wrote:
>
>> Thanks Dongjoon!
>> There are some ticket I want to share.
>> SPARK-39420 Support ANALYZE TABLE on v2 tables
>> SPARK-42750 Support INSERT INTO by name
>> SPARK-43521 Support CREATE TABLE LIKE FILE
>>
>> Dongjoon Hyun  于2023年5月29日周一 08:42写道:
>>
>>> Hi, All.
>>>
>>> Apache Spark 3.5.0 is scheduled for August (1st Release Candidate) and
>>> currently a few notable things are under discussions in the mailing list.
>>>
>>> I believe it's a good time to share a short summary list (containing
>>> both completed and in-progress items) to give a highlight in advance and to
>>> collect your targets too.
>>>
>>> Please share your expectations or working items if you want to
>>> prioritize them more in the community in Apache Spark 3.5.0 timeframe.
>>>
>>> (Sorted by ID)
>>> SPARK-40497 Upgrade Scala 2.13.11
>>> SPARK-42452 Remove hadoop-2 profile from Apache Spark 3.5.0
>>> SPARK-42913 Upgrade to Hadoop 3.3.5 (aws-java-sdk-bundle: 1.12.262 ->
>>> 1.12.316)
>>> SPARK-43024 Upgrade Pandas to 2.0.0
>>> SPARK-43200 Remove Hadoop 2 reference in docs
>>> SPARK-43347 Remove Python 3.7 Support
>>> SPARK-43348 Support Python 3.8 in PyPy3
>>> SPARK-43351 Add Spark Connect Go prototype code and example
>>> SPARK-43379 Deprecate old Java 8 versions prior to 8u371
>>> SPARK-43394 Upgrade to Maven 3.8.8
>>> SPARK-43436 Upgrade to RocksDbjni 8.1.1.1
>>> SPARK-43446 Upgrade to Apache Arrow 12.0.0
>>> SPARK-43447 Support R 4.3.0
>>> SPARK-43489 Remove protobuf 2.5.0
>>> SPARK-43519 Bump Parquet to 1.13.1
>>> SPARK-43581 Upgrade kubernetes-client to 6.6.2
>>> SPARK-43588 Upgrade to ASM 9.5
>>> SPARK-43600 Update K8s doc to recommend K8s 1.24+
>>> SPARK-43738 Upgrade to DropWizard Metrics 4.2.18
>>> SPARK-43831 Build and Run Spark on Java 21
>>> SPARK-43832 Upgrade to Scala 2.12.18
>>> SPARK-43836 Make Scala 2.13 as default in Spark 3.5
>>> SPARK-43842 Upgrade gcs-connector to 2.2.14
>>> SPARK-43844 Update to ORC 1.9.0
>>> UMBRELLA: Add SQL functions into Scala, Python and R API
>>>
>>> Thanks,
>>> Dongjoon.
>>>
>>> PS. The above is not a list of release blockers. Instead, it could be a
>>> nice-to-have from someone's perspective.
>>>
>>


Re: Apache Spark 3.5.0 Expectations (?)

2023-05-29 Thread Hyukjin Kwon
While I support going forward with a higher version, actually using Scala
2.13 by default is a big deal especially in a way that:

   - Users would likely download the built-in version assuming that it’s
   backward binary compatible.
   - PyPI doesn't allow specifying the Scala version, meaning that users
   wouldn’t have a way to 'pip install pyspark' based on Scala 2.12.

I wonder if it’s safer to do it in Spark 4 (which I believe will be
discussed soon).


On Mon, 29 May 2023 at 13:21, Jia Fan  wrote:

> Thanks Dongjoon!
> There are some ticket I want to share.
> SPARK-39420 Support ANALYZE TABLE on v2 tables
> SPARK-42750 Support INSERT INTO by name
> SPARK-43521 Support CREATE TABLE LIKE FILE
>
> Dongjoon Hyun  于2023年5月29日周一 08:42写道:
>
>> Hi, All.
>>
>> Apache Spark 3.5.0 is scheduled for August (1st Release Candidate) and
>> currently a few notable things are under discussions in the mailing list.
>>
>> I believe it's a good time to share a short summary list (containing both
>> completed and in-progress items) to give a highlight in advance and to
>> collect your targets too.
>>
>> Please share your expectations or working items if you want to prioritize
>> them more in the community in Apache Spark 3.5.0 timeframe.
>>
>> (Sorted by ID)
>> SPARK-40497 Upgrade Scala 2.13.11
>> SPARK-42452 Remove hadoop-2 profile from Apache Spark 3.5.0
>> SPARK-42913 Upgrade to Hadoop 3.3.5 (aws-java-sdk-bundle: 1.12.262 ->
>> 1.12.316)
>> SPARK-43024 Upgrade Pandas to 2.0.0
>> SPARK-43200 Remove Hadoop 2 reference in docs
>> SPARK-43347 Remove Python 3.7 Support
>> SPARK-43348 Support Python 3.8 in PyPy3
>> SPARK-43351 Add Spark Connect Go prototype code and example
>> SPARK-43379 Deprecate old Java 8 versions prior to 8u371
>> SPARK-43394 Upgrade to Maven 3.8.8
>> SPARK-43436 Upgrade to RocksDbjni 8.1.1.1
>> SPARK-43446 Upgrade to Apache Arrow 12.0.0
>> SPARK-43447 Support R 4.3.0
>> SPARK-43489 Remove protobuf 2.5.0
>> SPARK-43519 Bump Parquet to 1.13.1
>> SPARK-43581 Upgrade kubernetes-client to 6.6.2
>> SPARK-43588 Upgrade to ASM 9.5
>> SPARK-43600 Update K8s doc to recommend K8s 1.24+
>> SPARK-43738 Upgrade to DropWizard Metrics 4.2.18
>> SPARK-43831 Build and Run Spark on Java 21
>> SPARK-43832 Upgrade to Scala 2.12.18
>> SPARK-43836 Make Scala 2.13 as default in Spark 3.5
>> SPARK-43842 Upgrade gcs-connector to 2.2.14
>> SPARK-43844 Update to ORC 1.9.0
>> UMBRELLA: Add SQL functions into Scala, Python and R API
>>
>> Thanks,
>> Dongjoon.
>>
>> PS. The above is not a list of release blockers. Instead, it could be a
>> nice-to-have from someone's perspective.
>>
>