Re: UPDATE: Apache Spark 3.2 Release

2021-06-17 Thread Dongjoon Hyun
Thank you for the correction, Yikun.
Yes, it's 3.3.1. :)

On 2021/06/17 09:03:55, Yikun Jiang  wrote: 
> - Apache Hadoop 3.3.2 becomes the default Hadoop profile for Apache Spark
> 3.2 via SPARK-29250 today. We are observing big improvements in S3 use
> cases. Please try it and share your experience.
> 
> It should be  Apache Hadoop 3.3.1 [1]. : )
> 
> Note that Apache hadoop 3.3.0 is the first Hadoop release including x86 and
> aarch64, and 3.3.1 also. Very happy to see 3.3.1 can be the default
> dependency of Spark 3.2.0.
> 
> [1] https://hadoop.apache.org/release/3.3.1.html
> 
> Regards,
> Yikun
> 
> 
> Dongjoon Hyun  于2021年6月17日周四 上午5:58写道:
> 
> > This is a continuation of the previous thread, `Apache Spark 3.2
> > Expectation`, in order to give you updates.
> >
> > -
> > https://lists.apache.org/thread.html/r61897da071729913bf586ddd769311ce8b5b068e7156c352b51f7a33%40%3Cdev.spark.apache.org%3E
> >
> > First of all, the AS-IS schedule is here
> >
> > - https://spark.apache.org/versioning-policy.html
> >
> >   July 1st Code freeze. Release branch cut.
> >   Mid July QA period. Focus on bug fixes, tests, stability and docs.
> > Generally, no new features merged.
> >   August   Release candidates (RC), voting, etc. until final release passes
> >
> > Second, Gengliang Wang volunteered as a release manager and started to
> > work as a release manager. Thank you! He shared the on-going issues and I
> > want to piggy-back the followings to his list.
> >
> >
> > # Languages
> >
> > - Scala 2.13 Support: Although SPARK-25075 is almost done and we have
> > Scala 2.13 Jenkins job on master branch, we do not support Scala 2.13.6. We
> > should document it if Scala 2.13.7 is not arrived on time.
> >   Please see https://github.com/scala/scala/pull/9641 (Milestone Scala
> > 2.13.7).
> >
> > - SparkR CRAN publishing: Apache SparkR 3.1.2 is in CRAN as of today, but
> > we get policy violation warnings for cache directory. The fix deadline is
> > 2021-06-28. If that's going to be removed again, we need to retry via
> > Apache Spark 3.2.0 after making some fix.
> >   https://cran.r-project.org/web/packages/SparkR/index.html
> >
> >
> > # Dependencies
> >
> > - Apache Hadoop 3.3.2 becomes the default Hadoop profile for Apache Spark
> > 3.2 via SPARK-29250 today. We are observing big improvements in S3 use
> > cases. Please try it and share your experience.
> >
> > - Apache Hive 2.3.9 becomes the built-in Hive library with more HMS
> > compatibility fixes recently. We need re-evaluate the previous HMS
> > incompatibility reports.
> >
> > - K8s 1.21 is released May 12th. K8s Client 5.4.1 supports it in Apache
> > Spark 3.2. In addition, public cloud vendors start to support K8s 1.20.
> > Please note that this is a breaking K8s API change from K8s Client 4.x to
> > 5.x.
> >
> > - SPARK-33913 upgraded Apache Kafka Client dependency to 2.8.0 and Kafka
> > community is considering the deprecation of Scala 2.12 support at Apache
> > Kafka 3.0.
> >
> > - SPARK-34542 upgraded Apache Parquet dependency to 1.12.0. However, we
> > need SPARK-34859 to fix column index issue before release. In addition,
> > Apache Parquet encryption is added as a developer API. Custom KMS client
> > should be implemented.
> >
> > - SPARK-35489 upgraded Apache ORC dependency to 1.6.8. We still need
> > ORC-804 for better masking feature additionally.
> >
> > - SPARK-34651 improved ZStandard support with ZStandard 1.4.9 and we are
> > currently evaluating newly arrived ZStandard 1.5.0 additionally. Currently,
> > JDK11 performance is under investigation. In addition, SPARK-35181 (Use
> > zstd for spark.io.compression.codec by default) is still on the way
> > seperately.
> >
> >
> > # Newly arrived items
> >
> > - SPARK-35779 Dynamic filtering for Data Source V2
> >
> > - SPARK-35781 Support Spark on Apple Silicon on macOS natively
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: UPDATE: Apache Spark 3.2 Release

2021-06-17 Thread Yikun Jiang
- Apache Hadoop 3.3.2 becomes the default Hadoop profile for Apache Spark
3.2 via SPARK-29250 today. We are observing big improvements in S3 use
cases. Please try it and share your experience.

It should be  Apache Hadoop 3.3.1 [1]. : )

Note that Apache hadoop 3.3.0 is the first Hadoop release including x86 and
aarch64, and 3.3.1 also. Very happy to see 3.3.1 can be the default
dependency of Spark 3.2.0.

[1] https://hadoop.apache.org/release/3.3.1.html

Regards,
Yikun


Dongjoon Hyun  于2021年6月17日周四 上午5:58写道:

> This is a continuation of the previous thread, `Apache Spark 3.2
> Expectation`, in order to give you updates.
>
> -
> https://lists.apache.org/thread.html/r61897da071729913bf586ddd769311ce8b5b068e7156c352b51f7a33%40%3Cdev.spark.apache.org%3E
>
> First of all, the AS-IS schedule is here
>
> - https://spark.apache.org/versioning-policy.html
>
>   July 1st Code freeze. Release branch cut.
>   Mid July QA period. Focus on bug fixes, tests, stability and docs.
> Generally, no new features merged.
>   August   Release candidates (RC), voting, etc. until final release passes
>
> Second, Gengliang Wang volunteered as a release manager and started to
> work as a release manager. Thank you! He shared the on-going issues and I
> want to piggy-back the followings to his list.
>
>
> # Languages
>
> - Scala 2.13 Support: Although SPARK-25075 is almost done and we have
> Scala 2.13 Jenkins job on master branch, we do not support Scala 2.13.6. We
> should document it if Scala 2.13.7 is not arrived on time.
>   Please see https://github.com/scala/scala/pull/9641 (Milestone Scala
> 2.13.7).
>
> - SparkR CRAN publishing: Apache SparkR 3.1.2 is in CRAN as of today, but
> we get policy violation warnings for cache directory. The fix deadline is
> 2021-06-28. If that's going to be removed again, we need to retry via
> Apache Spark 3.2.0 after making some fix.
>   https://cran.r-project.org/web/packages/SparkR/index.html
>
>
> # Dependencies
>
> - Apache Hadoop 3.3.2 becomes the default Hadoop profile for Apache Spark
> 3.2 via SPARK-29250 today. We are observing big improvements in S3 use
> cases. Please try it and share your experience.
>
> - Apache Hive 2.3.9 becomes the built-in Hive library with more HMS
> compatibility fixes recently. We need re-evaluate the previous HMS
> incompatibility reports.
>
> - K8s 1.21 is released May 12th. K8s Client 5.4.1 supports it in Apache
> Spark 3.2. In addition, public cloud vendors start to support K8s 1.20.
> Please note that this is a breaking K8s API change from K8s Client 4.x to
> 5.x.
>
> - SPARK-33913 upgraded Apache Kafka Client dependency to 2.8.0 and Kafka
> community is considering the deprecation of Scala 2.12 support at Apache
> Kafka 3.0.
>
> - SPARK-34542 upgraded Apache Parquet dependency to 1.12.0. However, we
> need SPARK-34859 to fix column index issue before release. In addition,
> Apache Parquet encryption is added as a developer API. Custom KMS client
> should be implemented.
>
> - SPARK-35489 upgraded Apache ORC dependency to 1.6.8. We still need
> ORC-804 for better masking feature additionally.
>
> - SPARK-34651 improved ZStandard support with ZStandard 1.4.9 and we are
> currently evaluating newly arrived ZStandard 1.5.0 additionally. Currently,
> JDK11 performance is under investigation. In addition, SPARK-35181 (Use
> zstd for spark.io.compression.codec by default) is still on the way
> seperately.
>
>
> # Newly arrived items
>
> - SPARK-35779 Dynamic filtering for Data Source V2
>
> - SPARK-35781 Support Spark on Apple Silicon on macOS natively
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>