Re: Time for Spark 3.4.0 release?

2023-01-03 Thread Yang,Jie(INF)
+1 for me

YangJie

发件人: Hyukjin Kwon 
日期: 2023年1月4日 星期三 13:16
收件人: Xinrong Meng 
抄送: dev 
主题: Re: Time for Spark 3.4.0 release?

SGTM +1

On Wed, Jan 4, 2023 at 2:13 PM Xinrong Meng 
mailto:xinrong.apa...@gmail.com>> wrote:
Hi All,

Shall we cut branch-3.4 on January 16th, 2023? We proposed January 15th per
https://spark.apache.org/versioning-policy.html,
 but I would suggest we postpone one day since January 15th is a Sunday.

I would like to volunteer as the release manager for Apache Spark 3.4.0.

Thanks,

Xinrong Meng



Re: maven build failing in spark sql w/BouncyCastleProvider CNFE

2022-12-07 Thread Yang,Jie(INF)
Steve, after some investigate, I think this problem may not related to 
`scala-maven-plugin`. We can add the following two test dependencies to the 
`sql/core` module to make the mvn build successful:

```

  org.bouncycastle
  bcprov-jdk15on
  test


  org.bouncycastle
  bcpkix-jdk15on
  test

```

Yang Jie

发件人: "Yang,Jie(INF)" 
日期: 2022年12月6日 星期二 18:27
收件人: Steve Loughran 
抄送: Hyukjin Kwon , Apache Spark Dev 
主题: Re: maven build failing in spark sql w/BouncyCastleProvider CNFE

I think we can try scala-maven-plugin 4.8.0

发件人: Steve Loughran 
日期: 2022年12月6日 星期二 18:19
收件人: "Yang,Jie(INF)" 
抄送: Hyukjin Kwon , Apache Spark Dev 
主题: Re: maven build failing in spark sql w/BouncyCastleProvider CNFE



On Tue, 6 Dec 2022 at 04:10, Yang,Jie(INF) 
mailto:yangji...@baidu.com>> wrote:
Steve, did compile failed happen when mvn build Spark master with hadoop 
3.4.0-SNAPSHOT?

yes. doesn't happen with
* branch-3.3 snapshot (3.3.9-SNAPSHOT)
* branch-3.3.5 RC0 "pre-rc" in asf staging.

maybe trying the 4.8.0 plugin would be worth trying...not something i'll do 
this week as i'm really trying to get the RC0 out rather than anything else


发件人: Hyukjin Kwon mailto:gurwls...@gmail.com>>
日期: 2022年12月6日 星期二 10:27
抄送: Apache Spark Dev mailto:dev@spark.apache.org>>
主题: Re: maven build failing in spark sql w/BouncyCastleProvider CNFE

Steve, does the lower version of scala plugin work for you? If that solves, we 
could temporary downgrade for now.

On Mon, 5 Dec 2022 at 22:23, Steve Loughran  wrote:
 trying to build spark master w/ hadoop trunk and the maven sbt plugin is 
failing. This doesn't happen with the 3.3.5 RC0;

I note that the only mention of this anywhere was me in march.

clearly something in hadoop trunk has changed in a way which is incompatible.

Has anyone else tried such a build/seen this problem? any suggestions of a fix?

Created SPARK-41392 to cover this...

[INFO] 
[ERROR] Failed to execute goal 
net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile 
(scala-test-compile-first) on project spark-sql_2.12: Execution 
scala-test-compile-first of goal 
net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile failed: A required 
class was missing while executing 
net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: 
org/bouncycastle/jce/provider/BouncyCastleProvider
[


Re: maven build failing in spark sql w/BouncyCastleProvider CNFE

2022-12-06 Thread Yang,Jie(INF)
I think we can try scala-maven-plugin 4.8.0

发件人: Steve Loughran 
日期: 2022年12月6日 星期二 18:19
收件人: "Yang,Jie(INF)" 
抄送: Hyukjin Kwon , Apache Spark Dev 
主题: Re: maven build failing in spark sql w/BouncyCastleProvider CNFE



On Tue, 6 Dec 2022 at 04:10, Yang,Jie(INF) 
mailto:yangji...@baidu.com>> wrote:
Steve, did compile failed happen when mvn build Spark master with hadoop 
3.4.0-SNAPSHOT?

yes. doesn't happen with
* branch-3.3 snapshot (3.3.9-SNAPSHOT)
* branch-3.3.5 RC0 "pre-rc" in asf staging.

maybe trying the 4.8.0 plugin would be worth trying...not something i'll do 
this week as i'm really trying to get the RC0 out rather than anything else


发件人: Hyukjin Kwon mailto:gurwls...@gmail.com>>
日期: 2022年12月6日 星期二 10:27
抄送: Apache Spark Dev mailto:dev@spark.apache.org>>
主题: Re: maven build failing in spark sql w/BouncyCastleProvider CNFE

Steve, does the lower version of scala plugin work for you? If that solves, we 
could temporary downgrade for now.

On Mon, 5 Dec 2022 at 22:23, Steve Loughran  wrote:
 trying to build spark master w/ hadoop trunk and the maven sbt plugin is 
failing. This doesn't happen with the 3.3.5 RC0;

I note that the only mention of this anywhere was me in march.

clearly something in hadoop trunk has changed in a way which is incompatible.

Has anyone else tried such a build/seen this problem? any suggestions of a fix?

Created SPARK-41392 to cover this...

[INFO] 
[ERROR] Failed to execute goal 
net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile 
(scala-test-compile-first) on project spark-sql_2.12: Execution 
scala-test-compile-first of goal 
net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile failed: A required 
class was missing while executing 
net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: 
org/bouncycastle/jce/provider/BouncyCastleProvider
[


Re: maven build failing in spark sql w/BouncyCastleProvider CNFE

2022-12-05 Thread Yang,Jie(INF)
Steve, did compile failed happen when mvn build Spark master with hadoop 
3.4.0-SNAPSHOT?

发件人: Hyukjin Kwon 
日期: 2022年12月6日 星期二 10:27
抄送: Apache Spark Dev 
主题: Re: maven build failing in spark sql w/BouncyCastleProvider CNFE

Steve, does the lower version of scala plugin work for you? If that solves, we 
could temporary downgrade for now.

On Mon, 5 Dec 2022 at 22:23, Steve Loughran  wrote:
 trying to build spark master w/ hadoop trunk and the maven sbt plugin is 
failing. This doesn't happen with the 3.3.5 RC0;

I note that the only mention of this anywhere was me in march.

clearly something in hadoop trunk has changed in a way which is incompatible.

Has anyone else tried such a build/seen this problem? any suggestions of a fix?

Created SPARK-41392 to cover this...

[INFO] 
[ERROR] Failed to execute goal 
net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile 
(scala-test-compile-first) on project spark-sql_2.12: Execution 
scala-test-compile-first of goal 
net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile failed: A required 
class was missing while executing 
net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: 
org/bouncycastle/jce/provider/BouncyCastleProvider
[ERROR] -
[ERROR] realm =plugin>net.alchim31.maven:scala-maven-plugin:4.7.2
[ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
[ERROR] urls[0] = 
file:/Users/stevel/.m2/repository/net/alchim31/maven/scala-maven-plugin/4.7.2/scala-maven-plugin-4.7.2.jar
[ERROR] urls[1] = 
file:/Users/stevel/.m2/repository/org/apache/maven/shared/maven-dependency-tree/3.2.0/maven-dependency-tree-3.2.0.jar
[ERROR] urls[2] = 
file:/Users/stevel/.m2/repository/org/eclipse/aether/aether-util/1.0.0.v20140518/aether-util-1.0.0.v20140518.jar
[ERROR] urls[3] = 
file:/Users/stevel/.m2/repository/org/apache/maven/reporting/maven-reporting-api/3.1.1/maven-reporting-api-3.1.1.jar
[ERROR] urls[4] = 
file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-sink-api/1.11.1/doxia-sink-api-1.11.1.jar
[ERROR] urls[5] = 
file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-logging-api/1.11.1/doxia-logging-api-1.11.1.jar
[ERROR] urls[6] = 
file:/Users/stevel/.m2/repository/org/apache/maven/maven-archiver/3.6.0/maven-archiver-3.6.0.jar
[ERROR] urls[7] = 
file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-io/3.4.0/plexus-io-3.4.0.jar
[ERROR] urls[8] = 
file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.26/plexus-interpolation-1.26.jar
[ERROR] urls[9] = 
file:/Users/stevel/.m2/repository/org/apache/commons/commons-exec/1.3/commons-exec-1.3.jar
[ERROR] urls[10] = 
file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-utils/3.4.2/plexus-utils-3.4.2.jar
[ERROR] urls[11] = 
file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-archiver/4.5.0/plexus-archiver-4.5.0.jar
[ERROR] urls[12] = 
file:/Users/stevel/.m2/repository/commons-io/commons-io/2.11.0/commons-io-2.11.0.jar
[ERROR] urls[13] = 
file:/Users/stevel/.m2/repository/org/apache/commons/commons-compress/1.21/commons-compress-1.21.jar
[ERROR] urls[14] = 
file:/Users/stevel/.m2/repository/org/iq80/snappy/snappy/0.4/snappy-0.4.jar
[ERROR] urls[15] = 
file:/Users/stevel/.m2/repository/org/tukaani/xz/1.9/xz-1.9.jar
[ERROR] urls[16] = 
file:/Users/stevel/.m2/repository/com/github/luben/zstd-jni/1.5.2-4/zstd-jni-1.5.2-4.jar
[ERROR] urls[17] = 
file:/Users/stevel/.m2/repository/org/scala-sbt/zinc_2.13/1.7.1/zinc_2.13-1.7.1.jar
[ERROR] urls[18] = 
file:/Users/stevel/.m2/repository/org/scala-lang/scala-library/2.13.8/scala-library-2.13.8.jar
[ERROR] urls[19] = 
file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-core_2.13/1.7.1/zinc-core_2.13-1.7.1.jar
[ERROR] urls[20] = 
file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-apiinfo_2.13/1.7.1/zinc-apiinfo_2.13-1.7.1.jar
[ERROR] urls[21] = 
file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-bridge_2.13/1.7.1/compiler-bridge_2.13-1.7.1.jar
[ERROR] urls[22] = 
file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-classpath_2.13/1.7.1/zinc-classpath_2.13-1.7.1.jar
[ERROR] urls[23] = 
file:/Users/stevel/.m2/repository/org/scala-lang/scala-compiler/2.13.8/scala-compiler-2.13.8.jar
[ERROR] urls[24] = 
file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-interface/1.7.1/compiler-interface-1.7.1.jar
[ERROR] urls[25] = 
file:/Users/stevel/.m2/repository/org/scala-sbt/util-interface/1.7.0/util-interface-1.7.0.jar
[ERROR] urls[26] = 
file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-persist-core-assembly/1.7.1/zinc-persist-core-assembly-1.7.1.jar
[ERROR] urls[27] = 
file:/Users/stevel/.m2/repository/org/scala-lang/modules/scala-parallel-collections_2.13/0.2.0/scala-parallel-collections_2.13-0.2.0.jar
[ERROR] urls[28] = 
file:/Users/stevel/.m2/repository/org/scala-sbt/io_2.13/1.7.0/io_2.13-1.7.0.jar
[ERROR] urls[29] = 

Re: [ANNOUNCE] Apache Spark 3.2.3 released

2022-11-30 Thread Yang,Jie(INF)
Thanks, Chao!

发件人: Maxim Gekk 
日期: 2022年11月30日 星期三 19:40
收件人: Jungtaek Lim 
抄送: Wenchen Fan , Chao Sun , dev 
, user 
主题: Re: [ANNOUNCE] Apache Spark 3.2.3 released

Thank you, Chao!

On Wed, Nov 30, 2022 at 12:42 PM Jungtaek Lim 
mailto:kabhwan.opensou...@gmail.com>> wrote:
Thanks Chao for driving the release!

On Wed, Nov 30, 2022 at 6:03 PM Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
Thanks, Chao!

On Wed, Nov 30, 2022 at 1:33 AM Chao Sun 
mailto:sunc...@apache.org>> wrote:
We are happy to announce the availability of Apache Spark 3.2.3!

Spark 3.2.3 is a maintenance release containing stability fixes. This
release is based on the branch-3.2 maintenance branch of Spark. We strongly
recommend all 3.2 users to upgrade to this stable release.

To download Spark 3.2.3, head over to the download page:
https://spark.apache.org/downloads.html

To view the release notes:
https://spark.apache.org/releases/spark-release-3-2-3.html

We would like to acknowledge all community members for contributing to this
release. This release would not have been possible without you.

Chao

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org


Re: [VOTE] Release Spark 3.2.3 (RC1)

2022-11-16 Thread Yang,Jie(INF)
+1,non-binding

The test combination of Java 11 + Scala 2.12 and Java 11 + Scala 2.13 has 
passed.

Yang Jie


发件人: Chris Nauroth 
日期: 2022年11月17日 星期四 04:27
收件人: Yuming Wang 
抄送: "Yang,Jie(INF)" , Dongjoon Hyun 
, huaxin gao , "L. C. Hsieh" 
, Chao Sun , dev 
主题: Re: [VOTE] Release Spark 3.2.3 (RC1)

+1 (non-binding)

* Verified all checksums.
* Verified all signatures.
* Built from source, with multiple profiles, to full success, for Java 11 and 
Scala 2.12:
* build/mvn -Phadoop-3.2 -Phadoop-cloud -Phive-2.3 -Phive-thriftserver 
-Pkubernetes -Pscala-2.12 -Psparkr -Pyarn -DskipTests clean package
* Tests passed.
* Ran several examples successfully:
* bin/spark-submit --class org.apache.spark.examples.SparkPi 
examples/jars/spark-examples_2.12-3.2.3.jar
* bin/spark-submit --class 
org.apache.spark.examples.sql.hive.SparkHiveExample 
examples/jars/spark-examples_2.12-3.2.3.jar
* bin/spark-submit examples/src/main/python/streaming/network_wordcount.py 
localhost 

Chao, thank you for preparing the release.

Chris Nauroth


On Wed, Nov 16, 2022 at 5:22 AM Yuming Wang 
mailto:wgy...@gmail.com>> wrote:
+1

On Wed, Nov 16, 2022 at 2:28 PM Yang,Jie(INF) 
mailto:yangji...@baidu.com>> wrote:
I switched Scala 2.13 to Scala 2.12 today. The test is still in progress and it 
has not been hung.

Yang Jie

发件人: Dongjoon Hyun mailto:dongjoon.h...@gmail.com>>
日期: 2022年11月16日 星期三 01:17
收件人: "Yang,Jie(INF)" mailto:yangji...@baidu.com>>
抄送: huaxin gao mailto:huaxin.ga...@gmail.com>>, "L. C. 
Hsieh" mailto:vii...@gmail.com>>, Chao Sun 
mailto:sunc...@apache.org>>, dev 
mailto:dev@spark.apache.org>>
主题: Re: [VOTE] Release Spark 3.2.3 (RC1)

Did you hit that in Scala 2.12, too?

Dongjoon.

On Tue, Nov 15, 2022 at 4:36 AM Yang,Jie(INF) 
mailto:yangji...@baidu.com>> wrote:
Hi, all

I test v3.2.3 with following command:

```

dev/change-scala-version.sh 2.13
build/mvn clean install -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl 
-Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive  -Pscala-2.13 -fn
```

The testing environment is:

OS: CentOS 6u3 Final
Java: zulu 11.0.17
Python: 3.9.7
Scala: 2.13

The above test command has been executed twice, and all times hang in the 
following stack:

```
"ScalaTest-main-running-JoinSuite" #1 prio=5 os_prio=0 cpu=312870.06ms 
elapsed=1552.65s tid=0x7f2ddc02d000 nid=0x7132 waiting on condition  
[0x7f2de3929000]
   java.lang.Thread.State: WAITING (parking)
   at jdk.internal.misc.Unsafe.park(java.base@11.0.17/Native Method)
   - parking to wait for  <0x000790d00050> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
   at 
java.util.concurrent.locks.LockSupport.park(java.base@11.0.17/LockSupport.java:194)
   at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.17/AbstractQueuedSynchronizer.java:2081)
   at 
java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.17/LinkedBlockingQueue.java:433)
   at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$1(AdaptiveSparkPlanExec.scala:275)
   at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$$Lambda$9429/0x000802269840.apply(Unknown
 Source)
   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.getFinalPhysicalPlan(AdaptiveSparkPlanExec.scala:228)
   - locked <0x000790d00208> (a java.lang.Object)
   at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:370)
   at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.doExecute(AdaptiveSparkPlanExec.scala:355)
   at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
   at 
org.apache.spark.sql.execution.SparkPlan$$Lambda$8573/0x000801f99c40.apply(Unknown
 Source)
   at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
   at 
org.apache.spark.sql.execution.SparkPlan$$Lambda$8574/0x000801f9a040.apply(Unknown
 Source)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
   at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:172)
   - locked <0x000790d00218> (a 
org.apache.spark.sql.execution.QueryExecution)
   at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:171)
   at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3247)
   - locked <0x000790d002d8> (a org.apache.spark.sql.Dataset)
   at org.apa

Re: [VOTE][SPIP] Better Spark UI scalability and Driver stability for large applications

2022-11-16 Thread Yang,Jie(INF)
+1, non-binding

Yang Jie

发件人: Mridul Muralidharan 
日期: 2022年11月16日 星期三 17:35
收件人: Kent Yao 
抄送: Gengliang Wang , dev 
主题: Re: [VOTE][SPIP] Better Spark UI scalability and Driver stability for large 
applications


+1

Would be great to see history server performance improvements and lower 
resource utilization at driver !

Regards,
Mridul

On Wed, Nov 16, 2022 at 2:38 AM Kent Yao 
mailto:y...@apache.org>> wrote:
+1, non-binding

Gengliang Wang mailto:ltn...@gmail.com>> 于2022年11月16日周三 
16:36写道:
>
> Hi all,
>
> I’d like to start a vote for SPIP: "Better Spark UI scalability and Driver 
> stability for large applications"
>
> The goal of the SPIP is to improve the Driver's stability by supporting 
> storing Spark's UI data on RocksDB. Furthermore, to fasten the read and write 
> operations on RocksDB, it introduces a new Protobuf serializer.
>
> Please also refer to the following:
>
> Previous discussion in the dev mailing list: [DISCUSS] SPIP: Better Spark UI 
> scalability and Driver stability for large applications
> Design Doc: Better Spark UI scalability and Driver stability for large 
> applications
> JIRA: SPARK-41053
>
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Kind Regards,
> Gengliang

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org


Re: [VOTE] Release Spark 3.2.3 (RC1)

2022-11-15 Thread Yang,Jie(INF)
I switched Scala 2.13 to Scala 2.12 today. The test is still in progress and it 
has not been hung.

Yang Jie

发件人: Dongjoon Hyun 
日期: 2022年11月16日 星期三 01:17
收件人: "Yang,Jie(INF)" 
抄送: huaxin gao , "L. C. Hsieh" , Chao 
Sun , dev 
主题: Re: [VOTE] Release Spark 3.2.3 (RC1)

Did you hit that in Scala 2.12, too?

Dongjoon.

On Tue, Nov 15, 2022 at 4:36 AM Yang,Jie(INF) 
mailto:yangji...@baidu.com>> wrote:
Hi, all

I test v3.2.3 with following command:

```

dev/change-scala-version.sh 2.13
build/mvn clean install -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl 
-Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive  -Pscala-2.13 -fn
```

The testing environment is:

OS: CentOS 6u3 Final
Java: zulu 11.0.17
Python: 3.9.7
Scala: 2.13

The above test command has been executed twice, and all times hang in the 
following stack:

```
"ScalaTest-main-running-JoinSuite" #1 prio=5 os_prio=0 cpu=312870.06ms 
elapsed=1552.65s tid=0x7f2ddc02d000 nid=0x7132 waiting on condition  
[0x7f2de3929000]
   java.lang.Thread.State: WAITING (parking)
   at jdk.internal.misc.Unsafe.park(java.base@11.0.17/Native Method)
   - parking to wait for  <0x000790d00050> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
   at 
java.util.concurrent.locks.LockSupport.park(java.base@11.0.17/LockSupport.java:194)
   at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.17/AbstractQueuedSynchronizer.java:2081)
   at 
java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.17/LinkedBlockingQueue.java:433)
   at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$1(AdaptiveSparkPlanExec.scala:275)
   at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$$Lambda$9429/0x000802269840.apply(Unknown
 Source)
   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.getFinalPhysicalPlan(AdaptiveSparkPlanExec.scala:228)
   - locked <0x000790d00208> (a java.lang.Object)
   at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:370)
   at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.doExecute(AdaptiveSparkPlanExec.scala:355)
   at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
   at 
org.apache.spark.sql.execution.SparkPlan$$Lambda$8573/0x000801f99c40.apply(Unknown
 Source)
   at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
   at 
org.apache.spark.sql.execution.SparkPlan$$Lambda$8574/0x000801f9a040.apply(Unknown
 Source)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
   at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:172)
   - locked <0x000790d00218> (a 
org.apache.spark.sql.execution.QueryExecution)
   at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:171)
   at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3247)
   - locked <0x000790d002d8> (a org.apache.spark.sql.Dataset)
   at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3245)
   at 
org.apache.spark.sql.QueryTest$.$anonfun$getErrorMessageInCheckAnswer$1(QueryTest.scala:265)
   at 
org.apache.spark.sql.QueryTest$$$Lambda$8564/0x000801f94440.apply$mcJ$sp(Unknown
 Source)
   at 
scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
   at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
   at 
org.apache.spark.sql.QueryTest$.getErrorMessageInCheckAnswer(QueryTest.scala:265)
   at org.apache.spark.sql.QueryTest$.checkAnswer(QueryTest.scala:242)
   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:151)
   at org.apache.spark.sql.JoinSuite.checkAnswer(JoinSuite.scala:58)
   at org.apache.spark.sql.JoinSuite.$anonfun$new$138(JoinSuite.scala:1062)
   at 
org.apache.spark.sql.JoinSuite$$Lambda$2827/0x0008013d5840.apply$mcV$sp(Unknown
 Source)
   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
   at org.scalatest.Transformer.apply(Transformer.scala:22)
   at org.scalatest.Transformer.apply(Transformer.scala:20)
   at 
org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
   at org.ap

Re: [VOTE] Release Spark 3.2.3 (RC1)

2022-11-15 Thread Yang,Jie(INF)
Hi, all

I test v3.2.3 with following command:

```

dev/change-scala-version.sh 2.13
build/mvn clean install -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl 
-Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive  -Pscala-2.13 -fn
```

The testing environment is:

OS: CentOS 6u3 Final
Java: zulu 11.0.17
Python: 3.9.7
Scala: 2.13

The above test command has been executed twice, and all times hang in the 
following stack:

```
"ScalaTest-main-running-JoinSuite" #1 prio=5 os_prio=0 cpu=312870.06ms 
elapsed=1552.65s tid=0x7f2ddc02d000 nid=0x7132 waiting on condition  
[0x7f2de3929000]
   java.lang.Thread.State: WAITING (parking)
   at jdk.internal.misc.Unsafe.park(java.base@11.0.17/Native Method)
   - parking to wait for  <0x000790d00050> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
   at 
java.util.concurrent.locks.LockSupport.park(java.base@11.0.17/LockSupport.java:194)
   at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.17/AbstractQueuedSynchronizer.java:2081)
   at 
java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.17/LinkedBlockingQueue.java:433)
   at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$1(AdaptiveSparkPlanExec.scala:275)
   at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$$Lambda$9429/0x000802269840.apply(Unknown
 Source)
   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.getFinalPhysicalPlan(AdaptiveSparkPlanExec.scala:228)
   - locked <0x000790d00208> (a java.lang.Object)
   at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:370)
   at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.doExecute(AdaptiveSparkPlanExec.scala:355)
   at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
   at 
org.apache.spark.sql.execution.SparkPlan$$Lambda$8573/0x000801f99c40.apply(Unknown
 Source)
   at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
   at 
org.apache.spark.sql.execution.SparkPlan$$Lambda$8574/0x000801f9a040.apply(Unknown
 Source)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
   at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:172)
   - locked <0x000790d00218> (a 
org.apache.spark.sql.execution.QueryExecution)
   at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:171)
   at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3247)
   - locked <0x000790d002d8> (a org.apache.spark.sql.Dataset)
   at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3245)
   at 
org.apache.spark.sql.QueryTest$.$anonfun$getErrorMessageInCheckAnswer$1(QueryTest.scala:265)
   at 
org.apache.spark.sql.QueryTest$$$Lambda$8564/0x000801f94440.apply$mcJ$sp(Unknown
 Source)
   at 
scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
   at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
   at 
org.apache.spark.sql.QueryTest$.getErrorMessageInCheckAnswer(QueryTest.scala:265)
   at org.apache.spark.sql.QueryTest$.checkAnswer(QueryTest.scala:242)
   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:151)
   at org.apache.spark.sql.JoinSuite.checkAnswer(JoinSuite.scala:58)
   at org.apache.spark.sql.JoinSuite.$anonfun$new$138(JoinSuite.scala:1062)
   at 
org.apache.spark.sql.JoinSuite$$Lambda$2827/0x0008013d5840.apply$mcV$sp(Unknown
 Source)
   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
   at org.scalatest.Transformer.apply(Transformer.scala:22)
   at org.scalatest.Transformer.apply(Transformer.scala:20)
   at 
org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190)
   at 
org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
   at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
   at 
org.scalatest.funsuite.AnyFunSuiteLike$$Lambda$8386/0x000801f0a840.apply(Unknown
 Source)
   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
   at 

Re: Upgrade guava to 31.1-jre and remove hadoop2

2022-11-06 Thread Yang,Jie(INF)
Hi, Bjørn

Currently, I don't think we can consider upgrading Guava for the following 
reasons:

1. Although Spark 3.4.0 will no longer release the hadoop-2 distribution, but 
the build and testing process is still running. We need to keep it and will not 
consider upgrading Guava until it is really removed

2. In my impression, Hive 2.3 still dependency on the Guava 14.0.1, Someone has 
tried to solve this problem before, but it was not completed, and this is 
another issue we need to solve before upgrading Guava


YangJie


发件人: Bjørn Jørgensen 
日期: 2022年11月6日 星期日 22:17
收件人: dev 
主题: Upgrade guava to 31.1-jre and remove hadoop2

Hi, anyone that has tried to upgrade guava now after we stop supporting hadoop2?
And is there a plan for removing hadoop2 code from the code base?



Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Yang,Jie(INF)
Thanks Yuming and all developers ~

Yang Jie

发件人: Maxim Gekk 
日期: 2022年10月26日 星期三 15:19
收件人: Hyukjin Kwon 
抄送: "L. C. Hsieh" , Dongjoon Hyun , 
Yuming Wang , dev , User 

主题: Re: [ANNOUNCE] Apache Spark 3.3.1 released

Congratulations everyone with the new release, and thanks to Yuming for his 
efforts.

Maxim Gekk

Software Engineer

Databricks, Inc.


On Wed, Oct 26, 2022 at 10:14 AM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
Thanks, Yuming.

On Wed, 26 Oct 2022 at 16:01, L. C. Hsieh 
mailto:vii...@gmail.com>> wrote:
Thank you for driving the release of Apache Spark 3.3.1, Yuming!

On Tue, Oct 25, 2022 at 11:38 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
>
> It's great. Thank you so much, Yuming!
>
> Dongjoon
>
> On Tue, Oct 25, 2022 at 11:23 PM Yuming Wang 
> mailto:wgy...@gmail.com>> wrote:
>>
>> We are happy to announce the availability of Apache Spark 3.3.1!
>>
>> Spark 3.3.1 is a maintenance release containing stability fixes. This
>> release is based on the branch-3.3 maintenance branch of Spark. We strongly
>> recommend all 3.3 users to upgrade to this stable release.
>>
>> To download Spark 3.3.1, head over to the download page:
>> https://spark.apache.org/downloads.html
>>
>> To view the release notes:
>> https://spark.apache.org/releases/spark-release-3-3-1.html
>>
>> We would like to acknowledge all community members for contributing to this
>> release. This release would not have been possible without you.
>>
>>

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org


Re: Apache Spark 3.2.3 Release?

2022-10-18 Thread Yang,Jie(INF)
+1

发件人: vaquar khan 
日期: 2022年10月19日 星期三 10:08
收件人: "416161...@qq.com" 
抄送: Yuming Wang , kazuyuki tanimura 
, Gengliang Wang , huaxin gao 
, Dongjoon Hyun , Sean Owen 
, Chao Sun , dev 
主题: Re: Apache Spark 3.2.3 Release?

+1

On Tue, Oct 18, 2022, 8:58 PM 416161...@qq.com 
mailto:ruife...@foxmail.com>> wrote:
+1


[图像已被发件人删除。]

Ruifeng Zheng
ruife...@foxmail.com




-- Original --
From: "Yuming Wang" mailto:wgy...@gmail.com>>;
Date: Wed, Oct 19, 2022 09:35 AM
To: "kazuyuki tanimura";
Cc: "Gengliang Wang"mailto:ltn...@gmail.com>>;"huaxin 
gao"mailto:huaxin.ga...@gmail.com>>;"Dongjoon 
Hyun"mailto:dongjoon.h...@gmail.com>>;"Sean 
Owen"mailto:sro...@gmail.com>>;"Chao 
Sun"mailto:sunc...@apache.org>>;"dev"mailto:dev@spark.apache.org>>;
Subject: Re: Apache Spark 3.2.3 Release?

+1

On Wed, Oct 19, 2022 at 4:17 AM kazuyuki tanimura  
wrote:
+1 Thanks Chao!


Kazu

On Oct 18, 2022, at 11:48 AM, Gengliang Wang 
mailto:ltn...@gmail.com>> wrote:

+1. Thanks Chao!

On Tue, Oct 18, 2022 at 11:45 AM huaxin gao 
mailto:huaxin.ga...@gmail.com>> wrote:
+1 Thanks Chao!

Huaxin

On Tue, Oct 18, 2022 at 11:29 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
+1

Thank you for volunteering, Chao!

Dongjoon.


On Tue, Oct 18, 2022 at 9:55 AM Sean Owen 
mailto:sro...@gmail.com>> wrote:
OK by me, if someone is willing to drive it.

On Tue, Oct 18, 2022 at 11:47 AM Chao Sun 
mailto:sunc...@apache.org>> wrote:
Hi All,

It's been more than 3 months since 3.2.2 (tagged at Jul 11) was
released There are now 66 patches accumulated in branch-3.2, including
2 correctness issues.

Is it a good time to start a new release? If there's no objection, I'd
like to volunteer as the release manager for the 3.2.3 release, and
start preparing the first RC next week.

# Correctness issues

SPARK-39833Filtered parquet data frame count() and show() produce
inconsistent results when spark.sql.parquet.filterPushdown is true
SPARK-40002.   Limit improperly pushed down through window using ntile function

Best,
Chao

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-18 Thread Yang,Jie(INF)
Use maven to test Java 17 + Scala 2.13 and test passed, +1 for me

发件人: Sean Owen 
日期: 2022年10月17日 星期一 21:34
收件人: Yuming Wang 
抄送: dev 
主题: Re: [VOTE] Release Spark 3.3.1 (RC4)

+1 from me, same as last time

On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang 
mailto:wgy...@gmail.com>> wrote:

Please vote on releasing the following candidate as Apache Spark version 3.3.1.

The vote is open until 11:59pm Pacific time October 21th and passes if a 
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.3.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see 
https://spark.apache.org

The tag to be voted on is v3.3.1-rc4 (commit 
fbbcf9434ac070dd4ced4fb9efe32899c6db12a9):
https://github.com/apache/spark/tree/v3.3.1-rc4

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-bin

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1430

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-docs

The list of bug fixes going into 3.3.1 can be found at the following URL:
https://s.apache.org/ttgz6

This release is using the release script of the tag v3.3.1-rc4.


FAQ

==
What happened to v3.3.1-rc3?
==
A performance regression(SPARK-40703) was found after tagging v3.3.1-rc3, which 
the Iceberg community hopes Spark 3.3.1 could fix.
So we skipped the vote on v3.3.1-rc3.

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.3.1?
===
The current list of open tickets targeted at 3.3.1 can be found at:
https://issues.apache.org/jira/projects/SPARK
 and search for "Target Version/s" = 3.3.1

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.




答复: Welcome Yikun Jiang as a Spark committer

2022-10-08 Thread Yang,Jie(INF)
Congratulations Yikun!

Regards,
Yang Jie


发件人: Mridul Muralidharan 
发送时间: 2022年10月8日 14:16:02
收件人: Yuming Wang
抄送: Hyukjin Kwon; dev; Yikun Jiang
主题: Re: Welcome Yikun Jiang as a Spark committer


Congratulations !

Regards,
Mridul

On Sat, Oct 8, 2022 at 12:19 AM Yuming Wang 
mailto:wgy...@gmail.com>> wrote:
Congratulations Yikun!

On Sat, Oct 8, 2022 at 12:40 PM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
Hi all,

The Spark PMC recently added Yikun Jiang as a committer on the project.
Yikun is the major contributor of the infrastructure and GitHub Actions in 
Apache Spark as well as Kubernates and PySpark.
He has put a lot of effort into stabilizing and optimizing the builds so we all 
can work together in Apache Spark more
efficiently and effectively. He's also driving the SPIP for Docker official 
image in Apache Spark as well for users and developers.
Please join me in welcoming Yikun!



Re: Dropping Apache Spark Hadoop2 Binary Distribution?

2022-10-03 Thread Yang,Jie(INF)
Hi, Dongjoon

Our company(Baidu) is still using the combination of Spark 3.3 + Hadoop 2.7.4 
in the production environment. Hadoop 2.7.4 is an internally maintained version 
compiled by Java 8. Although we are using Hadoop 2, I still support this 
proposal because it is positive and exciting.

Regards,
YangJie

发件人: Dongjoon Hyun 
日期: 2022年10月4日 星期二 11:16
收件人: dev 
主题: Dropping Apache Spark Hadoop2 Binary Distribution?

Hi, All.

I'm wondering if the following Apache Spark Hadoop2 Binary Distribution
is still used by someone in the community or not. If it's not used or not 
useful,
we may remove it from Apache Spark 3.4.0 release.


https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.tgz

Here is the background of this question.
Since Apache Spark 2.2.0 (SPARK-19493, SPARK-19550), the Apache
Spark community has been building and releasing with Java 8 only.
I believe that the user applications also use Java8+ in these days.
Recently, I received the following message from the Hadoop PMC.

  > "if you really want to claim hadoop 2.x compatibility, then you have to
  > be building against java 7". Otherwise a lot of people with hadoop 2.x
  > clusters won't be able to run your code. If your projects are java8+
  > only, then they are implicitly hadoop 3.1+, no matter what you use
  > in your build. Hence: no need for branch-2 branches except
  > to complicate your build/test/release processes [1]

If Hadoop2 binary distribution is no longer used as of today,
or incomplete somewhere due to Java 8 building, the following three
existing alternative Hadoop 3 binary distributions could be
the better official solution for old Hadoop 2 clusters.

1) Scala 2.12 and without-hadoop distribution
2) Scala 2.12 and Hadoop 3 distribution
3) Scala 2.13 and Hadoop 3 distribution

In short, is there anyone who is using Apache Spark 3.3.0 Hadoop2 Binary 
distribution?

Dongjoon

[1] 
https://issues.apache.org/jira/browse/ORC-1251?focusedCommentId=17608247=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17608247


Re: [VOTE] SPIP: Support Docker Official Image for Spark

2022-09-21 Thread Yang,Jie(INF)
+1 (non-binding)

Regards,
Yang Jie

发件人: Gengliang Wang 
日期: 2022年9月22日 星期四 12:22
收件人: Xiangrui Meng 
抄送: Kent Yao , Hyukjin Kwon , dev 

主题: Re: [VOTE] SPIP: Support Docker Official Image for Spark

+1

On Wed, Sep 21, 2022 at 7:26 PM Xiangrui Meng 
mailto:men...@gmail.com>> wrote:
+1

On Wed, Sep 21, 2022 at 6:53 PM Kent Yao 
mailto:yaooq...@gmail.com>> wrote:
+1

Kent Yao
@ Data Science Center, Hangzhou Research Institute, NetEase Corp.
a spark enthusiast
kyuubiis
 a unified multi-tenant JDBC interface for large-scale data processing and 
analytics, built on top of Apache 
Spark.
spark-authorizerA
 Spark SQL extension which provides SQL Standard Authorization for Apache 
Spark.
spark-postgres
 A library for reading data from and transferring data to Postgres / Greenplum 
with Spark SQL and DataFrames, 10~100x faster.
itatchiA
 library that brings useful functions from various modern database management 
systems to Apache 
Spark.



 Replied Message 
From

Hyukjin Kwon 

Date

09/22/2022 09:43

To

dev 

Subject

Re: [VOTE] SPIP: Support Docker Official Image for Spark

Starting with my +1.

On Thu, 22 Sept 2022 at 10:41, Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
Hi all,

I would like to start a vote for SPIP: "Support Docker Official Image for Spark"

The goal of the SPIP is to add Docker Official 
Image(DOI)
 to ensure the Spark Docker images
meet the quality standards for Docker images, to provide these Docker images 
for users
who want to use Apache Spark via Docker image.

Please also refer to:

- Previous discussion in dev mailing list: [DISCUSS] SPIP: Support Docker 
Official Image for 
Spark
- SPIP doc: SPIP: Support Docker Official Image for 
Spark
- JIRA: 
SPARK-40513

Please vote on the SPIP for the next 72 hours:

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don’t think this is a good idea because …

- To 
unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org


答复: [DISCUSS] SPIP: Support Docker Official Image for Spark

2022-09-19 Thread Yang,Jie(INF)
+1 (non-binding)



Yang Jie


发件人: Yikun Jiang 
发送时间: 2022年9月19日 14:23:14
收件人: Denny Lee
抄送: bo zhaobo; Yuming Wang; Kent Yao; Gengliang Wang; Hyukjin Kwon; dev; zrf
主题: Re: [DISCUSS] SPIP: Support Docker Official Image for Spark

Thanks for your support!  @all

> Count me in to help as well, eh?! :)

@Denny Sure, It would be great to have your help! I'm going to create a JIRA 
and TASKS if the SPIP vote passes.


On Mon, Sep 19, 2022 at 10:34 AM Denny Lee 
mailto:denny.g@gmail.com>> wrote:
+1 (non-binding).

This is a great idea and we should definitely do this.  Count me in to help as 
well, eh?! :)

On Sun, Sep 18, 2022 at 7:24 PM bo zhaobo 
mailto:bzhaojyathousa...@gmail.com>> wrote:
+1 (non-binding)

This will bring the good experience to customers. So excited about this. ;-)

Yuming Wang mailto:wgy...@gmail.com>> 于2022年9月19日周一 10:18写道:
+1.

On Mon, Sep 19, 2022 at 9:44 AM Kent Yao 
mailto:y...@apache.org>> wrote:
+1

Gengliang Wang mailto:ltn...@gmail.com>> 于2022年9月19日周一 
09:23写道:
>
> +1, thanks for the work!
>
> On Sun, Sep 18, 2022 at 6:20 PM Hyukjin Kwon 
> mailto:gurwls...@gmail.com>> wrote:
>>
>> +1
>>
>> On Mon, 19 Sept 2022 at 09:15, Yikun Jiang 
>> mailto:yikunk...@gmail.com>> wrote:
>>>
>>> Hi, all
>>>
>>>
>>> I would like to start the discussion for supporting Docker Official Image 
>>> for Spark.
>>>
>>>
>>> This SPIP is proposed to add Docker Official Image(DOI) to ensure the Spark 
>>> Docker images meet the quality standards for Docker images, to provide 
>>> these Docker images for users who want to use Apache Spark via Docker image.
>>>
>>>
>>> There are also several Apache projects that release the Docker Official 
>>> Images, such as: flink, storm, solr, zookeeper, httpd (with 50M+ to 1B+ 
>>> download for each). From the huge download statistics, we can see the real 
>>> demands of users, and from the support of other apache projects, we should 
>>> also be able to do it.
>>>
>>>
>>> After support:
>>>
>>> The Dockerfile will still be maintained by the Apache Spark community and 
>>> reviewed by Docker.
>>>
>>> The images will be maintained by the Docker community to ensure the quality 
>>> standards for Docker images of the Docker community.
>>>
>>>
>>> It will also reduce the extra docker images maintenance effort (such as 
>>> frequently rebuilding, image security update) of the Apache Spark community.
>>>
>>>
>>> See more in SPIP DOC: 
>>> https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o
>>>
>>>
>>> cc: Ruifeng (co-author) and Hyukjin (shepherd)
>>>
>>>
>>> Regards,
>>> Yikun

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org



Re: Time for Spark 3.3.1 release?

2022-09-12 Thread Yang,Jie(INF)
+1

Thanks Yuming ~

发件人: Hyukjin Kwon 
日期: 2022年9月13日 星期二 08:19
收件人: Gengliang Wang 
抄送: "L. C. Hsieh" , Dongjoon Hyun , 
Yuming Wang , dev 
主题: Re: Time for Spark 3.3.1 release?

+1

On Tue, 13 Sept 2022 at 06:45, Gengliang Wang 
mailto:ltn...@gmail.com>> wrote:
+1.
Thank you, Yuming!

On Mon, Sep 12, 2022 at 12:10 PM L. C. Hsieh 
mailto:vii...@gmail.com>> wrote:
+1

Thanks Yuming!

On Mon, Sep 12, 2022 at 11:50 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
>
> +1
>
> Thanks,
> Dongjoon.
>
> On Mon, Sep 12, 2022 at 6:38 AM Yuming Wang 
> mailto:wgy...@gmail.com>> wrote:
>>
>> Hi, All.
>>
>>
>>
>> Since Apache Spark 3.3.0 tag creation (Jun 10), new 138 patches including 7 
>> correctness patches arrived at branch-3.3.
>>
>>
>>
>> Shall we make a new release, Apache Spark 3.3.1, as the second release at 
>> branch-3.3? I'd like to volunteer as the release manager for Apache Spark 
>> 3.3.1.
>>
>>
>>
>> All changes:
>>
>> https://github.com/apache/spark/compare/v3.3.0...branch-3.3
>>
>>
>>
>> Correctness issues:
>>
>> SPARK-40149: Propagate metadata columns through Project
>>
>> SPARK-40002: Don't push down limit through window using ntile
>>
>> SPARK-39976: ArrayIntersect should handle null in left expression correctly
>>
>> SPARK-39833: Disable Parquet column index in DSv1 to fix a correctness issue 
>> in the case of overlapping partition and data columns
>>
>> SPARK-39061: Set nullable correctly for Inline output attributes
>>
>> SPARK-39887: RemoveRedundantAliases should keep aliases that make the output 
>> of projection nodes unique
>>
>> SPARK-38614: Don't push down limit through window that's using percent_rank

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org


Re: memory module of yarn container

2022-08-25 Thread Yang,Jie(INF)
Hi, vtygoss

In my memory, the memoryOverhead in Spark 2.3 includes all the memories that 
are not executor onHeap memory, including the memory used by Spark 
offheapMemoryPool(executorOffHeapMemory, this concept also exists in Spark 
2.3),  PySparkWorker,  PipeRDD used,  netty memory pool,  JVM direct memory and 
so on.

In Spark 2.3, the size relationship between memoryOverhead and 
executorOffHeapMemory is not strongly dependent. For example, if the user 
configures executorMemory=1g , executoroffheapmemory=2g, and 
executormemoryoverhead=1g , this does not raise an error at the resource 
request stage and the request memory resource is 3g, but at least 4g is 
required.

In Spark 3.x, executorMemoryOverhead no longer includes the memory used by 
Spark offheapMemoryPool(executorOffHeapMemory), I think this can ensure that 
Spark offheapMemoryPool has enough memory.

Warm regards,
YangJie


发件人: vtygoss 
日期: 2022年8月25日 星期四 20:02
收件人: spark 
主题: memory module of yarn container


Hi, community!



I notice a change about the memory module of yarn container between spark-2.3.0 
and spark-3.2.1 when requesting containers from yarn.



org.apache.spark.deploy.yarn.Client.java # verifyClusterResources



```

spark-2.3.0

val executorMem = executorMemory + executorMemoryOverhead

```



```

spark-3.2.1
val executorMem =
executorMemory + executorOffHeapMemory + executorMemoryOverhead + 
pysparkWorkerMemory
```

And i have these questions:

1. in spark-2.3.0 and spark-3.2.1, what is memoryOverhead and where is it used?
2. what is the difference between memoryOverhead and off-heap memory, native 
memory, direct memory? There is no such concept in apache flink, is it an 
unique concept of spark?
3. in spark-2.3.0, i think that memoryOverhead contains all non-heap memory, 
including off-heap / native / direct. Do i think wrong?

Thanks for your any replies.

Best Regards!




Re: Welcoming three new PMC members

2022-08-10 Thread Yang,Jie(INF)
Congratulations!

Regards,
Yang Jie


发件人: Yuming Wang 
日期: 2022年8月10日 星期三 16:42
收件人: Yikun Jiang 
抄送: dev 
主题: Re: Welcoming three new PMC members

Congratulations!

On Wed, Aug 10, 2022 at 4:35 PM Yikun Jiang 
mailto:yikunk...@gmail.com>> wrote:
Congratulations!

Regards,
Yikun


On Wed, Aug 10, 2022 at 3:19 PM Maciej 
mailto:mszymkiew...@gmail.com>> wrote:
Congratulations!

On 8/10/22 08:14, Yi Wu wrote:
> Congrats everyone!
>
>
>
> On Wed, Aug 10, 2022 at 11:33 AM Yuanjian Li 
> mailto:xyliyuanj...@gmail.com>
> >> wrote:
>
> Congrats everyone!
>
> L. C. Hsieh mailto:vii...@gmail.com> 
> >>于2022年8月9
> 日 周二19:01写道:
>
> Congrats!
>
> On Tue, Aug 9, 2022 at 5:38 PM Chao Sun 
> mailto:sunc...@apache.org>
> >> wrote:
>  >
>  > Congrats everyone!
>  >
>  > On Tue, Aug 9, 2022 at 5:36 PM Dongjoon Hyun
> mailto:dongjoon.h...@gmail.com> 
> >> wrote:
>  > >
>  > > Congrat to all!
>  > >
>  > > Dongjoon.
>  > >
>  > > On Tue, Aug 9, 2022 at 5:13 PM Takuya UESHIN
> mailto:ues...@happy-camper.st> 
> >> wrote:
>  > > >
>  > > > Congratulations!
>  > > >
>  > > > On Tue, Aug 9, 2022 at 4:57 PM Hyukjin Kwon
> mailto:gurwls...@gmail.com> 
> >> wrote:
>  > > >>
>  > > >> Congrats everybody!
>  > > >>
>  > > >> On Wed, 10 Aug 2022 at 05:50, Mridul Muralidharan
> mailto:mri...@gmail.com> 
> >> wrote:
>  > > >>>
>  > > >>>
>  > > >>> Congratulations !
>  > > >>> Great to have you join the PMC !!
>  > > >>>
>  > > >>> Regards,
>  > > >>> Mridul
>  > > >>>
>  > > >>> On Tue, Aug 9, 2022 at 11:57 AM vaquar khan
> mailto:vaquar.k...@gmail.com> 
> >> wrote:
>  > > 
>  > >  Congratulations
>  > > 
>  > >  On Tue, Aug 9, 2022, 11:40 AM Xiao Li
> mailto:gatorsm...@gmail.com> 
> >> wrote:
>  > > >
>  > > > Hi all,
>  > > >
>  > > > The Spark PMC recently voted to add three new PMC
> members. Join me in welcoming them to their new roles!
>  > > >
>  > > > New PMC members: Huaxin Gao, Gengliang Wang and Maxim
> Gekk
>  > > >
>  > > > The Spark PMC
>  > > >
>  > > >
>  > > >
>  > > > --
>  > > > Takuya UESHIN
>  > > >
>  > >
>  > >
> -
>  > > To unsubscribe e-mail: 
> dev-unsubscr...@spark.apache.org
> 
> >
>  > >
>  >
>  >
> -
>  > To unsubscribe e-mail: 
> dev-unsubscr...@spark.apache.org
> 
> >
>  >
>
> -
> To unsubscribe e-mail: 
> dev-unsubscr...@spark.apache.org
> 
> >
>

--
Best regards,
Maciej Szymkiewicz

Web: 
https://zero323.net
PGP: A30CEF0C31A501EC


Re: Welcome Xinrong Meng as a Spark committer

2022-08-09 Thread Yang,Jie(INF)
Congratulations!

Regards,
Yang Jie


发件人: Hyukjin Kwon 
日期: 2022年8月9日 星期二 16:12
收件人: dev 
主题: Welcome Xinrong Meng as a Spark committer

Hi all,

The Spark PMC recently added Xinrong Meng as a committer on the project. 
Xinrong is the major contributor of PySpark especially Pandas API on Spark. She 
has guided a lot of new contributors enthusiastically. Please join me in 
welcoming Xinrong!



Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-12 Thread Yang,Jie(INF)
+1 (non-binding)

Yang Jie


发件人: Dongjoon Hyun 
日期: 2022年7月12日 星期二 16:03
收件人: dev 
抄送: Cheng Su , "Yang,Jie(INF)" , Sean 
Owen 
主题: Re: [VOTE] Release Spark 3.2.2 (RC1)

+1

Dongjoon.

On Mon, Jul 11, 2022 at 11:34 PM Cheng Su 
mailto:scnj...@gmail.com>> wrote:
+1 (non-binding). Built from source, and ran some scala unit tests on M1 mac, 
with OpenJDK 8 and Scala 2.12.

Thanks,
Cheng Su

On Mon, Jul 11, 2022 at 10:31 PM Yang,Jie(INF) 
mailto:yangji...@baidu.com>> wrote:
Does this happen when running all UTs? I ran this suite several times alone 
using OpenJDK(zulu) 8u322-b06 on my Mac, but no similar error occurred

发件人: Sean Owen mailto:sro...@gmail.com>>
日期: 2022年7月12日 星期二 10:45
收件人: Dongjoon Hyun mailto:dongjoon.h...@gmail.com>>
抄送: dev mailto:dev@spark.apache.org>>
主题: Re: [VOTE] Release Spark 3.2.2 (RC1)

Is anyone seeing this error? I'm on OpenJDK 8 on a Mac:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000101ca8ace, pid=11962, tid=0x1603
#
# JRE version: OpenJDK Runtime Environment (8.0_322) (build 
1.8.0_322-bre_2022_02_28_15_01-b00)
# Java VM: OpenJDK 64-Bit Server VM (25.322-b00 mixed mode bsd-amd64 compressed 
oops)
# Problematic frame:
# V  [libjvm.dylib+0x549ace]
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /private/tmp/spark-3.2.2/sql/core/hs_err_pid11962.log
ColumnVectorSuite:
- boolean
- byte
Compiled method (nm)  885897 75403 n 0   sun.misc.Unsafe::putShort 
(native)
 total in heap  [0x000102fdaa10,0x000102fdad48] = 824
 relocation [0x000102fdab38,0x000102fdab78] = 64
 main code  [0x000102fdab80,0x000102fdad48] = 456
Compiled method (nm)  885897 75403 n 0   sun.misc.Unsafe::putShort 
(native)
 total in heap  [0x000102fdaa10,0x000102fdad48] = 824
 relocation [0x000102fdab38,0x000102fdab78] = 64
 main code  [0x000102fdab80,0x000102fdad48] = 456

On Mon, Jul 11, 2022 at 4:58 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 3.2.2.

The vote is open until July 15th 1AM (PST) and passes if a majority +1 PMC 
votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.2
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see 
https://spark.apache.org/<https://mailshield.baidu.com/check?q=iR6md5rYrz%2bpTPJlEXXlR6NN3aGjunZT0DADO3Pcgs0%3d>

The tag to be voted on is v3.2.2-rc1 (commit 
78a5825fe266c0884d2dd18cbca9625fa258d7f7):
https://github.com/apache/spark/tree/v3.2.2-rc1<https://mailshield.baidu.com/check?q=OPpTJN30W6csasZsQLXYMz8wfTlJ5%2bnnfXlDjC2R5gWSaBjEUJ676hoO3EDYrtmu4bXg%2bA%3d%3d>

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-bin/<https://mailshield.baidu.com/check?q=7THGMp87%2fmTNaemtH6JB%2bybJqCLb%2fveO2dtcLu9p2HPsnf1RVQGyT40xy%2bMHCQGAFnTgIkwFZNZWe3IPpkkp9w%3d%3d>

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS<https://mailshield.baidu.com/check?q=E6fHbSXEWw02TTJBpc3bfA9mi7ea0YiWcNHkm%2fDJxwlaWinGnMdaoO1PahHhgj00vKwcbElpuHA%3d>

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1409/<https://mailshield.baidu.com/check?q=e%2bM47bS5NGsG58Ou4yPktxmBrNc30lZeeAJoW55%2bwS5h8wbAq%2b60jNYPaoljh7dFcTEPFkV4q3p7TXoIDnbkVqwGsDSEuCDOG5yoyQ%3d%3d>

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-docs/<https://mailshield.baidu.com/check?q=%2bkgNM0UeXb%2fnKpJNJc%2f4tUYbmUQYZvOZtJxpyTBJNHjRWl1TKNnBQVg1PKnxScg14fmW8rcziXt4unYy4dBdye405us%3d>

The list of bug fixes going into 3.2.2 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12351232<https://mailshield.baidu.com/check?q=8a5luT7J9KShVCNhj3Yr7XaoCuFqEmzbRDNOxnYupRy6bJ1JrvgNJMhgjCNcpQ5%2bWAK85n5wB4YbKJTbiCYPhj7PixU%3d>

This release is using the release script of the tag v3.2.2-rc1.

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward

Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-11 Thread Yang,Jie(INF)
Does this happen when running all UTs? I ran this suite several times alone 
using OpenJDK(zulu) 8u322-b06 on my Mac, but no similar error occurred

发件人: Sean Owen 
日期: 2022年7月12日 星期二 10:45
收件人: Dongjoon Hyun 
抄送: dev 
主题: Re: [VOTE] Release Spark 3.2.2 (RC1)

Is anyone seeing this error? I'm on OpenJDK 8 on a Mac:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000101ca8ace, pid=11962, tid=0x1603
#
# JRE version: OpenJDK Runtime Environment (8.0_322) (build 
1.8.0_322-bre_2022_02_28_15_01-b00)
# Java VM: OpenJDK 64-Bit Server VM (25.322-b00 mixed mode bsd-amd64 compressed 
oops)
# Problematic frame:
# V  [libjvm.dylib+0x549ace]
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /private/tmp/spark-3.2.2/sql/core/hs_err_pid11962.log
ColumnVectorSuite:
- boolean
- byte
Compiled method (nm)  885897 75403 n 0   sun.misc.Unsafe::putShort 
(native)
 total in heap  [0x000102fdaa10,0x000102fdad48] = 824
 relocation [0x000102fdab38,0x000102fdab78] = 64
 main code  [0x000102fdab80,0x000102fdad48] = 456
Compiled method (nm)  885897 75403 n 0   sun.misc.Unsafe::putShort 
(native)
 total in heap  [0x000102fdaa10,0x000102fdad48] = 824
 relocation [0x000102fdab38,0x000102fdab78] = 64
 main code  [0x000102fdab80,0x000102fdad48] = 456

On Mon, Jul 11, 2022 at 4:58 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 3.2.2.

The vote is open until July 15th 1AM (PST) and passes if a majority +1 PMC 
votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.2
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see 
https://spark.apache.org/

The tag to be voted on is v3.2.2-rc1 (commit 
78a5825fe266c0884d2dd18cbca9625fa258d7f7):
https://github.com/apache/spark/tree/v3.2.2-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1409/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-docs/

The list of bug fixes going into 3.2.2 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12351232

This release is using the release script of the tag v3.2.2-rc1.

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.2?
===

The current list of open tickets targeted at 3.2.2 can be found at:
https://issues.apache.org/jira/projects/SPARK
 and search for "Target Version/s" = 3.2.2

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.


Re: Apache Spark 3.2.2 Release?

2022-07-07 Thread Yang,Jie(INF)
+1 (non-binding) Thank you Dongjoon ~

发件人: Ruifeng Zheng 
日期: 2022年7月7日 星期四 16:28
收件人: dev 
主题: Re: Apache Spark 3.2.2 Release?

+1 thank you Dongjoon!


[图像已被发件人删除。]

Ruifeng Zheng
ruife...@foxmail.com




-- Original --
From: "Yikun Jiang" ;
Date: Thu, Jul 7, 2022 04:16 PM
To: "Mridul Muralidharan";
Cc: "Gengliang Wang";"Cheng Su";"Maxim 
Gekk";"Wenchen 
Fan";"Xiao Li";"Xinrong 
Meng";"Yuming 
Wang";"dev";
Subject: Re: Apache Spark 3.2.2 Release?

+1  (non-binding)

Thanks!

Regards,
Yikun


On Thu, Jul 7, 2022 at 1:57 PM Mridul Muralidharan 
mailto:mri...@gmail.com>> wrote:
+1

Thanks for driving this Dongjoon !

Regards,
Mridul

On Thu, Jul 7, 2022 at 12:36 AM Gengliang Wang 
mailto:ltn...@gmail.com>> wrote:
+1.
Thank you, Dongjoon.

On Wed, Jul 6, 2022 at 10:21 PM Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
+1

On Thu, Jul 7, 2022 at 10:41 AM Xinrong Meng 
 wrote:
+1


Thanks!



Xinrong Meng

Software Engineer

Databricks


On Wed, Jul 6, 2022 at 7:25 PM Xiao Li 
mailto:gatorsm...@gmail.com>> wrote:
+1

Xiao

Cheng Su mailto:scnj...@gmail.com>> 于2022年7月6日周三 19:16写道:
+1 (non-binding)

Thanks,
Cheng Su

On Wed, Jul 6, 2022 at 6:01 PM Yuming Wang 
mailto:wgy...@gmail.com>> wrote:
+1

On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk  
wrote:
+1

On Thu, Jul 7, 2022 at 12:26 AM John Zhuge 
mailto:jzh...@apache.org>> wrote:
+1  Thanks for the effort!

On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen 
mailto:bjornjorgen...@gmail.com>> wrote:
+1

ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon 
mailto:gurwls...@gmail.com>>:
Yeah +1

On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Hi, All.

Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches
including 11 correctness patches arrived at branch-3.2.

Shall we make a new release, Apache Spark 3.2.2, as the third release
at 3.2 line? I'd like to volunteer as the release manager for Apache
Spark 3.2.2. I'm thinking about starting the first RC next week.

$ git log --oneline v3.2.1..HEAD | wc -l
 197

# Correctness issues

SPARK-38075 Hive script transform with order by and limit will
return fake rows
SPARK-38204 All state operators are at a risk of inconsistency
between state partitioning and operator partitioning
SPARK-38309 SHS has incorrect percentiles for shuffle read bytes
and shuffle total blocks metrics
SPARK-38320 (flat)MapGroupsWithState can timeout groups which just
received inputs in the same microbatch
SPARK-38614 After Spark update, df.show() shows incorrect
F.percent_rank results
SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset
row whose input is not null
SPARK-38684 Stream-stream outer join has a possible correctness
issue due to weakly read consistent on outer iterators
SPARK-39061 Incorrect results or NPE when using Inline function
against an array of dynamically created structs
SPARK-39107 Silent change in regexp_replace's handling of empty strings
SPARK-39259 Timestamps returned by now() and equivalent functions
are not consistent in subqueries
SPARK-39293 The accumulator of ArrayAggregate should copy the
intermediate result if string, struct, array, or map

Best,
Dongjoon.

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org
--
John Zhuge


Contributor data in github-page no longer updated after May 1

2022-05-11 Thread Yang,Jie(INF)
Hi, teams

The contributors data in the following page seems no longer updated after May 
1,  Can anyone fix it?

https://github.com/apache/spark/graphs/contributors?from=2022-05-01=2022-05-11=c

Warm regards,
YangJie



Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-19 Thread Yang,Jie(INF)
+1

发件人: Gengliang Wang 
日期: 2021年1月19日 星期二 下午3:04
收件人: Jungtaek Lim 
抄送: Yuming Wang , Hyukjin Kwon , dev 

主题: Re: [VOTE] Release Spark 3.1.1 (RC1)

+1 (non-binding)


On Tue, Jan 19, 2021 at 2:05 PM Jungtaek Lim 
mailto:kabhwan.opensou...@gmail.com>> wrote:
+1 (non-binding)

* verified signature and sha for all files (there's a glitch which I'll 
describe in below)
* built source (DISCLAIMER: didn't run tests) and made custom distribution, and 
built a docker image based on the distribution
  - used profiles: kubernetes, hadoop-3.2, hadoop-cloud
* ran some SS PySpark queries (Rate to Kafka, Kafka to Kafka) with Spark on k8s 
(used MinIO - s3 compatible - as checkpoint location)
  - for Kafka reader, tested both approaches: newer (offset via admin client) 
and older (offset via consumer)
* ran simple batch query with magic committer against MinIO storage & dynamic 
volume provisioning (with NFS)
* verified DataStreamReader.table & DataStreamWriter.toTable works in PySpark 
(which also verifies on Scala API as well)
* ran test stateful SS queries and checked the new additions of SS UI (state 
store & watermark information)

A glitch from verifying sha; the file format of sha512 is different between 
source targz and others. My tool succeeded with others and failed with source 
targz, though I confirmed sha itself is the same. Not a blocker but would be 
ideal if we can make it be consistent.

Thanks for driving the release process!

On Tue, Jan 19, 2021 at 2:25 PM Yuming Wang 
mailto:wgy...@gmail.com>> wrote:
+1.

On Tue, Jan 19, 2021 at 7:54 AM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
I forgot to say :). I'll start with my +1.

On Mon, 18 Jan 2021, 21:06 Hyukjin Kwon, 
mailto:gurwls...@gmail.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 3.1.1.

The vote is open until January 22nd 4PM PST and passes if a majority +1 PMC 
votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.1.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see 
http://spark.apache.org/

The tag to be voted on is v3.1.1-rc1 (commit 
53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
https://github.com/apache/spark/tree/v3.1.1-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1364

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/

The list of bug fixes going into 3.1.1 can be found at the following URL:
https://s.apache.org/41kf2

This release is using the release script of the tag v3.1.1-rc1.

FAQ

===
What happened to 3.1.0?
===

There was a technical issue during Apache Spark 3.1.0 preparation, and it was 
discussed and decided to skip 3.1.0.
Please see 
https://spark.apache.org/news/next-official-release-spark-3.1.1.html
 for more details.
=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC via "pip install 
https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz"
and see if anything important breaks.
In the Java/Scala, you can add the staging repository to your projects 
resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up