date:20200929

[spark] branch branch-3.0 updated: [SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f3b80f8  [SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
f3b80f8 is described below

commit f3b80f88324e8a1a76d01d13cfc1fc7082238214
Author: Dongjoon Hyun 
AuthorDate: Tue Sep 29 12:02:45 2020 -0700

[SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

### What changes were proposed in this pull request?

Apache Spark 3.1's default Hadoop profile is `hadoop-3.2`. Instead of 
having a warning documentation, this PR aims to use a consistent and safer 
version of Apache Hadoop file output committer algorithm which is `v1`. This 
will prevent a silent correctness regression during migration from Apache Spark 
2.4/3.0 to Apache Spark 3.1.0. Of course, if there is a user-provided 
configuration, 
`spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2`, that will be 
used still.

### Why are the changes needed?

Apache Spark provides multiple distributions with Hadoop 2.7 and Hadoop 
3.2. `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version` depends on 
the Hadoop version. Apache Hadoop 3.0 switches the default algorithm from `v1` 
to `v2` and now there exists a discussion to remove `v2`. We had better provide 
a consistent default behavior of `v1` across various Spark distributions.

- [MAPREDUCE-7282](https://issues.apache.org/jira/browse/MAPREDUCE-7282) MR 
v2 commit algorithm should be deprecated and not the default

### Does this PR introduce _any_ user-facing change?

Yes. This changes the default behavior. Users can override this conf.

### How was this patch tested?

Manual.

**BEFORE (spark-3.0.1-bin-hadoop3.2)**
```scala
scala> sc.version
res0: String = 3.0.1

scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res1: String = 2
```

**AFTER**
```scala
scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res0: String = 1
```

Closes #29895 from dongjoon-hyun/SPARK-DEFAUT-COMMITTER.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit cc06266ade5a4eb35089501a3b32736624208d4c)
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala   |  3 +++
 docs/configuration.md  | 10 ++
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
index 1180501..6f799a5 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
@@ -462,6 +462,9 @@ private[spark] object SparkHadoopUtil {
 for ((key, value) <- conf.getAll if key.startsWith("spark.hadoop.")) {
   hadoopConf.set(key.substring("spark.hadoop.".length), value)
 }
+if 
(conf.getOption("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version").isEmpty)
 {
+  hadoopConf.set("mapreduce.fileoutputcommitter.algorithm.version", "1")
+}
   }
 
   private def appendSparkHiveConfigs(conf: SparkConf, hadoopConf: 
Configuration): Unit = {
diff --git a/docs/configuration.md b/docs/configuration.md
index 95ff282..36e4f45 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1761,16 +1761,10 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
   
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version
-  Dependent on environment
+  1
   
 The file output committer algorithm version, valid algorithm version 
number: 1 or 2.
-Version 2 may have better performance, but version 1 may handle failures 
better in certain situations,
-as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815.
-The default value depends on the Hadoop version used in an environment:
-1 for Hadoop versions lower than 3.0
-2 for Hadoop versions 3.0 and higher
-It's important to note that this can change back to 1 again in the future 
once https://issues.apache.org/jira/browse/MAPREDUCE-7282;>MAPREDUCE-7282
-is fixed and merged.
+Note that 2 may cause a correctness issue like MAPREDUCE-7282.
   
   2.2.0
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (711d8dd -> cc06266)

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 711d8dd  [SPARK-33018][SQL] Fix estimate statistics issue if child has 
0 bytes
 add cc06266  [SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala   |  3 +++
 docs/configuration.md  | 10 ++
 2 files changed, 5 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f3b80f8  [SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
f3b80f8 is described below

commit f3b80f88324e8a1a76d01d13cfc1fc7082238214
Author: Dongjoon Hyun 
AuthorDate: Tue Sep 29 12:02:45 2020 -0700

[SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

### What changes were proposed in this pull request?

Apache Spark 3.1's default Hadoop profile is `hadoop-3.2`. Instead of 
having a warning documentation, this PR aims to use a consistent and safer 
version of Apache Hadoop file output committer algorithm which is `v1`. This 
will prevent a silent correctness regression during migration from Apache Spark 
2.4/3.0 to Apache Spark 3.1.0. Of course, if there is a user-provided 
configuration, 
`spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2`, that will be 
used still.

### Why are the changes needed?

Apache Spark provides multiple distributions with Hadoop 2.7 and Hadoop 
3.2. `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version` depends on 
the Hadoop version. Apache Hadoop 3.0 switches the default algorithm from `v1` 
to `v2` and now there exists a discussion to remove `v2`. We had better provide 
a consistent default behavior of `v1` across various Spark distributions.

- [MAPREDUCE-7282](https://issues.apache.org/jira/browse/MAPREDUCE-7282) MR 
v2 commit algorithm should be deprecated and not the default

### Does this PR introduce _any_ user-facing change?

Yes. This changes the default behavior. Users can override this conf.

### How was this patch tested?

Manual.

**BEFORE (spark-3.0.1-bin-hadoop3.2)**
```scala
scala> sc.version
res0: String = 3.0.1

scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res1: String = 2
```

**AFTER**
```scala
scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res0: String = 1
```

Closes #29895 from dongjoon-hyun/SPARK-DEFAUT-COMMITTER.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit cc06266ade5a4eb35089501a3b32736624208d4c)
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala   |  3 +++
 docs/configuration.md  | 10 ++
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
index 1180501..6f799a5 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
@@ -462,6 +462,9 @@ private[spark] object SparkHadoopUtil {
 for ((key, value) <- conf.getAll if key.startsWith("spark.hadoop.")) {
   hadoopConf.set(key.substring("spark.hadoop.".length), value)
 }
+if 
(conf.getOption("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version").isEmpty)
 {
+  hadoopConf.set("mapreduce.fileoutputcommitter.algorithm.version", "1")
+}
   }
 
   private def appendSparkHiveConfigs(conf: SparkConf, hadoopConf: 
Configuration): Unit = {
diff --git a/docs/configuration.md b/docs/configuration.md
index 95ff282..36e4f45 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1761,16 +1761,10 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
   
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version
-  Dependent on environment
+  1
   
 The file output committer algorithm version, valid algorithm version 
number: 1 or 2.
-Version 2 may have better performance, but version 1 may handle failures 
better in certain situations,
-as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815.
-The default value depends on the Hadoop version used in an environment:
-1 for Hadoop versions lower than 3.0
-2 for Hadoop versions 3.0 and higher
-It's important to note that this can change back to 1 again in the future 
once https://issues.apache.org/jira/browse/MAPREDUCE-7282;>MAPREDUCE-7282
-is fixed and merged.
+Note that 2 may cause a correctness issue like MAPREDUCE-7282.
   
   2.2.0
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (711d8dd -> cc06266)

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 711d8dd  [SPARK-33018][SQL] Fix estimate statistics issue if child has 
0 bytes
 add cc06266  [SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala   |  3 +++
 docs/configuration.md  | 10 ++
 2 files changed, 5 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f3b80f8  [SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
f3b80f8 is described below

commit f3b80f88324e8a1a76d01d13cfc1fc7082238214
Author: Dongjoon Hyun 
AuthorDate: Tue Sep 29 12:02:45 2020 -0700

[SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

### What changes were proposed in this pull request?

Apache Spark 3.1's default Hadoop profile is `hadoop-3.2`. Instead of 
having a warning documentation, this PR aims to use a consistent and safer 
version of Apache Hadoop file output committer algorithm which is `v1`. This 
will prevent a silent correctness regression during migration from Apache Spark 
2.4/3.0 to Apache Spark 3.1.0. Of course, if there is a user-provided 
configuration, 
`spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2`, that will be 
used still.

### Why are the changes needed?

Apache Spark provides multiple distributions with Hadoop 2.7 and Hadoop 
3.2. `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version` depends on 
the Hadoop version. Apache Hadoop 3.0 switches the default algorithm from `v1` 
to `v2` and now there exists a discussion to remove `v2`. We had better provide 
a consistent default behavior of `v1` across various Spark distributions.

- [MAPREDUCE-7282](https://issues.apache.org/jira/browse/MAPREDUCE-7282) MR 
v2 commit algorithm should be deprecated and not the default

### Does this PR introduce _any_ user-facing change?

Yes. This changes the default behavior. Users can override this conf.

### How was this patch tested?

Manual.

**BEFORE (spark-3.0.1-bin-hadoop3.2)**
```scala
scala> sc.version
res0: String = 3.0.1

scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res1: String = 2
```

**AFTER**
```scala
scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res0: String = 1
```

Closes #29895 from dongjoon-hyun/SPARK-DEFAUT-COMMITTER.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit cc06266ade5a4eb35089501a3b32736624208d4c)
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala   |  3 +++
 docs/configuration.md  | 10 ++
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
index 1180501..6f799a5 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
@@ -462,6 +462,9 @@ private[spark] object SparkHadoopUtil {
 for ((key, value) <- conf.getAll if key.startsWith("spark.hadoop.")) {
   hadoopConf.set(key.substring("spark.hadoop.".length), value)
 }
+if 
(conf.getOption("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version").isEmpty)
 {
+  hadoopConf.set("mapreduce.fileoutputcommitter.algorithm.version", "1")
+}
   }
 
   private def appendSparkHiveConfigs(conf: SparkConf, hadoopConf: 
Configuration): Unit = {
diff --git a/docs/configuration.md b/docs/configuration.md
index 95ff282..36e4f45 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1761,16 +1761,10 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
   
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version
-  Dependent on environment
+  1
   
 The file output committer algorithm version, valid algorithm version 
number: 1 or 2.
-Version 2 may have better performance, but version 1 may handle failures 
better in certain situations,
-as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815.
-The default value depends on the Hadoop version used in an environment:
-1 for Hadoop versions lower than 3.0
-2 for Hadoop versions 3.0 and higher
-It's important to note that this can change back to 1 again in the future 
once https://issues.apache.org/jira/browse/MAPREDUCE-7282;>MAPREDUCE-7282
-is fixed and merged.
+Note that 2 may cause a correctness issue like MAPREDUCE-7282.
   
   2.2.0
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (711d8dd -> cc06266)

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 711d8dd  [SPARK-33018][SQL] Fix estimate statistics issue if child has 
0 bytes
 add cc06266  [SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala   |  3 +++
 docs/configuration.md  | 10 ++
 2 files changed, 5 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f3b80f8  [SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
f3b80f8 is described below

commit f3b80f88324e8a1a76d01d13cfc1fc7082238214
Author: Dongjoon Hyun 
AuthorDate: Tue Sep 29 12:02:45 2020 -0700

[SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

### What changes were proposed in this pull request?

Apache Spark 3.1's default Hadoop profile is `hadoop-3.2`. Instead of 
having a warning documentation, this PR aims to use a consistent and safer 
version of Apache Hadoop file output committer algorithm which is `v1`. This 
will prevent a silent correctness regression during migration from Apache Spark 
2.4/3.0 to Apache Spark 3.1.0. Of course, if there is a user-provided 
configuration, 
`spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2`, that will be 
used still.

### Why are the changes needed?

Apache Spark provides multiple distributions with Hadoop 2.7 and Hadoop 
3.2. `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version` depends on 
the Hadoop version. Apache Hadoop 3.0 switches the default algorithm from `v1` 
to `v2` and now there exists a discussion to remove `v2`. We had better provide 
a consistent default behavior of `v1` across various Spark distributions.

- [MAPREDUCE-7282](https://issues.apache.org/jira/browse/MAPREDUCE-7282) MR 
v2 commit algorithm should be deprecated and not the default

### Does this PR introduce _any_ user-facing change?

Yes. This changes the default behavior. Users can override this conf.

### How was this patch tested?

Manual.

**BEFORE (spark-3.0.1-bin-hadoop3.2)**
```scala
scala> sc.version
res0: String = 3.0.1

scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res1: String = 2
```

**AFTER**
```scala
scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res0: String = 1
```

Closes #29895 from dongjoon-hyun/SPARK-DEFAUT-COMMITTER.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit cc06266ade5a4eb35089501a3b32736624208d4c)
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala   |  3 +++
 docs/configuration.md  | 10 ++
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
index 1180501..6f799a5 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
@@ -462,6 +462,9 @@ private[spark] object SparkHadoopUtil {
 for ((key, value) <- conf.getAll if key.startsWith("spark.hadoop.")) {
   hadoopConf.set(key.substring("spark.hadoop.".length), value)
 }
+if 
(conf.getOption("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version").isEmpty)
 {
+  hadoopConf.set("mapreduce.fileoutputcommitter.algorithm.version", "1")
+}
   }
 
   private def appendSparkHiveConfigs(conf: SparkConf, hadoopConf: 
Configuration): Unit = {
diff --git a/docs/configuration.md b/docs/configuration.md
index 95ff282..36e4f45 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1761,16 +1761,10 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
   
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version
-  Dependent on environment
+  1
   
 The file output committer algorithm version, valid algorithm version 
number: 1 or 2.
-Version 2 may have better performance, but version 1 may handle failures 
better in certain situations,
-as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815.
-The default value depends on the Hadoop version used in an environment:
-1 for Hadoop versions lower than 3.0
-2 for Hadoop versions 3.0 and higher
-It's important to note that this can change back to 1 again in the future 
once https://issues.apache.org/jira/browse/MAPREDUCE-7282;>MAPREDUCE-7282
-is fixed and merged.
+Note that 2 may cause a correctness issue like MAPREDUCE-7282.
   
   2.2.0
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (711d8dd -> cc06266)

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 711d8dd  [SPARK-33018][SQL] Fix estimate statistics issue if child has 
0 bytes
 add cc06266  [SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala   |  3 +++
 docs/configuration.md  | 10 ++
 2 files changed, 5 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f3b80f8  [SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
f3b80f8 is described below

commit f3b80f88324e8a1a76d01d13cfc1fc7082238214
Author: Dongjoon Hyun 
AuthorDate: Tue Sep 29 12:02:45 2020 -0700

[SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

### What changes were proposed in this pull request?

Apache Spark 3.1's default Hadoop profile is `hadoop-3.2`. Instead of 
having a warning documentation, this PR aims to use a consistent and safer 
version of Apache Hadoop file output committer algorithm which is `v1`. This 
will prevent a silent correctness regression during migration from Apache Spark 
2.4/3.0 to Apache Spark 3.1.0. Of course, if there is a user-provided 
configuration, 
`spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2`, that will be 
used still.

### Why are the changes needed?

Apache Spark provides multiple distributions with Hadoop 2.7 and Hadoop 
3.2. `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version` depends on 
the Hadoop version. Apache Hadoop 3.0 switches the default algorithm from `v1` 
to `v2` and now there exists a discussion to remove `v2`. We had better provide 
a consistent default behavior of `v1` across various Spark distributions.

- [MAPREDUCE-7282](https://issues.apache.org/jira/browse/MAPREDUCE-7282) MR 
v2 commit algorithm should be deprecated and not the default

### Does this PR introduce _any_ user-facing change?

Yes. This changes the default behavior. Users can override this conf.

### How was this patch tested?

Manual.

**BEFORE (spark-3.0.1-bin-hadoop3.2)**
```scala
scala> sc.version
res0: String = 3.0.1

scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res1: String = 2
```

**AFTER**
```scala
scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res0: String = 1
```

Closes #29895 from dongjoon-hyun/SPARK-DEFAUT-COMMITTER.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit cc06266ade5a4eb35089501a3b32736624208d4c)
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala   |  3 +++
 docs/configuration.md  | 10 ++
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
index 1180501..6f799a5 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
@@ -462,6 +462,9 @@ private[spark] object SparkHadoopUtil {
 for ((key, value) <- conf.getAll if key.startsWith("spark.hadoop.")) {
   hadoopConf.set(key.substring("spark.hadoop.".length), value)
 }
+if 
(conf.getOption("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version").isEmpty)
 {
+  hadoopConf.set("mapreduce.fileoutputcommitter.algorithm.version", "1")
+}
   }
 
   private def appendSparkHiveConfigs(conf: SparkConf, hadoopConf: 
Configuration): Unit = {
diff --git a/docs/configuration.md b/docs/configuration.md
index 95ff282..36e4f45 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1761,16 +1761,10 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
   
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version
-  Dependent on environment
+  1
   
 The file output committer algorithm version, valid algorithm version 
number: 1 or 2.
-Version 2 may have better performance, but version 1 may handle failures 
better in certain situations,
-as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815.
-The default value depends on the Hadoop version used in an environment:
-1 for Hadoop versions lower than 3.0
-2 for Hadoop versions 3.0 and higher
-It's important to note that this can change back to 1 again in the future 
once https://issues.apache.org/jira/browse/MAPREDUCE-7282;>MAPREDUCE-7282
-is fixed and merged.
+Note that 2 may cause a correctness issue like MAPREDUCE-7282.
   
   2.2.0
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (711d8dd -> cc06266)

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 711d8dd  [SPARK-33018][SQL] Fix estimate statistics issue if child has 
0 bytes
 add cc06266  [SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala   |  3 +++
 docs/configuration.md  | 10 ++
 2 files changed, 5 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (39bfae2 -> ae8b35a)

2020-09-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 39bfae2  [MINOR][DOCS] Fixing log message for better clarity
 add ae8b35a  [SPARK-33018][SQL] Fix estimate statistics issue if child has 
0 bytes

No new revisions were added by this update.

Summary of changes:
 .../SizeInBytesOnlyStatsPlanVisitor.scala  |  3 ++-
 .../statsEstimation/JoinEstimationSuite.scala  | 22 ++
 .../statsEstimation/StatsEstimationTestBase.scala  |  9 ++---
 3 files changed, 30 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7766fd1 -> 711d8dd)

2020-09-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7766fd1  [MINOR][DOCS] Fixing log message for better clarity
 add 711d8dd  [SPARK-33018][SQL] Fix estimate statistics issue if child has 
0 bytes

No new revisions were added by this update.

Summary of changes:
 .../SizeInBytesOnlyStatsPlanVisitor.scala  |  3 ++-
 .../statsEstimation/JoinEstimationSuite.scala  | 22 ++
 .../statsEstimation/StatsEstimationTestBase.scala  |  9 ++---
 3 files changed, 30 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (39bfae2 -> ae8b35a)

2020-09-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 39bfae2  [MINOR][DOCS] Fixing log message for better clarity
 add ae8b35a  [SPARK-33018][SQL] Fix estimate statistics issue if child has 
0 bytes

No new revisions were added by this update.

Summary of changes:
 .../SizeInBytesOnlyStatsPlanVisitor.scala  |  3 ++-
 .../statsEstimation/JoinEstimationSuite.scala  | 22 ++
 .../statsEstimation/StatsEstimationTestBase.scala  |  9 ++---
 3 files changed, 30 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7766fd1 -> 711d8dd)

2020-09-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7766fd1  [MINOR][DOCS] Fixing log message for better clarity
 add 711d8dd  [SPARK-33018][SQL] Fix estimate statistics issue if child has 
0 bytes

No new revisions were added by this update.

Summary of changes:
 .../SizeInBytesOnlyStatsPlanVisitor.scala  |  3 ++-
 .../statsEstimation/JoinEstimationSuite.scala  | 22 ++
 .../statsEstimation/StatsEstimationTestBase.scala  |  9 ++---
 3 files changed, 30 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (39bfae2 -> ae8b35a)

2020-09-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 39bfae2  [MINOR][DOCS] Fixing log message for better clarity
 add ae8b35a  [SPARK-33018][SQL] Fix estimate statistics issue if child has 
0 bytes

No new revisions were added by this update.

Summary of changes:
 .../SizeInBytesOnlyStatsPlanVisitor.scala  |  3 ++-
 .../statsEstimation/JoinEstimationSuite.scala  | 22 ++
 .../statsEstimation/StatsEstimationTestBase.scala  |  9 ++---
 3 files changed, 30 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7766fd1 -> 711d8dd)

2020-09-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7766fd1  [MINOR][DOCS] Fixing log message for better clarity
 add 711d8dd  [SPARK-33018][SQL] Fix estimate statistics issue if child has 
0 bytes

No new revisions were added by this update.

Summary of changes:
 .../SizeInBytesOnlyStatsPlanVisitor.scala  |  3 ++-
 .../statsEstimation/JoinEstimationSuite.scala  | 22 ++
 .../statsEstimation/StatsEstimationTestBase.scala  |  9 ++---
 3 files changed, 30 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (39bfae2 -> ae8b35a)

2020-09-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 39bfae2  [MINOR][DOCS] Fixing log message for better clarity
 add ae8b35a  [SPARK-33018][SQL] Fix estimate statistics issue if child has 
0 bytes

No new revisions were added by this update.

Summary of changes:
 .../SizeInBytesOnlyStatsPlanVisitor.scala  |  3 ++-
 .../statsEstimation/JoinEstimationSuite.scala  | 22 ++
 .../statsEstimation/StatsEstimationTestBase.scala  |  9 ++---
 3 files changed, 30 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7766fd1 -> 711d8dd)

2020-09-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7766fd1  [MINOR][DOCS] Fixing log message for better clarity
 add 711d8dd  [SPARK-33018][SQL] Fix estimate statistics issue if child has 
0 bytes

No new revisions were added by this update.

Summary of changes:
 .../SizeInBytesOnlyStatsPlanVisitor.scala  |  3 ++-
 .../statsEstimation/JoinEstimationSuite.scala  | 22 ++
 .../statsEstimation/StatsEstimationTestBase.scala  |  9 ++---
 3 files changed, 30 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33018][SQL] Fix estimate statistics issue if child has 0 bytes

2020-09-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ae8b35a  [SPARK-33018][SQL] Fix estimate statistics issue if child has 
0 bytes
ae8b35a is described below

commit ae8b35a0d24f8c83597d668875793a8dbca6
Author: Yuming Wang 
AuthorDate: Tue Sep 29 16:46:04 2020 +

[SPARK-33018][SQL] Fix estimate statistics issue if child has 0 bytes

### What changes were proposed in this pull request?

This pr fix estimate statistics issue if child has 0 bytes.

### Why are the changes needed?
The `sizeInBytes` can be `0` when AQE and CBO are 
enabled(`spark.sql.adaptive.enabled`=true, `spark.sql.cbo.enabled`=true and 
`spark.sql.cbo.planStats.enabled`=true). This will generate incorrect 
BroadcastJoin, resulting in Driver OOM. For example:

![SPARK-33018](https://user-images.githubusercontent.com/5399861/94457606-647e3d00-01e7-11eb-85ee-812ae6efe7bb.jpg)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test.

Closes #29894 from wangyum/SPARK-33018.

Authored-by: Yuming Wang 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 711d8dd28afd9af92b025f9908534e5f1d575042)
Signed-off-by: Wenchen Fan 
---
 .../SizeInBytesOnlyStatsPlanVisitor.scala  |  3 ++-
 .../statsEstimation/JoinEstimationSuite.scala  | 22 ++
 .../statsEstimation/StatsEstimationTestBase.scala  |  9 ++---
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala
index da36db7..a586988 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala
@@ -53,7 +53,8 @@ object SizeInBytesOnlyStatsPlanVisitor extends 
LogicalPlanVisitor[Statistics] {
*/
   override def default(p: LogicalPlan): Statistics = p match {
 case p: LeafNode => p.computeStats()
-case _: LogicalPlan => Statistics(sizeInBytes = 
p.children.map(_.stats.sizeInBytes).product)
+case _: LogicalPlan =>
+  Statistics(sizeInBytes = p.children.map(_.stats.sizeInBytes).filter(_ > 
0L).product)
   }
 
   override def visitAggregate(p: Aggregate): Statistics = {
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/JoinEstimationSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/JoinEstimationSuite.scala
index 6c5a2b2..cdfc863 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/JoinEstimationSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/JoinEstimationSuite.scala
@@ -551,4 +551,26 @@ class JoinEstimationSuite extends StatsEstimationTestBase {
   attributeStats = AttributeMap(Nil))
 assert(join.stats == expectedStats)
   }
+
+  test("SPARK-33018 Fix estimate statistics issue if child has 0 bytes") {
+case class MyStatsTestPlan(
+outputList: Seq[Attribute],
+sizeInBytes: BigInt) extends LeafNode {
+  override def output: Seq[Attribute] = outputList
+  override def computeStats(): Statistics = Statistics(sizeInBytes = 
sizeInBytes)
+}
+
+val left = MyStatsTestPlan(
+  outputList = Seq("key-1-2", "key-2-4").map(nameToAttr),
+  sizeInBytes = BigInt(100))
+
+val right = MyStatsTestPlan(
+  outputList = Seq("key-1-2", "key-2-3").map(nameToAttr),
+  sizeInBytes = BigInt(0))
+
+val join = Join(left, right, LeftOuter,
+  Some(EqualTo(nameToAttr("key-2-4"), nameToAttr("key-2-3"))), 
JoinHint.NONE)
+
+assert(join.stats == Statistics(sizeInBytes = 100))
+  }
 }
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/StatsEstimationTestBase.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/StatsEstimationTestBase.scala
index 9dceca5..0a27e31 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/StatsEstimationTestBase.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/StatsEstimationTestBase.scala
@@ -26,17 +26,20 @@ import org.apache.spark.sql.types.{IntegerType, StringType}
 
 trait StatsEstimationTestBase extends SparkFunSuite {
 
-  var originalValue: Boolean = false
+  var originalCBOValue: Boolean = false
+  var originalPlanStatsValue: Boolean = false
 
   override def

[spark] branch master updated (7766fd1 -> 711d8dd)

2020-09-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7766fd1  [MINOR][DOCS] Fixing log message for better clarity
 add 711d8dd  [SPARK-33018][SQL] Fix estimate statistics issue if child has 
0 bytes

No new revisions were added by this update.

Summary of changes:
 .../SizeInBytesOnlyStatsPlanVisitor.scala  |  3 ++-
 .../statsEstimation/JoinEstimationSuite.scala  | 22 ++
 .../statsEstimation/StatsEstimationTestBase.scala  |  9 ++---
 3 files changed, 30 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [MINOR][DOCS] Fixing log message for better clarity

2020-09-29 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 39bfae2  [MINOR][DOCS] Fixing log message for better clarity
39bfae2 is described below

commit 39bfae25979aecbe8058beb2a4882fde9f141eba
Author: Akshat Bordia 
AuthorDate: Tue Sep 29 08:38:43 2020 -0500

[MINOR][DOCS] Fixing log message for better clarity

Fixing log message for better clarity.

Closes #29870 from akshatb1/master.

Lead-authored-by: Akshat Bordia 
Co-authored-by: Akshat Bordia 
Signed-off-by: Sean Owen 
(cherry picked from commit 7766fd13c9e7cb72b97fdfee224d3958fbe882a0)
Signed-off-by: Sean Owen 
---
 core/src/main/scala/org/apache/spark/SparkConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala 
b/core/src/main/scala/org/apache/spark/SparkConf.scala
index 40915e3..802100e 100644
--- a/core/src/main/scala/org/apache/spark/SparkConf.scala
+++ b/core/src/main/scala/org/apache/spark/SparkConf.scala
@@ -577,7 +577,7 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable 
with Logging with Seria
 // If spark.executor.heartbeatInterval bigger than spark.network.timeout,
 // it will almost always cause ExecutorLostFailure. See SPARK-22754.
 require(executorTimeoutThresholdMs > executorHeartbeatIntervalMs, "The 
value of " +
-  s"${networkTimeout}=${executorTimeoutThresholdMs}ms must be no less than 
the value of " +
+  s"${networkTimeout}=${executorTimeoutThresholdMs}ms must be greater than 
the value of " +
   s"${EXECUTOR_HEARTBEAT_INTERVAL.key}=${executorHeartbeatIntervalMs}ms.")
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [MINOR][DOCS] Fixing log message for better clarity

2020-09-29 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 39bfae2  [MINOR][DOCS] Fixing log message for better clarity
39bfae2 is described below

commit 39bfae25979aecbe8058beb2a4882fde9f141eba
Author: Akshat Bordia 
AuthorDate: Tue Sep 29 08:38:43 2020 -0500

[MINOR][DOCS] Fixing log message for better clarity

Fixing log message for better clarity.

Closes #29870 from akshatb1/master.

Lead-authored-by: Akshat Bordia 
Co-authored-by: Akshat Bordia 
Signed-off-by: Sean Owen 
(cherry picked from commit 7766fd13c9e7cb72b97fdfee224d3958fbe882a0)
Signed-off-by: Sean Owen 
---
 core/src/main/scala/org/apache/spark/SparkConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala 
b/core/src/main/scala/org/apache/spark/SparkConf.scala
index 40915e3..802100e 100644
--- a/core/src/main/scala/org/apache/spark/SparkConf.scala
+++ b/core/src/main/scala/org/apache/spark/SparkConf.scala
@@ -577,7 +577,7 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable 
with Logging with Seria
 // If spark.executor.heartbeatInterval bigger than spark.network.timeout,
 // it will almost always cause ExecutorLostFailure. See SPARK-22754.
 require(executorTimeoutThresholdMs > executorHeartbeatIntervalMs, "The 
value of " +
-  s"${networkTimeout}=${executorTimeoutThresholdMs}ms must be no less than 
the value of " +
+  s"${networkTimeout}=${executorTimeoutThresholdMs}ms must be greater than 
the value of " +
   s"${EXECUTOR_HEARTBEAT_INTERVAL.key}=${executorHeartbeatIntervalMs}ms.")
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f167002 -> 7766fd1)

2020-09-29 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f167002  [SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter
 add 7766fd1  [MINOR][DOCS] Fixing log message for better clarity

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/SparkConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [MINOR][DOCS] Fixing log message for better clarity

2020-09-29 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 39bfae2  [MINOR][DOCS] Fixing log message for better clarity
39bfae2 is described below

commit 39bfae25979aecbe8058beb2a4882fde9f141eba
Author: Akshat Bordia 
AuthorDate: Tue Sep 29 08:38:43 2020 -0500

[MINOR][DOCS] Fixing log message for better clarity

Fixing log message for better clarity.

Closes #29870 from akshatb1/master.

Lead-authored-by: Akshat Bordia 
Co-authored-by: Akshat Bordia 
Signed-off-by: Sean Owen 
(cherry picked from commit 7766fd13c9e7cb72b97fdfee224d3958fbe882a0)
Signed-off-by: Sean Owen 
---
 core/src/main/scala/org/apache/spark/SparkConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala 
b/core/src/main/scala/org/apache/spark/SparkConf.scala
index 40915e3..802100e 100644
--- a/core/src/main/scala/org/apache/spark/SparkConf.scala
+++ b/core/src/main/scala/org/apache/spark/SparkConf.scala
@@ -577,7 +577,7 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable 
with Logging with Seria
 // If spark.executor.heartbeatInterval bigger than spark.network.timeout,
 // it will almost always cause ExecutorLostFailure. See SPARK-22754.
 require(executorTimeoutThresholdMs > executorHeartbeatIntervalMs, "The 
value of " +
-  s"${networkTimeout}=${executorTimeoutThresholdMs}ms must be no less than 
the value of " +
+  s"${networkTimeout}=${executorTimeoutThresholdMs}ms must be greater than 
the value of " +
   s"${EXECUTOR_HEARTBEAT_INTERVAL.key}=${executorHeartbeatIntervalMs}ms.")
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f167002 -> 7766fd1)

2020-09-29 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f167002  [SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter
 add 7766fd1  [MINOR][DOCS] Fixing log message for better clarity

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/SparkConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [MINOR][DOCS] Fixing log message for better clarity

2020-09-29 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 39bfae2  [MINOR][DOCS] Fixing log message for better clarity
39bfae2 is described below

commit 39bfae25979aecbe8058beb2a4882fde9f141eba
Author: Akshat Bordia 
AuthorDate: Tue Sep 29 08:38:43 2020 -0500

[MINOR][DOCS] Fixing log message for better clarity

Fixing log message for better clarity.

Closes #29870 from akshatb1/master.

Lead-authored-by: Akshat Bordia 
Co-authored-by: Akshat Bordia 
Signed-off-by: Sean Owen 
(cherry picked from commit 7766fd13c9e7cb72b97fdfee224d3958fbe882a0)
Signed-off-by: Sean Owen 
---
 core/src/main/scala/org/apache/spark/SparkConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala 
b/core/src/main/scala/org/apache/spark/SparkConf.scala
index 40915e3..802100e 100644
--- a/core/src/main/scala/org/apache/spark/SparkConf.scala
+++ b/core/src/main/scala/org/apache/spark/SparkConf.scala
@@ -577,7 +577,7 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable 
with Logging with Seria
 // If spark.executor.heartbeatInterval bigger than spark.network.timeout,
 // it will almost always cause ExecutorLostFailure. See SPARK-22754.
 require(executorTimeoutThresholdMs > executorHeartbeatIntervalMs, "The 
value of " +
-  s"${networkTimeout}=${executorTimeoutThresholdMs}ms must be no less than 
the value of " +
+  s"${networkTimeout}=${executorTimeoutThresholdMs}ms must be greater than 
the value of " +
   s"${EXECUTOR_HEARTBEAT_INTERVAL.key}=${executorHeartbeatIntervalMs}ms.")
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f167002 -> 7766fd1)

2020-09-29 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f167002  [SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter
 add 7766fd1  [MINOR][DOCS] Fixing log message for better clarity

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/SparkConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [MINOR][DOCS] Fixing log message for better clarity

2020-09-29 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 39bfae2  [MINOR][DOCS] Fixing log message for better clarity
39bfae2 is described below

commit 39bfae25979aecbe8058beb2a4882fde9f141eba
Author: Akshat Bordia 
AuthorDate: Tue Sep 29 08:38:43 2020 -0500

[MINOR][DOCS] Fixing log message for better clarity

Fixing log message for better clarity.

Closes #29870 from akshatb1/master.

Lead-authored-by: Akshat Bordia 
Co-authored-by: Akshat Bordia 
Signed-off-by: Sean Owen 
(cherry picked from commit 7766fd13c9e7cb72b97fdfee224d3958fbe882a0)
Signed-off-by: Sean Owen 
---
 core/src/main/scala/org/apache/spark/SparkConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala 
b/core/src/main/scala/org/apache/spark/SparkConf.scala
index 40915e3..802100e 100644
--- a/core/src/main/scala/org/apache/spark/SparkConf.scala
+++ b/core/src/main/scala/org/apache/spark/SparkConf.scala
@@ -577,7 +577,7 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable 
with Logging with Seria
 // If spark.executor.heartbeatInterval bigger than spark.network.timeout,
 // it will almost always cause ExecutorLostFailure. See SPARK-22754.
 require(executorTimeoutThresholdMs > executorHeartbeatIntervalMs, "The 
value of " +
-  s"${networkTimeout}=${executorTimeoutThresholdMs}ms must be no less than 
the value of " +
+  s"${networkTimeout}=${executorTimeoutThresholdMs}ms must be greater than 
the value of " +
   s"${EXECUTOR_HEARTBEAT_INTERVAL.key}=${executorHeartbeatIntervalMs}ms.")
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f167002 -> 7766fd1)

2020-09-29 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f167002  [SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter
 add 7766fd1  [MINOR][DOCS] Fixing log message for better clarity

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/SparkConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f167002 -> 7766fd1)

2020-09-29 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f167002  [SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter
 add 7766fd1  [MINOR][DOCS] Fixing log message for better clarity

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/SparkConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32901][CORE] Do not allocate memory while spilling UnsafeExternalSorter

2020-09-29 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d3cc564  [SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter
d3cc564 is described below

commit d3cc564de2e27d6f40d360a35ac86d17b39f1498
Author: Tom van Bussel 
AuthorDate: Tue Sep 29 13:05:33 2020 +0200

[SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter

### What changes were proposed in this pull request?

This PR changes `UnsafeExternalSorter` to no longer allocate any memory 
while spilling. In particular it removes the allocation of a new pointer array 
in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever 
the next record is inserted into the sorter.

### Why are the changes needed?

Without this change the `UnsafeExternalSorter` could throw an OOM while 
spilling. The following sequence of events would have triggered an OOM:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array, and tries to 
allocate a new small pointer array.
5. `TaskMemoryManager` tries to allocate the memory backing the small array 
using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, 
as the `TaskMemoryManager` is still holding on to the memory it got for the new 
large array.
6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this 
time there is nothing to spill.
7. `UnsafeInMemorySorter` receives less memory than it requested, and 
causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to 
fail.

With the changes in the PR the following will happen instead:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array.
5. `TaskMemoryManager` returns control to 
`UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the 
new large array or by throwing a `SparkOutOfMemoryError`).
6. `UnsafeExternalSorter` either frees the new large array or it ignores 
the `SparkOutOfMemoryError` depending on what happened in the previous step.
7. `UnsafeExternalSorter` successfully allocates a new small pointer array 
and operation continues as normal.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tests were added in `UnsafeExternalSorterSuite` and 
`UnsafeInMemorySorterSuite`.

Closes #29785 from tomvanbussel/SPARK-32901.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
(cherry picked from commit f167002522d50eefb261c8ba2d66a23b781a38c4)
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 96 --
 .../unsafe/sort/UnsafeInMemorySorter.java  | 55 +++--
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 46 ---
 .../unsafe/sort/UnsafeInMemorySorterSuite.java | 40 -
 .../apache/spark/memory/TestMemoryManager.scala|  8 ++
 5 files changed, 143 insertions(+), 102 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index 71b9a5b..2096453 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -203,6 +203,10 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
 }
 
 if (inMemSorter == null || inMemSorter.numRecords() <= 0) {
+  // There could still be some memory allocated when there are no records 
in the in-memory
+  // sorter. We will not spill it however, to

[spark] branch branch-3.0 updated: [SPARK-32901][CORE] Do not allocate memory while spilling UnsafeExternalSorter

2020-09-29 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d3cc564  [SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter
d3cc564 is described below

commit d3cc564de2e27d6f40d360a35ac86d17b39f1498
Author: Tom van Bussel 
AuthorDate: Tue Sep 29 13:05:33 2020 +0200

[SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter

### What changes were proposed in this pull request?

This PR changes `UnsafeExternalSorter` to no longer allocate any memory 
while spilling. In particular it removes the allocation of a new pointer array 
in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever 
the next record is inserted into the sorter.

### Why are the changes needed?

Without this change the `UnsafeExternalSorter` could throw an OOM while 
spilling. The following sequence of events would have triggered an OOM:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array, and tries to 
allocate a new small pointer array.
5. `TaskMemoryManager` tries to allocate the memory backing the small array 
using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, 
as the `TaskMemoryManager` is still holding on to the memory it got for the new 
large array.
6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this 
time there is nothing to spill.
7. `UnsafeInMemorySorter` receives less memory than it requested, and 
causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to 
fail.

With the changes in the PR the following will happen instead:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array.
5. `TaskMemoryManager` returns control to 
`UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the 
new large array or by throwing a `SparkOutOfMemoryError`).
6. `UnsafeExternalSorter` either frees the new large array or it ignores 
the `SparkOutOfMemoryError` depending on what happened in the previous step.
7. `UnsafeExternalSorter` successfully allocates a new small pointer array 
and operation continues as normal.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tests were added in `UnsafeExternalSorterSuite` and 
`UnsafeInMemorySorterSuite`.

Closes #29785 from tomvanbussel/SPARK-32901.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
(cherry picked from commit f167002522d50eefb261c8ba2d66a23b781a38c4)
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 96 --
 .../unsafe/sort/UnsafeInMemorySorter.java  | 55 +++--
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 46 ---
 .../unsafe/sort/UnsafeInMemorySorterSuite.java | 40 -
 .../apache/spark/memory/TestMemoryManager.scala|  8 ++
 5 files changed, 143 insertions(+), 102 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index 71b9a5b..2096453 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -203,6 +203,10 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
 }
 
 if (inMemSorter == null || inMemSorter.numRecords() <= 0) {
+  // There could still be some memory allocated when there are no records 
in the in-memory
+  // sorter. We will not spill it however, to

[spark] branch master updated (90e86f6 -> f167002)

2020-09-29 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 90e86f6  [SPARK-32970][SPARK-32019][SQL][TEST] Reduce the runtime of 
an UT for
 add f167002  [SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter

No new revisions were added by this update.

Summary of changes:
 .../unsafe/sort/UnsafeExternalSorter.java  | 96 --
 .../unsafe/sort/UnsafeInMemorySorter.java  | 55 +++--
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 46 ---
 .../unsafe/sort/UnsafeInMemorySorterSuite.java | 40 -
 .../apache/spark/memory/TestMemoryManager.scala|  8 ++
 5 files changed, 143 insertions(+), 102 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32901][CORE] Do not allocate memory while spilling UnsafeExternalSorter

2020-09-29 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d3cc564  [SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter
d3cc564 is described below

commit d3cc564de2e27d6f40d360a35ac86d17b39f1498
Author: Tom van Bussel 
AuthorDate: Tue Sep 29 13:05:33 2020 +0200

[SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter

### What changes were proposed in this pull request?

This PR changes `UnsafeExternalSorter` to no longer allocate any memory 
while spilling. In particular it removes the allocation of a new pointer array 
in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever 
the next record is inserted into the sorter.

### Why are the changes needed?

Without this change the `UnsafeExternalSorter` could throw an OOM while 
spilling. The following sequence of events would have triggered an OOM:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array, and tries to 
allocate a new small pointer array.
5. `TaskMemoryManager` tries to allocate the memory backing the small array 
using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, 
as the `TaskMemoryManager` is still holding on to the memory it got for the new 
large array.
6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this 
time there is nothing to spill.
7. `UnsafeInMemorySorter` receives less memory than it requested, and 
causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to 
fail.

With the changes in the PR the following will happen instead:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array.
5. `TaskMemoryManager` returns control to 
`UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the 
new large array or by throwing a `SparkOutOfMemoryError`).
6. `UnsafeExternalSorter` either frees the new large array or it ignores 
the `SparkOutOfMemoryError` depending on what happened in the previous step.
7. `UnsafeExternalSorter` successfully allocates a new small pointer array 
and operation continues as normal.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tests were added in `UnsafeExternalSorterSuite` and 
`UnsafeInMemorySorterSuite`.

Closes #29785 from tomvanbussel/SPARK-32901.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
(cherry picked from commit f167002522d50eefb261c8ba2d66a23b781a38c4)
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 96 --
 .../unsafe/sort/UnsafeInMemorySorter.java  | 55 +++--
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 46 ---
 .../unsafe/sort/UnsafeInMemorySorterSuite.java | 40 -
 .../apache/spark/memory/TestMemoryManager.scala|  8 ++
 5 files changed, 143 insertions(+), 102 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index 71b9a5b..2096453 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -203,6 +203,10 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
 }
 
 if (inMemSorter == null || inMemSorter.numRecords() <= 0) {
+  // There could still be some memory allocated when there are no records 
in the in-memory
+  // sorter. We will not spill it however, to

[spark] branch master updated (90e86f6 -> f167002)

2020-09-29 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 90e86f6  [SPARK-32970][SPARK-32019][SQL][TEST] Reduce the runtime of 
an UT for
 add f167002  [SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter

No new revisions were added by this update.

Summary of changes:
 .../unsafe/sort/UnsafeExternalSorter.java  | 96 --
 .../unsafe/sort/UnsafeInMemorySorter.java  | 55 +++--
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 46 ---
 .../unsafe/sort/UnsafeInMemorySorterSuite.java | 40 -
 .../apache/spark/memory/TestMemoryManager.scala|  8 ++
 5 files changed, 143 insertions(+), 102 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32901][CORE] Do not allocate memory while spilling UnsafeExternalSorter

2020-09-29 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d3cc564  [SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter
d3cc564 is described below

commit d3cc564de2e27d6f40d360a35ac86d17b39f1498
Author: Tom van Bussel 
AuthorDate: Tue Sep 29 13:05:33 2020 +0200

[SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter

### What changes were proposed in this pull request?

This PR changes `UnsafeExternalSorter` to no longer allocate any memory 
while spilling. In particular it removes the allocation of a new pointer array 
in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever 
the next record is inserted into the sorter.

### Why are the changes needed?

Without this change the `UnsafeExternalSorter` could throw an OOM while 
spilling. The following sequence of events would have triggered an OOM:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array, and tries to 
allocate a new small pointer array.
5. `TaskMemoryManager` tries to allocate the memory backing the small array 
using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, 
as the `TaskMemoryManager` is still holding on to the memory it got for the new 
large array.
6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this 
time there is nothing to spill.
7. `UnsafeInMemorySorter` receives less memory than it requested, and 
causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to 
fail.

With the changes in the PR the following will happen instead:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array.
5. `TaskMemoryManager` returns control to 
`UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the 
new large array or by throwing a `SparkOutOfMemoryError`).
6. `UnsafeExternalSorter` either frees the new large array or it ignores 
the `SparkOutOfMemoryError` depending on what happened in the previous step.
7. `UnsafeExternalSorter` successfully allocates a new small pointer array 
and operation continues as normal.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tests were added in `UnsafeExternalSorterSuite` and 
`UnsafeInMemorySorterSuite`.

Closes #29785 from tomvanbussel/SPARK-32901.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
(cherry picked from commit f167002522d50eefb261c8ba2d66a23b781a38c4)
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 96 --
 .../unsafe/sort/UnsafeInMemorySorter.java  | 55 +++--
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 46 ---
 .../unsafe/sort/UnsafeInMemorySorterSuite.java | 40 -
 .../apache/spark/memory/TestMemoryManager.scala|  8 ++
 5 files changed, 143 insertions(+), 102 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index 71b9a5b..2096453 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -203,6 +203,10 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
 }
 
 if (inMemSorter == null || inMemSorter.numRecords() <= 0) {
+  // There could still be some memory allocated when there are no records 
in the in-memory
+  // sorter. We will not spill it however, to

[spark] branch master updated (90e86f6 -> f167002)

2020-09-29 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 90e86f6  [SPARK-32970][SPARK-32019][SQL][TEST] Reduce the runtime of 
an UT for
 add f167002  [SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter

No new revisions were added by this update.

Summary of changes:
 .../unsafe/sort/UnsafeExternalSorter.java  | 96 --
 .../unsafe/sort/UnsafeInMemorySorter.java  | 55 +++--
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 46 ---
 .../unsafe/sort/UnsafeInMemorySorterSuite.java | 40 -
 .../apache/spark/memory/TestMemoryManager.scala|  8 ++
 5 files changed, 143 insertions(+), 102 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32901][CORE] Do not allocate memory while spilling UnsafeExternalSorter

2020-09-29 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d3cc564  [SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter
d3cc564 is described below

commit d3cc564de2e27d6f40d360a35ac86d17b39f1498
Author: Tom van Bussel 
AuthorDate: Tue Sep 29 13:05:33 2020 +0200

[SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter

### What changes were proposed in this pull request?

This PR changes `UnsafeExternalSorter` to no longer allocate any memory 
while spilling. In particular it removes the allocation of a new pointer array 
in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever 
the next record is inserted into the sorter.

### Why are the changes needed?

Without this change the `UnsafeExternalSorter` could throw an OOM while 
spilling. The following sequence of events would have triggered an OOM:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array, and tries to 
allocate a new small pointer array.
5. `TaskMemoryManager` tries to allocate the memory backing the small array 
using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, 
as the `TaskMemoryManager` is still holding on to the memory it got for the new 
large array.
6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this 
time there is nothing to spill.
7. `UnsafeInMemorySorter` receives less memory than it requested, and 
causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to 
fail.

With the changes in the PR the following will happen instead:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array.
5. `TaskMemoryManager` returns control to 
`UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the 
new large array or by throwing a `SparkOutOfMemoryError`).
6. `UnsafeExternalSorter` either frees the new large array or it ignores 
the `SparkOutOfMemoryError` depending on what happened in the previous step.
7. `UnsafeExternalSorter` successfully allocates a new small pointer array 
and operation continues as normal.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tests were added in `UnsafeExternalSorterSuite` and 
`UnsafeInMemorySorterSuite`.

Closes #29785 from tomvanbussel/SPARK-32901.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
(cherry picked from commit f167002522d50eefb261c8ba2d66a23b781a38c4)
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 96 --
 .../unsafe/sort/UnsafeInMemorySorter.java  | 55 +++--
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 46 ---
 .../unsafe/sort/UnsafeInMemorySorterSuite.java | 40 -
 .../apache/spark/memory/TestMemoryManager.scala|  8 ++
 5 files changed, 143 insertions(+), 102 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index 71b9a5b..2096453 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -203,6 +203,10 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
 }
 
 if (inMemSorter == null || inMemSorter.numRecords() <= 0) {
+  // There could still be some memory allocated when there are no records 
in the in-memory
+  // sorter. We will not spill it however, to

[spark] branch master updated (90e86f6 -> f167002)

2020-09-29 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 90e86f6  [SPARK-32970][SPARK-32019][SQL][TEST] Reduce the runtime of 
an UT for
 add f167002  [SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter

No new revisions were added by this update.

Summary of changes:
 .../unsafe/sort/UnsafeExternalSorter.java  | 96 --
 .../unsafe/sort/UnsafeInMemorySorter.java  | 55 +++--
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 46 ---
 .../unsafe/sort/UnsafeInMemorySorterSuite.java | 40 -
 .../apache/spark/memory/TestMemoryManager.scala|  8 ++
 5 files changed, 143 insertions(+), 102 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (90e86f6 -> f167002)

2020-09-29 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 90e86f6  [SPARK-32970][SPARK-32019][SQL][TEST] Reduce the runtime of 
an UT for
 add f167002  [SPARK-32901][CORE] Do not allocate memory while spilling 
UnsafeExternalSorter

No new revisions were added by this update.

Summary of changes:
 .../unsafe/sort/UnsafeExternalSorter.java  | 96 --
 .../unsafe/sort/UnsafeInMemorySorter.java  | 55 +++--
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 46 ---
 .../unsafe/sort/UnsafeInMemorySorterSuite.java | 40 -
 .../apache/spark/memory/TestMemoryManager.scala|  8 ++
 5 files changed, 143 insertions(+), 102 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33015][SQL][FOLLOWUP][3.0] Use millisToDays() in the ComputeCurrentTime rule

2020-09-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2160dc5  [SPARK-33015][SQL][FOLLOWUP][3.0] Use millisToDays() in the 
ComputeCurrentTime rule
2160dc5 is described below

commit 2160dc52163f017bc164ad18ca6ebe6868070402
Author: Max Gekk 
AuthorDate: Tue Sep 29 19:34:43 2020 +0900

[SPARK-33015][SQL][FOLLOWUP][3.0] Use millisToDays() in the 
ComputeCurrentTime rule

### What changes were proposed in this pull request?
Use `millisToDays()` instead of `microsToDays()` because the former one is 
not available in `branch-3.0`.

### Why are the changes needed?
To fix the build failure:
```
[ERROR] [Error] 
/home/jenkins/workspace/spark-branch-3.0-maven-snapshots/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala:85:
 value microsToDays is not a member of object 
org.apache.spark.sql.catalyst.util.DateTimeUtils
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running `./build/sbt clean package` and `ComputeCurrentTimeSuite`.

Closes #29901 from MaxGekk/fix-current_date-3.0.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
---
 .../scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
index 09e0118..ba7e852 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
@@ -82,7 +82,7 @@ object ComputeCurrentTime extends Rule[LogicalPlan] {
   case currentDate @ CurrentDate(Some(timeZoneId)) =>
 currentDates.getOrElseUpdate(timeZoneId, {
   Literal.create(
-DateTimeUtils.microsToDays(timestamp, currentDate.zoneId),
+DateTimeUtils.millisToDays(DateTimeUtils.toMillis(timestamp), 
currentDate.zoneId),
 DateType)
 })
   case CurrentTimestamp() | Now() => currentTime


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33015][SQL][FOLLOWUP][3.0] Use millisToDays() in the ComputeCurrentTime rule

2020-09-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2160dc5  [SPARK-33015][SQL][FOLLOWUP][3.0] Use millisToDays() in the 
ComputeCurrentTime rule
2160dc5 is described below

commit 2160dc52163f017bc164ad18ca6ebe6868070402
Author: Max Gekk 
AuthorDate: Tue Sep 29 19:34:43 2020 +0900

[SPARK-33015][SQL][FOLLOWUP][3.0] Use millisToDays() in the 
ComputeCurrentTime rule

### What changes were proposed in this pull request?
Use `millisToDays()` instead of `microsToDays()` because the former one is 
not available in `branch-3.0`.

### Why are the changes needed?
To fix the build failure:
```
[ERROR] [Error] 
/home/jenkins/workspace/spark-branch-3.0-maven-snapshots/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala:85:
 value microsToDays is not a member of object 
org.apache.spark.sql.catalyst.util.DateTimeUtils
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running `./build/sbt clean package` and `ComputeCurrentTimeSuite`.

Closes #29901 from MaxGekk/fix-current_date-3.0.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
---
 .../scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
index 09e0118..ba7e852 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
@@ -82,7 +82,7 @@ object ComputeCurrentTime extends Rule[LogicalPlan] {
   case currentDate @ CurrentDate(Some(timeZoneId)) =>
 currentDates.getOrElseUpdate(timeZoneId, {
   Literal.create(
-DateTimeUtils.microsToDays(timestamp, currentDate.zoneId),
+DateTimeUtils.millisToDays(DateTimeUtils.toMillis(timestamp), 
currentDate.zoneId),
 DateType)
 })
   case CurrentTimestamp() | Now() => currentTime


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33015][SQL][FOLLOWUP][3.0] Use millisToDays() in the ComputeCurrentTime rule

2020-09-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2160dc5  [SPARK-33015][SQL][FOLLOWUP][3.0] Use millisToDays() in the 
ComputeCurrentTime rule
2160dc5 is described below

commit 2160dc52163f017bc164ad18ca6ebe6868070402
Author: Max Gekk 
AuthorDate: Tue Sep 29 19:34:43 2020 +0900

[SPARK-33015][SQL][FOLLOWUP][3.0] Use millisToDays() in the 
ComputeCurrentTime rule

### What changes were proposed in this pull request?
Use `millisToDays()` instead of `microsToDays()` because the former one is 
not available in `branch-3.0`.

### Why are the changes needed?
To fix the build failure:
```
[ERROR] [Error] 
/home/jenkins/workspace/spark-branch-3.0-maven-snapshots/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala:85:
 value microsToDays is not a member of object 
org.apache.spark.sql.catalyst.util.DateTimeUtils
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running `./build/sbt clean package` and `ComputeCurrentTimeSuite`.

Closes #29901 from MaxGekk/fix-current_date-3.0.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
---
 .../scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
index 09e0118..ba7e852 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
@@ -82,7 +82,7 @@ object ComputeCurrentTime extends Rule[LogicalPlan] {
   case currentDate @ CurrentDate(Some(timeZoneId)) =>
 currentDates.getOrElseUpdate(timeZoneId, {
   Literal.create(
-DateTimeUtils.microsToDays(timestamp, currentDate.zoneId),
+DateTimeUtils.millisToDays(DateTimeUtils.toMillis(timestamp), 
currentDate.zoneId),
 DateType)
 })
   case CurrentTimestamp() | Now() => currentTime


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33015][SQL][FOLLOWUP][3.0] Use millisToDays() in the ComputeCurrentTime rule

2020-09-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2160dc5  [SPARK-33015][SQL][FOLLOWUP][3.0] Use millisToDays() in the 
ComputeCurrentTime rule
2160dc5 is described below

commit 2160dc52163f017bc164ad18ca6ebe6868070402
Author: Max Gekk 
AuthorDate: Tue Sep 29 19:34:43 2020 +0900

[SPARK-33015][SQL][FOLLOWUP][3.0] Use millisToDays() in the 
ComputeCurrentTime rule

### What changes were proposed in this pull request?
Use `millisToDays()` instead of `microsToDays()` because the former one is 
not available in `branch-3.0`.

### Why are the changes needed?
To fix the build failure:
```
[ERROR] [Error] 
/home/jenkins/workspace/spark-branch-3.0-maven-snapshots/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala:85:
 value microsToDays is not a member of object 
org.apache.spark.sql.catalyst.util.DateTimeUtils
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running `./build/sbt clean package` and `ComputeCurrentTimeSuite`.

Closes #29901 from MaxGekk/fix-current_date-3.0.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
---
 .../scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
index 09e0118..ba7e852 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
@@ -82,7 +82,7 @@ object ComputeCurrentTime extends Rule[LogicalPlan] {
   case currentDate @ CurrentDate(Some(timeZoneId)) =>
 currentDates.getOrElseUpdate(timeZoneId, {
   Literal.create(
-DateTimeUtils.microsToDays(timestamp, currentDate.zoneId),
+DateTimeUtils.millisToDays(DateTimeUtils.toMillis(timestamp), 
currentDate.zoneId),
 DateType)
 })
   case CurrentTimestamp() | Now() => currentTime


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33015][SQL][FOLLOWUP][3.0] Use millisToDays() in the ComputeCurrentTime rule

2020-09-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2160dc5  [SPARK-33015][SQL][FOLLOWUP][3.0] Use millisToDays() in the 
ComputeCurrentTime rule
2160dc5 is described below

commit 2160dc52163f017bc164ad18ca6ebe6868070402
Author: Max Gekk 
AuthorDate: Tue Sep 29 19:34:43 2020 +0900

[SPARK-33015][SQL][FOLLOWUP][3.0] Use millisToDays() in the 
ComputeCurrentTime rule

### What changes were proposed in this pull request?
Use `millisToDays()` instead of `microsToDays()` because the former one is 
not available in `branch-3.0`.

### Why are the changes needed?
To fix the build failure:
```
[ERROR] [Error] 
/home/jenkins/workspace/spark-branch-3.0-maven-snapshots/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala:85:
 value microsToDays is not a member of object 
org.apache.spark.sql.catalyst.util.DateTimeUtils
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running `./build/sbt clean package` and `ComputeCurrentTimeSuite`.

Closes #29901 from MaxGekk/fix-current_date-3.0.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
---
 .../scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
index 09e0118..ba7e852 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
@@ -82,7 +82,7 @@ object ComputeCurrentTime extends Rule[LogicalPlan] {
   case currentDate @ CurrentDate(Some(timeZoneId)) =>
 currentDates.getOrElseUpdate(timeZoneId, {
   Literal.create(
-DateTimeUtils.microsToDays(timestamp, currentDate.zoneId),
+DateTimeUtils.millisToDays(DateTimeUtils.toMillis(timestamp), 
currentDate.zoneId),
 DateType)
 })
   case CurrentTimestamp() | Now() => currentTime


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (202115e -> 90e86f6)

2020-09-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 202115e  [SPARK-32948][SQL] Optimize to_json and from_json expression 
chain
 add 90e86f6  [SPARK-32970][SPARK-32019][SQL][TEST] Reduce the runtime of 
an UT for

No new revisions were added by this update.

Summary of changes:
 .../datasources/FileSourceStrategySuite.scala  | 25 +-
 1 file changed, 15 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (202115e -> 90e86f6)

2020-09-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 202115e  [SPARK-32948][SQL] Optimize to_json and from_json expression 
chain
 add 90e86f6  [SPARK-32970][SPARK-32019][SQL][TEST] Reduce the runtime of 
an UT for

No new revisions were added by this update.

Summary of changes:
 .../datasources/FileSourceStrategySuite.scala  | 25 +-
 1 file changed, 15 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (202115e -> 90e86f6)

2020-09-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 202115e  [SPARK-32948][SQL] Optimize to_json and from_json expression 
chain
 add 90e86f6  [SPARK-32970][SPARK-32019][SQL][TEST] Reduce the runtime of 
an UT for

No new revisions were added by this update.

Summary of changes:
 .../datasources/FileSourceStrategySuite.scala  | 25 +-
 1 file changed, 15 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (202115e -> 90e86f6)

2020-09-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 202115e  [SPARK-32948][SQL] Optimize to_json and from_json expression 
chain
 add 90e86f6  [SPARK-32970][SPARK-32019][SQL][TEST] Reduce the runtime of 
an UT for

No new revisions were added by this update.

Summary of changes:
 .../datasources/FileSourceStrategySuite.scala  | 25 +-
 1 file changed, 15 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (202115e -> 90e86f6)

2020-09-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 202115e  [SPARK-32948][SQL] Optimize to_json and from_json expression 
chain
 add 90e86f6  [SPARK-32970][SPARK-32019][SQL][TEST] Reduce the runtime of 
an UT for

No new revisions were added by this update.

Summary of changes:
 .../datasources/FileSourceStrategySuite.scala  | 25 +-
 1 file changed, 15 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33021][PYTHON][TESTS] Move functions related test cases into test_functions.py

2020-09-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 97d8634  [SPARK-33021][PYTHON][TESTS] Move functions related test 
cases into test_functions.py
97d8634 is described below

commit 97d8634450b39c1f4e5308b8a5308650e1e7489a
Author: HyukjinKwon 
AuthorDate: Mon Sep 28 21:54:00 2020 -0700

[SPARK-33021][PYTHON][TESTS] Move functions related test cases into 
test_functions.py

Move functions related test cases from `test_context.py` to 
`test_functions.py`.

To group the similar test cases.

Nope, test-only.

Jenkins and GitHub Actions should test.

Closes #29898 from HyukjinKwon/SPARK-33021.

Authored-by: HyukjinKwon 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/tests/test_context.py   | 101 
 python/pyspark/sql/tests/test_functions.py | 102 -
 2 files changed, 101 insertions(+), 102 deletions(-)

diff --git a/python/pyspark/sql/tests/test_context.py 
b/python/pyspark/sql/tests/test_context.py
index 92e5434..3a0c7bb 100644
--- a/python/pyspark/sql/tests/test_context.py
+++ b/python/pyspark/sql/tests/test_context.py
@@ -30,7 +30,6 @@ import py4j
 from pyspark import SparkContext, SQLContext
 from pyspark.sql import Row, SparkSession
 from pyspark.sql.types import *
-from pyspark.sql.window import Window
 from pyspark.testing.utils import ReusedPySparkTestCase
 
 
@@ -112,99 +111,6 @@ class HiveContextSQLTests(ReusedPySparkTestCase):
 
 shutil.rmtree(tmpPath)
 
-def test_window_functions(self):
-df = self.spark.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, 
"2")], ["key", "value"])
-w = Window.partitionBy("value").orderBy("key")
-from pyspark.sql import functions as F
-sel = df.select(df.value, df.key,
-F.max("key").over(w.rowsBetween(0, 1)),
-F.min("key").over(w.rowsBetween(0, 1)),
-F.count("key").over(w.rowsBetween(float('-inf'), 
float('inf'))),
-F.row_number().over(w),
-F.rank().over(w),
-F.dense_rank().over(w),
-F.ntile(2).over(w))
-rs = sorted(sel.collect())
-expected = [
-("1", 1, 1, 1, 1, 1, 1, 1, 1),
-("2", 1, 1, 1, 3, 1, 1, 1, 1),
-("2", 1, 2, 1, 3, 2, 1, 1, 1),
-("2", 2, 2, 2, 3, 3, 3, 2, 2)
-]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-def test_window_functions_without_partitionBy(self):
-df = self.spark.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, 
"2")], ["key", "value"])
-w = Window.orderBy("key", df.value)
-from pyspark.sql import functions as F
-sel = df.select(df.value, df.key,
-F.max("key").over(w.rowsBetween(0, 1)),
-F.min("key").over(w.rowsBetween(0, 1)),
-F.count("key").over(w.rowsBetween(float('-inf'), 
float('inf'))),
-F.row_number().over(w),
-F.rank().over(w),
-F.dense_rank().over(w),
-F.ntile(2).over(w))
-rs = sorted(sel.collect())
-expected = [
-("1", 1, 1, 1, 4, 1, 1, 1, 1),
-("2", 1, 1, 1, 4, 2, 2, 2, 1),
-("2", 1, 2, 1, 4, 3, 2, 2, 2),
-("2", 2, 2, 2, 4, 4, 4, 3, 2)
-]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-def test_window_functions_cumulative_sum(self):
-df = self.spark.createDataFrame([("one", 1), ("two", 2)], ["key", 
"value"])
-from pyspark.sql import functions as F
-
-# Test cumulative sum
-sel = df.select(
-df.key,
-F.sum(df.value).over(Window.rowsBetween(Window.unboundedPreceding, 
0)))
-rs = sorted(sel.collect())
-expected = [("one", 1), ("two", 3)]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-# Test boundary values less than JVM's Long.MinValue and make sure we 
don't overflow
-sel = df.select(
-df.key,
-F.sum(df.value).over(Window.rowsBetween(Window.unboundedPreceding 
- 1, 0)))
-rs = sorted(sel.collect())
-expected = [("one", 1), ("two", 3)]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-# Test boundary values greater than JVM's Long.MaxValue and make sure 
we don't overflow
-frame_end = Window.unboundedFollowing + 1
-sel = df.select(
-df.key,
-

[spark] branch branch-3.0 updated: [SPARK-33021][PYTHON][TESTS] Move functions related test cases into test_functions.py

2020-09-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 97d8634  [SPARK-33021][PYTHON][TESTS] Move functions related test 
cases into test_functions.py
97d8634 is described below

commit 97d8634450b39c1f4e5308b8a5308650e1e7489a
Author: HyukjinKwon 
AuthorDate: Mon Sep 28 21:54:00 2020 -0700

[SPARK-33021][PYTHON][TESTS] Move functions related test cases into 
test_functions.py

Move functions related test cases from `test_context.py` to 
`test_functions.py`.

To group the similar test cases.

Nope, test-only.

Jenkins and GitHub Actions should test.

Closes #29898 from HyukjinKwon/SPARK-33021.

Authored-by: HyukjinKwon 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/tests/test_context.py   | 101 
 python/pyspark/sql/tests/test_functions.py | 102 -
 2 files changed, 101 insertions(+), 102 deletions(-)

diff --git a/python/pyspark/sql/tests/test_context.py 
b/python/pyspark/sql/tests/test_context.py
index 92e5434..3a0c7bb 100644
--- a/python/pyspark/sql/tests/test_context.py
+++ b/python/pyspark/sql/tests/test_context.py
@@ -30,7 +30,6 @@ import py4j
 from pyspark import SparkContext, SQLContext
 from pyspark.sql import Row, SparkSession
 from pyspark.sql.types import *
-from pyspark.sql.window import Window
 from pyspark.testing.utils import ReusedPySparkTestCase
 
 
@@ -112,99 +111,6 @@ class HiveContextSQLTests(ReusedPySparkTestCase):
 
 shutil.rmtree(tmpPath)
 
-def test_window_functions(self):
-df = self.spark.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, 
"2")], ["key", "value"])
-w = Window.partitionBy("value").orderBy("key")
-from pyspark.sql import functions as F
-sel = df.select(df.value, df.key,
-F.max("key").over(w.rowsBetween(0, 1)),
-F.min("key").over(w.rowsBetween(0, 1)),
-F.count("key").over(w.rowsBetween(float('-inf'), 
float('inf'))),
-F.row_number().over(w),
-F.rank().over(w),
-F.dense_rank().over(w),
-F.ntile(2).over(w))
-rs = sorted(sel.collect())
-expected = [
-("1", 1, 1, 1, 1, 1, 1, 1, 1),
-("2", 1, 1, 1, 3, 1, 1, 1, 1),
-("2", 1, 2, 1, 3, 2, 1, 1, 1),
-("2", 2, 2, 2, 3, 3, 3, 2, 2)
-]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-def test_window_functions_without_partitionBy(self):
-df = self.spark.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, 
"2")], ["key", "value"])
-w = Window.orderBy("key", df.value)
-from pyspark.sql import functions as F
-sel = df.select(df.value, df.key,
-F.max("key").over(w.rowsBetween(0, 1)),
-F.min("key").over(w.rowsBetween(0, 1)),
-F.count("key").over(w.rowsBetween(float('-inf'), 
float('inf'))),
-F.row_number().over(w),
-F.rank().over(w),
-F.dense_rank().over(w),
-F.ntile(2).over(w))
-rs = sorted(sel.collect())
-expected = [
-("1", 1, 1, 1, 4, 1, 1, 1, 1),
-("2", 1, 1, 1, 4, 2, 2, 2, 1),
-("2", 1, 2, 1, 4, 3, 2, 2, 2),
-("2", 2, 2, 2, 4, 4, 4, 3, 2)
-]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-def test_window_functions_cumulative_sum(self):
-df = self.spark.createDataFrame([("one", 1), ("two", 2)], ["key", 
"value"])
-from pyspark.sql import functions as F
-
-# Test cumulative sum
-sel = df.select(
-df.key,
-F.sum(df.value).over(Window.rowsBetween(Window.unboundedPreceding, 
0)))
-rs = sorted(sel.collect())
-expected = [("one", 1), ("two", 3)]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-# Test boundary values less than JVM's Long.MinValue and make sure we 
don't overflow
-sel = df.select(
-df.key,
-F.sum(df.value).over(Window.rowsBetween(Window.unboundedPreceding 
- 1, 0)))
-rs = sorted(sel.collect())
-expected = [("one", 1), ("two", 3)]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-# Test boundary values greater than JVM's Long.MaxValue and make sure 
we don't overflow
-frame_end = Window.unboundedFollowing + 1
-sel = df.select(
-df.key,
-

[spark] branch branch-3.0 updated: [SPARK-33021][PYTHON][TESTS] Move functions related test cases into test_functions.py

2020-09-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 97d8634  [SPARK-33021][PYTHON][TESTS] Move functions related test 
cases into test_functions.py
97d8634 is described below

commit 97d8634450b39c1f4e5308b8a5308650e1e7489a
Author: HyukjinKwon 
AuthorDate: Mon Sep 28 21:54:00 2020 -0700

[SPARK-33021][PYTHON][TESTS] Move functions related test cases into 
test_functions.py

Move functions related test cases from `test_context.py` to 
`test_functions.py`.

To group the similar test cases.

Nope, test-only.

Jenkins and GitHub Actions should test.

Closes #29898 from HyukjinKwon/SPARK-33021.

Authored-by: HyukjinKwon 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/tests/test_context.py   | 101 
 python/pyspark/sql/tests/test_functions.py | 102 -
 2 files changed, 101 insertions(+), 102 deletions(-)

diff --git a/python/pyspark/sql/tests/test_context.py 
b/python/pyspark/sql/tests/test_context.py
index 92e5434..3a0c7bb 100644
--- a/python/pyspark/sql/tests/test_context.py
+++ b/python/pyspark/sql/tests/test_context.py
@@ -30,7 +30,6 @@ import py4j
 from pyspark import SparkContext, SQLContext
 from pyspark.sql import Row, SparkSession
 from pyspark.sql.types import *
-from pyspark.sql.window import Window
 from pyspark.testing.utils import ReusedPySparkTestCase
 
 
@@ -112,99 +111,6 @@ class HiveContextSQLTests(ReusedPySparkTestCase):
 
 shutil.rmtree(tmpPath)
 
-def test_window_functions(self):
-df = self.spark.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, 
"2")], ["key", "value"])
-w = Window.partitionBy("value").orderBy("key")
-from pyspark.sql import functions as F
-sel = df.select(df.value, df.key,
-F.max("key").over(w.rowsBetween(0, 1)),
-F.min("key").over(w.rowsBetween(0, 1)),
-F.count("key").over(w.rowsBetween(float('-inf'), 
float('inf'))),
-F.row_number().over(w),
-F.rank().over(w),
-F.dense_rank().over(w),
-F.ntile(2).over(w))
-rs = sorted(sel.collect())
-expected = [
-("1", 1, 1, 1, 1, 1, 1, 1, 1),
-("2", 1, 1, 1, 3, 1, 1, 1, 1),
-("2", 1, 2, 1, 3, 2, 1, 1, 1),
-("2", 2, 2, 2, 3, 3, 3, 2, 2)
-]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-def test_window_functions_without_partitionBy(self):
-df = self.spark.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, 
"2")], ["key", "value"])
-w = Window.orderBy("key", df.value)
-from pyspark.sql import functions as F
-sel = df.select(df.value, df.key,
-F.max("key").over(w.rowsBetween(0, 1)),
-F.min("key").over(w.rowsBetween(0, 1)),
-F.count("key").over(w.rowsBetween(float('-inf'), 
float('inf'))),
-F.row_number().over(w),
-F.rank().over(w),
-F.dense_rank().over(w),
-F.ntile(2).over(w))
-rs = sorted(sel.collect())
-expected = [
-("1", 1, 1, 1, 4, 1, 1, 1, 1),
-("2", 1, 1, 1, 4, 2, 2, 2, 1),
-("2", 1, 2, 1, 4, 3, 2, 2, 2),
-("2", 2, 2, 2, 4, 4, 4, 3, 2)
-]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-def test_window_functions_cumulative_sum(self):
-df = self.spark.createDataFrame([("one", 1), ("two", 2)], ["key", 
"value"])
-from pyspark.sql import functions as F
-
-# Test cumulative sum
-sel = df.select(
-df.key,
-F.sum(df.value).over(Window.rowsBetween(Window.unboundedPreceding, 
0)))
-rs = sorted(sel.collect())
-expected = [("one", 1), ("two", 3)]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-# Test boundary values less than JVM's Long.MinValue and make sure we 
don't overflow
-sel = df.select(
-df.key,
-F.sum(df.value).over(Window.rowsBetween(Window.unboundedPreceding 
- 1, 0)))
-rs = sorted(sel.collect())
-expected = [("one", 1), ("two", 3)]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-# Test boundary values greater than JVM's Long.MaxValue and make sure 
we don't overflow
-frame_end = Window.unboundedFollowing + 1
-sel = df.select(
-df.key,
-

[spark] branch branch-3.0 updated: [SPARK-33021][PYTHON][TESTS] Move functions related test cases into test_functions.py

2020-09-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 97d8634  [SPARK-33021][PYTHON][TESTS] Move functions related test 
cases into test_functions.py
97d8634 is described below

commit 97d8634450b39c1f4e5308b8a5308650e1e7489a
Author: HyukjinKwon 
AuthorDate: Mon Sep 28 21:54:00 2020 -0700

[SPARK-33021][PYTHON][TESTS] Move functions related test cases into 
test_functions.py

Move functions related test cases from `test_context.py` to 
`test_functions.py`.

To group the similar test cases.

Nope, test-only.

Jenkins and GitHub Actions should test.

Closes #29898 from HyukjinKwon/SPARK-33021.

Authored-by: HyukjinKwon 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/tests/test_context.py   | 101 
 python/pyspark/sql/tests/test_functions.py | 102 -
 2 files changed, 101 insertions(+), 102 deletions(-)

diff --git a/python/pyspark/sql/tests/test_context.py 
b/python/pyspark/sql/tests/test_context.py
index 92e5434..3a0c7bb 100644
--- a/python/pyspark/sql/tests/test_context.py
+++ b/python/pyspark/sql/tests/test_context.py
@@ -30,7 +30,6 @@ import py4j
 from pyspark import SparkContext, SQLContext
 from pyspark.sql import Row, SparkSession
 from pyspark.sql.types import *
-from pyspark.sql.window import Window
 from pyspark.testing.utils import ReusedPySparkTestCase
 
 
@@ -112,99 +111,6 @@ class HiveContextSQLTests(ReusedPySparkTestCase):
 
 shutil.rmtree(tmpPath)
 
-def test_window_functions(self):
-df = self.spark.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, 
"2")], ["key", "value"])
-w = Window.partitionBy("value").orderBy("key")
-from pyspark.sql import functions as F
-sel = df.select(df.value, df.key,
-F.max("key").over(w.rowsBetween(0, 1)),
-F.min("key").over(w.rowsBetween(0, 1)),
-F.count("key").over(w.rowsBetween(float('-inf'), 
float('inf'))),
-F.row_number().over(w),
-F.rank().over(w),
-F.dense_rank().over(w),
-F.ntile(2).over(w))
-rs = sorted(sel.collect())
-expected = [
-("1", 1, 1, 1, 1, 1, 1, 1, 1),
-("2", 1, 1, 1, 3, 1, 1, 1, 1),
-("2", 1, 2, 1, 3, 2, 1, 1, 1),
-("2", 2, 2, 2, 3, 3, 3, 2, 2)
-]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-def test_window_functions_without_partitionBy(self):
-df = self.spark.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, 
"2")], ["key", "value"])
-w = Window.orderBy("key", df.value)
-from pyspark.sql import functions as F
-sel = df.select(df.value, df.key,
-F.max("key").over(w.rowsBetween(0, 1)),
-F.min("key").over(w.rowsBetween(0, 1)),
-F.count("key").over(w.rowsBetween(float('-inf'), 
float('inf'))),
-F.row_number().over(w),
-F.rank().over(w),
-F.dense_rank().over(w),
-F.ntile(2).over(w))
-rs = sorted(sel.collect())
-expected = [
-("1", 1, 1, 1, 4, 1, 1, 1, 1),
-("2", 1, 1, 1, 4, 2, 2, 2, 1),
-("2", 1, 2, 1, 4, 3, 2, 2, 2),
-("2", 2, 2, 2, 4, 4, 4, 3, 2)
-]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-def test_window_functions_cumulative_sum(self):
-df = self.spark.createDataFrame([("one", 1), ("two", 2)], ["key", 
"value"])
-from pyspark.sql import functions as F
-
-# Test cumulative sum
-sel = df.select(
-df.key,
-F.sum(df.value).over(Window.rowsBetween(Window.unboundedPreceding, 
0)))
-rs = sorted(sel.collect())
-expected = [("one", 1), ("two", 3)]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-# Test boundary values less than JVM's Long.MinValue and make sure we 
don't overflow
-sel = df.select(
-df.key,
-F.sum(df.value).over(Window.rowsBetween(Window.unboundedPreceding 
- 1, 0)))
-rs = sorted(sel.collect())
-expected = [("one", 1), ("two", 3)]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-# Test boundary values greater than JVM's Long.MaxValue and make sure 
we don't overflow
-frame_end = Window.unboundedFollowing + 1
-sel = df.select(
-df.key,
-

[spark] branch branch-3.0 updated: [SPARK-33021][PYTHON][TESTS] Move functions related test cases into test_functions.py

2020-09-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 97d8634  [SPARK-33021][PYTHON][TESTS] Move functions related test 
cases into test_functions.py
97d8634 is described below

commit 97d8634450b39c1f4e5308b8a5308650e1e7489a
Author: HyukjinKwon 
AuthorDate: Mon Sep 28 21:54:00 2020 -0700

[SPARK-33021][PYTHON][TESTS] Move functions related test cases into 
test_functions.py

Move functions related test cases from `test_context.py` to 
`test_functions.py`.

To group the similar test cases.

Nope, test-only.

Jenkins and GitHub Actions should test.

Closes #29898 from HyukjinKwon/SPARK-33021.

Authored-by: HyukjinKwon 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/tests/test_context.py   | 101 
 python/pyspark/sql/tests/test_functions.py | 102 -
 2 files changed, 101 insertions(+), 102 deletions(-)

diff --git a/python/pyspark/sql/tests/test_context.py 
b/python/pyspark/sql/tests/test_context.py
index 92e5434..3a0c7bb 100644
--- a/python/pyspark/sql/tests/test_context.py
+++ b/python/pyspark/sql/tests/test_context.py
@@ -30,7 +30,6 @@ import py4j
 from pyspark import SparkContext, SQLContext
 from pyspark.sql import Row, SparkSession
 from pyspark.sql.types import *
-from pyspark.sql.window import Window
 from pyspark.testing.utils import ReusedPySparkTestCase
 
 
@@ -112,99 +111,6 @@ class HiveContextSQLTests(ReusedPySparkTestCase):
 
 shutil.rmtree(tmpPath)
 
-def test_window_functions(self):
-df = self.spark.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, 
"2")], ["key", "value"])
-w = Window.partitionBy("value").orderBy("key")
-from pyspark.sql import functions as F
-sel = df.select(df.value, df.key,
-F.max("key").over(w.rowsBetween(0, 1)),
-F.min("key").over(w.rowsBetween(0, 1)),
-F.count("key").over(w.rowsBetween(float('-inf'), 
float('inf'))),
-F.row_number().over(w),
-F.rank().over(w),
-F.dense_rank().over(w),
-F.ntile(2).over(w))
-rs = sorted(sel.collect())
-expected = [
-("1", 1, 1, 1, 1, 1, 1, 1, 1),
-("2", 1, 1, 1, 3, 1, 1, 1, 1),
-("2", 1, 2, 1, 3, 2, 1, 1, 1),
-("2", 2, 2, 2, 3, 3, 3, 2, 2)
-]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-def test_window_functions_without_partitionBy(self):
-df = self.spark.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, 
"2")], ["key", "value"])
-w = Window.orderBy("key", df.value)
-from pyspark.sql import functions as F
-sel = df.select(df.value, df.key,
-F.max("key").over(w.rowsBetween(0, 1)),
-F.min("key").over(w.rowsBetween(0, 1)),
-F.count("key").over(w.rowsBetween(float('-inf'), 
float('inf'))),
-F.row_number().over(w),
-F.rank().over(w),
-F.dense_rank().over(w),
-F.ntile(2).over(w))
-rs = sorted(sel.collect())
-expected = [
-("1", 1, 1, 1, 4, 1, 1, 1, 1),
-("2", 1, 1, 1, 4, 2, 2, 2, 1),
-("2", 1, 2, 1, 4, 3, 2, 2, 2),
-("2", 2, 2, 2, 4, 4, 4, 3, 2)
-]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-def test_window_functions_cumulative_sum(self):
-df = self.spark.createDataFrame([("one", 1), ("two", 2)], ["key", 
"value"])
-from pyspark.sql import functions as F
-
-# Test cumulative sum
-sel = df.select(
-df.key,
-F.sum(df.value).over(Window.rowsBetween(Window.unboundedPreceding, 
0)))
-rs = sorted(sel.collect())
-expected = [("one", 1), ("two", 3)]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-# Test boundary values less than JVM's Long.MinValue and make sure we 
don't overflow
-sel = df.select(
-df.key,
-F.sum(df.value).over(Window.rowsBetween(Window.unboundedPreceding 
- 1, 0)))
-rs = sorted(sel.collect())
-expected = [("one", 1), ("two", 3)]
-for r, ex in zip(rs, expected):
-self.assertEqual(tuple(r), ex[:len(r)])
-
-# Test boundary values greater than JVM's Long.MaxValue and make sure 
we don't overflow
-frame_end = Window.unboundedFollowing + 1
-sel = df.select(
-df.key,
-

[spark] branch master updated (1b60ff5 -> 202115e)

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1b60ff5  [MINOR][DOCS] Document when `current_date` and 
`current_timestamp` are evaluated
 add 202115e  [SPARK-32948][SQL] Optimize to_json and from_json expression 
chain

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/OptimizeJsonExprs.scala |  43 ++
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |   1 +
 .../optimizer/OptimizeJsonExprsSuite.scala | 144 +
 3 files changed, 188 insertions(+)
 create mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJsonExprs.scala
 create mode 100644 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJsonExprsSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (1b60ff5 -> 202115e)

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1b60ff5  [MINOR][DOCS] Document when `current_date` and 
`current_timestamp` are evaluated
 add 202115e  [SPARK-32948][SQL] Optimize to_json and from_json expression 
chain

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/OptimizeJsonExprs.scala |  43 ++
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |   1 +
 .../optimizer/OptimizeJsonExprsSuite.scala | 144 +
 3 files changed, 188 insertions(+)
 create mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJsonExprs.scala
 create mode 100644 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJsonExprsSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (424f16e -> 118de10)

2020-09-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 424f16e  [SPARK-33015][SQL] Compute the current date only once
 add 118de10  [MINOR][DOCS] Document when `current_date` and 
`current_timestamp` are evaluated

No new revisions were added by this update.

Summary of changes:
 R/pkg/R/functions.R  |  6 --
 python/pyspark/sql/functions.py  |  6 --
 .../spark/sql/catalyst/expressions/datetimeExpressions.scala | 12 ++--
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala |  6 --
 4 files changed, 18 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6868b40 -> 1b60ff5)

2020-09-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6868b40  [SPARK-33020][PYTHON] Add nth_value as a PySpark function
 add 1b60ff5  [MINOR][DOCS] Document when `current_date` and 
`current_timestamp` are evaluated

No new revisions were added by this update.

Summary of changes:
 R/pkg/R/functions.R  |  6 --
 python/pyspark/sql/functions.py  |  6 --
 .../spark/sql/catalyst/expressions/datetimeExpressions.scala | 12 ++--
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala |  6 --
 4 files changed, 18 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (1b60ff5 -> 202115e)

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1b60ff5  [MINOR][DOCS] Document when `current_date` and 
`current_timestamp` are evaluated
 add 202115e  [SPARK-32948][SQL] Optimize to_json and from_json expression 
chain

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/OptimizeJsonExprs.scala |  43 ++
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |   1 +
 .../optimizer/OptimizeJsonExprsSuite.scala | 144 +
 3 files changed, 188 insertions(+)
 create mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJsonExprs.scala
 create mode 100644 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJsonExprsSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (424f16e -> 118de10)

2020-09-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 424f16e  [SPARK-33015][SQL] Compute the current date only once
 add 118de10  [MINOR][DOCS] Document when `current_date` and 
`current_timestamp` are evaluated

No new revisions were added by this update.

Summary of changes:
 R/pkg/R/functions.R  |  6 --
 python/pyspark/sql/functions.py  |  6 --
 .../spark/sql/catalyst/expressions/datetimeExpressions.scala | 12 ++--
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala |  6 --
 4 files changed, 18 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6868b40 -> 1b60ff5)

2020-09-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6868b40  [SPARK-33020][PYTHON] Add nth_value as a PySpark function
 add 1b60ff5  [MINOR][DOCS] Document when `current_date` and 
`current_timestamp` are evaluated

No new revisions were added by this update.

Summary of changes:
 R/pkg/R/functions.R  |  6 --
 python/pyspark/sql/functions.py  |  6 --
 .../spark/sql/catalyst/expressions/datetimeExpressions.scala | 12 ++--
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala |  6 --
 4 files changed, 18 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (424f16e -> 118de10)

2020-09-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 424f16e  [SPARK-33015][SQL] Compute the current date only once
 add 118de10  [MINOR][DOCS] Document when `current_date` and 
`current_timestamp` are evaluated

No new revisions were added by this update.

Summary of changes:
 R/pkg/R/functions.R  |  6 --
 python/pyspark/sql/functions.py  |  6 --
 .../spark/sql/catalyst/expressions/datetimeExpressions.scala | 12 ++--
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala |  6 --
 4 files changed, 18 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6868b40 -> 1b60ff5)

2020-09-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6868b40  [SPARK-33020][PYTHON] Add nth_value as a PySpark function
 add 1b60ff5  [MINOR][DOCS] Document when `current_date` and 
`current_timestamp` are evaluated

No new revisions were added by this update.

Summary of changes:
 R/pkg/R/functions.R  |  6 --
 python/pyspark/sql/functions.py  |  6 --
 .../spark/sql/catalyst/expressions/datetimeExpressions.scala | 12 ++--
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala |  6 --
 4 files changed, 18 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (68cd567 -> 6868b40)

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 68cd567  [SPARK-33015][SQL] Compute the current date only once
 add 6868b40  [SPARK-33020][PYTHON] Add nth_value as a PySpark function

No new revisions were added by this update.

Summary of changes:
 python/docs/source/reference/pyspark.sql.rst |  1 +
 python/pyspark/sql/functions.py  | 20 
 python/pyspark/sql/functions.pyi |  3 +++
 python/pyspark/sql/tests/test_functions.py   | 34 
 4 files changed, 58 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (68cd567 -> 6868b40)

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 68cd567  [SPARK-33015][SQL] Compute the current date only once
 add 6868b40  [SPARK-33020][PYTHON] Add nth_value as a PySpark function

No new revisions were added by this update.

Summary of changes:
 python/docs/source/reference/pyspark.sql.rst |  1 +
 python/pyspark/sql/functions.py  | 20 
 python/pyspark/sql/functions.pyi |  3 +++
 python/pyspark/sql/tests/test_functions.py   | 34 
 4 files changed, 58 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

66 matches

Mail list logo