date:20220706

[spark] branch branch-3.2 updated (1c0bd4c15a2 -> be891ad9908)

2022-07-06 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1c0bd4c15a2 [SPARK-39656][SQL][3.2] Fix wrong namespace in 
DescribeNamespaceExec
 add be891ad9908 [SPARK-39551][SQL][3.2] Add AQE invalid plan check

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 72 --
 .../adaptive/InvalidAQEPlanException.scala | 17 +++--
 .../sql/execution/adaptive/ValidateSparkPlan.scala | 68 
 .../adaptive/AdaptiveQueryExecSuite.scala  | 25 +++-
 4 files changed, 141 insertions(+), 41 deletions(-)
 copy core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala => 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InvalidAQEPlanException.scala
 (61%)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ValidateSparkPlan.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39503][SQL][FOLLOWUP] Fix ansi golden files and typo

2022-07-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 88b983d9f2a [SPARK-39503][SQL][FOLLOWUP] Fix ansi golden files and typo
88b983d9f2a is described below

commit 88b983d9f2a7190b8d74a6176740afb65fa08223
Author: ulysses-you 
AuthorDate: Thu Jul 7 13:17:11 2022 +0900

[SPARK-39503][SQL][FOLLOWUP] Fix ansi golden files and typo

### What changes were proposed in this pull request?

- re-generate ansi golden files
- fix FunctionIdentifier parameter name typo
### Why are the changes needed?

Fix ansi golden files and typo

### Does this PR introduce _any_ user-facing change?

no, not released

### How was this patch tested?

pass CI

Closes #37111 from ulysses-you/catalog-followup.

Authored-by: ulysses-you 
Signed-off-by: Hyukjin Kwon 
---
 .../apache/spark/sql/catalyst/identifiers.scala|  2 +-
 .../approved-plans-v1_4/q83.ansi/explain.txt   | 28 +++---
 .../approved-plans-v1_4/q83.ansi/simplified.txt| 14 +--
 .../approved-plans-v1_4/q83.sf100.ansi/explain.txt | 28 +++---
 .../q83.sf100.ansi/simplified.txt  | 14 +--
 5 files changed, 43 insertions(+), 43 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala
index 9cae2b622a7..2de44d6f349 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala
@@ -142,7 +142,7 @@ case class FunctionIdentifier(funcName: String, database: 
Option[String], catalo
   override val identifier: String = funcName
 
   def this(funcName: String) = this(funcName, None, None)
-  def this(table: String, database: Option[String]) = this(table, database, 
None)
+  def this(funcName: String, database: Option[String]) = this(funcName, 
database, None)
 
   override def toString: String = unquotedString
 }
diff --git 
a/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q83.ansi/explain.txt
 
b/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q83.ansi/explain.txt
index d281e59c727..905d29293a3 100644
--- 
a/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q83.ansi/explain.txt
+++ 
b/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q83.ansi/explain.txt
@@ -13,11 +13,11 @@ TakeOrderedAndProject (46)
   : :  :  +- * BroadcastHashJoin Inner BuildRight (8)
   : :  : :- * Filter (3)
   : :  : :  +- * ColumnarToRow (2)
-  : :  : : +- Scan parquet default.store_returns 
(1)
+  : :  : : +- Scan parquet 
spark_catalog.default.store_returns (1)
   : :  : +- BroadcastExchange (7)
   : :  :+- * Filter (6)
   : :  :   +- * ColumnarToRow (5)
-  : :  :  +- Scan parquet default.item (4)
+  : :  :  +- Scan parquet 
spark_catalog.default.item (4)
   : :  +- ReusedExchange (10)
   : +- BroadcastExchange (28)
   :+- * HashAggregate (27)
@@ -29,7 +29,7 @@ TakeOrderedAndProject (46)
   :   :  +- * BroadcastHashJoin Inner BuildRight (20)
   :   : :- * Filter (18)
   :   : :  +- * ColumnarToRow (17)
-  :   : : +- Scan parquet 
default.catalog_returns (16)
+  :   : : +- Scan parquet 
spark_catalog.default.catalog_returns (16)
   :   : +- ReusedExchange (19)
   :   +- ReusedExchange (22)
   +- BroadcastExchange (43)
@@ -42,12 +42,12 @@ TakeOrderedAndProject (46)
 :  +- * BroadcastHashJoin Inner BuildRight (35)
 : :- * Filter (33)
 : :  +- * ColumnarToRow (32)
-: : +- Scan parquet default.web_returns (31)
+: : +- Scan parquet 
spark_catalog.default.web_returns (31)
 : +- ReusedExchange (34)
 +- ReusedExchange (37)
 
 
-(1) Scan parquet default.store_returns
+(1) Scan parquet spark_catalog.default.store_returns
 Output [3]: [sr_item_sk#1, sr_return_quantity#2, sr_returned_date_sk#3]
 Batched: true
 Location: InMemoryFileIndex []
@@ -62,7 +62,7 @@ Input [3]: [sr_item_sk#1, sr_return_quantity#2, 
sr_returned_date_sk#3]
 Input [3]: [sr_item_sk#1,

[spark] branch branch-3.3 updated: [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT`

2022-07-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 016dfeb760d [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, 
COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT`
016dfeb760d is described below

commit 016dfeb760dbe1109e3c81c39bcd1bf3316a3e20
Author: Jiaan Geng 
AuthorDate: Thu Jul 7 09:55:45 2022 +0800

[SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR 
in `H2Dialect` if them with `DISTINCT`

https://github.com/apache/spark/pull/35145 compile COVAR_POP, COVAR_SAMP 
and CORR in H2Dialect.
Because H2 does't support COVAR_POP, COVAR_SAMP and CORR works with 
DISTINCT.
So https://github.com/apache/spark/pull/35145 introduces a bug that compile 
COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT.

Fix bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate 
functions with DISTINCT.

'Yes'.
Bug will be fix.

New test cases.

Closes #37090 from beliefer/SPARK-37527_followup2.

Authored-by: Jiaan Geng 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 14f2bae208c093dea58e3f947fb660e8345fb256)
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/jdbc/H2Dialect.scala  | 15 -
 .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 38 +++---
 2 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
index 4a88203ec59..967df112af2 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
@@ -55,18 +55,15 @@ private object H2Dialect extends JdbcDialect {
   assert(f.children().length == 1)
   val distinct = if (f.isDistinct) "DISTINCT " else ""
   Some(s"STDDEV_SAMP($distinct${f.children().head})")
-case f: GeneralAggregateFunc if f.name() == "COVAR_POP" =>
+case f: GeneralAggregateFunc if f.name() == "COVAR_POP" && 
!f.isDistinct =>
   assert(f.children().length == 2)
-  val distinct = if (f.isDistinct) "DISTINCT " else ""
-  Some(s"COVAR_POP($distinct${f.children().head}, 
${f.children().last})")
-case f: GeneralAggregateFunc if f.name() == "COVAR_SAMP" =>
+  Some(s"COVAR_POP(${f.children().head}, ${f.children().last})")
+case f: GeneralAggregateFunc if f.name() == "COVAR_SAMP" && 
!f.isDistinct =>
   assert(f.children().length == 2)
-  val distinct = if (f.isDistinct) "DISTINCT " else ""
-  Some(s"COVAR_SAMP($distinct${f.children().head}, 
${f.children().last})")
-case f: GeneralAggregateFunc if f.name() == "CORR" =>
+  Some(s"COVAR_SAMP(${f.children().head}, ${f.children().last})")
+case f: GeneralAggregateFunc if f.name() == "CORR" && !f.isDistinct =>
   assert(f.children().length == 2)
-  val distinct = if (f.isDistinct) "DISTINCT " else ""
-  Some(s"CORR($distinct${f.children().head}, ${f.children().last})")
+  Some(s"CORR(${f.children().head}, ${f.children().last})")
 case _ => None
   }
 )
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
index 2f94f9ef31e..293334084af 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
@@ -1028,23 +1028,37 @@ class JDBCV2Suite extends QueryTest with 
SharedSparkSession with ExplainSuiteHel
   }
 
   test("scan with aggregate push-down: COVAR_POP COVAR_SAMP with filter and 
group by") {
-val df = sql("select COVAR_POP(bonus, bonus), COVAR_SAMP(bonus, bonus)" +
-  " FROM h2.test.employee where dept > 0 group by DePt")
-checkFiltersRemoved(df)
-checkAggregateRemoved(df)
-checkPushedInfo(df, "PushedAggregates: [COVAR_POP(BONUS, BONUS), 
COVAR_SAMP(BONUS, BONUS)], " +
+val df1 = sql("SELECT COVAR_POP(bonus, bonus), COVAR_SAMP(bonus, bonus)" +
+  " FROM h2.test.employee WHERE dept > 0 GROUP BY DePt")
+checkFiltersRemoved(df1)
+checkAggregateRemoved(df1)
+checkPushedInfo(df1, "PushedAggregates: [COVAR_POP(BONUS, BONUS), 
COVAR_SAMP(BONUS, BONUS)], " +
   "PushedFilters: [DEPT IS NOT NULL, DEPT > 0], PushedGroupByExpressions: 
[DEPT]")
-checkAnswer(df, Seq(Row(1d, 2d), Row(2500d, 5000d), Row(0d, null)))
+checkAnswer(df1, Seq(Row(1d, 2d), Row(2500d, 5000d), Row(0d, 
null)))
+
+val df2 = sql("SELECT COVAR_POP(DISTINCT bonus, bonus), 
COVAR_SAMP(DISTINCT bonus, bonus)" +
+  " FROM h2.test.employee WHERE dept > 0

[spark] branch master updated: [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT`

2022-07-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 14f2bae208c [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, 
COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT`
14f2bae208c is described below

commit 14f2bae208c093dea58e3f947fb660e8345fb256
Author: Jiaan Geng 
AuthorDate: Thu Jul 7 09:55:45 2022 +0800

[SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR 
in `H2Dialect` if them with `DISTINCT`

### What changes were proposed in this pull request?
https://github.com/apache/spark/pull/35145 compile COVAR_POP, COVAR_SAMP 
and CORR in H2Dialect.
Because H2 does't support COVAR_POP, COVAR_SAMP and CORR works with 
DISTINCT.
So https://github.com/apache/spark/pull/35145 introduces a bug that compile 
COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT.

### Why are the changes needed?
Fix bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate 
functions with DISTINCT.

### Does this PR introduce _any_ user-facing change?
'Yes'.
Bug will be fix.

### How was this patch tested?
New test cases.

Closes #37090 from beliefer/SPARK-37527_followup2.

Authored-by: Jiaan Geng 
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/jdbc/H2Dialect.scala  | 15 --
 .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 34 +++---
 2 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
index 124cb001b5c..5dfc64d7b6c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
@@ -62,18 +62,15 @@ private[sql] object H2Dialect extends JdbcDialect {
   assert(f.children().length == 1)
   val distinct = if (f.isDistinct) "DISTINCT " else ""
   Some(s"STDDEV_SAMP($distinct${f.children().head})")
-case f: GeneralAggregateFunc if f.name() == "COVAR_POP" =>
+case f: GeneralAggregateFunc if f.name() == "COVAR_POP" && 
!f.isDistinct =>
   assert(f.children().length == 2)
-  val distinct = if (f.isDistinct) "DISTINCT " else ""
-  Some(s"COVAR_POP($distinct${f.children().head}, 
${f.children().last})")
-case f: GeneralAggregateFunc if f.name() == "COVAR_SAMP" =>
+  Some(s"COVAR_POP(${f.children().head}, ${f.children().last})")
+case f: GeneralAggregateFunc if f.name() == "COVAR_SAMP" && 
!f.isDistinct =>
   assert(f.children().length == 2)
-  val distinct = if (f.isDistinct) "DISTINCT " else ""
-  Some(s"COVAR_SAMP($distinct${f.children().head}, 
${f.children().last})")
-case f: GeneralAggregateFunc if f.name() == "CORR" =>
+  Some(s"COVAR_SAMP(${f.children().head}, ${f.children().last})")
+case f: GeneralAggregateFunc if f.name() == "CORR" && !f.isDistinct =>
   assert(f.children().length == 2)
-  val distinct = if (f.isDistinct) "DISTINCT " else ""
-  Some(s"CORR($distinct${f.children().head}, ${f.children().last})")
+  Some(s"CORR(${f.children().head}, ${f.children().last})")
 case _ => None
   }
 )
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
index 108348fbcd3..0a713bdb76c 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
@@ -1652,23 +1652,37 @@ class JDBCV2Suite extends QueryTest with 
SharedSparkSession with ExplainSuiteHel
   }
 
   test("scan with aggregate push-down: COVAR_POP COVAR_SAMP with filter and 
group by") {
-val df = sql("SELECT COVAR_POP(bonus, bonus), COVAR_SAMP(bonus, bonus)" +
+val df1 = sql("SELECT COVAR_POP(bonus, bonus), COVAR_SAMP(bonus, bonus)" +
   " FROM h2.test.employee WHERE dept > 0 GROUP BY DePt")
-checkFiltersRemoved(df)
-checkAggregateRemoved(df)
-checkPushedInfo(df, "PushedAggregates: [COVAR_POP(BONUS, BONUS), 
COVAR_SAMP(BONUS, BONUS)], " +
+checkFiltersRemoved(df1)
+checkAggregateRemoved(df1)
+checkPushedInfo(df1, "PushedAggregates: [COVAR_POP(BONUS, BONUS), 
COVAR_SAMP(BONUS, BONUS)], " +
   "PushedFilters: [DEPT IS NOT NULL, DEPT > 0], PushedGroupByExpressions: 
[DEPT]")
-checkAnswer(df, Seq(Row(1d, 2d), Row(2500d, 5000d), Row(0d, null)))
+checkAnswer(df1, Seq(Row(1d, 2d), Row(2500d, 5000d), Row(0d, 
null)))
+
+val df2 = sql("SELECT COVAR_POP(DISTINCT bonus, bonus), 
COVAR_SAMP(DISTINCT bonus, bonus)" +
+  " FROM

[spark] branch master updated: [SPARK-39697][INFRA] Add REFRESH_DATE flag and use previous cache to build cache image

2022-07-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2a2608ad557 [SPARK-39697][INFRA] Add REFRESH_DATE flag and use 
previous cache to build cache image
2a2608ad557 is described below

commit 2a2608ad557e3ebb160287b7d7fd9d14c251b3c2
Author: Yikun Jiang 
AuthorDate: Thu Jul 7 08:59:38 2022 +0900

[SPARK-39697][INFRA] Add REFRESH_DATE flag and use previous cache to build 
cache image

### What changes were proposed in this pull request?
This patch have two improvment:
- Add `cache-from`: this will help to speed up cache build and ensure the 
image will NOT do full refresh if `REFRESH_DATE` is not changed by intention.
- Add `FULL_REFRESH_DATE` in dockerfile: this will help force to do a full 
refresh.

### Why are the changes needed?
Without this PR, if you change the dockerfile, the cache image will do a 
**complete refreshed** when dockerfile with any changes. This cause the 
different behavoir between ci tmp image (cache based refresh, in 
pyspark/sparkr/lint job) and infra cache (full refresh, in build infra cache 
job).
Finally, if a PR refresh dockerfile, you might see pyspark/sparkr/lint CI 
is successful, but next pyspark/sparkr/lint CI failure after cache is refreshed 
(because deps may be changed when image do full refreshed).

After this PR, if you change the dockerfile, the cache image job will do a 
cache based refreshed (use previous cache as much as possible, and refreshed 
the left layers when cache mismatch) to keep same behavior of 
pyspark/sparkr/lint job result.

This behavior is similar to **static image** in some level, you can refresh 
the `FULL_REFRESH_DATE` to force refresh cache completely, the advantage is you 
can see the pyspark/sparkr/lint ci results in GA when you do full refresh.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Test local

Closes #37103 from Yikun/SPARK-39522-FOLLOWUP.

Authored-by: Yikun Jiang 
Signed-off-by: Hyukjin Kwon 
---
 .github/workflows/build_infra_images_cache.yml | 1 +
 dev/infra/Dockerfile   | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/.github/workflows/build_infra_images_cache.yml 
b/.github/workflows/build_infra_images_cache.yml
index 4ab27da7bdf..145769d1506 100644
--- a/.github/workflows/build_infra_images_cache.yml
+++ b/.github/workflows/build_infra_images_cache.yml
@@ -57,6 +57,7 @@ jobs:
   context: ./dev/infra/
   push: true
   tags: 
ghcr.io/apache/spark/apache-spark-github-action-image-cache:${{ github.ref_name 
}}
+  cache-from: 
type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-cache:${{
 github.ref_name }}
   cache-to: 
type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-cache:${{
 github.ref_name }},mode=max
   -
 name: Image digest
diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 8968b097251..e3ba4f6110b 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -18,6 +18,8 @@
 # Image for building and testing Spark branches. Based on Ubuntu 20.04.
 FROM ubuntu:20.04
 
+ENV FULL_REFRESH_DATE 20220706
+
 ENV DEBIAN_FRONTEND noninteractive
 ENV DEBCONF_NONINTERACTIVE_SEEN true
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39648][PYTHON][PS][DOC] Fix type hints of `like`, `rlike`, `ilike` of Column

2022-07-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new be7dab12677 [SPARK-39648][PYTHON][PS][DOC] Fix type hints of `like`, 
`rlike`, `ilike` of Column
be7dab12677 is described below

commit be7dab12677a180908b6ce37847abdda12adeb9b
Author: Xinrong Meng 
AuthorDate: Thu Jul 7 08:53:50 2022 +0900

[SPARK-39648][PYTHON][PS][DOC] Fix type hints of `like`, `rlike`, `ilike` 
of Column

### What changes were proposed in this pull request?
Fix type hints of `like`, `rlike`, `ilike` of Column.

### Why are the changes needed?
Current type hints are incorrect so the doc is confusing: `Union["Column", 
"LiteralType", "DecimalLiteral", "DateTimeLiteral"]]` is hinted whereas only 
`str` is accepted.

The PR is proposed to adjust the above issue by introducing 
`_bin_op_other_str`.

### Does this PR introduce _any_ user-facing change?
No. Doc change only.

### How was this patch tested?
Manual tests.

Closes #37038 from xinrong-databricks/like_rlike.

Authored-by: Xinrong Meng 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/series.py |   2 +-
 python/pyspark/sql/column.py| 117 +---
 2 files changed, 64 insertions(+), 55 deletions(-)

diff --git a/python/pyspark/pandas/series.py b/python/pyspark/pandas/series.py
index a7852c110f7..838077ed7cd 100644
--- a/python/pyspark/pandas/series.py
+++ b/python/pyspark/pandas/series.py
@@ -5024,7 +5024,7 @@ class Series(Frame, IndexOpsMixin, Generic[T]):
 else:
 if regex:
 # to_replace must be a string
-cond = self.spark.column.rlike(to_replace)
+cond = self.spark.column.rlike(cast(str, to_replace))
 else:
 cond = self.spark.column.isin(to_replace)
 # to_replace may be a scalar
diff --git a/python/pyspark/sql/column.py b/python/pyspark/sql/column.py
index 04458d560ee..31954a95690 100644
--- a/python/pyspark/sql/column.py
+++ b/python/pyspark/sql/column.py
@@ -573,57 +573,6 @@ class Column:
 >>> df.filter(df.name.contains('o')).collect()
 [Row(age=5, name='Bob')]
 """
-_rlike_doc = """
-SQL RLIKE expression (LIKE with Regex). Returns a boolean :class:`Column` 
based on a regex
-match.
-
-Parameters
---
-other : str
-an extended regex expression
-
-Examples
-
->>> df.filter(df.name.rlike('ice$')).collect()
-[Row(age=2, name='Alice')]
-"""
-_like_doc = """
-SQL like expression. Returns a boolean :class:`Column` based on a SQL LIKE 
match.
-
-Parameters
---
-other : str
-a SQL LIKE pattern
-
-See Also
-
-pyspark.sql.Column.rlike
-
-Examples
-
->>> df.filter(df.name.like('Al%')).collect()
-[Row(age=2, name='Alice')]
-"""
-_ilike_doc = """
-SQL ILIKE expression (case insensitive LIKE). Returns a boolean 
:class:`Column`
-based on a case insensitive match.
-
-.. versionadded:: 3.3.0
-
-Parameters
---
-other : str
-a SQL LIKE pattern
-
-See Also
-
-pyspark.sql.Column.rlike
-
-Examples
-
->>> df.filter(df.name.ilike('%Ice')).collect()
-[Row(age=2, name='Alice')]
-"""
 _startswith_doc = """
 String starts with. Returns a boolean :class:`Column` based on a string 
match.
 
@@ -656,12 +605,72 @@ class Column:
 """
 
 contains = _bin_op("contains", _contains_doc)
-rlike = _bin_op("rlike", _rlike_doc)
-like = _bin_op("like", _like_doc)
-ilike = _bin_op("ilike", _ilike_doc)
 startswith = _bin_op("startsWith", _startswith_doc)
 endswith = _bin_op("endsWith", _endswith_doc)
 
+def like(self: "Column", other: str) -> "Column":
+"""
+SQL like expression. Returns a boolean :class:`Column` based on a SQL 
LIKE match.
+
+Parameters
+--
+other : str
+a SQL LIKE pattern
+
+See Also
+
+pyspark.sql.Column.rlike
+
+Examples
+
+>>> df.filter(df.name.like('Al%')).collect()
+[Row(age=2, name='Alice')]
+"""
+njc = getattr(self._jc, "like")(other)
+return Column(njc)
+
+def rlike(self: "Column", other: str) -> "Column":
+"""
+SQL RLIKE expression (LIKE with Regex). Returns a boolean 
:class:`Column` based on a regex
+match.
+
+Parameters
+--
+other : str
+an extended regex expression
+
+Examples
+
+>>> df.filter(df.name.rlike('ice$')).collect()
+[Row(age=2, name='Alice')]
+"""
+njc = getattr(self._jc,

[spark] branch master updated: [SPARK-39701][CORE][K8S][TESTS] Move `withSecretFile` to `SparkFunSuite` to reuse

2022-07-06 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1cf4fe5cd4d [SPARK-39701][CORE][K8S][TESTS] Move `withSecretFile` to 
`SparkFunSuite` to reuse
1cf4fe5cd4d is described below

commit 1cf4fe5cd4dedd6ccd38fc9c159069f7c5a72191
Author: Dongjoon Hyun 
AuthorDate: Wed Jul 6 16:27:58 2022 -0700

[SPARK-39701][CORE][K8S][TESTS] Move `withSecretFile` to `SparkFunSuite` to 
reuse

### What changes were proposed in this pull request?

This PR aims to move `withSecretFile` to `SparkFunSuite` and reuse it in 
Kubernetes tests.

### Why are the changes needed?

Currently, K8s unit tests generate a leftover because it doesn't clean up 
the temporary secret files. By reusing the existing method, we can avoid this
```
$ build/sbt -Pkubernetes "kubernetes/test"
$ git status
On branch master
Your branch is up to date with 'apache/master'.

Untracked files:
  (use "git add ..." to include in what will be committed)
resource-managers/kubernetes/core/temp-secret/
```

### Does this PR introduce _any_ user-facing change?

No. This is a test-only change.

### How was this patch tested?

Pass the CIs.

Closes #37106 from dongjoon-hyun/SPARK-39701.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/SecurityManagerSuite.scala| 11 +---
 .../scala/org/apache/spark/SparkFunSuite.scala | 16 ++-
 .../features/BasicExecutorFeatureStepSuite.scala   | 31 +-
 3 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala 
b/core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala
index 44e338c6f00..a11ecc22d0b 100644
--- a/core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala
@@ -29,7 +29,7 @@ import org.apache.spark.internal.config._
 import org.apache.spark.internal.config.UI._
 import org.apache.spark.launcher.SparkLauncher
 import org.apache.spark.security.GroupMappingServiceProvider
-import org.apache.spark.util.{ResetSystemProperties, SparkConfWithEnv, Utils}
+import org.apache.spark.util.{ResetSystemProperties, SparkConfWithEnv}
 
 class DummyGroupMappingServiceProvider extends GroupMappingServiceProvider {
 
@@ -513,14 +513,5 @@ class SecurityManagerSuite extends SparkFunSuite with 
ResetSystemProperties {
   private def encodeFileAsBase64(secretFile: File) = {
 Base64.getEncoder.encodeToString(Files.readAllBytes(secretFile.toPath))
   }
-
-  private def withSecretFile(contents: String = "test-secret")(f: File => 
Unit): Unit = {
-val secretDir = Utils.createTempDir("temp-secrets")
-val secretFile = new File(secretDir, "temp-secret.txt")
-Files.write(secretFile.toPath, contents.getBytes(UTF_8))
-try f(secretFile) finally {
-  Utils.deleteRecursively(secretDir)
-}
-  }
 }
 
diff --git a/core/src/test/scala/org/apache/spark/SparkFunSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkFunSuite.scala
index 7922e13db69..b17aacc0a9f 100644
--- a/core/src/test/scala/org/apache/spark/SparkFunSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkFunSuite.scala
@@ -18,7 +18,8 @@
 package org.apache.spark
 
 import java.io.File
-import java.nio.file.Path
+import java.nio.charset.StandardCharsets.UTF_8
+import java.nio.file.{Files, Path}
 import java.util.{Locale, TimeZone}
 
 import scala.annotation.tailrec
@@ -223,6 +224,19 @@ abstract class SparkFunSuite
 }
   }
 
+  /**
+   * Creates a temporary directory containing a secret file, which is then 
passed to `f` and
+   * will be deleted after `f` returns.
+   */
+  protected def withSecretFile(contents: String = "test-secret")(f: File => 
Unit): Unit = {
+val secretDir = Utils.createTempDir("temp-secrets")
+val secretFile = new File(secretDir, "temp-secret.txt")
+Files.write(secretFile.toPath, contents.getBytes(UTF_8))
+try f(secretFile) finally {
+  Utils.deleteRecursively(secretDir)
+}
+  }
+
   /**
* Adds a log appender and optionally sets a log level to the root logger or 
the logger with
* the specified name, then executes the specified function, and in the end 
removes the log
diff --git 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala
 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala
index 84c4f3b8ba3..420edddb693 100644
--- 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala
+++

[GitHub] [spark-website] holdenk commented on pull request #400: [SPARK-39512] Document docker image release steps

2022-07-06 Thread GitBox



holdenk commented on PR #400:
URL: https://github.com/apache/spark-website/pull/400#issuecomment-1176618299

   ping @MaxGekk & @tgravescs @gengliangwang since y'all had comments on the 
first draft, this one looking ok?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39663][SQL][TESTS] Add UT for MysqlDialect listIndexes method

2022-07-06 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9983bdb3b88 [SPARK-39663][SQL][TESTS] Add UT for MysqlDialect 
listIndexes method
9983bdb3b88 is described below

commit 9983bdb3b882a083cba9785392c3ba5d7a36496a
Author: panbingkun 
AuthorDate: Wed Jul 6 11:27:17 2022 -0500

[SPARK-39663][SQL][TESTS] Add UT for MysqlDialect listIndexes method

### What changes were proposed in this pull request?
Add complemented UT for MysqlDialect's lustIndexes method.

### Why are the changes needed?
Add UT for existed function & improve test coverage.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #37060 from panbingkun/SPARK-39663.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
---
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala  |  2 ++
 .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala  | 30 ++
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   |  2 +-
 3 files changed, 28 insertions(+), 6 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
index 97f521a378e..6e76b74c7d8 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
@@ -119,6 +119,8 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationV2Suite with V2JDBCTest
 
   override def supportsIndex: Boolean = true
 
+  override def supportListIndexes: Boolean = true
+
   override def indexOptions: String = "KEY_BLOCK_SIZE=10"
 
   testVarPop()
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala
index 5f0033490d5..0f85bd534c3 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala
@@ -197,6 +197,8 @@ private[v2] trait V2JDBCTest extends SharedSparkSession 
with DockerIntegrationFu
 
   def supportsIndex: Boolean = false
 
+  def supportListIndexes: Boolean = false
+
   def indexOptions: String = ""
 
   test("SPARK-36895: Test INDEX Using SQL") {
@@ -219,11 +221,21 @@ private[v2] trait V2JDBCTest extends SharedSparkSession 
with DockerIntegrationFu
   s" The supported Index Types are:"))
 
 sql(s"CREATE index i1 ON $catalogName.new_table USING BTREE (col1)")
+assert(jdbcTable.indexExists("i1"))
+if (supportListIndexes) {
+  val indexes = jdbcTable.listIndexes()
+  assert(indexes.size == 1)
+  assert(indexes.head.indexName() == "i1")
+}
+
 sql(s"CREATE index i2 ON $catalogName.new_table (col2, col3, col5)" +
   s" OPTIONS ($indexOptions)")
-
-assert(jdbcTable.indexExists("i1") == true)
-assert(jdbcTable.indexExists("i2") == true)
+assert(jdbcTable.indexExists("i2"))
+if (supportListIndexes) {
+  val indexes = jdbcTable.listIndexes()
+  assert(indexes.size == 2)
+  assert(indexes.map(_.indexName()).sorted === Array("i1", "i2"))
+}
 
 // This should pass without exception
 sql(s"CREATE index IF NOT EXISTS i1 ON $catalogName.new_table (col1)")
@@ -234,10 +246,18 @@ private[v2] trait V2JDBCTest extends SharedSparkSession 
with DockerIntegrationFu
 assert(m.contains("Failed to create index i1 in new_table"))
 
 sql(s"DROP index i1 ON $catalogName.new_table")
-sql(s"DROP index i2 ON $catalogName.new_table")
-
 assert(jdbcTable.indexExists("i1") == false)
+if (supportListIndexes) {
+  val indexes = jdbcTable.listIndexes()
+  assert(indexes.size == 1)
+  assert(indexes.head.indexName() == "i2")
+}
+
+sql(s"DROP index i2 ON $catalogName.new_table")
 assert(jdbcTable.indexExists("i2") == false)
+if (supportListIndexes) {
+  assert(jdbcTable.listIndexes().isEmpty)
+}
 
 // This should pass without exception
 sql(s"DROP index IF EXISTS i1 ON $catalogName.new_table")
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala
index 24f9bac74f8..c4cb5369af9 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala
+++

[spark] branch master updated: [SPARK-39691][TESTS] Update `MapStatusesConvertBenchmark` result files

2022-07-06 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3cf07c3e031 [SPARK-39691][TESTS] Update `MapStatusesConvertBenchmark` 
result files
3cf07c3e031 is described below

commit 3cf07c3e03195b95a37d3635f736fe78b70d22f7
Author: yangjie01 
AuthorDate: Wed Jul 6 09:23:37 2022 -0700

[SPARK-39691][TESTS] Update `MapStatusesConvertBenchmark` result files

### What changes were proposed in this pull request?
SPARK-39325 add `MapStatusesConvertBenchmark` but only upload result file 
generated by Java 8, so this pr supplement `MapStatusesConvertBenchmark` result 
generated by Java 11 and 17.

On the other hand, SPARK-39626 upgraded `RoaringBitmap` from `0.9.28` to 
`0.9.30` and from the `IntelliJ Profiler` sampling, the hotspot path of 
`MapStatusesConvertBenchmark` contains `RoaringBitmap#contains`, so this pr 
also updated the result file generated by Java 8.

### Why are the changes needed?
Update `MapStatusesConvertBenchmark` result files

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

Closes #37100 from LuciferYang/MapStatusesConvertBenchmark-result.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 .../MapStatusesConvertBenchmark-jdk11-results.txt   | 13 +
 ...ts.txt => MapStatusesConvertBenchmark-jdk17-results.txt} |  8 
 core/benchmarks/MapStatusesConvertBenchmark-results.txt | 10 +-
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/core/benchmarks/MapStatusesConvertBenchmark-jdk11-results.txt 
b/core/benchmarks/MapStatusesConvertBenchmark-jdk11-results.txt
new file mode 100644
index 000..96fa24175c5
--- /dev/null
+++ b/core/benchmarks/MapStatusesConvertBenchmark-jdk11-results.txt
@@ -0,0 +1,13 @@
+
+MapStatuses Convert Benchmark
+
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1031-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+MapStatuses Convert:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+
+Num Maps: 5 Fetch partitions:500   1324   1333 
  7  0.0  1324283680.0   1.0X
+Num Maps: 5 Fetch partitions:1000  2650   2670 
 32  0.0  2650318387.0   0.5X
+Num Maps: 5 Fetch partitions:1500  4018   4059 
 53  0.0  4017921009.0   0.3X
+
+
diff --git a/core/benchmarks/MapStatusesConvertBenchmark-results.txt 
b/core/benchmarks/MapStatusesConvertBenchmark-jdk17-results.txt
similarity index 54%
copy from core/benchmarks/MapStatusesConvertBenchmark-results.txt
copy to core/benchmarks/MapStatusesConvertBenchmark-jdk17-results.txt
index f41401bbe2e..0ba8d756dfc 100644
--- a/core/benchmarks/MapStatusesConvertBenchmark-results.txt
+++ b/core/benchmarks/MapStatusesConvertBenchmark-jdk17-results.txt
@@ -2,12 +2,12 @@
 MapStatuses Convert Benchmark
 

 
-OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1025-azure
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1031-azure
 Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
 MapStatuses Convert:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Num Maps: 5 Fetch partitions:500   1330   1359 
 26  0.0  1329827185.0   1.0X
-Num Maps: 5 Fetch partitions:1000  2648   2666 
 20  0.0  2647944453.0   0.5X
-Num Maps: 5 Fetch partitions:1500  4155   4436 
383  0.0  4154563448.0   0.3X
+Num Maps: 5 Fetch partitions:500   1092   1104 
 22  0.0  1091691925.0   1.0X
+Num Maps: 5 Fetch partitions:1000  2172   2192 
 29  0.0  2171702137.0   0.5X
+Num Maps: 5 Fetch partitions:1500  3268   3291 
 27  0.0  3267904436.0   0.3X
 
 
diff --git a/core/benchmarks/MapStatusesConvertBenchmark-results.txt 
b/core/benchmarks/MapStatusesConvertBenchmark-results.txt
index f41401bbe2e..ae84abfdcc2 100644
---

[spark] branch branch-3.2 updated (1c0bd4c15a2 -> be891ad9908)

[spark] branch master updated: [SPARK-39503][SQL][FOLLOWUP] Fix ansi golden files and typo

[spark] branch branch-3.3 updated: [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT`

[spark] branch master updated: [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT`

[spark] branch master updated: [SPARK-39697][INFRA] Add REFRESH_DATE flag and use previous cache to build cache image

[spark] branch master updated: [SPARK-39648][PYTHON][PS][DOC] Fix type hints of `like`, `rlike`, `ilike` of Column

[spark] branch master updated: [SPARK-39701][CORE][K8S][TESTS] Move `withSecretFile` to `SparkFunSuite` to reuse

[GitHub] [spark-website] holdenk commented on pull request #400: [SPARK-39512] Document docker image release steps

[spark] branch master updated: [SPARK-39663][SQL][TESTS] Add UT for MysqlDialect listIndexes method

[spark] branch master updated: [SPARK-39691][TESTS] Update `MapStatusesConvertBenchmark` result files

10 matches

Site Navigation

Mail list logo

Footer information