[spark] branch branch-3.2 updated (1c0bd4c15a2 -> be891ad9908)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git from 1c0bd4c15a2 [SPARK-39656][SQL][3.2] Fix wrong namespace in DescribeNamespaceExec add be891ad9908 [SPARK-39551][SQL][3.2] Add AQE invalid plan check No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 72 -- .../adaptive/InvalidAQEPlanException.scala | 17 +++-- .../sql/execution/adaptive/ValidateSparkPlan.scala | 68 .../adaptive/AdaptiveQueryExecSuite.scala | 25 +++- 4 files changed, 141 insertions(+), 41 deletions(-) copy core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala => sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InvalidAQEPlanException.scala (61%) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ValidateSparkPlan.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39503][SQL][FOLLOWUP] Fix ansi golden files and typo
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 88b983d9f2a [SPARK-39503][SQL][FOLLOWUP] Fix ansi golden files and typo 88b983d9f2a is described below commit 88b983d9f2a7190b8d74a6176740afb65fa08223 Author: ulysses-you AuthorDate: Thu Jul 7 13:17:11 2022 +0900 [SPARK-39503][SQL][FOLLOWUP] Fix ansi golden files and typo ### What changes were proposed in this pull request? - re-generate ansi golden files - fix FunctionIdentifier parameter name typo ### Why are the changes needed? Fix ansi golden files and typo ### Does this PR introduce _any_ user-facing change? no, not released ### How was this patch tested? pass CI Closes #37111 from ulysses-you/catalog-followup. Authored-by: ulysses-you Signed-off-by: Hyukjin Kwon --- .../apache/spark/sql/catalyst/identifiers.scala| 2 +- .../approved-plans-v1_4/q83.ansi/explain.txt | 28 +++--- .../approved-plans-v1_4/q83.ansi/simplified.txt| 14 +-- .../approved-plans-v1_4/q83.sf100.ansi/explain.txt | 28 +++--- .../q83.sf100.ansi/simplified.txt | 14 +-- 5 files changed, 43 insertions(+), 43 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala index 9cae2b622a7..2de44d6f349 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala @@ -142,7 +142,7 @@ case class FunctionIdentifier(funcName: String, database: Option[String], catalo override val identifier: String = funcName def this(funcName: String) = this(funcName, None, None) - def this(table: String, database: Option[String]) = this(table, database, None) + def this(funcName: String, database: Option[String]) = this(funcName, database, None) override def toString: String = unquotedString } diff --git a/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q83.ansi/explain.txt b/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q83.ansi/explain.txt index d281e59c727..905d29293a3 100644 --- a/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q83.ansi/explain.txt +++ b/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q83.ansi/explain.txt @@ -13,11 +13,11 @@ TakeOrderedAndProject (46) : : : +- * BroadcastHashJoin Inner BuildRight (8) : : : :- * Filter (3) : : : : +- * ColumnarToRow (2) - : : : : +- Scan parquet default.store_returns (1) + : : : : +- Scan parquet spark_catalog.default.store_returns (1) : : : +- BroadcastExchange (7) : : :+- * Filter (6) : : : +- * ColumnarToRow (5) - : : : +- Scan parquet default.item (4) + : : : +- Scan parquet spark_catalog.default.item (4) : : +- ReusedExchange (10) : +- BroadcastExchange (28) :+- * HashAggregate (27) @@ -29,7 +29,7 @@ TakeOrderedAndProject (46) : : +- * BroadcastHashJoin Inner BuildRight (20) : : :- * Filter (18) : : : +- * ColumnarToRow (17) - : : : +- Scan parquet default.catalog_returns (16) + : : : +- Scan parquet spark_catalog.default.catalog_returns (16) : : +- ReusedExchange (19) : +- ReusedExchange (22) +- BroadcastExchange (43) @@ -42,12 +42,12 @@ TakeOrderedAndProject (46) : +- * BroadcastHashJoin Inner BuildRight (35) : :- * Filter (33) : : +- * ColumnarToRow (32) -: : +- Scan parquet default.web_returns (31) +: : +- Scan parquet spark_catalog.default.web_returns (31) : +- ReusedExchange (34) +- ReusedExchange (37) -(1) Scan parquet default.store_returns +(1) Scan parquet spark_catalog.default.store_returns Output [3]: [sr_item_sk#1, sr_return_quantity#2, sr_returned_date_sk#3] Batched: true Location: InMemoryFileIndex [] @@ -62,7 +62,7 @@ Input [3]: [sr_item_sk#1, sr_return_quantity#2, sr_returned_date_sk#3] Input [3]: [sr_item_sk#1,
[spark] branch branch-3.3 updated: [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT`
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 016dfeb760d [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT` 016dfeb760d is described below commit 016dfeb760dbe1109e3c81c39bcd1bf3316a3e20 Author: Jiaan Geng AuthorDate: Thu Jul 7 09:55:45 2022 +0800 [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT` https://github.com/apache/spark/pull/35145 compile COVAR_POP, COVAR_SAMP and CORR in H2Dialect. Because H2 does't support COVAR_POP, COVAR_SAMP and CORR works with DISTINCT. So https://github.com/apache/spark/pull/35145 introduces a bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. Fix bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. 'Yes'. Bug will be fix. New test cases. Closes #37090 from beliefer/SPARK-37527_followup2. Authored-by: Jiaan Geng Signed-off-by: Wenchen Fan (cherry picked from commit 14f2bae208c093dea58e3f947fb660e8345fb256) Signed-off-by: Wenchen Fan --- .../org/apache/spark/sql/jdbc/H2Dialect.scala | 15 - .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 38 +++--- 2 files changed, 32 insertions(+), 21 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala index 4a88203ec59..967df112af2 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala @@ -55,18 +55,15 @@ private object H2Dialect extends JdbcDialect { assert(f.children().length == 1) val distinct = if (f.isDistinct) "DISTINCT " else "" Some(s"STDDEV_SAMP($distinct${f.children().head})") -case f: GeneralAggregateFunc if f.name() == "COVAR_POP" => +case f: GeneralAggregateFunc if f.name() == "COVAR_POP" && !f.isDistinct => assert(f.children().length == 2) - val distinct = if (f.isDistinct) "DISTINCT " else "" - Some(s"COVAR_POP($distinct${f.children().head}, ${f.children().last})") -case f: GeneralAggregateFunc if f.name() == "COVAR_SAMP" => + Some(s"COVAR_POP(${f.children().head}, ${f.children().last})") +case f: GeneralAggregateFunc if f.name() == "COVAR_SAMP" && !f.isDistinct => assert(f.children().length == 2) - val distinct = if (f.isDistinct) "DISTINCT " else "" - Some(s"COVAR_SAMP($distinct${f.children().head}, ${f.children().last})") -case f: GeneralAggregateFunc if f.name() == "CORR" => + Some(s"COVAR_SAMP(${f.children().head}, ${f.children().last})") +case f: GeneralAggregateFunc if f.name() == "CORR" && !f.isDistinct => assert(f.children().length == 2) - val distinct = if (f.isDistinct) "DISTINCT " else "" - Some(s"CORR($distinct${f.children().head}, ${f.children().last})") + Some(s"CORR(${f.children().head}, ${f.children().last})") case _ => None } ) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala index 2f94f9ef31e..293334084af 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala @@ -1028,23 +1028,37 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel } test("scan with aggregate push-down: COVAR_POP COVAR_SAMP with filter and group by") { -val df = sql("select COVAR_POP(bonus, bonus), COVAR_SAMP(bonus, bonus)" + - " FROM h2.test.employee where dept > 0 group by DePt") -checkFiltersRemoved(df) -checkAggregateRemoved(df) -checkPushedInfo(df, "PushedAggregates: [COVAR_POP(BONUS, BONUS), COVAR_SAMP(BONUS, BONUS)], " + +val df1 = sql("SELECT COVAR_POP(bonus, bonus), COVAR_SAMP(bonus, bonus)" + + " FROM h2.test.employee WHERE dept > 0 GROUP BY DePt") +checkFiltersRemoved(df1) +checkAggregateRemoved(df1) +checkPushedInfo(df1, "PushedAggregates: [COVAR_POP(BONUS, BONUS), COVAR_SAMP(BONUS, BONUS)], " + "PushedFilters: [DEPT IS NOT NULL, DEPT > 0], PushedGroupByExpressions: [DEPT]") -checkAnswer(df, Seq(Row(1d, 2d), Row(2500d, 5000d), Row(0d, null))) +checkAnswer(df1, Seq(Row(1d, 2d), Row(2500d, 5000d), Row(0d, null))) + +val df2 = sql("SELECT COVAR_POP(DISTINCT bonus, bonus), COVAR_SAMP(DISTINCT bonus, bonus)" + + " FROM h2.test.employee WHERE dept > 0
[spark] branch master updated: [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT`
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 14f2bae208c [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT` 14f2bae208c is described below commit 14f2bae208c093dea58e3f947fb660e8345fb256 Author: Jiaan Geng AuthorDate: Thu Jul 7 09:55:45 2022 +0800 [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT` ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/35145 compile COVAR_POP, COVAR_SAMP and CORR in H2Dialect. Because H2 does't support COVAR_POP, COVAR_SAMP and CORR works with DISTINCT. So https://github.com/apache/spark/pull/35145 introduces a bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. ### Why are the changes needed? Fix bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. ### Does this PR introduce _any_ user-facing change? 'Yes'. Bug will be fix. ### How was this patch tested? New test cases. Closes #37090 from beliefer/SPARK-37527_followup2. Authored-by: Jiaan Geng Signed-off-by: Wenchen Fan --- .../org/apache/spark/sql/jdbc/H2Dialect.scala | 15 -- .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 34 +++--- 2 files changed, 30 insertions(+), 19 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala index 124cb001b5c..5dfc64d7b6c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala @@ -62,18 +62,15 @@ private[sql] object H2Dialect extends JdbcDialect { assert(f.children().length == 1) val distinct = if (f.isDistinct) "DISTINCT " else "" Some(s"STDDEV_SAMP($distinct${f.children().head})") -case f: GeneralAggregateFunc if f.name() == "COVAR_POP" => +case f: GeneralAggregateFunc if f.name() == "COVAR_POP" && !f.isDistinct => assert(f.children().length == 2) - val distinct = if (f.isDistinct) "DISTINCT " else "" - Some(s"COVAR_POP($distinct${f.children().head}, ${f.children().last})") -case f: GeneralAggregateFunc if f.name() == "COVAR_SAMP" => + Some(s"COVAR_POP(${f.children().head}, ${f.children().last})") +case f: GeneralAggregateFunc if f.name() == "COVAR_SAMP" && !f.isDistinct => assert(f.children().length == 2) - val distinct = if (f.isDistinct) "DISTINCT " else "" - Some(s"COVAR_SAMP($distinct${f.children().head}, ${f.children().last})") -case f: GeneralAggregateFunc if f.name() == "CORR" => + Some(s"COVAR_SAMP(${f.children().head}, ${f.children().last})") +case f: GeneralAggregateFunc if f.name() == "CORR" && !f.isDistinct => assert(f.children().length == 2) - val distinct = if (f.isDistinct) "DISTINCT " else "" - Some(s"CORR($distinct${f.children().head}, ${f.children().last})") + Some(s"CORR(${f.children().head}, ${f.children().last})") case _ => None } ) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala index 108348fbcd3..0a713bdb76c 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala @@ -1652,23 +1652,37 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel } test("scan with aggregate push-down: COVAR_POP COVAR_SAMP with filter and group by") { -val df = sql("SELECT COVAR_POP(bonus, bonus), COVAR_SAMP(bonus, bonus)" + +val df1 = sql("SELECT COVAR_POP(bonus, bonus), COVAR_SAMP(bonus, bonus)" + " FROM h2.test.employee WHERE dept > 0 GROUP BY DePt") -checkFiltersRemoved(df) -checkAggregateRemoved(df) -checkPushedInfo(df, "PushedAggregates: [COVAR_POP(BONUS, BONUS), COVAR_SAMP(BONUS, BONUS)], " + +checkFiltersRemoved(df1) +checkAggregateRemoved(df1) +checkPushedInfo(df1, "PushedAggregates: [COVAR_POP(BONUS, BONUS), COVAR_SAMP(BONUS, BONUS)], " + "PushedFilters: [DEPT IS NOT NULL, DEPT > 0], PushedGroupByExpressions: [DEPT]") -checkAnswer(df, Seq(Row(1d, 2d), Row(2500d, 5000d), Row(0d, null))) +checkAnswer(df1, Seq(Row(1d, 2d), Row(2500d, 5000d), Row(0d, null))) + +val df2 = sql("SELECT COVAR_POP(DISTINCT bonus, bonus), COVAR_SAMP(DISTINCT bonus, bonus)" + + " FROM
[spark] branch master updated: [SPARK-39697][INFRA] Add REFRESH_DATE flag and use previous cache to build cache image
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2a2608ad557 [SPARK-39697][INFRA] Add REFRESH_DATE flag and use previous cache to build cache image 2a2608ad557 is described below commit 2a2608ad557e3ebb160287b7d7fd9d14c251b3c2 Author: Yikun Jiang AuthorDate: Thu Jul 7 08:59:38 2022 +0900 [SPARK-39697][INFRA] Add REFRESH_DATE flag and use previous cache to build cache image ### What changes were proposed in this pull request? This patch have two improvment: - Add `cache-from`: this will help to speed up cache build and ensure the image will NOT do full refresh if `REFRESH_DATE` is not changed by intention. - Add `FULL_REFRESH_DATE` in dockerfile: this will help force to do a full refresh. ### Why are the changes needed? Without this PR, if you change the dockerfile, the cache image will do a **complete refreshed** when dockerfile with any changes. This cause the different behavoir between ci tmp image (cache based refresh, in pyspark/sparkr/lint job) and infra cache (full refresh, in build infra cache job). Finally, if a PR refresh dockerfile, you might see pyspark/sparkr/lint CI is successful, but next pyspark/sparkr/lint CI failure after cache is refreshed (because deps may be changed when image do full refreshed). After this PR, if you change the dockerfile, the cache image job will do a cache based refreshed (use previous cache as much as possible, and refreshed the left layers when cache mismatch) to keep same behavior of pyspark/sparkr/lint job result. This behavior is similar to **static image** in some level, you can refresh the `FULL_REFRESH_DATE` to force refresh cache completely, the advantage is you can see the pyspark/sparkr/lint ci results in GA when you do full refresh. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test local Closes #37103 from Yikun/SPARK-39522-FOLLOWUP. Authored-by: Yikun Jiang Signed-off-by: Hyukjin Kwon --- .github/workflows/build_infra_images_cache.yml | 1 + dev/infra/Dockerfile | 2 ++ 2 files changed, 3 insertions(+) diff --git a/.github/workflows/build_infra_images_cache.yml b/.github/workflows/build_infra_images_cache.yml index 4ab27da7bdf..145769d1506 100644 --- a/.github/workflows/build_infra_images_cache.yml +++ b/.github/workflows/build_infra_images_cache.yml @@ -57,6 +57,7 @@ jobs: context: ./dev/infra/ push: true tags: ghcr.io/apache/spark/apache-spark-github-action-image-cache:${{ github.ref_name }} + cache-from: type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-cache:${{ github.ref_name }} cache-to: type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-cache:${{ github.ref_name }},mode=max - name: Image digest diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 8968b097251..e3ba4f6110b 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -18,6 +18,8 @@ # Image for building and testing Spark branches. Based on Ubuntu 20.04. FROM ubuntu:20.04 +ENV FULL_REFRESH_DATE 20220706 + ENV DEBIAN_FRONTEND noninteractive ENV DEBCONF_NONINTERACTIVE_SEEN true - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39648][PYTHON][PS][DOC] Fix type hints of `like`, `rlike`, `ilike` of Column
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new be7dab12677 [SPARK-39648][PYTHON][PS][DOC] Fix type hints of `like`, `rlike`, `ilike` of Column be7dab12677 is described below commit be7dab12677a180908b6ce37847abdda12adeb9b Author: Xinrong Meng AuthorDate: Thu Jul 7 08:53:50 2022 +0900 [SPARK-39648][PYTHON][PS][DOC] Fix type hints of `like`, `rlike`, `ilike` of Column ### What changes were proposed in this pull request? Fix type hints of `like`, `rlike`, `ilike` of Column. ### Why are the changes needed? Current type hints are incorrect so the doc is confusing: `Union["Column", "LiteralType", "DecimalLiteral", "DateTimeLiteral"]]` is hinted whereas only `str` is accepted. The PR is proposed to adjust the above issue by introducing `_bin_op_other_str`. ### Does this PR introduce _any_ user-facing change? No. Doc change only. ### How was this patch tested? Manual tests. Closes #37038 from xinrong-databricks/like_rlike. Authored-by: Xinrong Meng Signed-off-by: Hyukjin Kwon --- python/pyspark/pandas/series.py | 2 +- python/pyspark/sql/column.py| 117 +--- 2 files changed, 64 insertions(+), 55 deletions(-) diff --git a/python/pyspark/pandas/series.py b/python/pyspark/pandas/series.py index a7852c110f7..838077ed7cd 100644 --- a/python/pyspark/pandas/series.py +++ b/python/pyspark/pandas/series.py @@ -5024,7 +5024,7 @@ class Series(Frame, IndexOpsMixin, Generic[T]): else: if regex: # to_replace must be a string -cond = self.spark.column.rlike(to_replace) +cond = self.spark.column.rlike(cast(str, to_replace)) else: cond = self.spark.column.isin(to_replace) # to_replace may be a scalar diff --git a/python/pyspark/sql/column.py b/python/pyspark/sql/column.py index 04458d560ee..31954a95690 100644 --- a/python/pyspark/sql/column.py +++ b/python/pyspark/sql/column.py @@ -573,57 +573,6 @@ class Column: >>> df.filter(df.name.contains('o')).collect() [Row(age=5, name='Bob')] """ -_rlike_doc = """ -SQL RLIKE expression (LIKE with Regex). Returns a boolean :class:`Column` based on a regex -match. - -Parameters --- -other : str -an extended regex expression - -Examples - ->>> df.filter(df.name.rlike('ice$')).collect() -[Row(age=2, name='Alice')] -""" -_like_doc = """ -SQL like expression. Returns a boolean :class:`Column` based on a SQL LIKE match. - -Parameters --- -other : str -a SQL LIKE pattern - -See Also - -pyspark.sql.Column.rlike - -Examples - ->>> df.filter(df.name.like('Al%')).collect() -[Row(age=2, name='Alice')] -""" -_ilike_doc = """ -SQL ILIKE expression (case insensitive LIKE). Returns a boolean :class:`Column` -based on a case insensitive match. - -.. versionadded:: 3.3.0 - -Parameters --- -other : str -a SQL LIKE pattern - -See Also - -pyspark.sql.Column.rlike - -Examples - ->>> df.filter(df.name.ilike('%Ice')).collect() -[Row(age=2, name='Alice')] -""" _startswith_doc = """ String starts with. Returns a boolean :class:`Column` based on a string match. @@ -656,12 +605,72 @@ class Column: """ contains = _bin_op("contains", _contains_doc) -rlike = _bin_op("rlike", _rlike_doc) -like = _bin_op("like", _like_doc) -ilike = _bin_op("ilike", _ilike_doc) startswith = _bin_op("startsWith", _startswith_doc) endswith = _bin_op("endsWith", _endswith_doc) +def like(self: "Column", other: str) -> "Column": +""" +SQL like expression. Returns a boolean :class:`Column` based on a SQL LIKE match. + +Parameters +-- +other : str +a SQL LIKE pattern + +See Also + +pyspark.sql.Column.rlike + +Examples + +>>> df.filter(df.name.like('Al%')).collect() +[Row(age=2, name='Alice')] +""" +njc = getattr(self._jc, "like")(other) +return Column(njc) + +def rlike(self: "Column", other: str) -> "Column": +""" +SQL RLIKE expression (LIKE with Regex). Returns a boolean :class:`Column` based on a regex +match. + +Parameters +-- +other : str +an extended regex expression + +Examples + +>>> df.filter(df.name.rlike('ice$')).collect() +[Row(age=2, name='Alice')] +""" +njc = getattr(self._jc,
[spark] branch master updated: [SPARK-39701][CORE][K8S][TESTS] Move `withSecretFile` to `SparkFunSuite` to reuse
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1cf4fe5cd4d [SPARK-39701][CORE][K8S][TESTS] Move `withSecretFile` to `SparkFunSuite` to reuse 1cf4fe5cd4d is described below commit 1cf4fe5cd4dedd6ccd38fc9c159069f7c5a72191 Author: Dongjoon Hyun AuthorDate: Wed Jul 6 16:27:58 2022 -0700 [SPARK-39701][CORE][K8S][TESTS] Move `withSecretFile` to `SparkFunSuite` to reuse ### What changes were proposed in this pull request? This PR aims to move `withSecretFile` to `SparkFunSuite` and reuse it in Kubernetes tests. ### Why are the changes needed? Currently, K8s unit tests generate a leftover because it doesn't clean up the temporary secret files. By reusing the existing method, we can avoid this ``` $ build/sbt -Pkubernetes "kubernetes/test" $ git status On branch master Your branch is up to date with 'apache/master'. Untracked files: (use "git add ..." to include in what will be committed) resource-managers/kubernetes/core/temp-secret/ ``` ### Does this PR introduce _any_ user-facing change? No. This is a test-only change. ### How was this patch tested? Pass the CIs. Closes #37106 from dongjoon-hyun/SPARK-39701. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/SecurityManagerSuite.scala| 11 +--- .../scala/org/apache/spark/SparkFunSuite.scala | 16 ++- .../features/BasicExecutorFeatureStepSuite.scala | 31 +- 3 files changed, 29 insertions(+), 29 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala b/core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala index 44e338c6f00..a11ecc22d0b 100644 --- a/core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala +++ b/core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala @@ -29,7 +29,7 @@ import org.apache.spark.internal.config._ import org.apache.spark.internal.config.UI._ import org.apache.spark.launcher.SparkLauncher import org.apache.spark.security.GroupMappingServiceProvider -import org.apache.spark.util.{ResetSystemProperties, SparkConfWithEnv, Utils} +import org.apache.spark.util.{ResetSystemProperties, SparkConfWithEnv} class DummyGroupMappingServiceProvider extends GroupMappingServiceProvider { @@ -513,14 +513,5 @@ class SecurityManagerSuite extends SparkFunSuite with ResetSystemProperties { private def encodeFileAsBase64(secretFile: File) = { Base64.getEncoder.encodeToString(Files.readAllBytes(secretFile.toPath)) } - - private def withSecretFile(contents: String = "test-secret")(f: File => Unit): Unit = { -val secretDir = Utils.createTempDir("temp-secrets") -val secretFile = new File(secretDir, "temp-secret.txt") -Files.write(secretFile.toPath, contents.getBytes(UTF_8)) -try f(secretFile) finally { - Utils.deleteRecursively(secretDir) -} - } } diff --git a/core/src/test/scala/org/apache/spark/SparkFunSuite.scala b/core/src/test/scala/org/apache/spark/SparkFunSuite.scala index 7922e13db69..b17aacc0a9f 100644 --- a/core/src/test/scala/org/apache/spark/SparkFunSuite.scala +++ b/core/src/test/scala/org/apache/spark/SparkFunSuite.scala @@ -18,7 +18,8 @@ package org.apache.spark import java.io.File -import java.nio.file.Path +import java.nio.charset.StandardCharsets.UTF_8 +import java.nio.file.{Files, Path} import java.util.{Locale, TimeZone} import scala.annotation.tailrec @@ -223,6 +224,19 @@ abstract class SparkFunSuite } } + /** + * Creates a temporary directory containing a secret file, which is then passed to `f` and + * will be deleted after `f` returns. + */ + protected def withSecretFile(contents: String = "test-secret")(f: File => Unit): Unit = { +val secretDir = Utils.createTempDir("temp-secrets") +val secretFile = new File(secretDir, "temp-secret.txt") +Files.write(secretFile.toPath, contents.getBytes(UTF_8)) +try f(secretFile) finally { + Utils.deleteRecursively(secretDir) +} + } + /** * Adds a log appender and optionally sets a log level to the root logger or the logger with * the specified name, then executes the specified function, and in the end removes the log diff --git a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala index 84c4f3b8ba3..420edddb693 100644 --- a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala +++
[GitHub] [spark-website] holdenk commented on pull request #400: [SPARK-39512] Document docker image release steps
holdenk commented on PR #400: URL: https://github.com/apache/spark-website/pull/400#issuecomment-1176618299 ping @MaxGekk & @tgravescs @gengliangwang since y'all had comments on the first draft, this one looking ok? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39663][SQL][TESTS] Add UT for MysqlDialect listIndexes method
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9983bdb3b88 [SPARK-39663][SQL][TESTS] Add UT for MysqlDialect listIndexes method 9983bdb3b88 is described below commit 9983bdb3b882a083cba9785392c3ba5d7a36496a Author: panbingkun AuthorDate: Wed Jul 6 11:27:17 2022 -0500 [SPARK-39663][SQL][TESTS] Add UT for MysqlDialect listIndexes method ### What changes were proposed in this pull request? Add complemented UT for MysqlDialect's lustIndexes method. ### Why are the changes needed? Add UT for existed function & improve test coverage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #37060 from panbingkun/SPARK-39663. Authored-by: panbingkun Signed-off-by: Sean Owen --- .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 2 ++ .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 30 ++ .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 2 +- 3 files changed, 28 insertions(+), 6 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala index 97f521a378e..6e76b74c7d8 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala @@ -119,6 +119,8 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationV2Suite with V2JDBCTest override def supportsIndex: Boolean = true + override def supportListIndexes: Boolean = true + override def indexOptions: String = "KEY_BLOCK_SIZE=10" testVarPop() diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala index 5f0033490d5..0f85bd534c3 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala @@ -197,6 +197,8 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu def supportsIndex: Boolean = false + def supportListIndexes: Boolean = false + def indexOptions: String = "" test("SPARK-36895: Test INDEX Using SQL") { @@ -219,11 +221,21 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu s" The supported Index Types are:")) sql(s"CREATE index i1 ON $catalogName.new_table USING BTREE (col1)") +assert(jdbcTable.indexExists("i1")) +if (supportListIndexes) { + val indexes = jdbcTable.listIndexes() + assert(indexes.size == 1) + assert(indexes.head.indexName() == "i1") +} + sql(s"CREATE index i2 ON $catalogName.new_table (col2, col3, col5)" + s" OPTIONS ($indexOptions)") - -assert(jdbcTable.indexExists("i1") == true) -assert(jdbcTable.indexExists("i2") == true) +assert(jdbcTable.indexExists("i2")) +if (supportListIndexes) { + val indexes = jdbcTable.listIndexes() + assert(indexes.size == 2) + assert(indexes.map(_.indexName()).sorted === Array("i1", "i2")) +} // This should pass without exception sql(s"CREATE index IF NOT EXISTS i1 ON $catalogName.new_table (col1)") @@ -234,10 +246,18 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu assert(m.contains("Failed to create index i1 in new_table")) sql(s"DROP index i1 ON $catalogName.new_table") -sql(s"DROP index i2 ON $catalogName.new_table") - assert(jdbcTable.indexExists("i1") == false) +if (supportListIndexes) { + val indexes = jdbcTable.listIndexes() + assert(indexes.size == 1) + assert(indexes.head.indexName() == "i2") +} + +sql(s"DROP index i2 ON $catalogName.new_table") assert(jdbcTable.indexExists("i2") == false) +if (supportListIndexes) { + assert(jdbcTable.listIndexes().isEmpty) +} // This should pass without exception sql(s"DROP index IF EXISTS i1 ON $catalogName.new_table") diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala index 24f9bac74f8..c4cb5369af9 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala +++
[spark] branch master updated: [SPARK-39691][TESTS] Update `MapStatusesConvertBenchmark` result files
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3cf07c3e031 [SPARK-39691][TESTS] Update `MapStatusesConvertBenchmark` result files 3cf07c3e031 is described below commit 3cf07c3e03195b95a37d3635f736fe78b70d22f7 Author: yangjie01 AuthorDate: Wed Jul 6 09:23:37 2022 -0700 [SPARK-39691][TESTS] Update `MapStatusesConvertBenchmark` result files ### What changes were proposed in this pull request? SPARK-39325 add `MapStatusesConvertBenchmark` but only upload result file generated by Java 8, so this pr supplement `MapStatusesConvertBenchmark` result generated by Java 11 and 17. On the other hand, SPARK-39626 upgraded `RoaringBitmap` from `0.9.28` to `0.9.30` and from the `IntelliJ Profiler` sampling, the hotspot path of `MapStatusesConvertBenchmark` contains `RoaringBitmap#contains`, so this pr also updated the result file generated by Java 8. ### Why are the changes needed? Update `MapStatusesConvertBenchmark` result files ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions Closes #37100 from LuciferYang/MapStatusesConvertBenchmark-result. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun --- .../MapStatusesConvertBenchmark-jdk11-results.txt | 13 + ...ts.txt => MapStatusesConvertBenchmark-jdk17-results.txt} | 8 core/benchmarks/MapStatusesConvertBenchmark-results.txt | 10 +- 3 files changed, 22 insertions(+), 9 deletions(-) diff --git a/core/benchmarks/MapStatusesConvertBenchmark-jdk11-results.txt b/core/benchmarks/MapStatusesConvertBenchmark-jdk11-results.txt new file mode 100644 index 000..96fa24175c5 --- /dev/null +++ b/core/benchmarks/MapStatusesConvertBenchmark-jdk11-results.txt @@ -0,0 +1,13 @@ + +MapStatuses Convert Benchmark + + +OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1031-azure +Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz +MapStatuses Convert: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative + +Num Maps: 5 Fetch partitions:500 1324 1333 7 0.0 1324283680.0 1.0X +Num Maps: 5 Fetch partitions:1000 2650 2670 32 0.0 2650318387.0 0.5X +Num Maps: 5 Fetch partitions:1500 4018 4059 53 0.0 4017921009.0 0.3X + + diff --git a/core/benchmarks/MapStatusesConvertBenchmark-results.txt b/core/benchmarks/MapStatusesConvertBenchmark-jdk17-results.txt similarity index 54% copy from core/benchmarks/MapStatusesConvertBenchmark-results.txt copy to core/benchmarks/MapStatusesConvertBenchmark-jdk17-results.txt index f41401bbe2e..0ba8d756dfc 100644 --- a/core/benchmarks/MapStatusesConvertBenchmark-results.txt +++ b/core/benchmarks/MapStatusesConvertBenchmark-jdk17-results.txt @@ -2,12 +2,12 @@ MapStatuses Convert Benchmark -OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1025-azure +OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1031-azure Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz MapStatuses Convert: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Num Maps: 5 Fetch partitions:500 1330 1359 26 0.0 1329827185.0 1.0X -Num Maps: 5 Fetch partitions:1000 2648 2666 20 0.0 2647944453.0 0.5X -Num Maps: 5 Fetch partitions:1500 4155 4436 383 0.0 4154563448.0 0.3X +Num Maps: 5 Fetch partitions:500 1092 1104 22 0.0 1091691925.0 1.0X +Num Maps: 5 Fetch partitions:1000 2172 2192 29 0.0 2171702137.0 0.5X +Num Maps: 5 Fetch partitions:1500 3268 3291 27 0.0 3267904436.0 0.3X diff --git a/core/benchmarks/MapStatusesConvertBenchmark-results.txt b/core/benchmarks/MapStatusesConvertBenchmark-results.txt index f41401bbe2e..ae84abfdcc2 100644 ---