[spark] branch master updated: [SPARK-40152][SQL][TESTS] Move tests from SplitPart to elementAt
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 06997d6eb73 [SPARK-40152][SQL][TESTS] Move tests from SplitPart to elementAt 06997d6eb73 is described below commit 06997d6eb73f271aede5b159d86d1db80a73b89f Author: Yuming Wang AuthorDate: Wed Aug 24 13:33:26 2022 +0900 [SPARK-40152][SQL][TESTS] Move tests from SplitPart to elementAt ### What changes were proposed in this pull request? Move tests from SplitPart to elementAt in CollectionExpressionsSuite. ### Why are the changes needed? Simplify test. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A. Closes #37637 from wangyum/SPARK-40152-3. Authored-by: Yuming Wang Signed-off-by: Hyukjin Kwon --- .../expressions/CollectionExpressionsSuite.scala | 38 ++ 1 file changed, 18 insertions(+), 20 deletions(-) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala index 94cf0a74467..229e698fb2e 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala @@ -1535,6 +1535,24 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper } checkEvaluation(ElementAt(mb0, Literal(Array[Byte](2, 1), BinaryType)), "2") checkEvaluation(ElementAt(mb0, Literal(Array[Byte](3, 4))), null) + +// test defaultValueOutOfBound +val delimiter = Literal.create(".", StringType) +val str = StringSplitSQL(Literal.create("11.12.13", StringType), delimiter) +val outOfBoundValue = Some(Literal.create("", StringType)) + +checkEvaluation(ElementAt(str, Literal(3), outOfBoundValue), UTF8String.fromString("13")) +checkEvaluation(ElementAt(str, Literal(1), outOfBoundValue), UTF8String.fromString("11")) +checkEvaluation(ElementAt(str, Literal(10), outOfBoundValue), UTF8String.fromString("")) +checkEvaluation(ElementAt(str, Literal(-10), outOfBoundValue), UTF8String.fromString("")) + +checkEvaluation(ElementAt(StringSplitSQL(Literal.create(null, StringType), delimiter), + Literal(1), outOfBoundValue), null) +checkEvaluation(ElementAt(StringSplitSQL(Literal.create("11.12.13", StringType), + Literal.create(null, StringType)), Literal(1), outOfBoundValue), null) + +checkExceptionInExpression[Exception]( + ElementAt(str, Literal(0), outOfBoundValue), "The index 0 is invalid") } test("correctly handles ElementAt nullability for arrays") { @@ -2522,24 +2540,4 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper Date.valueOf("2017-02-12"))) } } - - test("SplitPart") { -val delimiter = Literal.create(".", StringType) -val str = StringSplitSQL(Literal.create("11.12.13", StringType), delimiter) -val outOfBoundValue = Some(Literal.create("", StringType)) - -checkEvaluation(ElementAt(str, Literal(3), outOfBoundValue), UTF8String.fromString("13")) -checkEvaluation(ElementAt(str, Literal(1), outOfBoundValue), UTF8String.fromString("11")) -checkEvaluation(ElementAt(str, Literal(10), outOfBoundValue), UTF8String.fromString("")) -checkEvaluation(ElementAt(str, Literal(-10), outOfBoundValue), UTF8String.fromString("")) - -checkEvaluation(ElementAt(StringSplitSQL(Literal.create(null, StringType), delimiter), - Literal(1), outOfBoundValue), null) -checkEvaluation(ElementAt(StringSplitSQL(Literal.create("11.12.13", StringType), - Literal.create(null, StringType)), Literal(1), outOfBoundValue), null) - -intercept[Exception] { - checkEvaluation(ElementAt(str, Literal(0), outOfBoundValue), null) -}.getMessage.contains("The index 0 is invalid") - } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d32a67f92cf -> 8b11439663c)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from d32a67f92cf Revert "[SPARK-39150][PS] Enable doctest which was disabled when pandas 1.4 upgrade" add 8b11439663c [SPARK-40198][CORE] Enable `spark.storage.decommission.(rdd|shuffle)Blocks.enabled` by default No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/internal/config/package.scala | 4 ++-- docs/core-migration-guide.md | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-40124][SQL][TEST][3.3] Update TPCDS v1.4 q32 for Plan Stability tests
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new d725d9c20e3 [SPARK-40124][SQL][TEST][3.3] Update TPCDS v1.4 q32 for Plan Stability tests d725d9c20e3 is described below commit d725d9c20e33e3c68d9c7ec84b74fea2952814b6 Author: Kapil Kumar Singh AuthorDate: Tue Aug 23 17:02:36 2022 -0700 [SPARK-40124][SQL][TEST][3.3] Update TPCDS v1.4 q32 for Plan Stability tests ### What changes were proposed in this pull request? This is port of SPARK-40124 to Spark 3.3. Fix query 32 for TPCDS v1.4 ### Why are the changes needed? Current q32.sql seems to be wrong. It is just selection `1`. Reference for query template: https://github.com/databricks/tpcds-kit/blob/eff5de2c30337b71cc0dc1976147742d2c65d378/query_templates/query32.tpl#L41 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test change only Closes #37615 from mskapilks/change-q32-3.3. Authored-by: Kapil Kumar Singh Signed-off-by: Dongjoon Hyun --- .../approved-plans-v1_4/q32.sf100/explain.txt | 120 - .../approved-plans-v1_4/q32.sf100/simplified.txt | 94 .../approved-plans-v1_4/q32/explain.txt| 120 - .../approved-plans-v1_4/q32/simplified.txt | 92 .../resources/tpcds-query-results/v1_4/q32.sql.out | 4 +- sql/core/src/test/resources/tpcds/q32.sql | 2 +- 6 files changed, 236 insertions(+), 196 deletions(-) diff --git a/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q32.sf100/explain.txt b/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q32.sf100/explain.txt index e7ae6145b43..0af12591bda 100644 --- a/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q32.sf100/explain.txt +++ b/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q32.sf100/explain.txt @@ -1,31 +1,33 @@ == Physical Plan == -CollectLimit (27) -+- * Project (26) - +- * BroadcastHashJoin Inner BuildRight (25) - :- * Project (23) - : +- * BroadcastHashJoin Inner BuildLeft (22) - : :- BroadcastExchange (18) - : : +- * Project (17) - : : +- * BroadcastHashJoin Inner BuildLeft (16) - : ::- BroadcastExchange (5) - : :: +- * Project (4) - : :: +- * Filter (3) - : ::+- * ColumnarToRow (2) - : :: +- Scan parquet default.item (1) - : :+- * Filter (15) - : : +- * HashAggregate (14) - : : +- Exchange (13) - : : +- * HashAggregate (12) - : :+- * Project (11) - : : +- * BroadcastHashJoin Inner BuildRight (10) - : : :- * Filter (8) - : : : +- * ColumnarToRow (7) - : : : +- Scan parquet default.catalog_sales (6) - : : +- ReusedExchange (9) - : +- * Filter (21) - :+- * ColumnarToRow (20) - : +- Scan parquet default.catalog_sales (19) - +- ReusedExchange (24) +* HashAggregate (29) ++- Exchange (28) + +- * HashAggregate (27) + +- * Project (26) + +- * BroadcastHashJoin Inner BuildRight (25) +:- * Project (23) +: +- * BroadcastHashJoin Inner BuildLeft (22) +: :- BroadcastExchange (18) +: : +- * Project (17) +: : +- * BroadcastHashJoin Inner BuildLeft (16) +: ::- BroadcastExchange (5) +: :: +- * Project (4) +: :: +- * Filter (3) +: ::+- * ColumnarToRow (2) +: :: +- Scan parquet default.item (1) +: :+- * Filter (15) +: : +- * HashAggregate (14) +: : +- Exchange (13) +: : +- * HashAggregate (12) +: :+- * Project (11) +: : +- * BroadcastHashJoin Inner BuildRight (10) +: : :- * Filter (8) +: : : +- * ColumnarToRow (7) +: : : +- Scan parquet default.catalog_sales (6) +: : +- ReusedExchange (9) +: +- * Filter (21) +:+- * ColumnarToRow (20) +: +- Scan parquet default.catalog_sales (19) +
[spark] branch master updated: Revert "[SPARK-39150][PS] Enable doctest which was disabled when pandas 1.4 upgrade"
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d32a67f92cf Revert "[SPARK-39150][PS] Enable doctest which was disabled when pandas 1.4 upgrade" d32a67f92cf is described below commit d32a67f92cfcc7c67f44e682d4c3612d60ba1b3a Author: Hyukjin Kwon AuthorDate: Wed Aug 24 09:01:48 2022 +0900 Revert "[SPARK-39150][PS] Enable doctest which was disabled when pandas 1.4 upgrade" This reverts commit def17af41f4480191c9a197b853d5e79a8387177. --- python/pyspark/pandas/frame.py| 10 +- python/pyspark/pandas/groupby.py | 4 ++-- python/pyspark/pandas/indexes/base.py | 2 +- python/pyspark/pandas/series.py | 8 4 files changed, 12 insertions(+), 12 deletions(-) diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py index 72913bc17d3..00a8aa0ec99 100644 --- a/python/pyspark/pandas/frame.py +++ b/python/pyspark/pandas/frame.py @@ -568,7 +568,7 @@ class DataFrame(Frame, Generic[T]): >>> df = ps.DataFrame([[1, 2], [4, 5], [7, 8]], ... index=['cobra', 'viper', None], ... columns=['max_speed', 'shield']) ->>> df +>>> df # doctest: +SKIP max_speed shield cobra 1 2 viper 4 5 @@ -7248,19 +7248,19 @@ defaultdict(, {'col..., 'col...})] >>> df = ps.DataFrame({'A': [2, 1, np.nan]}, index=['b', 'a', np.nan]) ->>> df.sort_index() +>>> df.sort_index() # doctest: +SKIP A a 1.0 b 2.0 None NaN ->>> df.sort_index(ascending=False) +>>> df.sort_index(ascending=False) # doctest: +SKIP A b 2.0 a 1.0 None NaN ->>> df.sort_index(na_position='first') +>>> df.sort_index(na_position='first') # doctest: +SKIP A None NaN a 1.0 @@ -7273,7 +7273,7 @@ defaultdict(, {'col..., 'col...})] 2 NaN >>> df.sort_index(inplace=True) ->>> df +>>> df # doctest: +SKIP A a 1.0 b 2.0 diff --git a/python/pyspark/pandas/groupby.py b/python/pyspark/pandas/groupby.py index c343ceb92ee..4377ad6a5c9 100644 --- a/python/pyspark/pandas/groupby.py +++ b/python/pyspark/pandas/groupby.py @@ -2382,7 +2382,7 @@ class GroupBy(Generic[FrameLike], metaclass=ABCMeta): ... ["g", "g3"], ... ["h", "h0"], ... ["h", "h1"]], columns=["A", "B"]) ->>> df.groupby("A").head(-1) +>>> df.groupby("A").head(-1) # doctest: +SKIP A B 0 g g0 1 g g1 @@ -2450,7 +2450,7 @@ class GroupBy(Generic[FrameLike], metaclass=ABCMeta): ... ["g", "g3"], ... ["h", "h0"], ... ["h", "h1"]], columns=["A", "B"]) ->>> df.groupby("A").tail(-1) +>>> df.groupby("A").tail(-1) # doctest: +SKIP A B 3 g g3 2 g g2 diff --git a/python/pyspark/pandas/indexes/base.py b/python/pyspark/pandas/indexes/base.py index 7e219d577c0..facedb1dc91 100644 --- a/python/pyspark/pandas/indexes/base.py +++ b/python/pyspark/pandas/indexes/base.py @@ -1163,7 +1163,7 @@ class Index(IndexOpsMixin): >>> df = ps.DataFrame([[1, 2], [4, 5], [7, 8]], ... index=['cobra', 'viper', None], ... columns=['max_speed', 'shield']) ->>> df +>>> df # doctest: +SKIP max_speed shield cobra 1 2 viper 4 5 diff --git a/python/pyspark/pandas/series.py b/python/pyspark/pandas/series.py index fa99ddf76ce..c24edf0d976 100644 --- a/python/pyspark/pandas/series.py +++ b/python/pyspark/pandas/series.py @@ -2969,7 +2969,7 @@ class Series(Frame, IndexOpsMixin, Generic[T]): >>> s = ps.Series([2, 1, np.nan], index=['b', 'a', np.nan]) ->>> s.sort_index() +>>> s.sort_index() # doctest: +SKIP a 1.0 b 2.0 NoneNaN @@ -2981,20 +2981,20 @@ class Series(Frame, IndexOpsMixin, Generic[T]): 2NaN dtype: float64 ->>> s.sort_index(ascending=False) +>>> s.sort_index(ascending=False) # doctest: +SKIP b 2.0 a 1.0 NoneNaN dtype: float64 ->>> s.sort_index(na_position='first') +>>> s.sort_index(na_position='first') # doctest: +SKIP NoneNaN a 1.0 b 2.0 dtype: float64 >>> s.sort_index(inplace=True) ->>> s
[spark] branch master updated: [SPARK-40078][PYTHON][DOCS] Make pyspark.sql.column examples self-contained
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new fdc11ab0494 [SPARK-40078][PYTHON][DOCS] Make pyspark.sql.column examples self-contained fdc11ab0494 is described below commit fdc11ab0494a681444e7a7e13f3f99d25fa6cf2f Author: Qian.Sun AuthorDate: Wed Aug 24 08:57:33 2022 +0900 [SPARK-40078][PYTHON][DOCS] Make pyspark.sql.column examples self-contained ### What changes were proposed in this pull request? This PR proposes to add parameters/returns and improve the examples in `pyspark.sql.column` by making each example self-contained with a brief explanation and a bit more realistic example. ### Why are the changes needed? To make the documentation more readable and able to copy and paste directly in PySpark shell. ### Does this PR introduce _any_ user-facing change? Yes, it changes the documentation. ### How was this patch tested? Manually ran each doctest. Closes #37521 from dcoliversun/SPARK-40078. Authored-by: Qian.Sun Signed-off-by: Hyukjin Kwon --- python/pyspark/sql/column.py | 185 --- 1 file changed, 172 insertions(+), 13 deletions(-) diff --git a/python/pyspark/sql/column.py b/python/pyspark/sql/column.py index 31954a95690..3746d8eba12 100644 --- a/python/pyspark/sql/column.py +++ b/python/pyspark/sql/column.py @@ -35,7 +35,7 @@ from py4j.java_gateway import JavaObject from pyspark import copy_func from pyspark.context import SparkContext -from pyspark.sql.types import DataType, StructField, StructType, IntegerType, StringType +from pyspark.sql.types import DataType if TYPE_CHECKING: from pyspark.sql._typing import ColumnOrName, LiteralType, DecimalLiteral, DateTimeLiteral @@ -187,18 +187,28 @@ class Column: """ A column in a DataFrame. -:class:`Column` instances can be created by:: +.. versionadded:: 1.3.0 + +Examples + +Column instances can be created by -# 1. Select a column out of a DataFrame +>>> df = spark.createDataFrame( +... [(2, "Alice"), (5, "Bob")], ["age", "name"]) -df.colName -df["colName"] +Select a column out of a DataFrame -# 2. Create from an expression -df.colName + 1 -1 / df.colName +>>> df.name +Column<'name'> +>>> df["name"] +Column<'name'> -.. versionadded:: 1.3.0 +Create from an expression + +>>> df.age + 1 +Column<'(age + 1)'> +>>> 1 / df.age +Column<'(1 / age)'> """ def __init__(self, jc: JavaObject) -> None: @@ -405,6 +415,20 @@ class Column: .. versionadded:: 1.3.0 +Parameters +-- +key +a literal value, or a :class:`Column` expression. +The result will only be true at a location if item matches in the column. + + .. deprecated:: 3.0.0 + :class:`Column` as a parameter is deprecated. + +Returns +--- +:class:`Column` +Column representing the item(s) got at position out of a list or by key out of a dict. + Examples >>> df = spark.createDataFrame([([1, 2], {"key": "value"})], ["l", "d"]) @@ -430,6 +454,19 @@ class Column: .. versionadded:: 1.3.0 +Parameters +-- +name +a literal value, or a :class:`Column` expression. +The result will only be true at a location if field matches in the Column. + + .. deprecated:: 3.0.0 + :class:`Column` as a parameter is deprecated. +Returns +--- +:class:`Column` +Column representing whether each element of Column gotten by name. + Examples >>> from pyspark.sql import Row @@ -462,6 +499,20 @@ class Column: .. versionadded:: 3.1.0 +Parameters +-- +fieldName : str +a literal value. +The result will only be true at a location if any field matches in the Column. +col : :class:`Column` +A :class:`Column` expression for the column with `fieldName`. + +Returns +--- +:class:`Column` +Column representing whether each element of Column +which field added/replaced by fieldName. + Examples >>> from pyspark.sql import Row @@ -495,6 +546,17 @@ class Column: .. versionadded:: 3.1.0 +Parameters +-- +fieldNames : str +Desired field names (collects all positional arguments passed) +The result will drop at a location if any field matches in the Column. + +
[spark] branch master updated: [SPARK-40191][PYTHON][CORE][DOCS] Make pyspark.resource examples self-contained
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5ae24203187 [SPARK-40191][PYTHON][CORE][DOCS] Make pyspark.resource examples self-contained 5ae24203187 is described below commit 5ae242031872e5e8ca8e6353e33666a31ccdf407 Author: Hyukjin Kwon AuthorDate: Tue Aug 23 16:37:13 2022 -0700 [SPARK-40191][PYTHON][CORE][DOCS] Make pyspark.resource examples self-contained ### What changes were proposed in this pull request? This PR proposes to add a working example `pyspark.resource.ResourceProfile` In addition, this PR adds return and parameter descriptions with fixing a typo in Scaladoc side. ### Why are the changes needed? To make the documentation more readable and able to copy and paste directly in PySpark shell. ### Does this PR introduce _any_ user-facing change? Yes, it changes the documentation ### How was this patch tested? Manually ran each doctests. CI also runs this. Closes #37627 from HyukjinKwon/SPARK-40191. Lead-authored-by: Hyukjin Kwon Co-authored-by: Hyukjin Kwon Signed-off-by: Dongjoon Hyun --- .../spark/resource/ExecutorResourceRequests.scala | 2 +- dev/sparktestsupport/modules.py| 2 + python/pyspark/resource/information.py | 18 +- python/pyspark/resource/profile.py | 131 +++-- python/pyspark/resource/requests.py| 202 - 5 files changed, 335 insertions(+), 20 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/resource/ExecutorResourceRequests.scala b/core/src/main/scala/org/apache/spark/resource/ExecutorResourceRequests.scala index b6992f4f883..28ff79ce1f4 100644 --- a/core/src/main/scala/org/apache/spark/resource/ExecutorResourceRequests.scala +++ b/core/src/main/scala/org/apache/spark/resource/ExecutorResourceRequests.scala @@ -38,7 +38,7 @@ class ExecutorResourceRequests() extends Serializable { private val _executorResources = new ConcurrentHashMap[String, ExecutorResourceRequest]() /** - * Returns all the resource requests for the task. + * Returns all the resource requests for the executor. */ def requests: Map[String, ExecutorResourceRequest] = _executorResources.asScala.toMap diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py index a4531cf5157..2b9d5269379 100644 --- a/dev/sparktestsupport/modules.py +++ b/dev/sparktestsupport/modules.py @@ -478,6 +478,8 @@ pyspark_resource = Module( dependencies=[pyspark_core], source_file_regexes=["python/pyspark/resource"], python_test_goals=[ +# doctests +"pyspark.resource.profile", # unittests "pyspark.resource.tests.test_resources", ], diff --git a/python/pyspark/resource/information.py b/python/pyspark/resource/information.py index bcd78ebdc18..92cfc5a6e8b 100644 --- a/python/pyspark/resource/information.py +++ b/python/pyspark/resource/information.py @@ -33,11 +33,15 @@ class ResourceInformation: name : str the name of the resource addresses : list -an array of strings describing the addresses of the resource +a list of strings describing the addresses of the resource Notes - This API is evolving. + +See Also + +:class:`pyspark.resource.ResourceProfile` """ def __init__(self, name: str, addresses: List[str]): @@ -46,8 +50,20 @@ class ResourceInformation: @property def name(self) -> str: +""" +Returns +--- +str +the name of the resource +""" return self._name @property def addresses(self) -> List[str]: +""" +Returns +--- +list +a list of strings describing the addresses of the resource +""" return self._addresses diff --git a/python/pyspark/resource/profile.py b/python/pyspark/resource/profile.py index 37e8ee85ea2..0b2de444832 100644 --- a/python/pyspark/resource/profile.py +++ b/python/pyspark/resource/profile.py @@ -39,6 +39,44 @@ class ResourceProfile: Notes - This API is evolving. + +Examples + +Create Executor resource requests. + +>>> executor_requests = ( +... ExecutorResourceRequests() +... .cores(2) +... .memory("6g") +... .memoryOverhead("1g") +... .pysparkMemory("2g") +... .offheapMemory("3g") +... .resource("gpu", 2, "testGpus", "nvidia.com") +... ) + +Create task resource requasts. + +>>> task_requests = TaskResourceRequests().cpus(2).resource("gpu", 2) + +Create a resource profile. + +>>> builder = ResourceProfileBuilder() +>>>
[spark] branch branch-3.2 updated: [SPARK-40172][ML][TESTS] Temporarily disable flaky test cases in ImageFileFormatSuite
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 0a2a568e739 [SPARK-40172][ML][TESTS] Temporarily disable flaky test cases in ImageFileFormatSuite 0a2a568e739 is described below commit 0a2a568e73993acddbac3fb7cefcc05acbcc4620 Author: Gengliang Wang AuthorDate: Mon Aug 22 16:16:03 2022 +0900 [SPARK-40172][ML][TESTS] Temporarily disable flaky test cases in ImageFileFormatSuite ### What changes were proposed in this pull request? 3 test cases in ImageFileFormatSuite become flaky in the GitHub action tests: https://github.com/apache/spark/runs/7941765326?check_suite_focus=true https://github.com/gengliangwang/spark/runs/7928658069 Before they are fixed(https://issues.apache.org/jira/browse/SPARK-40171), I suggest disabling them in OSS. ### Why are the changes needed? Disable flaky tests before they are fixed. The test cases keep failing from time to time, while they always pass on local env. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing CI Closes #37605 from gengliangwang/disableFlakyTest. Authored-by: Gengliang Wang Signed-off-by: Hyukjin Kwon (cherry picked from commit 50f2f506327b7d51af9fb0ae1316135905d2f87d) Signed-off-by: Dongjoon Hyun (cherry picked from commit 6572c66d01e3db00858f0b4743670a1243d3c44f) Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/ml/source/image/ImageFileFormatSuite.scala | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala index 0ec2747be65..edb0ea27192 100644 --- a/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala @@ -50,7 +50,8 @@ class ImageFileFormatSuite extends SparkFunSuite with MLlibTestSparkContext { assert(df.schema("image").dataType == columnSchema, "data do not fit ImageSchema") } - test("image datasource count test") { + // TODO(SPARK-40171): Re-enable the following flaky test case after being fixed. + ignore("image datasource count test") { val df1 = spark.read.format("image").load(imagePath) assert(df1.count === 9) @@ -88,7 +89,8 @@ class ImageFileFormatSuite extends SparkFunSuite with MLlibTestSparkContext { assert(result === invalidImageRow(resultOrigin)) } - test("image datasource partition test") { + // TODO(SPARK-40171): Re-enable the following flaky test case after being fixed. + ignore("image datasource partition test") { val result = spark.read.format("image") .option("dropInvalid", true).load(imagePath) .select(substring_index(col("image.origin"), "/", -1).as("origin"), col("cls"), col("date")) @@ -106,8 +108,9 @@ class ImageFileFormatSuite extends SparkFunSuite with MLlibTestSparkContext { )) } + // TODO(SPARK-40171): Re-enable the following flaky test case after being fixed. // Images with the different number of channels - test("readImages pixel values test") { + ignore("readImages pixel values test") { val images = spark.read.format("image").option("dropInvalid", true) .load(imagePath + "/cls=multichannel/").collect() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-40172][ML][TESTS] Temporarily disable flaky test cases in ImageFileFormatSuite
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 6572c66d01e [SPARK-40172][ML][TESTS] Temporarily disable flaky test cases in ImageFileFormatSuite 6572c66d01e is described below commit 6572c66d01e3db00858f0b4743670a1243d3c44f Author: Gengliang Wang AuthorDate: Mon Aug 22 16:16:03 2022 +0900 [SPARK-40172][ML][TESTS] Temporarily disable flaky test cases in ImageFileFormatSuite ### What changes were proposed in this pull request? 3 test cases in ImageFileFormatSuite become flaky in the GitHub action tests: https://github.com/apache/spark/runs/7941765326?check_suite_focus=true https://github.com/gengliangwang/spark/runs/7928658069 Before they are fixed(https://issues.apache.org/jira/browse/SPARK-40171), I suggest disabling them in OSS. ### Why are the changes needed? Disable flaky tests before they are fixed. The test cases keep failing from time to time, while they always pass on local env. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing CI Closes #37605 from gengliangwang/disableFlakyTest. Authored-by: Gengliang Wang Signed-off-by: Hyukjin Kwon (cherry picked from commit 50f2f506327b7d51af9fb0ae1316135905d2f87d) Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/ml/source/image/ImageFileFormatSuite.scala | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala index 10b9bbb0bfe..7981296e210 100644 --- a/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala @@ -49,7 +49,8 @@ class ImageFileFormatSuite extends SparkFunSuite with MLlibTestSparkContext { assert(df.schema("image").dataType == columnSchema, "data do not fit ImageSchema") } - test("image datasource count test") { + // TODO(SPARK-40171): Re-enable the following flaky test case after being fixed. + ignore("image datasource count test") { val df1 = spark.read.format("image").load(imagePath) assert(df1.count === 9) @@ -87,7 +88,8 @@ class ImageFileFormatSuite extends SparkFunSuite with MLlibTestSparkContext { assert(result === invalidImageRow(resultOrigin)) } - test("image datasource partition test") { + // TODO(SPARK-40171): Re-enable the following flaky test case after being fixed. + ignore("image datasource partition test") { val result = spark.read.format("image") .option("dropInvalid", true).load(imagePath) .select(substring_index(col("image.origin"), "/", -1).as("origin"), col("cls"), col("date")) @@ -105,8 +107,9 @@ class ImageFileFormatSuite extends SparkFunSuite with MLlibTestSparkContext { )) } + // TODO(SPARK-40171): Re-enable the following flaky test case after being fixed. // Images with the different number of channels - test("readImages pixel values test") { + ignore("readImages pixel values test") { val images = spark.read.format("image").option("dropInvalid", true) .load(imagePath + "/cls=multichannel/").collect() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40183][SQL] Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal conversion
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a12d121159d [SPARK-40183][SQL] Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal conversion a12d121159d is described below commit a12d121159d0ab8293f70b819cb489cf6126224d Author: Gengliang Wang AuthorDate: Tue Aug 23 10:01:15 2022 -0700 [SPARK-40183][SQL] Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal conversion ### What changes were proposed in this pull request? Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal conversion, instead of the confusing error class `CANNOT_CHANGE_DECIMAL_PRECISION`. Also, use `decimal.toPlainString` instead of `decimal.toDebugString` in these error messages. ### Why are the changes needed? * the error class `CANNOT_CHANGE_DECIMAL_PRECISION` is confusing * the output `decimal.toDebugString` contains internal details, users doesn't need to know it. ### Does this PR introduce _any_ user-facing change? Yes but minor, enhance error message of overflow exception in decimal conversions. ### How was this patch tested? Existing UT Closes #37620 from gengliangwang/reviseDecimalError. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang --- core/src/main/resources/error/error-classes.json | 12 +++ .../spark/sql/errors/QueryExecutionErrors.scala| 4 +-- .../resources/sql-tests/results/ansi/cast.sql.out | 8 ++--- .../ansi/decimalArithmeticOperations.sql.out | 40 +++--- .../sql-tests/results/ansi/interval.sql.out| 4 +-- .../test/resources/sql-tests/results/cast.sql.out | 4 +-- .../sql/errors/QueryExecutionAnsiErrorsSuite.scala | 6 ++-- 7 files changed, 39 insertions(+), 39 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index 3f6c1ca0362..d13849a6c7c 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -17,12 +17,6 @@ ], "sqlState" : "22005" }, - "CANNOT_CHANGE_DECIMAL_PRECISION" : { -"message" : [ - " cannot be represented as Decimal(, ). If necessary set to \"false\" to bypass this error." -], -"sqlState" : "22005" - }, "CANNOT_INFER_DATE" : { "message" : [ "Cannot infer date in schema inference when LegacyTimeParserPolicy is \"LEGACY\". Legacy Date formatter does not support strict date format matching which is required to avoid inferring timestamps and other non-date entries to date." @@ -342,6 +336,12 @@ "The comparison result is null. If you want to handle null as 0 (equal), you can set \"spark.sql.legacy.allowNullComparisonResultInArraySort\" to \"true\"." ] }, + "NUMERIC_VALUE_OUT_OF_RANGE" : { +"message" : [ + " cannot be represented as Decimal(, ). If necessary set to \"false\" to bypass this error." +], +"sqlState" : "22005" + }, "PARSE_CHAR_MISSING_LENGTH" : { "message" : [ "DataType requires a length parameter, for example (10). Please specify the length." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala index e4481a4c783..19e7a371f8f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala @@ -114,9 +114,9 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase { decimalScale: Int, context: SQLQueryContext = null): ArithmeticException = { new SparkArithmeticException( - errorClass = "CANNOT_CHANGE_DECIMAL_PRECISION", + errorClass = "NUMERIC_VALUE_OUT_OF_RANGE", messageParameters = Array( -value.toDebugString, +value.toPlainString, decimalPrecision.toString, decimalScale.toString, toSQLConf(SQLConf.ANSI_ENABLED.key)), diff --git a/sql/core/src/test/resources/sql-tests/results/ansi/cast.sql.out b/sql/core/src/test/resources/sql-tests/results/ansi/cast.sql.out index 95b2e0ef42b..8f53e557b59 100644 --- a/sql/core/src/test/resources/sql-tests/results/ansi/cast.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/ansi/cast.sql.out @@ -1027,10 +1027,10 @@ struct<> -- !query output org.apache.spark.SparkArithmeticException { - "errorClass" : "CANNOT_CHANGE_DECIMAL_PRECISION", + "errorClass" : "NUMERIC_VALUE_OUT_OF_RANGE", "sqlState" : "22005", "messageParameters" : { -"value" : "Decimal(expanded, 123.45, 5, 2)", +"value" : "123.45", "precision" : "4", "scale" : "2",
[spark] branch master updated: [SPARK-40152][SQL][TESTS] Add tests for SplitPart
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4f525eed7d5 [SPARK-40152][SQL][TESTS] Add tests for SplitPart 4f525eed7d5 is described below commit 4f525eed7d5d461498aee68c4d3e57941f9aae2c Author: Yuming Wang AuthorDate: Tue Aug 23 08:55:27 2022 -0500 [SPARK-40152][SQL][TESTS] Add tests for SplitPart ### What changes were proposed in this pull request? Add tests for `SplitPart`. ### Why are the changes needed? Improve test coverage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A. Closes #37626 from wangyum/SPARK-40152-2. Authored-by: Yuming Wang Signed-off-by: Sean Owen --- .../catalyst/expressions/collectionOperations.scala | 2 +- .../expressions/CollectionExpressionsSuite.scala | 20 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 870f58b4396..78496c98dec 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -2270,7 +2270,7 @@ case class ElementAt( case Some(value) => val defaultValueEval = value.genCode(ctx) s""" - ${defaultValueEval.code}; + ${defaultValueEval.code} ${ev.isNull} = ${defaultValueEval.isNull}; ${ev.value} = ${defaultValueEval.value}; """.stripMargin diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala index 2b0b9647665..94cf0a74467 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala @@ -2522,4 +2522,24 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper Date.valueOf("2017-02-12"))) } } + + test("SplitPart") { +val delimiter = Literal.create(".", StringType) +val str = StringSplitSQL(Literal.create("11.12.13", StringType), delimiter) +val outOfBoundValue = Some(Literal.create("", StringType)) + +checkEvaluation(ElementAt(str, Literal(3), outOfBoundValue), UTF8String.fromString("13")) +checkEvaluation(ElementAt(str, Literal(1), outOfBoundValue), UTF8String.fromString("11")) +checkEvaluation(ElementAt(str, Literal(10), outOfBoundValue), UTF8String.fromString("")) +checkEvaluation(ElementAt(str, Literal(-10), outOfBoundValue), UTF8String.fromString("")) + +checkEvaluation(ElementAt(StringSplitSQL(Literal.create(null, StringType), delimiter), + Literal(1), outOfBoundValue), null) +checkEvaluation(ElementAt(StringSplitSQL(Literal.create("11.12.13", StringType), + Literal.create(null, StringType)), Literal(1), outOfBoundValue), null) + +intercept[Exception] { + checkEvaluation(ElementAt(str, Literal(0), outOfBoundValue), null) +}.getMessage.contains("The index 0 is invalid") + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-40152][SQL][TESTS] Add tests for SplitPart
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 008b3a34759 [SPARK-40152][SQL][TESTS] Add tests for SplitPart 008b3a34759 is described below commit 008b3a347595cc47ff30853d7141b17bf7be4f13 Author: Yuming Wang AuthorDate: Tue Aug 23 08:55:27 2022 -0500 [SPARK-40152][SQL][TESTS] Add tests for SplitPart ### What changes were proposed in this pull request? Add tests for `SplitPart`. ### Why are the changes needed? Improve test coverage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A. Closes #37626 from wangyum/SPARK-40152-2. Authored-by: Yuming Wang Signed-off-by: Sean Owen (cherry picked from commit 4f525eed7d5d461498aee68c4d3e57941f9aae2c) Signed-off-by: Sean Owen --- .../catalyst/expressions/collectionOperations.scala | 2 +- .../expressions/CollectionExpressionsSuite.scala | 20 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 8186d006296..53bda0cbdc7 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -2225,7 +2225,7 @@ case class ElementAt( case Some(value) => val defaultValueEval = value.genCode(ctx) s""" - ${defaultValueEval.code}; + ${defaultValueEval.code} ${ev.isNull} = ${defaultValueEval.isNull}; ${ev.value} = ${defaultValueEval.value}; """.stripMargin diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala index 802988038a6..8fb04cd1ac7 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala @@ -2532,4 +2532,24 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper Date.valueOf("2017-02-12"))) } } + + test("SplitPart") { +val delimiter = Literal.create(".", StringType) +val str = StringSplitSQL(Literal.create("11.12.13", StringType), delimiter) +val outOfBoundValue = Some(Literal.create("", StringType)) + +checkEvaluation(ElementAt(str, Literal(3), outOfBoundValue), UTF8String.fromString("13")) +checkEvaluation(ElementAt(str, Literal(1), outOfBoundValue), UTF8String.fromString("11")) +checkEvaluation(ElementAt(str, Literal(10), outOfBoundValue), UTF8String.fromString("")) +checkEvaluation(ElementAt(str, Literal(-10), outOfBoundValue), UTF8String.fromString("")) + +checkEvaluation(ElementAt(StringSplitSQL(Literal.create(null, StringType), delimiter), + Literal(1), outOfBoundValue), null) +checkEvaluation(ElementAt(StringSplitSQL(Literal.create("11.12.13", StringType), + Literal.create(null, StringType)), Literal(1), outOfBoundValue), null) + +intercept[Exception] { + checkEvaluation(ElementAt(str, Literal(0), outOfBoundValue), null) +}.getMessage.contains("The index 0 is invalid") + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5a71a7f7b5c -> 4f8654b70fc)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 5a71a7f7b5c [SPARK-40020][SQL][FOLLOWUP] Some more code cleanup add 4f8654b70fc [SPARK-40173][PYTHON][CORE][DOCS] Make pyspark.taskcontext examples self-contained No new revisions were added by this update. Summary of changes: dev/sparktestsupport/modules.py | 1 + python/pyspark/taskcontext.py | 234 +++- 2 files changed, 210 insertions(+), 25 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org