[spark] branch master updated (dff5c2f2e9c -> d4c58159925)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from dff5c2f2e9c [SPARK-39976][SQL] ArrayIntersect should handle null in left expression correctly add d4c58159925 [SPARK-39635][SQL] Support driver metrics in DS v2 custom metric API No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/connector/read/Scan.java | 10 .../execution/datasources/v2/BatchScanExec.scala | 4 +- .../datasources/v2/ContinuousScanExec.scala| 4 +- .../datasources/v2/DataSourceV2ScanExecBase.scala | 14 - .../datasources/v2/MicroBatchScanExec.scala| 5 +- .../execution/ui/SQLAppStatusListenerSuite.scala | 70 ++ 6 files changed, 103 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-39976][SQL] ArrayIntersect should handle null in left expression correctly
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new d7af1d20f06 [SPARK-39976][SQL] ArrayIntersect should handle null in left expression correctly d7af1d20f06 is described below commit d7af1d20f06412f80798c53d8588356ee1490afe Author: Angerszh AuthorDate: Fri Aug 12 10:52:33 2022 +0800 [SPARK-39976][SQL] ArrayIntersect should handle null in left expression correctly ### What changes were proposed in this pull request? `ArrayInterscet` miss judge if null contains in right expression's hash set. ``` >>> a = [1, 2, 3] >>> b = [3, None, 5] >>> df = spark.sparkContext.parallelize(data).toDF(["a","b"]) >>> df.show() +-++ |a| b| +-++ |[1, 2, 3]|[3, null, 5]| +-++ >>> df.selectExpr("array_intersect(a,b)").show() +-+ |array_intersect(a, b)| +-+ | [3]| +-+ >>> df.selectExpr("array_intersect(b,a)").show() +-+ |array_intersect(b, a)| +-+ |[3, null]| +-+ ``` In origin code gen's code path, when handle `ArrayIntersect`'s array1, it use the below code ``` def withArray1NullAssignment(body: String) = if (left.dataType.asInstanceOf[ArrayType].containsNull) { if (right.dataType.asInstanceOf[ArrayType].containsNull) { s""" |if ($array1.isNullAt($i)) { | if ($foundNullElement) { |$nullElementIndex = $size; |$foundNullElement = false; |$size++; |$builder.$$plus$$eq($nullValueHolder); | } |} else { | $body |} """.stripMargin } else { s""" |if (!$array1.isNullAt($i)) { | $body |} """.stripMargin } } else { body } ``` We have a flag `foundNullElement` to indicate if array2 really contains a null value. But when implement https://issues.apache.org/jira/browse/SPARK-36829, misunderstand the meaning of `ArrayType.containsNull`, so when implement `SQLOpenHashSet.withNullCheckCode()` ``` def withNullCheckCode( arrayContainsNull: Boolean, setContainsNull: Boolean, array: String, index: String, hashSet: String, handleNotNull: (String, String) => String, handleNull: String): String = { if (arrayContainsNull) { if (setContainsNull) { s""" |if ($array.isNullAt($index)) { | if (!$hashSet.containsNull()) { |$hashSet.addNull(); |$handleNull | } |} else { | ${handleNotNull(array, index)} |} """.stripMargin } else { s""" |if (!$array.isNullAt($index)) { | ${handleNotNull(array, index)} |} """.stripMargin } } else { handleNotNull(array, index) } } ``` The code path of ` if (arrayContainsNull && setContainsNull) ` is misinterpreted that array's openHashSet really have a null value. In this pr we add a new parameter `additionalCondition ` to complements the previous implementation of `foundNullElement`. Also refactor the method's parameter name. ### Why are the changes needed? Fix data correct issue ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT Closes #37436 from AngersZh/SPARK-39776-FOLLOW_UP. Lead-authored-by: Angerszh Co-authored-by: AngersZh Signed-off-by: Wenchen Fan (cherry picked from commit dff5c2f2e9ce233e270e0e5cde0a40f682ba9534) Signed-off-by: Wenchen Fan --- .../expressions/collectionOperations.scala | 8 +++-- .../org/apache/spark/sql/util/SQLOpenHashSet.scala | 8 ++--- .../expressions/CollectionExpressionsSuite.scala | 34 ++ 3 files changed, 43 insertions(+), 7 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index f38beb480e6..650cfc7bca8
[spark] branch master updated: [SPARK-39976][SQL] ArrayIntersect should handle null in left expression correctly
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dff5c2f2e9c [SPARK-39976][SQL] ArrayIntersect should handle null in left expression correctly dff5c2f2e9c is described below commit dff5c2f2e9ce233e270e0e5cde0a40f682ba9534 Author: Angerszh AuthorDate: Fri Aug 12 10:52:33 2022 +0800 [SPARK-39976][SQL] ArrayIntersect should handle null in left expression correctly ### What changes were proposed in this pull request? `ArrayInterscet` miss judge if null contains in right expression's hash set. ``` >>> a = [1, 2, 3] >>> b = [3, None, 5] >>> df = spark.sparkContext.parallelize(data).toDF(["a","b"]) >>> df.show() +-++ |a| b| +-++ |[1, 2, 3]|[3, null, 5]| +-++ >>> df.selectExpr("array_intersect(a,b)").show() +-+ |array_intersect(a, b)| +-+ | [3]| +-+ >>> df.selectExpr("array_intersect(b,a)").show() +-+ |array_intersect(b, a)| +-+ |[3, null]| +-+ ``` In origin code gen's code path, when handle `ArrayIntersect`'s array1, it use the below code ``` def withArray1NullAssignment(body: String) = if (left.dataType.asInstanceOf[ArrayType].containsNull) { if (right.dataType.asInstanceOf[ArrayType].containsNull) { s""" |if ($array1.isNullAt($i)) { | if ($foundNullElement) { |$nullElementIndex = $size; |$foundNullElement = false; |$size++; |$builder.$$plus$$eq($nullValueHolder); | } |} else { | $body |} """.stripMargin } else { s""" |if (!$array1.isNullAt($i)) { | $body |} """.stripMargin } } else { body } ``` We have a flag `foundNullElement` to indicate if array2 really contains a null value. But when implement https://issues.apache.org/jira/browse/SPARK-36829, misunderstand the meaning of `ArrayType.containsNull`, so when implement `SQLOpenHashSet.withNullCheckCode()` ``` def withNullCheckCode( arrayContainsNull: Boolean, setContainsNull: Boolean, array: String, index: String, hashSet: String, handleNotNull: (String, String) => String, handleNull: String): String = { if (arrayContainsNull) { if (setContainsNull) { s""" |if ($array.isNullAt($index)) { | if (!$hashSet.containsNull()) { |$hashSet.addNull(); |$handleNull | } |} else { | ${handleNotNull(array, index)} |} """.stripMargin } else { s""" |if (!$array.isNullAt($index)) { | ${handleNotNull(array, index)} |} """.stripMargin } } else { handleNotNull(array, index) } } ``` The code path of ` if (arrayContainsNull && setContainsNull) ` is misinterpreted that array's openHashSet really have a null value. In this pr we add a new parameter `additionalCondition ` to complements the previous implementation of `foundNullElement`. Also refactor the method's parameter name. ### Why are the changes needed? Fix data correct issue ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT Closes #37436 from AngersZh/SPARK-39776-FOLLOW_UP. Lead-authored-by: Angerszh Co-authored-by: AngersZh Signed-off-by: Wenchen Fan --- .../expressions/collectionOperations.scala | 8 +++-- .../org/apache/spark/sql/util/SQLOpenHashSet.scala | 8 ++--- .../expressions/CollectionExpressionsSuite.scala | 34 ++ 3 files changed, 43 insertions(+), 7 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index ae23775b62d..d6a9601f884 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
[spark] branch branch-3.3 updated (221fee8973c -> 21d9db39b7a)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git from 221fee8973c [SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from `htmlunit` add 21d9db39b7a [SPARK-39887][SQL][3.3] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/Optimizer.scala | 26 +- .../RemoveRedundantAliasAndProjectSuite.scala | 2 +- .../approved-plans-v1_4/q14a.sf100/explain.txt | 270 ++--- .../approved-plans-v1_4/q14a/explain.txt | 254 +-- .../org/apache/spark/sql/DataFrameSuite.scala | 61 + .../sql/execution/metric/SQLMetricsSuite.scala | 5 +- 6 files changed, 347 insertions(+), 271 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39985][SQL] Enable implicit DEFAULT column values in inserts from DataFrames
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 31a132bcaee [SPARK-39985][SQL] Enable implicit DEFAULT column values in inserts from DataFrames 31a132bcaee is described below commit 31a132bcaeea2340d57515dd0f99406b06528cb4 Author: Daniel Tenedorio AuthorDate: Thu Aug 11 19:03:36 2022 -0700 [SPARK-39985][SQL] Enable implicit DEFAULT column values in inserts from DataFrames ### What changes were proposed in this pull request? Enable implicit DEFAULT column values in inserts from DataFrames. This mostly already worked since the DataFrame inserts already converted to LogicalPlans. I added testing and a small analysis change since the operators are resolved one-by-one instead of all at once. Note that explicit column "default" references are not supported in write operations from DataFrames: since the operators are resolved one-by-one, any `.select` referring to "default" generates a "column not found" error before any following `.insertInto`. ### Why are the changes needed? This makes inserts from DataFrames produce the same results as those from SQL commands, for consistency and correctness. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Extended the `InsertSuite` in this PR. Closes #37423 from dtenedor/defaults-in-dataframes. Authored-by: Daniel Tenedorio Signed-off-by: Gengliang Wang --- .../catalyst/analysis/ResolveDefaultColumns.scala | 39 ++- .../org/apache/spark/sql/sources/InsertSuite.scala | 311 +++-- 2 files changed, 259 insertions(+), 91 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala index c340359f2ca..b7c7f0d3772 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala @@ -83,6 +83,9 @@ case class ResolveDefaultColumns(catalog: SessionCatalog) extends Rule[LogicalPl case u: UnresolvedInlineTable if u.rows.nonEmpty && u.rows.forall(_.size == u.rows(0).size) => true + case r: LocalRelation +if r.data.nonEmpty && r.data.forall(_.numFields == r.data(0).numFields) => +true case _ => false } @@ -99,14 +102,13 @@ case class ResolveDefaultColumns(catalog: SessionCatalog) extends Rule[LogicalPl children.append(node) node = node.children(0) } -val table = node.asInstanceOf[UnresolvedInlineTable] val insertTableSchemaWithoutPartitionColumns: Option[StructType] = getInsertTableSchemaWithoutPartitionColumns(i) insertTableSchemaWithoutPartitionColumns.map { schema: StructType => val regenerated: InsertIntoStatement = regenerateUserSpecifiedCols(i, schema) - val expanded: UnresolvedInlineTable = -addMissingDefaultValuesForInsertFromInlineTable(table, schema) + val expanded: LogicalPlan = +addMissingDefaultValuesForInsertFromInlineTable(node, schema) val replaced: Option[LogicalPlan] = replaceExplicitDefaultValuesForInputOfInsertInto(schema, expanded) replaced.map { r: LogicalPlan => @@ -262,16 +264,33 @@ case class ResolveDefaultColumns(catalog: SessionCatalog) extends Rule[LogicalPl * Updates an inline table to generate missing default column values. */ private def addMissingDefaultValuesForInsertFromInlineTable( - table: UnresolvedInlineTable, - insertTableSchemaWithoutPartitionColumns: StructType): UnresolvedInlineTable = { -val numQueryOutputs: Int = table.rows(0).size + node: LogicalPlan, + insertTableSchemaWithoutPartitionColumns: StructType): LogicalPlan = { +val numQueryOutputs: Int = node match { + case table: UnresolvedInlineTable => table.rows(0).size + case local: LocalRelation => local.data(0).numFields +} val schema = insertTableSchemaWithoutPartitionColumns val newDefaultExpressions: Seq[Expression] = getDefaultExpressionsForInsert(numQueryOutputs, schema) val newNames: Seq[String] = schema.fields.drop(numQueryOutputs).map { _.name } -table.copy( - names = table.names ++ newNames, - rows = table.rows.map { row => row ++ newDefaultExpressions }) +node match { + case _ if newDefaultExpressions.isEmpty => node + case table: UnresolvedInlineTable => +table.copy( + names = table.names ++ newNames, + rows = table.rows.map { row => row ++ newDefaultExpressions }) + case local:
[spark] branch master updated (7f3baa77acb -> dd49a775d54)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 7f3baa77acb [SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from `htmlunit` add dd49a775d54 [SPARK-40029][PYTHON][DOC] Make pyspark.sql.types examples self-contained No new revisions were added by this update. Summary of changes: python/pyspark/sql/types.py | 54 - 1 file changed, 48 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from `htmlunit`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 45d42e17199 [SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from `htmlunit` 45d42e17199 is described below commit 45d42e1719933667981e7e490a2a21623501e6dd Author: yangjie01 AuthorDate: Thu Aug 11 15:10:42 2022 -0700 [SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from `htmlunit` ### What changes were proposed in this pull request? This pr exclude `xalan` from `htmlunit` to clean warning of CVE-2022-34169: ``` Provides transitive vulnerable dependency xalan:xalan:2.7.2 CVE-2022-34169 7.5 Integer Coercion Error vulnerability with medium severity found Results powered by Checkmarx(c) ``` `xalan:xalan:2.7.2` is the latest version, the code base has not been updated for 5 years, so can't solve by upgrading `xalan`. ### Why are the changes needed? The vulnerability is described is [CVE-2022-34169](https://github.com/advisories/GHSA-9339-86wc-4qgf), better to exclude it although it's just test dependency for Spark. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GitHub Actions - Manual test: run `mvn dependency:tree -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive | grep xalan` to check that `xalan` is not matched after this pr Closes #37481 from LuciferYang/exclude-xalan. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun (cherry picked from commit 7f3baa77acbf7747963a95d0f24e3b8868c7b16a) Signed-off-by: Dongjoon Hyun --- pom.xml | 6 ++ 1 file changed, 6 insertions(+) diff --git a/pom.xml b/pom.xml index 9165d62fd18..81c5d5f6370 100644 --- a/pom.xml +++ b/pom.xml @@ -670,6 +670,12 @@ net.sourceforge.htmlunit htmlunit ${htmlunit.version} + + +xalan +xalan + + test - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from `htmlunit`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 221fee8973c [SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from `htmlunit` 221fee8973c is described below commit 221fee8973ce438b089fae769dd054c47f6774ed Author: yangjie01 AuthorDate: Thu Aug 11 15:10:42 2022 -0700 [SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from `htmlunit` ### What changes were proposed in this pull request? This pr exclude `xalan` from `htmlunit` to clean warning of CVE-2022-34169: ``` Provides transitive vulnerable dependency xalan:xalan:2.7.2 CVE-2022-34169 7.5 Integer Coercion Error vulnerability with medium severity found Results powered by Checkmarx(c) ``` `xalan:xalan:2.7.2` is the latest version, the code base has not been updated for 5 years, so can't solve by upgrading `xalan`. ### Why are the changes needed? The vulnerability is described is [CVE-2022-34169](https://github.com/advisories/GHSA-9339-86wc-4qgf), better to exclude it although it's just test dependency for Spark. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GitHub Actions - Manual test: run `mvn dependency:tree -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive | grep xalan` to check that `xalan` is not matched after this pr Closes #37481 from LuciferYang/exclude-xalan. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun (cherry picked from commit 7f3baa77acbf7747963a95d0f24e3b8868c7b16a) Signed-off-by: Dongjoon Hyun --- pom.xml | 6 ++ 1 file changed, 6 insertions(+) diff --git a/pom.xml b/pom.xml index 206cad9eb98..f639e5e5444 100644 --- a/pom.xml +++ b/pom.xml @@ -709,6 +709,12 @@ net.sourceforge.htmlunit htmlunit ${htmlunit.version} + + +xalan +xalan + + test - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from `htmlunit`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7f3baa77acb [SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from `htmlunit` 7f3baa77acb is described below commit 7f3baa77acbf7747963a95d0f24e3b8868c7b16a Author: yangjie01 AuthorDate: Thu Aug 11 15:10:42 2022 -0700 [SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from `htmlunit` ### What changes were proposed in this pull request? This pr exclude `xalan` from `htmlunit` to clean warning of CVE-2022-34169: ``` Provides transitive vulnerable dependency xalan:xalan:2.7.2 CVE-2022-34169 7.5 Integer Coercion Error vulnerability with medium severity found Results powered by Checkmarx(c) ``` `xalan:xalan:2.7.2` is the latest version, the code base has not been updated for 5 years, so can't solve by upgrading `xalan`. ### Why are the changes needed? The vulnerability is described is [CVE-2022-34169](https://github.com/advisories/GHSA-9339-86wc-4qgf), better to exclude it although it's just test dependency for Spark. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GitHub Actions - Manual test: run `mvn dependency:tree -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive | grep xalan` to check that `xalan` is not matched after this pr Closes #37481 from LuciferYang/exclude-xalan. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun --- pom.xml | 6 ++ 1 file changed, 6 insertions(+) diff --git a/pom.xml b/pom.xml index c197987cd53..b6bbfef1854 100644 --- a/pom.xml +++ b/pom.xml @@ -712,6 +712,12 @@ net.sourceforge.htmlunit htmlunit ${htmlunit.version} + + +xalan +xalan + + test - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39927][BUILD] Upgrade to Avro 1.11.1
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4394e244bbd [SPARK-39927][BUILD] Upgrade to Avro 1.11.1 4394e244bbd is described below commit 4394e244bbd50d0b625b373351d38508f4debf41 Author: Ismaël Mejía AuthorDate: Thu Aug 11 15:05:41 2022 -0700 [SPARK-39927][BUILD] Upgrade to Avro 1.11.1 ### What changes were proposed in this pull request? Update the Avro version to 1.11.1 ### Why are the changes needed? To stay up to date with upstream ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests Closes #37352 from iemejia/SPARK-39927-avro-1.11.1. Authored-by: Ismaël Mejía Signed-off-by: Dongjoon Hyun --- .../avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala | 4 ++-- .../avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala | 5 ++--- dev/deps/spark-deps-hadoop-2-hive-2.3 | 6 +++--- dev/deps/spark-deps-hadoop-3-hive-2.3 | 6 +++--- docs/sql-data-sources-avro.md | 4 ++-- pom.xml | 2 +- project/SparkBuild.scala| 2 +- .../scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala| 2 +- 8 files changed, 15 insertions(+), 16 deletions(-) diff --git a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala index 3c68cbd537a..540420974f5 100644 --- a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala +++ b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala @@ -79,14 +79,14 @@ private[sql] class AvroOptions( /** * Top level record name in write result, which is required in Avro spec. - * See https://avro.apache.org/docs/1.11.0/spec.html#schema_record . + * See https://avro.apache.org/docs/1.11.1/spec.html#schema_record . * Default value is "topLevelRecord" */ val recordName: String = parameters.getOrElse("recordName", "topLevelRecord") /** * Record namespace in write result. Default value is "". - * See Avro spec for details: https://avro.apache.org/docs/1.11.0/spec.html#schema_record . + * See Avro spec for details: https://avro.apache.org/docs/1.11.1/spec.html#schema_record . */ val recordNamespace: String = parameters.getOrElse("recordNamespace", "") diff --git a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala index 8a088a43579..4a1749533ab 100644 --- a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala +++ b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala @@ -875,7 +875,7 @@ abstract class AvroSuite dfWithNull.write.format("avro") .option("avroSchema", avroSchema).save(s"$tempDir/${UUID.randomUUID()}") } - assertExceptionMsg[AvroTypeException](e1, "Not an enum: null") + assertExceptionMsg[AvroTypeException](e1, "value null is not a SuitEnumType") // Writing df containing data not in the enum will throw an exception val e2 = intercept[SparkException] { @@ -1075,8 +1075,7 @@ abstract class AvroSuite .save(s"$tempDir/${UUID.randomUUID()}") }.getCause.getMessage assert(message.contains("Caused by: java.lang.NullPointerException: ")) - assert(message.contains( -"null of string in string in field Name of test_schema in test_schema")) + assert(message.contains("null in string in field Name")) } } diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 b/dev/deps/spark-deps-hadoop-2-hive-2.3 index a86bbc52431..13e5e56ab0e 100644 --- a/dev/deps/spark-deps-hadoop-2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2-hive-2.3 @@ -23,9 +23,9 @@ arrow-memory-netty/9.0.0//arrow-memory-netty-9.0.0.jar arrow-vector/9.0.0//arrow-vector-9.0.0.jar audience-annotations/0.5.0//audience-annotations-0.5.0.jar automaton/1.11-8//automaton-1.11-8.jar -avro-ipc/1.11.0//avro-ipc-1.11.0.jar -avro-mapred/1.11.0//avro-mapred-1.11.0.jar -avro/1.11.0//avro-1.11.0.jar +avro-ipc/1.11.1//avro-ipc-1.11.1.jar +avro-mapred/1.11.1//avro-mapred-1.11.1.jar +avro/1.11.1//avro-1.11.1.jar azure-storage/2.0.0//azure-storage-2.0.0.jar blas/2.2.1//blas-2.2.1.jar bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 854919a9af6..c221e092806 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -22,9 +22,9 @@
[spark] branch master updated (a49f66fe49d -> 126870eddc5)
This is an automated email from the ASF dual-hosted git repository. mridulm80 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from a49f66fe49d [SPARK-38921][K8S][TESTS] Use k8s-client to create queue resource in Volcano IT add 126870eddc5 [SPARK-39955][CORE] Improve LaunchTask process to avoid Stage failures caused by fail-to-send LaunchTask messages No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/TaskInfo.scala | 6 +++ .../apache/spark/scheduler/TaskSchedulerImpl.scala | 3 ++ .../apache/spark/scheduler/TaskSetManager.scala| 7 +++- .../spark/scheduler/TaskSchedulerImplSuite.scala | 47 ++ .../spark/scheduler/TaskSetManagerSuite.scala | 12 ++ 5 files changed, 73 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-38921][K8S][TESTS] Use k8s-client to create queue resource in Volcano IT
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a49f66fe49d [SPARK-38921][K8S][TESTS] Use k8s-client to create queue resource in Volcano IT a49f66fe49d is described below commit a49f66fe49d4d4bbfb41da2e5bbb5af4bd64d1da Author: Yikun Jiang AuthorDate: Thu Aug 11 08:28:57 2022 -0700 [SPARK-38921][K8S][TESTS] Use k8s-client to create queue resource in Volcano IT ### What changes were proposed in this pull request? Use fabric8io/k8s-client to create queue resource in Volcano IT. ### Why are the changes needed? Use k8s-client to create volcano queue to - Make code easy to understand - Enable abity to set queue capacity dynamically. This will help to support running Volcano test in a resource limited env (such as github action). ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Volcano IT passed Closes #36219 from Yikun/SPARK-38921. Authored-by: Yikun Jiang Signed-off-by: Dongjoon Hyun --- .../src/test/resources/volcano/disable-queue.yml | 24 --- .../volcano/disable-queue0-enable-queue1.yml | 31 - .../volcano/driver-podgroup-template-cpu-2u.yml| 2 +- .../volcano/driver-podgroup-template-memory-3g.yml | 2 +- .../src/test/resources/volcano/enable-queue.yml| 24 --- .../volcano/enable-queue0-enable-queue1.yml| 29 - .../src/test/resources/volcano/queue-2u-3g.yml | 25 .../k8s/integrationtest/VolcanoTestsSuite.scala| 74 +++--- 8 files changed, 52 insertions(+), 159 deletions(-) diff --git a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue.yml b/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue.yml deleted file mode 100644 index d9f8c36471e..000 --- a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue.yml +++ /dev/null @@ -1,24 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -#http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -apiVersion: scheduling.volcano.sh/v1beta1 -kind: Queue -metadata: - name: queue -spec: - weight: 1 - capability: -cpu: "0.001" diff --git a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue0-enable-queue1.yml b/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue0-enable-queue1.yml deleted file mode 100644 index 82e479478cc..000 --- a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue0-enable-queue1.yml +++ /dev/null @@ -1,31 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -#http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -apiVersion: scheduling.volcano.sh/v1beta1 -kind: Queue -metadata: - name: queue0 -spec: - weight: 1 - capability: -cpu: "0.001" -apiVersion: scheduling.volcano.sh/v1beta1 -kind: Queue -metadata: - name: queue1 -spec: - weight: 1 diff --git a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/driver-podgroup-template-cpu-2u.yml b/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/driver-podgroup-template-cpu-2u.yml index e6d53ddc8b5..4a784f0f864 100644 ---
[spark] branch master updated (71792411083 -> 9dff034bdef)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 71792411083 [SPARK-40027][PYTHON][SS][DOCS] Add self-contained examples for pyspark.sql.streaming.readwriter add 9dff034bdef [SPARK-39966][SQL] Use V2 Filter in SupportsDelete No new revisions were added by this update. Summary of changes: .../sql/connector/catalog/SupportsDelete.java | 14 +- .../{SupportsDelete.java => SupportsDeleteV2.java} | 32 +- .../connector/read/SupportsRuntimeFiltering.java | 19 +- .../catalyst/analysis/RewriteDeleteFromTable.scala | 6 +- .../sql/catalyst/plans/logical/v2Commands.scala| 5 +- .../spark/sql/errors/QueryCompilationErrors.scala | 3 +- .../datasources/v2/DataSourceV2Implicits.scala | 6 +- .../sql/internal/connector/PredicateUtils.scala| 8 +- ...InMemoryTable.scala => InMemoryBaseTable.scala} | 38 +- .../sql/connector/catalog/InMemoryTable.scala | 646 + .../catalog/InMemoryTableWithV2Filter.scala| 81 ++- .../datasources/v2/DataSourceV2Strategy.scala | 8 +- .../datasources/v2/DeleteFromTableExec.scala | 8 +- .../v2/OptimizeMetadataOnlyDeleteFromTable.scala | 12 +- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 89 ++- .../spark/sql/connector/DatasourceV2SQLBase.scala | 4 +- .../spark/sql/connector/V1WriteFallbackSuite.scala | 4 +- 17 files changed, 265 insertions(+), 718 deletions(-) copy sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/{SupportsDelete.java => SupportsDeleteV2.java} (74%) copy sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/{InMemoryTable.scala => InMemoryBaseTable.scala} (94%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40027][PYTHON][SS][DOCS] Add self-contained examples for pyspark.sql.streaming.readwriter
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 71792411083 [SPARK-40027][PYTHON][SS][DOCS] Add self-contained examples for pyspark.sql.streaming.readwriter 71792411083 is described below commit 71792411083a71bcfd7a0d94ddf754bf09a27054 Author: Hyukjin Kwon AuthorDate: Thu Aug 11 20:19:24 2022 +0900 [SPARK-40027][PYTHON][SS][DOCS] Add self-contained examples for pyspark.sql.streaming.readwriter ### What changes were proposed in this pull request? This PR proposes to improve the examples in `pyspark.sql.streaming.readwriter` by making each example self-contained with a brief explanation and a bit more realistic example. ### Why are the changes needed? To make the documentation more readable and able to copy and paste directly in PySpark shell. ### Does this PR introduce _any_ user-facing change? Yes, it changes the documentation ### How was this patch tested? Manually ran each doctest. Closes #37461 from HyukjinKwon/SPARK-40027. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- python/pyspark/sql/streaming/readwriter.py | 441 +++-- 1 file changed, 357 insertions(+), 84 deletions(-) diff --git a/python/pyspark/sql/streaming/readwriter.py b/python/pyspark/sql/streaming/readwriter.py index 74b89dbe46c..ef3b7e525e3 100644 --- a/python/pyspark/sql/streaming/readwriter.py +++ b/python/pyspark/sql/streaming/readwriter.py @@ -24,7 +24,7 @@ from py4j.java_gateway import java_import, JavaObject from pyspark.sql.column import _to_seq from pyspark.sql.readwriter import OptionUtils, to_str from pyspark.sql.streaming.query import StreamingQuery -from pyspark.sql.types import Row, StructType, StructField, StringType +from pyspark.sql.types import Row, StructType from pyspark.sql.utils import ForeachBatchFunction if TYPE_CHECKING: @@ -46,6 +46,22 @@ class DataStreamReader(OptionUtils): Notes - This API is evolving. + +Examples + +>>> spark.readStream + + +The example below uses Rate source that generates rows continously. +After that, we operate a modulo by 3, and then writes the stream out to the console. +The streaming query stops in 3 seconds. + +>>> import time +>>> df = spark.readStream.format("rate").load() +>>> df = df.selectExpr("value % 3 as v") +>>> q = df.writeStream.format("console").start() +>>> time.sleep(3) +>>> q.stop() """ def __init__(self, spark: "SparkSession") -> None: @@ -73,7 +89,23 @@ class DataStreamReader(OptionUtils): Examples ->>> s = spark.readStream.format("text") +>>> spark.readStream.format("text") + + +This API allows to configure other sources to read. The example below writes a small text +file, and reads it back via Text source. + +>>> import tempfile +>>> import time +>>> with tempfile.TemporaryDirectory() as d: +... # Write a temporary text file to read it. +... spark.createDataFrame( +... [("hello",), ("this",)]).write.mode("overwrite").format("text").save(d) +... +... # Start a streaming query to read the text file. +... q = spark.readStream.format("text").load(d).writeStream.format("console").start() +... time.sleep(3) +... q.stop() """ self._jreader = self._jreader.format(source) return self @@ -99,8 +131,22 @@ class DataStreamReader(OptionUtils): Examples ->>> s = spark.readStream.schema(sdf_schema) ->>> s = spark.readStream.schema("col0 INT, col1 DOUBLE") +>>> from pyspark.sql.types import StructField, StructType, StringType +>>> spark.readStream.schema(StructType([StructField("data", StringType(), True)])) + +>>> spark.readStream.schema("col0 INT, col1 DOUBLE") + + +The example below specifies a different schema to CSV file. + +>>> import tempfile +>>> import time +>>> with tempfile.TemporaryDirectory() as d: +... # Start a streaming query to read the CSV file. +... spark.readStream.schema("col0 INT, col1 STRING").format("csv").load(d).printSchema() +root + |-- col0: integer (nullable = true) + |-- col1: string (nullable = true) """ from pyspark.sql import SparkSession @@ -125,7 +171,17 @@ class DataStreamReader(OptionUtils): Examples ->>> s = spark.readStream.option("x", 1) +>>> spark.readStream.option("x", 1) + + +The example below specifies 'rowsPerSecond' option to Rate source
[spark] branch branch-3.1 updated: [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 624277640ff [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table 624277640ff is described below commit 624277640ffcf0ce6bff76179126ccb8f9340ca2 Author: Hyukjin Kwon AuthorDate: Thu Aug 11 15:01:05 2022 +0900 [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/30835 that adds `DataStreamWriter.toTable` and `DataStreamReader.table` into PySpark documentation. ### Why are the changes needed? To document both features. ### Does this PR introduce _any_ user-facing change? Yes, both API will be shown in PySpark reference documentation. ### How was this patch tested? Manually built the documentation and checked. Closes #37477 from HyukjinKwon/SPARK-40043. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon (cherry picked from commit 447003324d2cf9f2bfa799ef3a1e744a5bc9277d) Signed-off-by: Hyukjin Kwon --- python/docs/source/reference/pyspark.ss.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/python/docs/source/reference/pyspark.ss.rst b/python/docs/source/reference/pyspark.ss.rst index a7936a4f2a5..c3e532646ac 100644 --- a/python/docs/source/reference/pyspark.ss.rst +++ b/python/docs/source/reference/pyspark.ss.rst @@ -52,6 +52,7 @@ Input and Output DataStreamReader.orc DataStreamReader.parquet DataStreamReader.schema +DataStreamReader.table DataStreamReader.text DataStreamWriter.foreach DataStreamWriter.foreachBatch @@ -62,6 +63,7 @@ Input and Output DataStreamWriter.partitionBy DataStreamWriter.queryName DataStreamWriter.start +DataStreamWriter.toTable DataStreamWriter.trigger Query Management - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 0f3107d5f5b [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table 0f3107d5f5b is described below commit 0f3107d5f5bdcc3887aa2df05e99c121a76bd07e Author: Hyukjin Kwon AuthorDate: Thu Aug 11 15:01:05 2022 +0900 [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/30835 that adds `DataStreamWriter.toTable` and `DataStreamReader.table` into PySpark documentation. ### Why are the changes needed? To document both features. ### Does this PR introduce _any_ user-facing change? Yes, both API will be shown in PySpark reference documentation. ### How was this patch tested? Manually built the documentation and checked. Closes #37477 from HyukjinKwon/SPARK-40043. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon (cherry picked from commit 447003324d2cf9f2bfa799ef3a1e744a5bc9277d) Signed-off-by: Hyukjin Kwon --- python/docs/source/reference/pyspark.ss.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/python/docs/source/reference/pyspark.ss.rst b/python/docs/source/reference/pyspark.ss.rst index a7936a4f2a5..c3e532646ac 100644 --- a/python/docs/source/reference/pyspark.ss.rst +++ b/python/docs/source/reference/pyspark.ss.rst @@ -52,6 +52,7 @@ Input and Output DataStreamReader.orc DataStreamReader.parquet DataStreamReader.schema +DataStreamReader.table DataStreamReader.text DataStreamWriter.foreach DataStreamWriter.foreachBatch @@ -62,6 +63,7 @@ Input and Output DataStreamWriter.partitionBy DataStreamWriter.queryName DataStreamWriter.start +DataStreamWriter.toTable DataStreamWriter.trigger Query Management - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 248e8b49b11 [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table 248e8b49b11 is described below commit 248e8b49b114d725e7a94bc8193f371b89270af7 Author: Hyukjin Kwon AuthorDate: Thu Aug 11 15:01:05 2022 +0900 [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/30835 that adds `DataStreamWriter.toTable` and `DataStreamReader.table` into PySpark documentation. ### Why are the changes needed? To document both features. ### Does this PR introduce _any_ user-facing change? Yes, both API will be shown in PySpark reference documentation. ### How was this patch tested? Manually built the documentation and checked. Closes #37477 from HyukjinKwon/SPARK-40043. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon (cherry picked from commit 447003324d2cf9f2bfa799ef3a1e744a5bc9277d) Signed-off-by: Hyukjin Kwon --- python/docs/source/reference/pyspark.ss/io.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/python/docs/source/reference/pyspark.ss/io.rst b/python/docs/source/reference/pyspark.ss/io.rst index da476fb6fac..7a20777fdc7 100644 --- a/python/docs/source/reference/pyspark.ss/io.rst +++ b/python/docs/source/reference/pyspark.ss/io.rst @@ -34,6 +34,7 @@ Input/Output DataStreamReader.orc DataStreamReader.parquet DataStreamReader.schema +DataStreamReader.table DataStreamReader.text DataStreamWriter.foreach DataStreamWriter.foreachBatch @@ -44,4 +45,5 @@ Input/Output DataStreamWriter.partitionBy DataStreamWriter.queryName DataStreamWriter.start +DataStreamWriter.toTable DataStreamWriter.trigger - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 447003324d2 [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table 447003324d2 is described below commit 447003324d2cf9f2bfa799ef3a1e744a5bc9277d Author: Hyukjin Kwon AuthorDate: Thu Aug 11 15:01:05 2022 +0900 [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/30835 that adds `DataStreamWriter.toTable` and `DataStreamReader.table` into PySpark documentation. ### Why are the changes needed? To document both features. ### Does this PR introduce _any_ user-facing change? Yes, both API will be shown in PySpark reference documentation. ### How was this patch tested? Manually built the documentation and checked. Closes #37477 from HyukjinKwon/SPARK-40043. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- python/docs/source/reference/pyspark.ss/io.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/python/docs/source/reference/pyspark.ss/io.rst b/python/docs/source/reference/pyspark.ss/io.rst index da476fb6fac..7a20777fdc7 100644 --- a/python/docs/source/reference/pyspark.ss/io.rst +++ b/python/docs/source/reference/pyspark.ss/io.rst @@ -34,6 +34,7 @@ Input/Output DataStreamReader.orc DataStreamReader.parquet DataStreamReader.schema +DataStreamReader.table DataStreamReader.text DataStreamWriter.foreach DataStreamWriter.foreachBatch @@ -44,4 +45,5 @@ Input/Output DataStreamWriter.partitionBy DataStreamWriter.queryName DataStreamWriter.start +DataStreamWriter.toTable DataStreamWriter.trigger - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org