date:20220811

[spark] branch master updated (dff5c2f2e9c -> d4c58159925)

2022-08-11 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from dff5c2f2e9c [SPARK-39976][SQL] ArrayIntersect should handle null in 
left expression correctly
 add d4c58159925 [SPARK-39635][SQL] Support driver metrics in DS v2 custom 
metric API

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/connector/read/Scan.java  | 10 
 .../execution/datasources/v2/BatchScanExec.scala   |  4 +-
 .../datasources/v2/ContinuousScanExec.scala|  4 +-
 .../datasources/v2/DataSourceV2ScanExecBase.scala  | 14 -
 .../datasources/v2/MicroBatchScanExec.scala|  5 +-
 .../execution/ui/SQLAppStatusListenerSuite.scala   | 70 ++
 6 files changed, 103 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-39976][SQL] ArrayIntersect should handle null in left expression correctly

2022-08-11 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new d7af1d20f06 [SPARK-39976][SQL] ArrayIntersect should handle null in 
left expression correctly
d7af1d20f06 is described below

commit d7af1d20f06412f80798c53d8588356ee1490afe
Author: Angerszh 
AuthorDate: Fri Aug 12 10:52:33 2022 +0800

[SPARK-39976][SQL] ArrayIntersect should handle null in left expression 
correctly

### What changes were proposed in this pull request?
`ArrayInterscet` miss judge if null contains in right expression's hash set.

```
>>> a = [1, 2, 3]
>>> b = [3, None, 5]
>>> df = spark.sparkContext.parallelize(data).toDF(["a","b"])
>>> df.show()
+-++
|a|   b|
+-++
|[1, 2, 3]|[3, null, 5]|
+-++

>>> df.selectExpr("array_intersect(a,b)").show()
+-+
|array_intersect(a, b)|
+-+
|  [3]|
+-+

>>> df.selectExpr("array_intersect(b,a)").show()
+-+
|array_intersect(b, a)|
+-+
|[3, null]|
+-+
```

In origin code gen's code path, when handle `ArrayIntersect`'s array1, it 
use the below code
```
def withArray1NullAssignment(body: String) =
  if (left.dataType.asInstanceOf[ArrayType].containsNull) {
if (right.dataType.asInstanceOf[ArrayType].containsNull) {
  s"""
 |if ($array1.isNullAt($i)) {
 |  if ($foundNullElement) {
 |$nullElementIndex = $size;
 |$foundNullElement = false;
 |$size++;
 |$builder.$$plus$$eq($nullValueHolder);
 |  }
 |} else {
 |  $body
 |}
   """.stripMargin
} else {
  s"""
 |if (!$array1.isNullAt($i)) {
 |  $body
 |}
   """.stripMargin
}
  } else {
body
  }
```
We have a flag `foundNullElement` to indicate if array2 really contains a 
null value. But when implement 
https://issues.apache.org/jira/browse/SPARK-36829, misunderstand the meaning of 
`ArrayType.containsNull`,
so when implement  `SQLOpenHashSet.withNullCheckCode()`
```
  def withNullCheckCode(
  arrayContainsNull: Boolean,
  setContainsNull: Boolean,
  array: String,
  index: String,
  hashSet: String,
  handleNotNull: (String, String) => String,
  handleNull: String): String = {
if (arrayContainsNull) {
  if (setContainsNull) {
s"""
   |if ($array.isNullAt($index)) {
   |  if (!$hashSet.containsNull()) {
   |$hashSet.addNull();
   |$handleNull
   |  }
   |} else {
   |  ${handleNotNull(array, index)}
   |}
 """.stripMargin
  } else {
s"""
   |if (!$array.isNullAt($index)) {
   | ${handleNotNull(array, index)}
   |}
 """.stripMargin
  }
} else {
  handleNotNull(array, index)
}
  }
```
The code path of `  if (arrayContainsNull && setContainsNull) `  is 
misinterpreted that array's openHashSet really have a null value.

In this pr we add a new parameter `additionalCondition ` to complements the 
previous implementation of `foundNullElement`. Also refactor the method's 
parameter name.

### Why are the changes needed?
Fix data correct issue

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT

Closes #37436 from AngersZh/SPARK-39776-FOLLOW_UP.

Lead-authored-by: Angerszh 
Co-authored-by: AngersZh 
Signed-off-by: Wenchen Fan 
(cherry picked from commit dff5c2f2e9ce233e270e0e5cde0a40f682ba9534)
Signed-off-by: Wenchen Fan 
---
 .../expressions/collectionOperations.scala |  8 +++--
 .../org/apache/spark/sql/util/SQLOpenHashSet.scala |  8 ++---
 .../expressions/CollectionExpressionsSuite.scala   | 34 ++
 3 files changed, 43 insertions(+), 7 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index f38beb480e6..650cfc7bca8

[spark] branch master updated: [SPARK-39976][SQL] ArrayIntersect should handle null in left expression correctly

2022-08-11 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dff5c2f2e9c [SPARK-39976][SQL] ArrayIntersect should handle null in 
left expression correctly
dff5c2f2e9c is described below

commit dff5c2f2e9ce233e270e0e5cde0a40f682ba9534
Author: Angerszh 
AuthorDate: Fri Aug 12 10:52:33 2022 +0800

[SPARK-39976][SQL] ArrayIntersect should handle null in left expression 
correctly

### What changes were proposed in this pull request?
`ArrayInterscet` miss judge if null contains in right expression's hash set.

```
>>> a = [1, 2, 3]
>>> b = [3, None, 5]
>>> df = spark.sparkContext.parallelize(data).toDF(["a","b"])
>>> df.show()
+-++
|a|   b|
+-++
|[1, 2, 3]|[3, null, 5]|
+-++

>>> df.selectExpr("array_intersect(a,b)").show()
+-+
|array_intersect(a, b)|
+-+
|  [3]|
+-+

>>> df.selectExpr("array_intersect(b,a)").show()
+-+
|array_intersect(b, a)|
+-+
|[3, null]|
+-+
```

In origin code gen's code path, when handle `ArrayIntersect`'s array1, it 
use the below code
```
def withArray1NullAssignment(body: String) =
  if (left.dataType.asInstanceOf[ArrayType].containsNull) {
if (right.dataType.asInstanceOf[ArrayType].containsNull) {
  s"""
 |if ($array1.isNullAt($i)) {
 |  if ($foundNullElement) {
 |$nullElementIndex = $size;
 |$foundNullElement = false;
 |$size++;
 |$builder.$$plus$$eq($nullValueHolder);
 |  }
 |} else {
 |  $body
 |}
   """.stripMargin
} else {
  s"""
 |if (!$array1.isNullAt($i)) {
 |  $body
 |}
   """.stripMargin
}
  } else {
body
  }
```
We have a flag `foundNullElement` to indicate if array2 really contains a 
null value. But when implement 
https://issues.apache.org/jira/browse/SPARK-36829, misunderstand the meaning of 
`ArrayType.containsNull`,
so when implement  `SQLOpenHashSet.withNullCheckCode()`
```
  def withNullCheckCode(
  arrayContainsNull: Boolean,
  setContainsNull: Boolean,
  array: String,
  index: String,
  hashSet: String,
  handleNotNull: (String, String) => String,
  handleNull: String): String = {
if (arrayContainsNull) {
  if (setContainsNull) {
s"""
   |if ($array.isNullAt($index)) {
   |  if (!$hashSet.containsNull()) {
   |$hashSet.addNull();
   |$handleNull
   |  }
   |} else {
   |  ${handleNotNull(array, index)}
   |}
 """.stripMargin
  } else {
s"""
   |if (!$array.isNullAt($index)) {
   | ${handleNotNull(array, index)}
   |}
 """.stripMargin
  }
} else {
  handleNotNull(array, index)
}
  }
```
The code path of `  if (arrayContainsNull && setContainsNull) `  is 
misinterpreted that array's openHashSet really have a null value.

In this pr we add a new parameter `additionalCondition ` to complements the 
previous implementation of `foundNullElement`. Also refactor the method's 
parameter name.

### Why are the changes needed?
Fix data correct issue

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT

Closes #37436 from AngersZh/SPARK-39776-FOLLOW_UP.

Lead-authored-by: Angerszh 
Co-authored-by: AngersZh 
Signed-off-by: Wenchen Fan 
---
 .../expressions/collectionOperations.scala |  8 +++--
 .../org/apache/spark/sql/util/SQLOpenHashSet.scala |  8 ++---
 .../expressions/CollectionExpressionsSuite.scala   | 34 ++
 3 files changed, 43 insertions(+), 7 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index ae23775b62d..d6a9601f884 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala

[spark] branch branch-3.3 updated (221fee8973c -> 21d9db39b7a)

2022-08-11 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


from 221fee8973c [SPARK-40047][TEST] Exclude unused `xalan` transitive 
dependency from `htmlunit`
 add 21d9db39b7a [SPARK-39887][SQL][3.3] RemoveRedundantAliases should keep 
aliases that make the output of projection nodes unique

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |  26 +-
 .../RemoveRedundantAliasAndProjectSuite.scala  |   2 +-
 .../approved-plans-v1_4/q14a.sf100/explain.txt | 270 ++---
 .../approved-plans-v1_4/q14a/explain.txt   | 254 +--
 .../org/apache/spark/sql/DataFrameSuite.scala  |  61 +
 .../sql/execution/metric/SQLMetricsSuite.scala |   5 +-
 6 files changed, 347 insertions(+), 271 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39985][SQL] Enable implicit DEFAULT column values in inserts from DataFrames

2022-08-11 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 31a132bcaee [SPARK-39985][SQL] Enable implicit DEFAULT column values 
in inserts from DataFrames
31a132bcaee is described below

commit 31a132bcaeea2340d57515dd0f99406b06528cb4
Author: Daniel Tenedorio 
AuthorDate: Thu Aug 11 19:03:36 2022 -0700

[SPARK-39985][SQL] Enable implicit DEFAULT column values in inserts from 
DataFrames

### What changes were proposed in this pull request?

Enable implicit DEFAULT column values in inserts from DataFrames.

This mostly already worked since the DataFrame inserts already converted to 
LogicalPlans. I added testing and a small analysis change since the operators 
are resolved one-by-one instead of all at once.

Note that explicit column "default" references are not supported in write 
operations from DataFrames: since the operators are resolved one-by-one, any 
`.select` referring to "default" generates a "column not found" error before 
any following `.insertInto`.

### Why are the changes needed?

This makes inserts from DataFrames produce the same results as those from 
SQL commands, for consistency and correctness.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Extended the `InsertSuite` in this PR.

Closes #37423 from dtenedor/defaults-in-dataframes.

Authored-by: Daniel Tenedorio 
Signed-off-by: Gengliang Wang 
---
 .../catalyst/analysis/ResolveDefaultColumns.scala  |  39 ++-
 .../org/apache/spark/sql/sources/InsertSuite.scala | 311 +++--
 2 files changed, 259 insertions(+), 91 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala
index c340359f2ca..b7c7f0d3772 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala
@@ -83,6 +83,9 @@ case class ResolveDefaultColumns(catalog: SessionCatalog) 
extends Rule[LogicalPl
   case u: UnresolvedInlineTable
 if u.rows.nonEmpty && u.rows.forall(_.size == u.rows(0).size) =>
 true
+  case r: LocalRelation
+if r.data.nonEmpty && r.data.forall(_.numFields == 
r.data(0).numFields) =>
+true
   case _ =>
 false
 }
@@ -99,14 +102,13 @@ case class ResolveDefaultColumns(catalog: SessionCatalog) 
extends Rule[LogicalPl
   children.append(node)
   node = node.children(0)
 }
-val table = node.asInstanceOf[UnresolvedInlineTable]
 val insertTableSchemaWithoutPartitionColumns: Option[StructType] =
   getInsertTableSchemaWithoutPartitionColumns(i)
 insertTableSchemaWithoutPartitionColumns.map { schema: StructType =>
   val regenerated: InsertIntoStatement =
 regenerateUserSpecifiedCols(i, schema)
-  val expanded: UnresolvedInlineTable =
-addMissingDefaultValuesForInsertFromInlineTable(table, schema)
+  val expanded: LogicalPlan =
+addMissingDefaultValuesForInsertFromInlineTable(node, schema)
   val replaced: Option[LogicalPlan] =
 replaceExplicitDefaultValuesForInputOfInsertInto(schema, expanded)
   replaced.map { r: LogicalPlan =>
@@ -262,16 +264,33 @@ case class ResolveDefaultColumns(catalog: SessionCatalog) 
extends Rule[LogicalPl
* Updates an inline table to generate missing default column values.
*/
   private def addMissingDefaultValuesForInsertFromInlineTable(
-  table: UnresolvedInlineTable,
-  insertTableSchemaWithoutPartitionColumns: StructType): 
UnresolvedInlineTable = {
-val numQueryOutputs: Int = table.rows(0).size
+  node: LogicalPlan,
+  insertTableSchemaWithoutPartitionColumns: StructType): LogicalPlan = {
+val numQueryOutputs: Int = node match {
+  case table: UnresolvedInlineTable => table.rows(0).size
+  case local: LocalRelation => local.data(0).numFields
+}
 val schema = insertTableSchemaWithoutPartitionColumns
 val newDefaultExpressions: Seq[Expression] =
   getDefaultExpressionsForInsert(numQueryOutputs, schema)
 val newNames: Seq[String] = schema.fields.drop(numQueryOutputs).map { 
_.name }
-table.copy(
-  names = table.names ++ newNames,
-  rows = table.rows.map { row => row ++ newDefaultExpressions })
+node match {
+  case _ if newDefaultExpressions.isEmpty => node
+  case table: UnresolvedInlineTable =>
+table.copy(
+  names = table.names ++ newNames,
+  rows = table.rows.map { row => row ++ newDefaultExpressions })
+  case local:

[spark] branch master updated (7f3baa77acb -> dd49a775d54)

2022-08-11 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 7f3baa77acb [SPARK-40047][TEST] Exclude unused `xalan` transitive 
dependency from `htmlunit`
 add dd49a775d54 [SPARK-40029][PYTHON][DOC] Make pyspark.sql.types examples 
self-contained

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/types.py | 54 -
 1 file changed, 48 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from `htmlunit`

2022-08-11 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 45d42e17199 [SPARK-40047][TEST] Exclude unused `xalan` transitive 
dependency from `htmlunit`
45d42e17199 is described below

commit 45d42e1719933667981e7e490a2a21623501e6dd
Author: yangjie01 
AuthorDate: Thu Aug 11 15:10:42 2022 -0700

[SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from 
`htmlunit`

### What changes were proposed in this pull request?
This pr exclude `xalan` from `htmlunit` to clean warning of CVE-2022-34169:

```
Provides transitive vulnerable dependency xalan:xalan:2.7.2
CVE-2022-34169 7.5 Integer Coercion Error vulnerability with medium 
severity found
Results powered by Checkmarx(c)
```
`xalan:xalan:2.7.2` is the latest version, the code base has not been 
updated for 5 years, so can't solve by upgrading `xalan`.

### Why are the changes needed?
The vulnerability is described is 
[CVE-2022-34169](https://github.com/advisories/GHSA-9339-86wc-4qgf), better to 
exclude it although it's just test dependency for Spark.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

- Pass GitHub Actions
- Manual test:

run `mvn dependency:tree -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn 
-Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive | 
grep xalan` to check that `xalan` is not matched after this pr

Closes #37481 from LuciferYang/exclude-xalan.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 7f3baa77acbf7747963a95d0f24e3b8868c7b16a)
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/pom.xml b/pom.xml
index 9165d62fd18..81c5d5f6370 100644
--- a/pom.xml
+++ b/pom.xml
@@ -670,6 +670,12 @@
 net.sourceforge.htmlunit
 htmlunit
 ${htmlunit.version}
+
+  
+xalan
+xalan
+  
+
 test
   
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from `htmlunit`

2022-08-11 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 221fee8973c [SPARK-40047][TEST] Exclude unused `xalan` transitive 
dependency from `htmlunit`
221fee8973c is described below

commit 221fee8973ce438b089fae769dd054c47f6774ed
Author: yangjie01 
AuthorDate: Thu Aug 11 15:10:42 2022 -0700

[SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from 
`htmlunit`

### What changes were proposed in this pull request?
This pr exclude `xalan` from `htmlunit` to clean warning of CVE-2022-34169:

```
Provides transitive vulnerable dependency xalan:xalan:2.7.2
CVE-2022-34169 7.5 Integer Coercion Error vulnerability with medium 
severity found
Results powered by Checkmarx(c)
```
`xalan:xalan:2.7.2` is the latest version, the code base has not been 
updated for 5 years, so can't solve by upgrading `xalan`.

### Why are the changes needed?
The vulnerability is described is 
[CVE-2022-34169](https://github.com/advisories/GHSA-9339-86wc-4qgf), better to 
exclude it although it's just test dependency for Spark.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

- Pass GitHub Actions
- Manual test:

run `mvn dependency:tree -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn 
-Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive | 
grep xalan` to check that `xalan` is not matched after this pr

Closes #37481 from LuciferYang/exclude-xalan.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 7f3baa77acbf7747963a95d0f24e3b8868c7b16a)
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/pom.xml b/pom.xml
index 206cad9eb98..f639e5e5444 100644
--- a/pom.xml
+++ b/pom.xml
@@ -709,6 +709,12 @@
 net.sourceforge.htmlunit
 htmlunit
 ${htmlunit.version}
+
+  
+xalan
+xalan
+  
+
 test
   
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from `htmlunit`

2022-08-11 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7f3baa77acb [SPARK-40047][TEST] Exclude unused `xalan` transitive 
dependency from `htmlunit`
7f3baa77acb is described below

commit 7f3baa77acbf7747963a95d0f24e3b8868c7b16a
Author: yangjie01 
AuthorDate: Thu Aug 11 15:10:42 2022 -0700

[SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from 
`htmlunit`

### What changes were proposed in this pull request?
This pr exclude `xalan` from `htmlunit` to clean warning of CVE-2022-34169:

```
Provides transitive vulnerable dependency xalan:xalan:2.7.2
CVE-2022-34169 7.5 Integer Coercion Error vulnerability with medium 
severity found
Results powered by Checkmarx(c)
```
`xalan:xalan:2.7.2` is the latest version, the code base has not been 
updated for 5 years, so can't solve by upgrading `xalan`.

### Why are the changes needed?
The vulnerability is described is 
[CVE-2022-34169](https://github.com/advisories/GHSA-9339-86wc-4qgf), better to 
exclude it although it's just test dependency for Spark.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

- Pass GitHub Actions
- Manual test:

run `mvn dependency:tree -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn 
-Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive | 
grep xalan` to check that `xalan` is not matched after this pr

Closes #37481 from LuciferYang/exclude-xalan.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/pom.xml b/pom.xml
index c197987cd53..b6bbfef1854 100644
--- a/pom.xml
+++ b/pom.xml
@@ -712,6 +712,12 @@
 net.sourceforge.htmlunit
 htmlunit
 ${htmlunit.version}
+
+  
+xalan
+xalan
+  
+
 test
   
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39927][BUILD] Upgrade to Avro 1.11.1

2022-08-11 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4394e244bbd [SPARK-39927][BUILD] Upgrade to Avro 1.11.1
4394e244bbd is described below

commit 4394e244bbd50d0b625b373351d38508f4debf41
Author: Ismaël Mejía 
AuthorDate: Thu Aug 11 15:05:41 2022 -0700

[SPARK-39927][BUILD] Upgrade to Avro 1.11.1

### What changes were proposed in this pull request?
Update the Avro version to 1.11.1

### Why are the changes needed?
To stay up to date with upstream

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit tests

Closes #37352 from iemejia/SPARK-39927-avro-1.11.1.

Authored-by: Ismaël Mejía 
Signed-off-by: Dongjoon Hyun 
---
 .../avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala | 4 ++--
 .../avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala   | 5 ++---
 dev/deps/spark-deps-hadoop-2-hive-2.3   | 6 +++---
 dev/deps/spark-deps-hadoop-3-hive-2.3   | 6 +++---
 docs/sql-data-sources-avro.md   | 4 ++--
 pom.xml | 2 +-
 project/SparkBuild.scala| 2 +-
 .../scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala| 2 +-
 8 files changed, 15 insertions(+), 16 deletions(-)

diff --git 
a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala 
b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
index 3c68cbd537a..540420974f5 100644
--- a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
+++ b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
@@ -79,14 +79,14 @@ private[sql] class AvroOptions(
 
   /**
* Top level record name in write result, which is required in Avro spec.
-   * See https://avro.apache.org/docs/1.11.0/spec.html#schema_record .
+   * See https://avro.apache.org/docs/1.11.1/spec.html#schema_record .
* Default value is "topLevelRecord"
*/
   val recordName: String = parameters.getOrElse("recordName", "topLevelRecord")
 
   /**
* Record namespace in write result. Default value is "".
-   * See Avro spec for details: 
https://avro.apache.org/docs/1.11.0/spec.html#schema_record .
+   * See Avro spec for details: 
https://avro.apache.org/docs/1.11.1/spec.html#schema_record .
*/
   val recordNamespace: String = parameters.getOrElse("recordNamespace", "")
 
diff --git 
a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala 
b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
index 8a088a43579..4a1749533ab 100644
--- a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
+++ b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
@@ -875,7 +875,7 @@ abstract class AvroSuite
 dfWithNull.write.format("avro")
   .option("avroSchema", 
avroSchema).save(s"$tempDir/${UUID.randomUUID()}")
   }
-  assertExceptionMsg[AvroTypeException](e1, "Not an enum: null")
+  assertExceptionMsg[AvroTypeException](e1, "value null is not a 
SuitEnumType")
 
   // Writing df containing data not in the enum will throw an exception
   val e2 = intercept[SparkException] {
@@ -1075,8 +1075,7 @@ abstract class AvroSuite
   .save(s"$tempDir/${UUID.randomUUID()}")
   }.getCause.getMessage
   assert(message.contains("Caused by: java.lang.NullPointerException: "))
-  assert(message.contains(
-"null of string in string in field Name of test_schema in 
test_schema"))
+  assert(message.contains("null in string in field Name"))
 }
   }
 
diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 
b/dev/deps/spark-deps-hadoop-2-hive-2.3
index a86bbc52431..13e5e56ab0e 100644
--- a/dev/deps/spark-deps-hadoop-2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
@@ -23,9 +23,9 @@ arrow-memory-netty/9.0.0//arrow-memory-netty-9.0.0.jar
 arrow-vector/9.0.0//arrow-vector-9.0.0.jar
 audience-annotations/0.5.0//audience-annotations-0.5.0.jar
 automaton/1.11-8//automaton-1.11-8.jar
-avro-ipc/1.11.0//avro-ipc-1.11.0.jar
-avro-mapred/1.11.0//avro-mapred-1.11.0.jar
-avro/1.11.0//avro-1.11.0.jar
+avro-ipc/1.11.1//avro-ipc-1.11.1.jar
+avro-mapred/1.11.1//avro-mapred-1.11.1.jar
+avro/1.11.1//avro-1.11.1.jar
 azure-storage/2.0.0//azure-storage-2.0.0.jar
 blas/2.2.1//blas-2.2.1.jar
 bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 854919a9af6..c221e092806 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -22,9 +22,9 @@

[spark] branch master updated (a49f66fe49d -> 126870eddc5)

2022-08-11 Thread mridulm80

This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from a49f66fe49d [SPARK-38921][K8S][TESTS] Use k8s-client to create queue 
resource in Volcano IT
 add 126870eddc5 [SPARK-39955][CORE] Improve LaunchTask process to avoid 
Stage failures caused by fail-to-send LaunchTask messages

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/TaskInfo.scala  |  6 +++
 .../apache/spark/scheduler/TaskSchedulerImpl.scala |  3 ++
 .../apache/spark/scheduler/TaskSetManager.scala|  7 +++-
 .../spark/scheduler/TaskSchedulerImplSuite.scala   | 47 ++
 .../spark/scheduler/TaskSetManagerSuite.scala  | 12 ++
 5 files changed, 73 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-38921][K8S][TESTS] Use k8s-client to create queue resource in Volcano IT

2022-08-11 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a49f66fe49d [SPARK-38921][K8S][TESTS] Use k8s-client to create queue 
resource in Volcano IT
a49f66fe49d is described below

commit a49f66fe49d4d4bbfb41da2e5bbb5af4bd64d1da
Author: Yikun Jiang 
AuthorDate: Thu Aug 11 08:28:57 2022 -0700

[SPARK-38921][K8S][TESTS] Use k8s-client to create queue resource in 
Volcano IT

### What changes were proposed in this pull request?
Use fabric8io/k8s-client to create queue resource in Volcano IT.

### Why are the changes needed?
Use k8s-client to create volcano queue to
- Make code easy to understand
- Enable abity to set queue capacity dynamically. This will help to support 
running Volcano test in a resource limited env (such as github action).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Volcano IT passed

Closes #36219 from Yikun/SPARK-38921.

Authored-by: Yikun Jiang 
Signed-off-by: Dongjoon Hyun 
---
 .../src/test/resources/volcano/disable-queue.yml   | 24 ---
 .../volcano/disable-queue0-enable-queue1.yml   | 31 -
 .../volcano/driver-podgroup-template-cpu-2u.yml|  2 +-
 .../volcano/driver-podgroup-template-memory-3g.yml |  2 +-
 .../src/test/resources/volcano/enable-queue.yml| 24 ---
 .../volcano/enable-queue0-enable-queue1.yml| 29 -
 .../src/test/resources/volcano/queue-2u-3g.yml | 25 
 .../k8s/integrationtest/VolcanoTestsSuite.scala| 74 +++---
 8 files changed, 52 insertions(+), 159 deletions(-)

diff --git 
a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue.yml
 
b/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue.yml
deleted file mode 100644
index d9f8c36471e..000
--- 
a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue.yml
+++ /dev/null
@@ -1,24 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-apiVersion: scheduling.volcano.sh/v1beta1
-kind: Queue
-metadata:
-  name: queue
-spec:
-  weight: 1
-  capability:
-cpu: "0.001"
diff --git 
a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue0-enable-queue1.yml
 
b/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue0-enable-queue1.yml
deleted file mode 100644
index 82e479478cc..000
--- 
a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue0-enable-queue1.yml
+++ /dev/null
@@ -1,31 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-apiVersion: scheduling.volcano.sh/v1beta1
-kind: Queue
-metadata:
-  name: queue0
-spec:
-  weight: 1
-  capability:
-cpu: "0.001"

-apiVersion: scheduling.volcano.sh/v1beta1
-kind: Queue
-metadata:
-  name: queue1
-spec:
-  weight: 1
diff --git 
a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/driver-podgroup-template-cpu-2u.yml
 
b/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/driver-podgroup-template-cpu-2u.yml
index e6d53ddc8b5..4a784f0f864 100644
---

[spark] branch master updated (71792411083 -> 9dff034bdef)

2022-08-11 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 71792411083 [SPARK-40027][PYTHON][SS][DOCS] Add self-contained 
examples for pyspark.sql.streaming.readwriter
 add 9dff034bdef [SPARK-39966][SQL] Use V2 Filter in SupportsDelete

No new revisions were added by this update.

Summary of changes:
 .../sql/connector/catalog/SupportsDelete.java  |  14 +-
 .../{SupportsDelete.java => SupportsDeleteV2.java} |  32 +-
 .../connector/read/SupportsRuntimeFiltering.java   |  19 +-
 .../catalyst/analysis/RewriteDeleteFromTable.scala |   6 +-
 .../sql/catalyst/plans/logical/v2Commands.scala|   5 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  |   3 +-
 .../datasources/v2/DataSourceV2Implicits.scala |   6 +-
 .../sql/internal/connector/PredicateUtils.scala|   8 +-
 ...InMemoryTable.scala => InMemoryBaseTable.scala} |  38 +-
 .../sql/connector/catalog/InMemoryTable.scala  | 646 +
 .../catalog/InMemoryTableWithV2Filter.scala|  81 ++-
 .../datasources/v2/DataSourceV2Strategy.scala  |   8 +-
 .../datasources/v2/DeleteFromTableExec.scala   |   8 +-
 .../v2/OptimizeMetadataOnlyDeleteFromTable.scala   |  12 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |  89 ++-
 .../spark/sql/connector/DatasourceV2SQLBase.scala  |   4 +-
 .../spark/sql/connector/V1WriteFallbackSuite.scala |   4 +-
 17 files changed, 265 insertions(+), 718 deletions(-)
 copy 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/{SupportsDelete.java
 => SupportsDeleteV2.java} (74%)
 copy 
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/{InMemoryTable.scala
 => InMemoryBaseTable.scala} (94%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-40027][PYTHON][SS][DOCS] Add self-contained examples for pyspark.sql.streaming.readwriter

2022-08-11 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 71792411083 [SPARK-40027][PYTHON][SS][DOCS] Add self-contained 
examples for pyspark.sql.streaming.readwriter
71792411083 is described below

commit 71792411083a71bcfd7a0d94ddf754bf09a27054
Author: Hyukjin Kwon 
AuthorDate: Thu Aug 11 20:19:24 2022 +0900

[SPARK-40027][PYTHON][SS][DOCS] Add self-contained examples for 
pyspark.sql.streaming.readwriter

### What changes were proposed in this pull request?

This PR proposes to improve the examples in 
`pyspark.sql.streaming.readwriter` by making each example self-contained with a 
brief explanation and a bit more realistic example.

### Why are the changes needed?

To make the documentation more readable and able to copy and paste directly 
in PySpark shell.

### Does this PR introduce _any_ user-facing change?

Yes, it changes the documentation

### How was this patch tested?

Manually ran each doctest.

Closes #37461 from HyukjinKwon/SPARK-40027.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/streaming/readwriter.py | 441 +++--
 1 file changed, 357 insertions(+), 84 deletions(-)

diff --git a/python/pyspark/sql/streaming/readwriter.py 
b/python/pyspark/sql/streaming/readwriter.py
index 74b89dbe46c..ef3b7e525e3 100644
--- a/python/pyspark/sql/streaming/readwriter.py
+++ b/python/pyspark/sql/streaming/readwriter.py
@@ -24,7 +24,7 @@ from py4j.java_gateway import java_import, JavaObject
 from pyspark.sql.column import _to_seq
 from pyspark.sql.readwriter import OptionUtils, to_str
 from pyspark.sql.streaming.query import StreamingQuery
-from pyspark.sql.types import Row, StructType, StructField, StringType
+from pyspark.sql.types import Row, StructType
 from pyspark.sql.utils import ForeachBatchFunction
 
 if TYPE_CHECKING:
@@ -46,6 +46,22 @@ class DataStreamReader(OptionUtils):
 Notes
 -
 This API is evolving.
+
+Examples
+
+>>> spark.readStream
+
+
+The example below uses Rate source that generates rows continously.
+After that, we operate a modulo by 3, and then writes the stream out to 
the console.
+The streaming query stops in 3 seconds.
+
+>>> import time
+>>> df = spark.readStream.format("rate").load()
+>>> df = df.selectExpr("value % 3 as v")
+>>> q = df.writeStream.format("console").start()
+>>> time.sleep(3)
+>>> q.stop()
 """
 
 def __init__(self, spark: "SparkSession") -> None:
@@ -73,7 +89,23 @@ class DataStreamReader(OptionUtils):
 
 Examples
 
->>> s = spark.readStream.format("text")
+>>> spark.readStream.format("text")
+
+
+This API allows to configure other sources to read. The example below 
writes a small text
+file, and reads it back via Text source.
+
+>>> import tempfile
+>>> import time
+>>> with tempfile.TemporaryDirectory() as d:
+... # Write a temporary text file to read it.
+... spark.createDataFrame(
+... [("hello",), 
("this",)]).write.mode("overwrite").format("text").save(d)
+...
+... # Start a streaming query to read the text file.
+... q = 
spark.readStream.format("text").load(d).writeStream.format("console").start()
+... time.sleep(3)
+... q.stop()
 """
 self._jreader = self._jreader.format(source)
 return self
@@ -99,8 +131,22 @@ class DataStreamReader(OptionUtils):
 
 Examples
 
->>> s = spark.readStream.schema(sdf_schema)
->>> s = spark.readStream.schema("col0 INT, col1 DOUBLE")
+>>> from pyspark.sql.types import StructField, StructType, StringType
+>>> spark.readStream.schema(StructType([StructField("data", 
StringType(), True)]))
+
+>>> spark.readStream.schema("col0 INT, col1 DOUBLE")
+
+
+The example below specifies a different schema to CSV file.
+
+>>> import tempfile
+>>> import time
+>>> with tempfile.TemporaryDirectory() as d:
+... # Start a streaming query to read the CSV file.
+... spark.readStream.schema("col0 INT, col1 
STRING").format("csv").load(d).printSchema()
+root
+ |-- col0: integer (nullable = true)
+ |-- col1: string (nullable = true)
 """
 from pyspark.sql import SparkSession
 
@@ -125,7 +171,17 @@ class DataStreamReader(OptionUtils):
 
 Examples
 
->>> s = spark.readStream.option("x", 1)
+>>> spark.readStream.option("x", 1)
+
+
+The example below specifies 'rowsPerSecond' option to Rate source

[spark] branch branch-3.1 updated: [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table

2022-08-11 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 624277640ff [SPARK-40043][PYTHON][SS][DOCS] Document 
DataStreamWriter.toTable and DataStreamReader.table
624277640ff is described below

commit 624277640ffcf0ce6bff76179126ccb8f9340ca2
Author: Hyukjin Kwon 
AuthorDate: Thu Aug 11 15:01:05 2022 +0900

[SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and 
DataStreamReader.table

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/30835 that 
adds `DataStreamWriter.toTable` and `DataStreamReader.table` into PySpark 
documentation.

### Why are the changes needed?

To document both features.

### Does this PR introduce _any_ user-facing change?

Yes, both API will be shown in PySpark reference documentation.

### How was this patch tested?

Manually built the documentation and checked.

Closes #37477 from HyukjinKwon/SPARK-40043.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 447003324d2cf9f2bfa799ef3a1e744a5bc9277d)
Signed-off-by: Hyukjin Kwon 
---
 python/docs/source/reference/pyspark.ss.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/python/docs/source/reference/pyspark.ss.rst 
b/python/docs/source/reference/pyspark.ss.rst
index a7936a4f2a5..c3e532646ac 100644
--- a/python/docs/source/reference/pyspark.ss.rst
+++ b/python/docs/source/reference/pyspark.ss.rst
@@ -52,6 +52,7 @@ Input and Output
 DataStreamReader.orc
 DataStreamReader.parquet
 DataStreamReader.schema
+DataStreamReader.table
 DataStreamReader.text
 DataStreamWriter.foreach
 DataStreamWriter.foreachBatch
@@ -62,6 +63,7 @@ Input and Output
 DataStreamWriter.partitionBy
 DataStreamWriter.queryName
 DataStreamWriter.start
+DataStreamWriter.toTable
 DataStreamWriter.trigger
 
 Query Management


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table

2022-08-11 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 0f3107d5f5b [SPARK-40043][PYTHON][SS][DOCS] Document 
DataStreamWriter.toTable and DataStreamReader.table
0f3107d5f5b is described below

commit 0f3107d5f5bdcc3887aa2df05e99c121a76bd07e
Author: Hyukjin Kwon 
AuthorDate: Thu Aug 11 15:01:05 2022 +0900

[SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and 
DataStreamReader.table

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/30835 that 
adds `DataStreamWriter.toTable` and `DataStreamReader.table` into PySpark 
documentation.

### Why are the changes needed?

To document both features.

### Does this PR introduce _any_ user-facing change?

Yes, both API will be shown in PySpark reference documentation.

### How was this patch tested?

Manually built the documentation and checked.

Closes #37477 from HyukjinKwon/SPARK-40043.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 447003324d2cf9f2bfa799ef3a1e744a5bc9277d)
Signed-off-by: Hyukjin Kwon 
---
 python/docs/source/reference/pyspark.ss.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/python/docs/source/reference/pyspark.ss.rst 
b/python/docs/source/reference/pyspark.ss.rst
index a7936a4f2a5..c3e532646ac 100644
--- a/python/docs/source/reference/pyspark.ss.rst
+++ b/python/docs/source/reference/pyspark.ss.rst
@@ -52,6 +52,7 @@ Input and Output
 DataStreamReader.orc
 DataStreamReader.parquet
 DataStreamReader.schema
+DataStreamReader.table
 DataStreamReader.text
 DataStreamWriter.foreach
 DataStreamWriter.foreachBatch
@@ -62,6 +63,7 @@ Input and Output
 DataStreamWriter.partitionBy
 DataStreamWriter.queryName
 DataStreamWriter.start
+DataStreamWriter.toTable
 DataStreamWriter.trigger
 
 Query Management


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table

2022-08-11 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 248e8b49b11 [SPARK-40043][PYTHON][SS][DOCS] Document 
DataStreamWriter.toTable and DataStreamReader.table
248e8b49b11 is described below

commit 248e8b49b114d725e7a94bc8193f371b89270af7
Author: Hyukjin Kwon 
AuthorDate: Thu Aug 11 15:01:05 2022 +0900

[SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and 
DataStreamReader.table

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/30835 that 
adds `DataStreamWriter.toTable` and `DataStreamReader.table` into PySpark 
documentation.

### Why are the changes needed?

To document both features.

### Does this PR introduce _any_ user-facing change?

Yes, both API will be shown in PySpark reference documentation.

### How was this patch tested?

Manually built the documentation and checked.

Closes #37477 from HyukjinKwon/SPARK-40043.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 447003324d2cf9f2bfa799ef3a1e744a5bc9277d)
Signed-off-by: Hyukjin Kwon 
---
 python/docs/source/reference/pyspark.ss/io.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/python/docs/source/reference/pyspark.ss/io.rst 
b/python/docs/source/reference/pyspark.ss/io.rst
index da476fb6fac..7a20777fdc7 100644
--- a/python/docs/source/reference/pyspark.ss/io.rst
+++ b/python/docs/source/reference/pyspark.ss/io.rst
@@ -34,6 +34,7 @@ Input/Output
 DataStreamReader.orc
 DataStreamReader.parquet
 DataStreamReader.schema
+DataStreamReader.table
 DataStreamReader.text
 DataStreamWriter.foreach
 DataStreamWriter.foreachBatch
@@ -44,4 +45,5 @@ Input/Output
 DataStreamWriter.partitionBy
 DataStreamWriter.queryName
 DataStreamWriter.start
+DataStreamWriter.toTable
 DataStreamWriter.trigger


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table

2022-08-11 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 447003324d2 [SPARK-40043][PYTHON][SS][DOCS] Document 
DataStreamWriter.toTable and DataStreamReader.table
447003324d2 is described below

commit 447003324d2cf9f2bfa799ef3a1e744a5bc9277d
Author: Hyukjin Kwon 
AuthorDate: Thu Aug 11 15:01:05 2022 +0900

[SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and 
DataStreamReader.table

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/30835 that 
adds `DataStreamWriter.toTable` and `DataStreamReader.table` into PySpark 
documentation.

### Why are the changes needed?

To document both features.

### Does this PR introduce _any_ user-facing change?

Yes, both API will be shown in PySpark reference documentation.

### How was this patch tested?

Manually built the documentation and checked.

Closes #37477 from HyukjinKwon/SPARK-40043.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/docs/source/reference/pyspark.ss/io.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/python/docs/source/reference/pyspark.ss/io.rst 
b/python/docs/source/reference/pyspark.ss/io.rst
index da476fb6fac..7a20777fdc7 100644
--- a/python/docs/source/reference/pyspark.ss/io.rst
+++ b/python/docs/source/reference/pyspark.ss/io.rst
@@ -34,6 +34,7 @@ Input/Output
 DataStreamReader.orc
 DataStreamReader.parquet
 DataStreamReader.schema
+DataStreamReader.table
 DataStreamReader.text
 DataStreamWriter.foreach
 DataStreamWriter.foreachBatch
@@ -44,4 +45,5 @@ Input/Output
 DataStreamWriter.partitionBy
 DataStreamWriter.queryName
 DataStreamWriter.start
+DataStreamWriter.toTable
 DataStreamWriter.trigger


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (dff5c2f2e9c -> d4c58159925)

[spark] branch branch-3.3 updated: [SPARK-39976][SQL] ArrayIntersect should handle null in left expression correctly

[spark] branch master updated: [SPARK-39976][SQL] ArrayIntersect should handle null in left expression correctly

[spark] branch branch-3.3 updated (221fee8973c -> 21d9db39b7a)

[spark] branch master updated: [SPARK-39985][SQL] Enable implicit DEFAULT column values in inserts from DataFrames

[spark] branch master updated (7f3baa77acb -> dd49a775d54)

[spark] branch branch-3.2 updated: [SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from `htmlunit`

[spark] branch branch-3.3 updated: [SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from `htmlunit`

[spark] branch master updated: [SPARK-40047][TEST] Exclude unused `xalan` transitive dependency from `htmlunit`

[spark] branch master updated: [SPARK-39927][BUILD] Upgrade to Avro 1.11.1

[spark] branch master updated (a49f66fe49d -> 126870eddc5)

[spark] branch master updated: [SPARK-38921][K8S][TESTS] Use k8s-client to create queue resource in Volcano IT

[spark] branch master updated (71792411083 -> 9dff034bdef)

[spark] branch master updated: [SPARK-40027][PYTHON][SS][DOCS] Add self-contained examples for pyspark.sql.streaming.readwriter

[spark] branch branch-3.1 updated: [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table

[spark] branch branch-3.2 updated: [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table

[spark] branch branch-3.3 updated: [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table

[spark] branch master updated: [SPARK-40043][PYTHON][SS][DOCS] Document DataStreamWriter.toTable and DataStreamReader.table

18 matches

Site Navigation

Mail list logo

Footer information