[spark] branch master updated (2dc0527 -> e56f865)

2022-02-24 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2dc0527  [SPARK-38322][SQL] Support query stage show runtime 
statistics in formatted explain mode
 add e56f865  [SPARK-38316][SQL][TESTS] Fix 
SQLViewSuite/TriggerAvailableNowSuite/Unwrap*Suite under ANSI mode

No new revisions were added by this update.

Summary of changes:
 .../UnwrapCastInBinaryComparisonSuite.scala|  4 +++-
 .../sql/UnwrapCastInComparisonEndToEndSuite.scala  | 27 --
 .../apache/spark/sql/execution/SQLViewSuite.scala  |  8 +--
 .../spark/sql/execution/SQLViewTestSuite.scala | 12 ++
 .../sql/streaming/TriggerAvailableNowSuite.scala   |  2 +-
 5 files changed, 32 insertions(+), 21 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-38322][SQL] Support query stage show runtime statistics in formatted explain mode

2022-02-24 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2dc0527  [SPARK-38322][SQL] Support query stage show runtime 
statistics in formatted explain mode
2dc0527 is described below

commit 2dc0527fb6462b6849d3c53c6d83392a8e37cdcc
Author: ulysses-you 
AuthorDate: Fri Feb 25 14:59:10 2022 +0800

[SPARK-38322][SQL] Support query stage show runtime statistics in formatted 
explain mode

### What changes were proposed in this pull request?

Add query stage statistics information in formatted explain mode.

### Why are the changes needed?

The formatted explalin mode is the powerful explain mode to show the 
details of query plan. In AQE, the query stage know its statistics if has 
already materialized. So it can help to quick check the conversion of plan, 
e.g. join selection.

A simple example:
```sql
SELECT * FROM t JOIN t2 ON t.c = t2.c;
```
```sql
== Physical Plan ==
AdaptiveSparkPlan (21)
+- == Final Plan ==
   * SortMergeJoin Inner (13)
   :- * Sort (6)
   :  +- AQEShuffleRead (5)
   : +- ShuffleQueryStage (4), Statistics(sizeInBytes=16.0 B, 
rowCount=1)
   :+- Exchange (3)
   :   +- * Filter (2)
   :  +- Scan hive default.t (1)
   +- * Sort (12)
  +- AQEShuffleRead (11)
 +- ShuffleQueryStage (10), Statistics(sizeInBytes=16.0 B, 
rowCount=1)
+- Exchange (9)
   +- * Filter (8)
  +- Scan hive default.t2 (7)
+- == Initial Plan ==
   SortMergeJoin Inner (20)
   :- Sort (16)
   :  +- Exchange (15)
   : +- Filter (14)
   :+- Scan hive default.t (1)
   +- Sort (19)
  +- Exchange (18)
 +- Filter (17)
+- Scan hive default.t2 (7)
```

### Does this PR introduce _any_ user-facing change?

no, only change the output of explain in AQE

### How was this patch tested?

Add test

Closes #35658 from ulysses-you/exchange-statistics.

Authored-by: ulysses-you 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/execution/adaptive/QueryStageExec.scala|  4 
 .../test/scala/org/apache/spark/sql/ExplainSuite.scala   | 16 
 2 files changed, 20 insertions(+)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
index e2f763e..ac1968d 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
@@ -124,6 +124,10 @@ abstract class QueryStageExec extends LeafExecNode {
 
   protected override def stringArgs: Iterator[Any] = Iterator.single(id)
 
+  override def simpleStringWithNodeId(): String = {
+super.simpleStringWithNodeId() + computeStats().map(", " + 
_.toString).getOrElse("")
+  }
+
   override def generateTreeString(
   depth: Int,
   lastChildren: Seq[Boolean],
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
index 67240c5..a5403ec 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
@@ -735,6 +735,22 @@ class ExplainSuiteAE extends ExplainSuiteHelper with 
EnableAdaptiveExecutionSuit
   }
 }
   }
+
+  test("SPARK-38322: Support query stage show runtime statistics in formatted 
explain mode") {
+val df = Seq(1, 2).toDF("c").distinct()
+val statistics = "Statistics(sizeInBytes=32.0 B, rowCount=2)"
+
+checkKeywordsNotExistsInExplain(
+  df,
+  FormattedMode,
+  statistics)
+
+df.collect()
+checkKeywordsExistsInExplain(
+  df,
+  FormattedMode,
+  statistics)
+  }
 }
 
 case class ExplainSingleData(id: Int)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (6a79539 -> 95f06f3)

2022-02-24 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6a79539  [SPARK-38298][SQL][TESTS] Fix DataExpressionSuite, 
NullExpressionsSuite, StringExpressionsSuite, complexTypesSuite, CastSuite 
under ANSI mode
 add 95f06f3  [SPARK-37614][SQL] Support ANSI Aggregate Function: regr_avgx 
& regr_avgy

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/FunctionRegistry.scala   |   2 +
 .../catalyst/expressions/aggregate/RegrCount.scala |  47 -
 .../expressions/aggregate/linearRegression.scala   | 107 +
 .../sql-functions/sql-expression-schema.md |  14 +--
 .../test/resources/sql-tests/inputs/group-by.sql   |   6 ++
 .../inputs/postgreSQL/aggregates_part1.sql |   2 +-
 .../inputs/udf/postgreSQL/udf-aggregates_part1.sql |   2 +-
 .../resources/sql-tests/results/group-by.sql.out   |  36 ++-
 .../results/postgreSQL/aggregates_part1.sql.out|  10 +-
 .../udf/postgreSQL/udf-aggregates_part1.sql.out|  10 +-
 10 files changed, 178 insertions(+), 58 deletions(-)
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/RegrCount.scala
 create mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/linearRegression.scala

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (860f44f -> 6a79539)

2022-02-24 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 860f44f  [SPARK-38311][SQL] Fix 
DynamicPartitionPruning/BucketedReadSuite/ExpressionInfoSuite under ANSI mode
 add 6a79539  [SPARK-38298][SQL][TESTS] Fix DataExpressionSuite, 
NullExpressionsSuite, StringExpressionsSuite, complexTypesSuite, CastSuite 
under ANSI mode

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/CastSuite.scala  |  9 +
 .../catalyst/expressions/DateExpressionsSuite.scala |  4 +++-
 .../catalyst/expressions/NullExpressionsSuite.scala | 21 ++---
 .../expressions/StringExpressionsSuite.scala|  9 ++---
 .../sql/catalyst/optimizer/complexTypesSuite.scala  |  8 ++--
 5 files changed, 38 insertions(+), 13 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b8b1fbc -> 860f44f)

2022-02-24 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b8b1fbc  [SPARK-38275][SS] Include the writeBatch's memory usage as 
the total memory usage of RocksDB state store
 add 860f44f  [SPARK-38311][SQL] Fix 
DynamicPartitionPruning/BucketedReadSuite/ExpressionInfoSuite under ANSI mode

No new revisions were added by this update.

Summary of changes:
 .../catalyst/expressions/collectionOperations.scala   |  2 --
 .../catalyst/expressions/datetimeExpressions.scala| 19 ++-
 .../spark/sql/catalyst/expressions/predicates.scala   |  2 +-
 .../spark/sql/DynamicPartitionPruningSuite.scala  |  3 ++-
 .../apache/spark/sql/sources/BucketedReadSuite.scala  |  2 +-
 5 files changed, 10 insertions(+), 18 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (9758d55 -> b8b1fbc)

2022-02-24 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9758d55  [SPARK-38303][BUILD] Upgrade `ansi-regex` from 5.0.0 to 5.0.1 
in /dev
 add b8b1fbc  [SPARK-38275][SS] Include the writeBatch's memory usage as 
the total memory usage of RocksDB state store

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/streaming/state/RocksDB.scala  | 8 ++--
 .../sql/execution/streaming/state/RocksDBStateStoreProvider.scala | 2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated: [SPARK-38303][BUILD] Upgrade `ansi-regex` from 5.0.0 to 5.0.1 in /dev

2022-02-24 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 637a69f  [SPARK-38303][BUILD] Upgrade `ansi-regex` from 5.0.0 to 5.0.1 
in /dev
637a69f is described below

commit 637a69f349d01199db8af7331a22d2b9154cb50e
Author: bjornjorgensen 
AuthorDate: Fri Feb 25 11:43:36 2022 +0900

[SPARK-38303][BUILD] Upgrade `ansi-regex` from 5.0.0 to 5.0.1 in /dev

### What changes were proposed in this pull request?
Upgrade ansi-regex from 5.0.0 to 5.0.1 in /dev

### Why are the changes needed?

[CVE-2021-3807](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-3807)

[Releases notes at github](https://github.com/chalk/ansi-regex/releases)

By upgrading ansi-regex from 5.0.0 to 5.0.1 we will resolve this issue.

### Does this PR introduce _any_ user-facing change?
Some users use remote security scanners and this is one of the issues that 
comes up. How this can do some damage with spark is highly uncertain. but let's 
remove the uncertainty that any user may have.

### How was this patch tested?
All test must pass.

Closes #35628 from bjornjorgensen/ansi-regex-from-5.0.0-to-5.0.1.

Authored-by: bjornjorgensen 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit 9758d55918dfec236e8ac9f1655a9ff0acd7156e)
Signed-off-by: Kousuke Saruta 
---
 dev/package-lock.json | 3189 ++---
 dev/package.json  |3 +-
 2 files changed, 2229 insertions(+), 963 deletions(-)

diff --git a/dev/package-lock.json b/dev/package-lock.json
index a57f45b..c2a61b3 100644
--- a/dev/package-lock.json
+++ b/dev/package-lock.json
@@ -1,979 +1,2244 @@
 {
-"requires": true,
-"lockfileVersion": 1,
-"dependencies": {
-"@babel/code-frame": {
-"version": "7.12.11",
-"resolved": 
"https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.12.11.tgz";,
-"integrity": 
"sha512-Zt1yodBx1UcyiePMSkWnU4hPqhwq7hGi2nFL1LeA3EUl+q2LQx16MISgJ0+z7dnmgvP9QtIleuETGOiOH1RcIw==",
-"dev": true,
-"requires": {
-"@babel/highlight": "^7.10.4"
-}
-},
-"@babel/helper-validator-identifier": {
-"version": "7.14.0",
-"resolved": 
"https://registry.npmjs.org/@babel/helper-validator-identifier/-/helper-validator-identifier-7.14.0.tgz";,
-"integrity": 
"sha512-V3ts7zMSu5lfiwWDVWzRDGIN+lnCEUdaXgtVHJgLb1rGaA6jMrtB9EmE7L18foXJIE8Un/A/h6NJfGQp/e1J4A==",
-"dev": true
-},
-"@babel/highlight": {
-"version": "7.14.0",
-"resolved": 
"https://registry.npmjs.org/@babel/highlight/-/highlight-7.14.0.tgz";,
-"integrity": 
"sha512-YSCOwxvTYEIMSGaBQb5kDDsCopDdiUGsqpatp3fOlI4+2HQSkTmEVWnVuySdAC5EWCqSWWTv0ib63RjR7dTBdg==",
-"dev": true,
-"requires": {
-"@babel/helper-validator-identifier": "^7.14.0",
-"chalk": "^2.0.0",
-"js-tokens": "^4.0.0"
-},
-"dependencies": {
-"chalk": {
-"version": "2.4.2",
-"resolved": 
"https://registry.npmjs.org/chalk/-/chalk-2.4.2.tgz";,
-"integrity": 
"sha512-Mti+f9lpJNcwF4tWV8/OrTTtF1gZi+f8FqlyAdouralcFWFQWF2+NgCHShjkCb+IFBLq9buZwE1xckQU4peSuQ==",
-"dev": true,
-"requires": {
-"ansi-styles": "^3.2.1",
-"escape-string-regexp": "^1.0.5",
-"supports-color": "^5.3.0"
-}
-}
-}
-},
-"@eslint/eslintrc": {
-"version": "0.4.0",
-"resolved": 
"https://registry.npmjs.org/@eslint/eslintrc/-/eslintrc-0.4.0.tgz";,
-"integrity": 
"sha512-2ZPCc+uNbjV5ERJr+aKSPRwZgKd2z11x0EgLvb1PURmUrn9QNRXFqje0Ldq454PfAVyaJYyrDvvIKSFP4NnBog==",
-"dev": true,
-"requires": {
-"ajv": "^6.12.4",
-"debug": "^4.1.1",
-"espree": "^7.3.0",
-"globals": "^12.1.0",
-"ignore": "^4.0.6",
-"import-fresh": "^3.2.1",
-"js-yaml": "^3.13.1",
-"minimatch": "^3.0.4",
-"strip-json-comments": "^3.1.1"
-},
-"dependencies": {
-"globals": {
-"version": "12.4.0",
-"resolved": 
"https://registry.npmjs.org/globals/-/globals-12.4.0.tgz";,
-"integrity": 
"sha512-BWICuzzDvDoH54NHKCseDanAhE3CeDorgDL5MT6LMXXj2WCnd9UC2szdk4AWLfjdgNBCXLUanXYcpBBKOSWGwg==",
-"dev": true,
-"requires": {
-

[spark] branch master updated: [SPARK-38303][BUILD] Upgrade `ansi-regex` from 5.0.0 to 5.0.1 in /dev

2022-02-24 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9758d55  [SPARK-38303][BUILD] Upgrade `ansi-regex` from 5.0.0 to 5.0.1 
in /dev
9758d55 is described below

commit 9758d55918dfec236e8ac9f1655a9ff0acd7156e
Author: bjornjorgensen 
AuthorDate: Fri Feb 25 11:43:36 2022 +0900

[SPARK-38303][BUILD] Upgrade `ansi-regex` from 5.0.0 to 5.0.1 in /dev

### What changes were proposed in this pull request?
Upgrade ansi-regex from 5.0.0 to 5.0.1 in /dev

### Why are the changes needed?

[CVE-2021-3807](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-3807)

[Releases notes at github](https://github.com/chalk/ansi-regex/releases)

By upgrading ansi-regex from 5.0.0 to 5.0.1 we will resolve this issue.

### Does this PR introduce _any_ user-facing change?
Some users use remote security scanners and this is one of the issues that 
comes up. How this can do some damage with spark is highly uncertain. but let's 
remove the uncertainty that any user may have.

### How was this patch tested?
All test must pass.

Closes #35628 from bjornjorgensen/ansi-regex-from-5.0.0-to-5.0.1.

Authored-by: bjornjorgensen 
Signed-off-by: Kousuke Saruta 
---
 dev/package-lock.json | 3189 ++---
 dev/package.json  |3 +-
 2 files changed, 2229 insertions(+), 963 deletions(-)

diff --git a/dev/package-lock.json b/dev/package-lock.json
index a57f45b..c2a61b3 100644
--- a/dev/package-lock.json
+++ b/dev/package-lock.json
@@ -1,979 +1,2244 @@
 {
-"requires": true,
-"lockfileVersion": 1,
-"dependencies": {
-"@babel/code-frame": {
-"version": "7.12.11",
-"resolved": 
"https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.12.11.tgz";,
-"integrity": 
"sha512-Zt1yodBx1UcyiePMSkWnU4hPqhwq7hGi2nFL1LeA3EUl+q2LQx16MISgJ0+z7dnmgvP9QtIleuETGOiOH1RcIw==",
-"dev": true,
-"requires": {
-"@babel/highlight": "^7.10.4"
-}
-},
-"@babel/helper-validator-identifier": {
-"version": "7.14.0",
-"resolved": 
"https://registry.npmjs.org/@babel/helper-validator-identifier/-/helper-validator-identifier-7.14.0.tgz";,
-"integrity": 
"sha512-V3ts7zMSu5lfiwWDVWzRDGIN+lnCEUdaXgtVHJgLb1rGaA6jMrtB9EmE7L18foXJIE8Un/A/h6NJfGQp/e1J4A==",
-"dev": true
-},
-"@babel/highlight": {
-"version": "7.14.0",
-"resolved": 
"https://registry.npmjs.org/@babel/highlight/-/highlight-7.14.0.tgz";,
-"integrity": 
"sha512-YSCOwxvTYEIMSGaBQb5kDDsCopDdiUGsqpatp3fOlI4+2HQSkTmEVWnVuySdAC5EWCqSWWTv0ib63RjR7dTBdg==",
-"dev": true,
-"requires": {
-"@babel/helper-validator-identifier": "^7.14.0",
-"chalk": "^2.0.0",
-"js-tokens": "^4.0.0"
-},
-"dependencies": {
-"chalk": {
-"version": "2.4.2",
-"resolved": 
"https://registry.npmjs.org/chalk/-/chalk-2.4.2.tgz";,
-"integrity": 
"sha512-Mti+f9lpJNcwF4tWV8/OrTTtF1gZi+f8FqlyAdouralcFWFQWF2+NgCHShjkCb+IFBLq9buZwE1xckQU4peSuQ==",
-"dev": true,
-"requires": {
-"ansi-styles": "^3.2.1",
-"escape-string-regexp": "^1.0.5",
-"supports-color": "^5.3.0"
-}
-}
-}
-},
-"@eslint/eslintrc": {
-"version": "0.4.0",
-"resolved": 
"https://registry.npmjs.org/@eslint/eslintrc/-/eslintrc-0.4.0.tgz";,
-"integrity": 
"sha512-2ZPCc+uNbjV5ERJr+aKSPRwZgKd2z11x0EgLvb1PURmUrn9QNRXFqje0Ldq454PfAVyaJYyrDvvIKSFP4NnBog==",
-"dev": true,
-"requires": {
-"ajv": "^6.12.4",
-"debug": "^4.1.1",
-"espree": "^7.3.0",
-"globals": "^12.1.0",
-"ignore": "^4.0.6",
-"import-fresh": "^3.2.1",
-"js-yaml": "^3.13.1",
-"minimatch": "^3.0.4",
-"strip-json-comments": "^3.1.1"
-},
-"dependencies": {
-"globals": {
-"version": "12.4.0",
-"resolved": 
"https://registry.npmjs.org/globals/-/globals-12.4.0.tgz";,
-"integrity": 
"sha512-BWICuzzDvDoH54NHKCseDanAhE3CeDorgDL5MT6LMXXj2WCnd9UC2szdk4AWLfjdgNBCXLUanXYcpBBKOSWGwg==",
-"dev": true,
-"requires": {
-"type-fest": "^0.8.1"
-}
-}
-}
-},
-

[spark] branch master updated (43c89dc -> e58872d)

2022-02-24 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 43c89dc  [SPARK-38273][SQL] `decodeUnsafeRows`'s iterators should 
close underlying input streams
 add e58872d  [SPARK-38191][CORE] The staging directory of write job only 
needs to be initialized once in HadoopMapReduceCommitProtocol

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated: [SPARK-38273][SQL] `decodeUnsafeRows`'s iterators should close underlying input streams

2022-02-24 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 3d1317f  [SPARK-38273][SQL] `decodeUnsafeRows`'s iterators should 
close underlying input streams
3d1317f is described below

commit 3d1317f8657beddfc6e8a5e49dbbbaaefdff1a5c
Author: Kevin Sewell 
AuthorDate: Thu Feb 24 08:14:07 2022 -0800

[SPARK-38273][SQL] `decodeUnsafeRows`'s iterators should close underlying 
input streams

### What changes were proposed in this pull request?
Wrapping the DataInputStream in the SparkPlan.decodeUnsafeRows method with 
a NextIterator as opposed to a plain Iterator, this will allow us to close the 
DataInputStream properly. This happens in Spark driver only.

### Why are the changes needed?
SPARK-34647 replaced the ZstdInputStream with ZstdInputStreamNoFinalizer. 
This meant that all usages of `CompressionCodec.compressedInputStream` would 
need to manually close the stream as this would no longer be handled by the 
finaliser mechanism.

In SparkPlan, the result of `CompressionCodec.compressedInputStream` is 
wrapped in an Iterator which never calls close.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

 Spark Shell Configuration
```bash
$> export SPARK_SUBMIT_OPTS="-XX:+AlwaysPreTouch -Xms1g"
$> $SPARK_HOME/bin/spark-shell --conf spark.io.compression.codec=zstd
```

 Test Script
```scala
import java.sql.Timestamp
import java.time.Instant
import spark.implicits._

case class Record(timestamp: Timestamp, batch: Long, value: Long)

(1 to 300).foreach { batch =>
  sc.parallelize(1 to 100).map(Record(Timestamp.from(Instant.now()), 
batch, _)).toDS.write.parquet(s"test_data/batch_$batch")
}

(1 to 300).foreach(batch => 
spark.read.parquet(s"test_data/batch_$batch").as[Record].repartition().collect())

```

 Memory Monitor
```shell
$> while true; do echo \"$(date +%Y-%m-%d' '%H:%M:%S)\",$(pmap -x  | 
grep "total kB" | awk '{print $4}'); sleep 10; done;
```

 Results

# Before
```
"2022-02-22 11:55:23",1400016
"2022-02-22 11:55:33",1522024
"2022-02-22 11:55:43",1587812
"2022-02-22 11:55:53",1631868
"2022-02-22 11:56:03",1657252
"2022-02-22 11:56:13",1659728
"2022-02-22 11:56:23",1664640
"2022-02-22 11:56:33",1674152
"2022-02-22 11:56:43",1697320
"2022-02-22 11:56:53",1689636
"2022-02-22 11:57:03",1783888
"2022-02-22 11:57:13",1896920
"2022-02-22 11:57:23",1950492
"2022-02-22 11:57:33",2010968
"2022-02-22 11:57:44",2066560
"2022-02-22 11:57:54",2108232
"2022-02-22 11:58:04",2158188
"2022-02-22 11:58:14",2211344
"2022-02-22 11:58:24",2260180
"2022-02-22 11:58:34",2316352
"2022-02-22 11:58:44",2367412
"2022-02-22 11:58:54",2420916
"2022-02-22 11:59:04",2472132
"2022-02-22 11:59:14",2519888
"2022-02-22 11:59:24",2571372
"2022-02-22 11:59:34",2621992
"2022-02-22 11:59:44",2672400
"2022-02-22 11:59:54",2728924
"2022-02-22 12:00:04",212
"2022-02-22 12:00:14",2834272
"2022-02-22 12:00:24",2881344
"2022-02-22 12:00:34",2935552
"2022-02-22 12:00:44",2984896
"2022-02-22 12:00:54",3034116
"2022-02-22 12:01:04",3087092
"2022-02-22 12:01:14",3134432
"2022-02-22 12:01:25",3198316
"2022-02-22 12:01:35",3193484
"2022-02-22 12:01:45",3193212
"2022-02-22 12:01:55",3192872
"2022-02-22 12:02:05",3191772
"2022-02-22 12:02:15",3187780
"2022-02-22 12:02:25",3177084
"2022-02-22 12:02:35",3173292
"2022-02-22 12:02:45",3173292
"2022-02-22 12:02:55",3173292
```

# After
```
"2022-02-22 12:05:03",1377124
"2022-02-22 12:05:13",1425132
"2022-02-22 12:05:23",1564060
"2022-02-22 12:05:33",1616116
"2022-02-22 12:05:43",1637448
"2022-02-22 12:05:53",1637700
"2022-02-22 12:06:03",1653912
"2022-02-22 12:06:13",1659532
"2022-02-22 12:06:23",1673368
"2022-02-22 12:06:33",1687580
"2022-02-22 12:06:43",1711076
"2022-02-22 12:06:53",1849752
"2022-02-22 12:07:03",1861528
"2022-02-22 12:07:13",1871200
"2022-02-22 12:07:24",1878860
"2022-02-22 12:07:34",1879332
"2022-02-22 12:07:44",1886552
"2022-02-22 12:07:54",1884160
"2022-02-22 12:08:04",1880924
"2022-02-22 12:08:14",1876084
"2022-02-22 12:08:24",1878800
"2022-02-22 12:08:34",1879068
"2022-02-22 12:08:44",1880088
"2022-02-22 12:08:54",1880160
"2022-02-22 12:09:04",1880496
"2022-02-22 12:09:14",1891672
"2022-02-22 12:09:24",1878552
"2022-02-22 12:09:34",1876136
"2022-02-22 12:09:44",1890056
"2022-02-22 12:09:54",1878076
"2022-02-22 12:10:04",18

[spark] branch master updated (5190048 -> 43c89dc)

2022-02-24 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5190048  [SPARK-38300][SQL] Use `ByteStreams.toByteArray` to simplify 
`fileToString` and `resourceToBytes` in catalyst.util
 add 43c89dc  [SPARK-38273][SQL] `decodeUnsafeRows`'s iterators should 
close underlying input streams

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/SparkPlan.scala | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c4b013f -> 5190048)

2022-02-24 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c4b013f  [SPARK-38229][FOLLOWUP][SQL] Clean up unnecessary code for 
code simplification
 add 5190048  [SPARK-38300][SQL] Use `ByteStreams.toByteArray` to simplify 
`fileToString` and `resourceToBytes` in catalyst.util

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/util/package.scala   | 30 +-
 1 file changed, 6 insertions(+), 24 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org