[spark] branch master updated (0880989 -> 3b2ff16)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0880989 [SPARK-22798][PYTHON][ML][FOLLOWUP] Add labelsArray to PySpark StringIndexer add 3b2ff16 [SPARK-33636][PYTHON][ML][FOLLOWUP] Update since tag of labelsArray in StringIndexer No new revisions were added by this update. Summary of changes: python/pyspark/ml/feature.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (6f4587a -> 13ca88c)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 6f4587a [SPARK-33631][DOCS][TEST] Clean up spark.core.connection.ack.wait.timeout from configuration.md add 13ca88c [SPARK-33636][PYTHON][ML][3.0] Add labelsArray to PySpark StringIndexer No new revisions were added by this update. Summary of changes: python/pyspark/ml/feature.py| 9 + python/pyspark/ml/tests/test_feature.py | 1 + 2 files changed, 10 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (878cc0e -> 0880989)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 878cc0e [SPARK-32896][SS][FOLLOW-UP] Rename the API to `toTable` add 0880989 [SPARK-22798][PYTHON][ML][FOLLOWUP] Add labelsArray to PySpark StringIndexer No new revisions were added by this update. Summary of changes: python/pyspark/ml/feature.py| 12 python/pyspark/ml/tests/test_feature.py | 1 + 2 files changed, 13 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-32896][SS][FOLLOW-UP] Rename the API to `toTable`
This is an automated email from the ASF dual-hosted git repository. zsxwing pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 878cc0e [SPARK-32896][SS][FOLLOW-UP] Rename the API to `toTable` 878cc0e is described below commit 878cc0e6e95f300a0a58c742654f53a28b30b174 Author: Yuanjian Li AuthorDate: Wed Dec 2 17:36:25 2020 -0800 [SPARK-32896][SS][FOLLOW-UP] Rename the API to `toTable` ### What changes were proposed in this pull request? As the discussion in https://github.com/apache/spark/pull/30521#discussion_r531463427, rename the API to `toTable`. ### Why are the changes needed? Rename the API for further extension and accuracy. ### Does this PR introduce _any_ user-facing change? Yes, it's an API change but the new API is not released yet. ### How was this patch tested? Existing UT. Closes #30571 from xuanyuanking/SPARK-32896-follow. Authored-by: Yuanjian Li Signed-off-by: Shixiong Zhu --- .../main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala| 2 +- .../org/apache/spark/sql/streaming/test/DataStreamTableAPISuite.scala | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala b/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala index d67e175..9e35997 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala @@ -304,7 +304,7 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) { * @since 3.1.0 */ @throws[TimeoutException] - def saveAsTable(tableName: String): StreamingQuery = { + def toTable(tableName: String): StreamingQuery = { this.source = SOURCE_NAME_TABLE this.tableName = tableName startInternal(None) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamTableAPISuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamTableAPISuite.scala index 062b106..bf85043 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamTableAPISuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamTableAPISuite.scala @@ -291,7 +291,7 @@ class DataStreamTableAPISuite extends StreamTest with BeforeAndAfter { val query = inputDF .writeStream .option("checkpointLocation", checkpointDir.getAbsolutePath) - .saveAsTable(tableIdentifier) + .toTable(tableIdentifier) inputData.addData(newInputs: _*) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4f96670 -> 90d4d7d)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4f96670 [SPARK-31953][SS] Add Spark Structured Streaming History Server Support add 90d4d7d [SPARK-33610][ML] Imputer transform skip duplicate head() job No new revisions were added by this update. Summary of changes: .../org/apache/spark/ml/feature/Imputer.scala | 29 +- 1 file changed, 17 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-32896][SS][FOLLOW-UP] Rename the API to `toTable`
This is an automated email from the ASF dual-hosted git repository. zsxwing pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6fa797e [SPARK-32896][SS][FOLLOW-UP] Rename the API to `toTable` 6fa797e is described below commit 6fa797e977412d071dd4dc079053ec64a21b3041 Author: Yuanjian Li AuthorDate: Wed Dec 2 17:31:10 2020 -0800 [SPARK-32896][SS][FOLLOW-UP] Rename the API to `toTable` ### What changes were proposed in this pull request? As the discussion in https://github.com/apache/spark/pull/30521#discussion_r531463427, rename the API to `toTable`. ### Why are the changes needed? Rename the API for further extension and accuracy. ### Does this PR introduce _any_ user-facing change? Yes, it's an API change but the new API is not released yet. ### How was this patch tested? Existing UT. Closes #30571 from xuanyuanking/SPARK-32896-follow. Authored-by: Yuanjian Li Signed-off-by: Shixiong Zhu --- .../main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala| 2 +- .../org/apache/spark/sql/streaming/test/DataStreamTableAPISuite.scala | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala b/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala index d67e175..9e35997 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala @@ -304,7 +304,7 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) { * @since 3.1.0 */ @throws[TimeoutException] - def saveAsTable(tableName: String): StreamingQuery = { + def toTable(tableName: String): StreamingQuery = { this.source = SOURCE_NAME_TABLE this.tableName = tableName startInternal(None) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamTableAPISuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamTableAPISuite.scala index 062b106..bf85043 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamTableAPISuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamTableAPISuite.scala @@ -291,7 +291,7 @@ class DataStreamTableAPISuite extends StreamTest with BeforeAndAfter { val query = inputDF .writeStream .option("checkpointLocation", checkpointDir.getAbsolutePath) - .saveAsTable(tableIdentifier) + .toTable(tableIdentifier) inputData.addData(newInputs: _*) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support
This is an automated email from the ASF dual-hosted git repository. zsxwing pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4f96670 [SPARK-31953][SS] Add Spark Structured Streaming History Server Support 4f96670 is described below commit 4f9667035886a67e6c9a4e8fad2efa390e87ca68 Author: uncleGen AuthorDate: Wed Dec 2 17:11:51 2020 -0800 [SPARK-31953][SS] Add Spark Structured Streaming History Server Support ### What changes were proposed in this pull request? Add Spark Structured Streaming History Server Support. ### Why are the changes needed? Add a streaming query history server plugin. ![image](https://user-images.githubusercontent.com/7402327/84248291-d26cfe80-ab3b-11ea-86d2-98205fa2bcc4.png) ![image](https://user-images.githubusercontent.com/7402327/84248347-e44ea180-ab3b-11ea-81de-eefe207656f2.png) ![image](https://user-images.githubusercontent.com/7402327/84248396-f0d2fa00-ab3b-11ea-9b0d-e410115471b0.png) - Follow-ups - Query duration should not update in history UI. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Update UT. Closes #28781 from uncleGen/SPARK-31953. Lead-authored-by: uncleGen Co-authored-by: Genmao Yu Co-authored-by: Yuanjian Li Signed-off-by: Shixiong Zhu --- dev/.rat-excludes | 1 + .../org.apache.spark.status.AppHistoryServerPlugin | 1 + .../streaming/StreamingQueryListenerBus.scala | 26 +++- .../ui/StreamingQueryHistoryServerPlugin.scala | 43 ++ .../execution/ui/StreamingQueryStatusStore.scala | 53 +++ .../apache/spark/sql/internal/SharedState.scala| 8 +- .../sql/streaming/StreamingQueryManager.scala | 3 +- .../sql/streaming/ui/StreamingQueryPage.scala | 44 +++--- .../ui/StreamingQueryStatisticsPage.scala | 27 ++-- .../ui/StreamingQueryStatusListener.scala | 166 + .../spark/sql/streaming/ui/StreamingQueryTab.scala | 3 +- .../apache/spark/sql/streaming/ui/UIUtils.scala| 12 +- .../resources/spark-events/local-1596020211915 | 160 .../org/apache/spark/deploy/history/Utils.scala} | 39 ++--- .../streaming/ui/StreamingQueryHistorySuite.scala | 63 .../sql/streaming/ui/StreamingQueryPageSuite.scala | 42 +++--- .../ui/StreamingQueryStatusListenerSuite.scala | 159 17 files changed, 673 insertions(+), 177 deletions(-) diff --git a/dev/.rat-excludes b/dev/.rat-excludes index 7da330d..167cf22 100644 --- a/dev/.rat-excludes +++ b/dev/.rat-excludes @@ -123,6 +123,7 @@ SessionHandler.java GangliaReporter.java application_1578436911597_0052 config.properties +local-1596020211915 app-20200706201101-0003 py.typed _metadata diff --git a/sql/core/src/main/resources/META-INF/services/org.apache.spark.status.AppHistoryServerPlugin b/sql/core/src/main/resources/META-INF/services/org.apache.spark.status.AppHistoryServerPlugin index 0bba2f8..6771eef 100644 --- a/sql/core/src/main/resources/META-INF/services/org.apache.spark.status.AppHistoryServerPlugin +++ b/sql/core/src/main/resources/META-INF/services/org.apache.spark.status.AppHistoryServerPlugin @@ -1 +1,2 @@ org.apache.spark.sql.execution.ui.SQLHistoryServerPlugin +org.apache.spark.sql.execution.ui.StreamingQueryHistoryServerPlugin diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala index 1b8d69f..4b98acd 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala @@ -31,16 +31,21 @@ import org.apache.spark.util.ListenerBus * Spark listener bus, so that it can receive [[StreamingQueryListener.Event]]s and dispatch them * to StreamingQueryListeners. * - * Note that each bus and its registered listeners are associated with a single SparkSession + * Note 1: Each bus and its registered listeners are associated with a single SparkSession * and StreamingQueryManager. So this bus will dispatch events to registered listeners for only * those queries that were started in the associated SparkSession. + * + * Note 2: To rebuild Structured Streaming UI in SHS, this bus will be registered into + * [[org.apache.spark.scheduler.ReplayListenerBus]]. We check `sparkListenerBus` defined or not to + * determine how to process [[StreamingQueryListener.Event]]. If false, it means this bus is used to + * replay all streaming query event from eventLog. */ -class StreamingQueryListenerBus(sparkListenerBus: LiveListenerBus) +class St
[spark] branch master updated (92bfbcb -> f94cb53)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 92bfbcb [SPARK-33631][DOCS][TEST] Clean up spark.core.connection.ack.wait.timeout from configuration.md add f94cb53 [MINOR][INFRA] Use the latest image for GitHub Action jobs No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-33631][DOCS][TEST] Clean up spark.core.connection.ack.wait.timeout from configuration.md
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 7a8af18 [SPARK-33631][DOCS][TEST] Clean up spark.core.connection.ack.wait.timeout from configuration.md 7a8af18 is described below commit 7a8af183a7f277e3d5f75eb5d27db46ccb9ecd93 Author: yangjie01 AuthorDate: Wed Dec 2 12:58:41 2020 -0800 [SPARK-33631][DOCS][TEST] Clean up spark.core.connection.ack.wait.timeout from configuration.md SPARK-9767 remove `ConnectionManager` and related files, the configuration `spark.core.connection.ack.wait.timeout` previously used by `ConnectionManager` is no longer used by other Spark code, but it still exists in the `configuration.md`. So this pr cleans up the useless configuration item spark.core.connection.ack.wait.timeout` from `configuration.md`. Clean up useless configuration from `configuration.md`. No Pass the Jenkins or GitHub Action Closes #30569 from LuciferYang/SPARK-33631. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun (cherry picked from commit 92bfbcb2e372e8fecfe65bc582c779d9df4036bb) Signed-off-by: Dongjoon Hyun --- .../apache/spark/storage/BlockManagerReplicationSuite.scala| 2 -- docs/configuration.md | 10 -- 2 files changed, 12 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala b/core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala index 3962bdc..bfff101 100644 --- a/core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala +++ b/core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala @@ -93,8 +93,6 @@ trait BlockManagerReplicationBehavior extends SparkFunSuite conf.set("spark.storage.unrollFraction", "0.4") conf.set("spark.storage.unrollMemoryThreshold", "512") -// to make a replication attempt to inactive store fail fast -conf.set("spark.core.connection.ack.wait.timeout", "1s") // to make cached peers refresh frequently conf.set("spark.storage.cachedPeersTtl", "10") diff --git a/docs/configuration.md b/docs/configuration.md index 6bb1bda..a6aef5d 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1473,7 +1473,6 @@ Apart from these, the following properties are also available, and may be useful 120s Default timeout for all network interactions. This config will be used in place of -spark.core.connection.ack.wait.timeout, spark.storage.blockManagerSlaveTimeoutMs, spark.shuffle.io.connectionTimeout, spark.rpc.askTimeout or spark.rpc.lookupTimeout if they are not configured. @@ -1519,15 +1518,6 @@ Apart from these, the following properties are also available, and may be useful Duration for an RPC remote endpoint lookup operation to wait before timing out. - - spark.core.connection.ack.wait.timeout - spark.network.timeout - -How long for the connection to wait for ack to occur before timing -out and giving up. To avoid unwilling timeout caused by long pause like GC, -you can set larger value. - - ### Scheduling - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33631][DOCS][TEST] Clean up spark.core.connection.ack.wait.timeout from configuration.md
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 6f4587a [SPARK-33631][DOCS][TEST] Clean up spark.core.connection.ack.wait.timeout from configuration.md 6f4587a is described below commit 6f4587a10d553969a2f753d30d86ae3b1077e1f6 Author: yangjie01 AuthorDate: Wed Dec 2 12:58:41 2020 -0800 [SPARK-33631][DOCS][TEST] Clean up spark.core.connection.ack.wait.timeout from configuration.md SPARK-9767 remove `ConnectionManager` and related files, the configuration `spark.core.connection.ack.wait.timeout` previously used by `ConnectionManager` is no longer used by other Spark code, but it still exists in the `configuration.md`. So this pr cleans up the useless configuration item spark.core.connection.ack.wait.timeout` from `configuration.md`. Clean up useless configuration from `configuration.md`. No Pass the Jenkins or GitHub Action Closes #30569 from LuciferYang/SPARK-33631. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun (cherry picked from commit 92bfbcb2e372e8fecfe65bc582c779d9df4036bb) Signed-off-by: Dongjoon Hyun --- .../apache/spark/storage/BlockManagerReplicationSuite.scala | 2 -- docs/configuration.md | 11 --- 2 files changed, 13 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala b/core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala index 59ace85..6238d86 100644 --- a/core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala +++ b/core/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala @@ -92,8 +92,6 @@ trait BlockManagerReplicationBehavior extends SparkFunSuite conf.set(MEMORY_STORAGE_FRACTION, 0.999) conf.set(STORAGE_UNROLL_MEMORY_THRESHOLD, 512L) -// to make a replication attempt to inactive store fail fast -conf.set("spark.core.connection.ack.wait.timeout", "1s") // to make cached peers refresh frequently conf.set(STORAGE_CACHED_PEERS_TTL, 10) diff --git a/docs/configuration.md b/docs/configuration.md index 5a2c6b3..ebfee12 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1887,7 +1887,6 @@ Apart from these, the following properties are also available, and may be useful 120s Default timeout for all network interactions. This config will be used in place of -spark.core.connection.ack.wait.timeout, spark.storage.blockManagerSlaveTimeoutMs, spark.shuffle.io.connectionTimeout, spark.rpc.askTimeout or spark.rpc.lookupTimeout if they are not configured. @@ -1951,16 +1950,6 @@ Apart from these, the following properties are also available, and may be useful 1.4.0 - spark.core.connection.ack.wait.timeout - spark.network.timeout - -How long for the connection to wait for ack to occur before timing -out and giving up. To avoid unwilling timeout caused by long pause like GC, -you can set larger value. - - 1.1.1 - - spark.network.maxRemoteBlockSizeFetchToMem 200m - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b76c6b7 -> 92bfbcb)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b76c6b7 [SPARK-33627][SQL] Add new function UNIX_SECONDS, UNIX_MILLIS and UNIX_MICROS add 92bfbcb [SPARK-33631][DOCS][TEST] Clean up spark.core.connection.ack.wait.timeout from configuration.md No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerReplicationSuite.scala | 2 -- docs/configuration.md | 11 --- 2 files changed, 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a082f460 -> b76c6b7)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a082f460 [SPARK-33071][SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin add b76c6b7 [SPARK-33627][SQL] Add new function UNIX_SECONDS, UNIX_MILLIS and UNIX_MICROS No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/FunctionRegistry.scala | 3 + .../catalyst/expressions/datetimeExpressions.scala | 73 ++ .../expressions/DateExpressionsSuite.scala | 45 + .../sql-functions/sql-expression-schema.md | 5 +- .../test/resources/sql-tests/inputs/datetime.sql | 4 ++ .../sql-tests/results/ansi/datetime.sql.out| 26 +++- .../sql-tests/results/datetime-legacy.sql.out | 26 +++- .../resources/sql-tests/results/datetime.sql.out | 26 +++- 8 files changed, 204 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: Revert "[SPARK-33504][CORE] The application log in the Spark history server contains sensitive attributes should be redacted"
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 3fb9f6f Revert "[SPARK-33504][CORE] The application log in the Spark history server contains sensitive attributes should be redacted" 3fb9f6f is described below commit 3fb9f6f670328d31bf24fcb6b805715f5828ce06 Author: Thomas Graves AuthorDate: Wed Dec 2 14:38:19 2020 -0600 Revert "[SPARK-33504][CORE] The application log in the Spark history server contains sensitive attributes should be redacted" ### What changes were proposed in this pull request? Revert SPARK-33504 on branch-3.0 compilation error. Original PR https://github.com/apache/spark/pull/30446 This reverts commit e59179b7326112f526e4c000e21146df283d861c. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #30576 from tgravescs/revert33504. Authored-by: Thomas Graves Signed-off-by: Thomas Graves --- .../spark/scheduler/EventLoggingListener.scala | 24 +--- .../scheduler/EventLoggingListenerSuite.scala | 64 +- 2 files changed, 3 insertions(+), 85 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala index 5673c02..24e2a5e 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala @@ -18,9 +18,7 @@ package org.apache.spark.scheduler import java.net.URI -import java.util.Properties -import scala.collection.JavaConverters._ import scala.collection.mutable import org.apache.hadoop.conf.Configuration @@ -105,7 +103,7 @@ private[spark] class EventLoggingListener( // Events that do not trigger a flush override def onStageSubmitted(event: SparkListenerStageSubmitted): Unit = { -logEvent(event.copy(properties = redactProperties(event.properties))) +logEvent(event) if (shouldLogStageExecutorMetrics) { // record the peak metrics for the new stage liveStageExecutorMetrics.put((event.stageInfo.stageId, event.stageInfo.attemptNumber()), @@ -158,9 +156,7 @@ private[spark] class EventLoggingListener( logEvent(event, flushLogger = true) } - override def onJobStart(event: SparkListenerJobStart): Unit = { -logEvent(event.copy(properties = redactProperties(event.properties)), flushLogger = true) - } + override def onJobStart(event: SparkListenerJobStart): Unit = logEvent(event, flushLogger = true) override def onJobEnd(event: SparkListenerJobEnd): Unit = logEvent(event, flushLogger = true) @@ -250,22 +246,6 @@ private[spark] class EventLoggingListener( logWriter.stop() } - private def redactProperties(properties: Properties): Properties = { -if (properties == null) { - return properties -} -val redactedProperties = new Properties -// properties may contain some custom local properties such as stage/job description -// only properties in sparkConf need to be redacted. -val (globalProperties, localProperties) = properties.asScala.toSeq.partition { - case (key, _) => sparkConf.contains(key) -} -(Utils.redact(sparkConf, globalProperties) ++ localProperties).foreach { - case (key, value) => redactedProperties.setProperty(key, value) -} -redactedProperties - } - private[spark] def redactEvent( event: SparkListenerEnvironmentUpdate): SparkListenerEnvironmentUpdate = { // environmentDetails maps a string descriptor to a set of properties diff --git a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala index e0e6406..046564d 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala @@ -18,7 +18,7 @@ package org.apache.spark.scheduler import java.io.{File, InputStream} -import java.util.{Arrays, Properties} +import java.util.Arrays import scala.collection.immutable.Map import scala.collection.mutable @@ -96,68 +96,6 @@ class EventLoggingListenerSuite extends SparkFunSuite with LocalSparkContext wit assert(redactedProps(key) == "*(redacted)") } - test("Spark-33504 sensitive attributes redaction in properties") { -val (secretKey, secretPassword) = ("spark.executorEnv.HADOOP_CREDSTORE_PASSWORD", - "secret_password") -val (customKey, customValue) = ("parse_token", "secret_password") - -val conf = getLoggingConf(testDirPath, None).set(secretKey, secretPassword) - -val properties
[spark] branch master updated (91182d6 -> a082f460)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 91182d6 [SPARK-33626][K8S][TEST] Allow k8s integration tests to assert both driver and executor logs for expected log(s) add a082f460 [SPARK-33071][SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/AliasHelper.scala | 3 +- .../catalyst/expressions/namedExpressions.scala| 15 ++--- .../main/scala/org/apache/spark/sql/Column.scala | 5 ++- .../main/scala/org/apache/spark/sql/Dataset.scala | 39 +- .../apache/spark/sql/DataFrameSelfJoinSuite.scala | 29 .../spark/sql/SparkSessionExtensionSuite.scala | 7 ++-- 6 files changed, 73 insertions(+), 25 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (58583f7 -> 91182d6)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 58583f7 [SPARK-33619][SQL] Fix GetMapValueUtil code generation error add 91182d6 [SPARK-33626][K8S][TEST] Allow k8s integration tests to assert both driver and executor logs for expected log(s) No new revisions were added by this update. Summary of changes: .../k8s/integrationtest/DecommissionSuite.scala| 6 ++-- .../k8s/integrationtest/DepsTestsSuite.scala | 2 +- .../k8s/integrationtest/KubernetesSuite.scala | 32 +++--- .../k8s/integrationtest/PythonTestsSuite.scala | 6 ++-- .../deploy/k8s/integrationtest/RTestsSuite.scala | 2 +- .../integrationtest/SparkConfPropagateSuite.scala | 22 +++ 6 files changed, 47 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (df8d3f1 -> 58583f7)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from df8d3f1 [SPARK-33544][SQL][FOLLOW-UP] Rename NoSideEffect to NoThrow and clarify the documentation more add 58583f7 [SPARK-33619][SQL] Fix GetMapValueUtil code generation error No new revisions were added by this update. Summary of changes: .../expressions/complexTypeExtractors.scala| 2 +- .../catalyst/expressions/datetimeExpressions.scala | 7 +++- .../catalyst/expressions/intervalExpressions.scala | 14 +++ .../expressions/ExpressionEvalHelper.scala | 49 +++--- .../expressions/ExpressionEvalHelperSuite.scala| 25 ++- .../expressions/IntervalExpressionsSuite.scala | 36 .../expressions/MathExpressionsSuite.scala | 5 +-- 7 files changed, 70 insertions(+), 68 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (28dad1b -> df8d3f1)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 28dad1b [SPARK-33504][CORE] The application log in the Spark history server contains sensitive attributes should be redacted add df8d3f1 [SPARK-33544][SQL][FOLLOW-UP] Rename NoSideEffect to NoThrow and clarify the documentation more No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/complexTypeCreator.scala | 18 -- .../spark/sql/catalyst/optimizer/expressions.scala | 2 +- .../sql/catalyst/optimizer/ConstantFoldingSuite.scala | 2 +- .../optimizer/InferFiltersFromGenerateSuite.scala | 6 +++--- 4 files changed, 17 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33504][CORE] The application log in the Spark history server contains sensitive attributes should be redacted
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new e59179b [SPARK-33504][CORE] The application log in the Spark history server contains sensitive attributes should be redacted e59179b is described below commit e59179b7326112f526e4c000e21146df283d861c Author: neko AuthorDate: Wed Dec 2 09:24:19 2020 -0600 [SPARK-33504][CORE] The application log in the Spark history server contains sensitive attributes should be redacted ### What changes were proposed in this pull request? To make sure the sensitive attributes to be redacted in the history server log. ### Why are the changes needed? We found the secure attributes like password in SparkListenerJobStart and SparkListenerStageSubmitted events would not been redated, resulting in sensitive attributes can be viewd directly. The screenshot can be viewed in the attachment of JIRA spark-33504 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? muntual test works well, I have also added unit testcase. Closes #30446 from akiyamaneko/eventlog_unredact. Authored-by: neko Signed-off-by: Thomas Graves (cherry picked from commit 28dad1ba770e5b7f7cf542da1ae3f05975a969c6) Signed-off-by: Thomas Graves --- .../spark/scheduler/EventLoggingListener.scala | 24 +++- .../scheduler/EventLoggingListenerSuite.scala | 64 +- 2 files changed, 85 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala index 24e2a5e..5673c02 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala @@ -18,7 +18,9 @@ package org.apache.spark.scheduler import java.net.URI +import java.util.Properties +import scala.collection.JavaConverters._ import scala.collection.mutable import org.apache.hadoop.conf.Configuration @@ -103,7 +105,7 @@ private[spark] class EventLoggingListener( // Events that do not trigger a flush override def onStageSubmitted(event: SparkListenerStageSubmitted): Unit = { -logEvent(event) +logEvent(event.copy(properties = redactProperties(event.properties))) if (shouldLogStageExecutorMetrics) { // record the peak metrics for the new stage liveStageExecutorMetrics.put((event.stageInfo.stageId, event.stageInfo.attemptNumber()), @@ -156,7 +158,9 @@ private[spark] class EventLoggingListener( logEvent(event, flushLogger = true) } - override def onJobStart(event: SparkListenerJobStart): Unit = logEvent(event, flushLogger = true) + override def onJobStart(event: SparkListenerJobStart): Unit = { +logEvent(event.copy(properties = redactProperties(event.properties)), flushLogger = true) + } override def onJobEnd(event: SparkListenerJobEnd): Unit = logEvent(event, flushLogger = true) @@ -246,6 +250,22 @@ private[spark] class EventLoggingListener( logWriter.stop() } + private def redactProperties(properties: Properties): Properties = { +if (properties == null) { + return properties +} +val redactedProperties = new Properties +// properties may contain some custom local properties such as stage/job description +// only properties in sparkConf need to be redacted. +val (globalProperties, localProperties) = properties.asScala.toSeq.partition { + case (key, _) => sparkConf.contains(key) +} +(Utils.redact(sparkConf, globalProperties) ++ localProperties).foreach { + case (key, value) => redactedProperties.setProperty(key, value) +} +redactedProperties + } + private[spark] def redactEvent( event: SparkListenerEnvironmentUpdate): SparkListenerEnvironmentUpdate = { // environmentDetails maps a string descriptor to a set of properties diff --git a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala index 046564d..e0e6406 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala @@ -18,7 +18,7 @@ package org.apache.spark.scheduler import java.io.{File, InputStream} -import java.util.Arrays +import java.util.{Arrays, Properties} import scala.collection.immutable.Map import scala.collection.mutable @@ -96,6 +96,68 @@ class EventLoggingListenerSuite extends SparkFunSuite with LocalSparkContext wit assert(redactedProps(key) == "*(redacted)") } + test("Spark-33504 sensitive attributes redaction
[spark] branch master updated: [SPARK-33504][CORE] The application log in the Spark history server contains sensitive attributes should be redacted
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 28dad1b [SPARK-33504][CORE] The application log in the Spark history server contains sensitive attributes should be redacted 28dad1b is described below commit 28dad1ba770e5b7f7cf542da1ae3f05975a969c6 Author: neko AuthorDate: Wed Dec 2 09:24:19 2020 -0600 [SPARK-33504][CORE] The application log in the Spark history server contains sensitive attributes should be redacted ### What changes were proposed in this pull request? To make sure the sensitive attributes to be redacted in the history server log. ### Why are the changes needed? We found the secure attributes like password in SparkListenerJobStart and SparkListenerStageSubmitted events would not been redated, resulting in sensitive attributes can be viewd directly. The screenshot can be viewed in the attachment of JIRA spark-33504 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? muntual test works well, I have also added unit testcase. Closes #30446 from akiyamaneko/eventlog_unredact. Authored-by: neko Signed-off-by: Thomas Graves --- .../spark/scheduler/EventLoggingListener.scala | 24 +++- .../scheduler/EventLoggingListenerSuite.scala | 64 +- 2 files changed, 85 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala index 1fda03f..d4e22d7 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala @@ -18,7 +18,9 @@ package org.apache.spark.scheduler import java.net.URI +import java.util.Properties +import scala.collection.JavaConverters._ import scala.collection.mutable import org.apache.hadoop.conf.Configuration @@ -103,7 +105,7 @@ private[spark] class EventLoggingListener( // Events that do not trigger a flush override def onStageSubmitted(event: SparkListenerStageSubmitted): Unit = { -logEvent(event) +logEvent(event.copy(properties = redactProperties(event.properties))) if (shouldLogStageExecutorMetrics) { // record the peak metrics for the new stage liveStageExecutorMetrics.put((event.stageInfo.stageId, event.stageInfo.attemptNumber()), @@ -156,7 +158,9 @@ private[spark] class EventLoggingListener( logEvent(event, flushLogger = true) } - override def onJobStart(event: SparkListenerJobStart): Unit = logEvent(event, flushLogger = true) + override def onJobStart(event: SparkListenerJobStart): Unit = { +logEvent(event.copy(properties = redactProperties(event.properties)), flushLogger = true) + } override def onJobEnd(event: SparkListenerJobEnd): Unit = logEvent(event, flushLogger = true) @@ -276,6 +280,22 @@ private[spark] class EventLoggingListener( logWriter.stop() } + private def redactProperties(properties: Properties): Properties = { +if (properties == null) { + return properties +} +val redactedProperties = new Properties +// properties may contain some custom local properties such as stage/job description +// only properties in sparkConf need to be redacted. +val (globalProperties, localProperties) = properties.asScala.toSeq.partition { + case (key, _) => sparkConf.contains(key) +} +(Utils.redact(sparkConf, globalProperties) ++ localProperties).foreach { + case (key, value) => redactedProperties.setProperty(key, value) +} +redactedProperties + } + private[spark] def redactEvent( event: SparkListenerEnvironmentUpdate): SparkListenerEnvironmentUpdate = { // environmentDetails maps a string descriptor to a set of properties diff --git a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala index c4a8bcb..7acb845 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala @@ -18,7 +18,7 @@ package org.apache.spark.scheduler import java.io.{File, InputStream} -import java.util.Arrays +import java.util.{Arrays, Properties} import scala.collection.immutable.Map import scala.collection.mutable @@ -98,6 +98,68 @@ class EventLoggingListenerSuite extends SparkFunSuite with LocalSparkContext wit assert(redactedProps(key) == "*(redacted)") } + test("Spark-33504 sensitive attributes redaction in properties") { +val (secretKey, secretPassword) = ("spark.executorEnv.HADOOP_CREDSTORE_PASSWORD", + "
[spark] branch master updated (290aa02 -> 084d38b)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 290aa02 [SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work add 084d38b [SPARK-33557][CORE][MESOS][TEST] Ensure the relationship between STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT and NETWORK_TIMEOUT No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala | 4 +++- core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +- .../test/scala/org/apache/spark/repl/ExecutorClassLoaderSuite.scala | 1 + .../scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala | 5 - 4 files changed, 9 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a4788ee -> 290aa02)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a4788ee [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite add 290aa02 [SPARK-33618][CORE] Use hadoop-client instead of hadoop-client-api to make hadoop-aws work No new revisions were added by this update. Summary of changes: common/network-yarn/pom.xml| 8 +-- core/pom.xml | 16 +- dev/deps/spark-deps-hadoop-2.7-hive-2.3| 3 +- dev/deps/spark-deps-hadoop-3.2-hive-2.3| 52 - external/kafka-0-10-assembly/pom.xml | 8 +-- external/kafka-0-10-sql/pom.xml| 4 -- external/kafka-0-10-token-provider/pom.xml | 5 -- external/kinesis-asl-assembly/pom.xml | 8 +-- hadoop-cloud/pom.xml | 7 +-- launcher/pom.xml | 9 +-- pom.xml| 57 -- resource-managers/kubernetes/core/pom.xml | 9 +++ resource-managers/yarn/pom.xml | 67 -- .../spark/deploy/yarn/ApplicationMaster.scala | 6 +- .../spark/deploy/yarn/BaseYarnClusterSuite.scala | 10 sql/catalyst/pom.xml | 4 -- sql/hive/pom.xml | 5 -- .../sql/hive/client/IsolatedClientLoader.scala | 19 +- 18 files changed, 107 insertions(+), 190 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org