spark git commit: [SPARK-20425][SQL] Support a vertical display mode for Dataset.show
Repository: spark Updated Branches: refs/heads/master 66636ef0b -> b4724db19 [SPARK-20425][SQL] Support a vertical display mode for Dataset.show ## What changes were proposed in this pull request? This pr added a new display mode for `Dataset.show` to print output rows vertically (one line per column value). In the current master, when printing Dataset with many columns, the readability is low like; ``` scala> val df = spark.range(100).selectExpr((0 until 100).map(i => s"rand() AS c$i"): _*) scala> df.show(3, 0) +--+--+--+---+--+--+---+--+--+--+--+---+--+--+--+---+---+---+--+--+---+--+---+--+---+---+---++---+--+---++--+--+---+---+---+--+--+---+--+--+---+---+---+--+++---+---+- --+---+---+---++---+---+---+---+--+--+---+---+--+---+--+--+-+---+---+--+---+--+---+---+---+--+---+--+---+---+---+---+---+---+---+---+--+---+---+--+--+--+---+--+---+--+---+---+---+ |c0|c1|c2|c3 |c4|c5|c6 |c7 |c8|c9|c10 |c11 |c12 |c13 |c14 |c15 |c16|c17|c18 |c19 |c20|c21 |c22|c23 |c24|c25|c26|c27 |c28|c29 |c30|c31 |c32 |c33 |c34|c35 |c36|c37 |c38 |c39 |c40 |c41 |c42|c43 |c44|c45 |c46 |c47 |c48|c49|c50 |c51|c52|c53|c54 |c55|c56|c57|c58 |c59 |c60 |c61|c62 |c63 |c64|c65 |c66 |c67 |c68|c69|c70 |c71|c72 |c73|c74 |c75|c76 |c77|c78 |c79|c80|c81|c82 |c83|c84|c85|c86 |c87 |c88|c89|c90 |c91 |c92 |c93|c94 |c95|c96 |c97|c98 |c99| +--+--+--+---+--+--+---+--+--+--+--+---+--+--+--+---+---+---+--+--+---+--+---+--+---+---+--
[1/2] spark git commit: Preparing Spark release v2.2.0-rc1
Repository: spark Updated Branches: refs/heads/branch-2.2 d6efda512 -> 75544c019 Preparing Spark release v2.2.0-rc1 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8ccb4a57 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8ccb4a57 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8ccb4a57 Branch: refs/heads/branch-2.2 Commit: 8ccb4a57c82146c1a8f8966c7e64010cf5632cb6 Parents: d6efda5 Author: Patrick Wendell Authored: Wed Apr 26 17:32:19 2017 -0700 Committer: Patrick Wendell Committed: Wed Apr 26 17:32:19 2017 -0700 -- assembly/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 2 +- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 37 files changed, 37 insertions(+), 37 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8ccb4a57/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index 9d8607d..3a7003f 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.2.0-SNAPSHOT +2.2.0 ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/8ccb4a57/common/network-common/pom.xml -- diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 8657af7..5e9ffd1 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.0-SNAPSHOT +2.2.0 ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/8ccb4a57/common/network-shuffle/pom.xml -- diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 24c10fb..c3e10d1 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.0-SNAPSHOT +2.2.0 ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/8ccb4a57/common/network-yarn/pom.xml -- diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 5e5a80b..10ea657 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.0-SNAPSHOT +2.2.0 ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/8ccb4a57/common/sketch/pom.xml -- diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index 1356c47..1a1f652 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.0-SNAPSHOT +2.2.0 ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/8ccb4a57/common/tags/pom.xml ---
[2/2] spark git commit: Preparing development version 2.2.0-SNAPSHOT
Preparing development version 2.2.0-SNAPSHOT Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/75544c01 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/75544c01 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/75544c01 Branch: refs/heads/branch-2.2 Commit: 75544c01939297cc39b4c3095fce435a22b833c0 Parents: 8ccb4a5 Author: Patrick Wendell Authored: Wed Apr 26 17:32:23 2017 -0700 Committer: Patrick Wendell Committed: Wed Apr 26 17:32:23 2017 -0700 -- assembly/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 2 +- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 37 files changed, 37 insertions(+), 37 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/75544c01/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index 3a7003f..9d8607d 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.2.0 +2.2.0-SNAPSHOT ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/75544c01/common/network-common/pom.xml -- diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 5e9ffd1..8657af7 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.0 +2.2.0-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/75544c01/common/network-shuffle/pom.xml -- diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index c3e10d1..24c10fb 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.0 +2.2.0-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/75544c01/common/network-yarn/pom.xml -- diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 10ea657..5e5a80b 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.0 +2.2.0-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/75544c01/common/sketch/pom.xml -- diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index 1a1f652..1356c47 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.0 +2.2.0-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/75544c01/common/tags/pom.xml -- diff --git a/common/tags/pom.xml b/common/tags/pom.xml index 525ece5..9
[spark] Git Push Summary
Repository: spark Updated Tags: refs/tags/v2.2.0-rc1 [created] 8ccb4a57c - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20435][CORE] More thorough redaction of sensitive information
Repository: spark Updated Branches: refs/heads/branch-2.2 b48bb3ab2 -> d6efda512 [SPARK-20435][CORE] More thorough redaction of sensitive information This change does a more thorough redaction of sensitive information from logs and UI Add unit tests that ensure that no regressions happen that leak sensitive information to the logs. The motivation for this change was appearance of password like so in `SparkListenerEnvironmentUpdate` in event logs under some JVM configurations: `"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password ..." ` Previously redaction logic was only checking if the key matched the secret regex pattern, it'd redact it's value. That worked for most cases. However, in the above case, the key (sun.java.command) doesn't tell much, so the value needs to be searched. This PR expands the check to check for values as well. ## How was this patch tested? New unit tests added that ensure that no sensitive information is present in the event logs or the yarn logs. Old unit test in UtilsSuite was modified because the test was asserting that a non-sensitive property's value won't be redacted. However, the non-sensitive value had the literal "secret" in it which was causing it to redact. Simply updating the non-sensitive property's value to another arbitrary value (that didn't have "secret" in it) fixed it. Author: Mark Grover Closes #17725 from markgrover/spark-20435. (cherry picked from commit 66636ef0b046e5d1f340c3b8153d7213fa9d19c7) Signed-off-by: Marcelo Vanzin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d6efda51 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d6efda51 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d6efda51 Branch: refs/heads/branch-2.2 Commit: d6efda512e9d40e0a51c03675477bfb20c6bc7ae Parents: b48bb3a Author: Mark Grover Authored: Wed Apr 26 17:06:21 2017 -0700 Committer: Marcelo Vanzin Committed: Wed Apr 26 17:06:30 2017 -0700 -- .../apache/spark/internal/config/package.scala | 4 +-- .../spark/scheduler/EventLoggingListener.scala | 16 ++--- .../scala/org/apache/spark/util/Utils.scala | 22 ++--- .../apache/spark/deploy/SparkSubmitSuite.scala | 34 .../org/apache/spark/util/UtilsSuite.scala | 10 -- docs/configuration.md | 4 +-- .../spark/deploy/yarn/YarnClusterSuite.scala| 32 ++ 7 files changed, 100 insertions(+), 22 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d6efda51/core/src/main/scala/org/apache/spark/internal/config/package.scala -- diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index 89aeea4..2f0a306 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -244,8 +244,8 @@ package object config { ConfigBuilder("spark.redaction.regex") .doc("Regex to decide which Spark configuration properties and environment variables in " + "driver and executor environments contain sensitive information. When this regex matches " + -"a property, its value is redacted from the environment UI and various logs like YARN " + -"and event logs.") +"a property key or value, the value is redacted from the environment UI and various logs " + +"like YARN and event logs.") .regexConf .createWithDefault("(?i)secret|password".r) http://git-wip-us.apache.org/repos/asf/spark/blob/d6efda51/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala -- diff --git a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala index aecb3a9..a7dbf87 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala @@ -252,11 +252,17 @@ private[spark] class EventLoggingListener( private[spark] def redactEvent( event: SparkListenerEnvironmentUpdate): SparkListenerEnvironmentUpdate = { -// "Spark Properties" entry will always exist because the map is always populated with it. -val redactedProps = Utils.redact(sparkConf, event.environmentDetails("Spark Properties")) -val redactedEnvironmentDetails = event.environmentDetails + - ("Spark Properties" -> redactedProps) -SparkListenerEnvironmentUpdate(redactedEnvironmentDet
spark git commit: [SPARK-20435][CORE] More thorough redaction of sensitive information
Repository: spark Updated Branches: refs/heads/master 2ba1eba37 -> 66636ef0b [SPARK-20435][CORE] More thorough redaction of sensitive information This change does a more thorough redaction of sensitive information from logs and UI Add unit tests that ensure that no regressions happen that leak sensitive information to the logs. The motivation for this change was appearance of password like so in `SparkListenerEnvironmentUpdate` in event logs under some JVM configurations: `"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password ..." ` Previously redaction logic was only checking if the key matched the secret regex pattern, it'd redact it's value. That worked for most cases. However, in the above case, the key (sun.java.command) doesn't tell much, so the value needs to be searched. This PR expands the check to check for values as well. ## How was this patch tested? New unit tests added that ensure that no sensitive information is present in the event logs or the yarn logs. Old unit test in UtilsSuite was modified because the test was asserting that a non-sensitive property's value won't be redacted. However, the non-sensitive value had the literal "secret" in it which was causing it to redact. Simply updating the non-sensitive property's value to another arbitrary value (that didn't have "secret" in it) fixed it. Author: Mark Grover Closes #17725 from markgrover/spark-20435. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/66636ef0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/66636ef0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/66636ef0 Branch: refs/heads/master Commit: 66636ef0b046e5d1f340c3b8153d7213fa9d19c7 Parents: 2ba1eba Author: Mark Grover Authored: Wed Apr 26 17:06:21 2017 -0700 Committer: Marcelo Vanzin Committed: Wed Apr 26 17:06:21 2017 -0700 -- .../apache/spark/internal/config/package.scala | 4 +-- .../spark/scheduler/EventLoggingListener.scala | 16 ++--- .../scala/org/apache/spark/util/Utils.scala | 22 ++--- .../apache/spark/deploy/SparkSubmitSuite.scala | 34 .../org/apache/spark/util/UtilsSuite.scala | 10 -- docs/configuration.md | 4 +-- .../spark/deploy/yarn/YarnClusterSuite.scala| 32 ++ 7 files changed, 100 insertions(+), 22 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/66636ef0/core/src/main/scala/org/apache/spark/internal/config/package.scala -- diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index 89aeea4..2f0a306 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -244,8 +244,8 @@ package object config { ConfigBuilder("spark.redaction.regex") .doc("Regex to decide which Spark configuration properties and environment variables in " + "driver and executor environments contain sensitive information. When this regex matches " + -"a property, its value is redacted from the environment UI and various logs like YARN " + -"and event logs.") +"a property key or value, the value is redacted from the environment UI and various logs " + +"like YARN and event logs.") .regexConf .createWithDefault("(?i)secret|password".r) http://git-wip-us.apache.org/repos/asf/spark/blob/66636ef0/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala -- diff --git a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala index aecb3a9..a7dbf87 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala @@ -252,11 +252,17 @@ private[spark] class EventLoggingListener( private[spark] def redactEvent( event: SparkListenerEnvironmentUpdate): SparkListenerEnvironmentUpdate = { -// "Spark Properties" entry will always exist because the map is always populated with it. -val redactedProps = Utils.redact(sparkConf, event.environmentDetails("Spark Properties")) -val redactedEnvironmentDetails = event.environmentDetails + - ("Spark Properties" -> redactedProps) -SparkListenerEnvironmentUpdate(redactedEnvironmentDetails) +// environmentDetails maps a string descriptor to a set of properties +// Similar to: +//
spark git commit: [SPARK-12868][SQL] Allow adding jars from hdfs
Repository: spark Updated Branches: refs/heads/master a277ae80a -> 2ba1eba37 [SPARK-12868][SQL] Allow adding jars from hdfs ## What changes were proposed in this pull request? Spark 2.2 is going to be cut, it'll be great if SPARK-12868 can be resolved before that. There have been several PRs for this like [PR#16324](https://github.com/apache/spark/pull/16324) , but all of them are inactivity for a long time or have been closed. This PR added a SparkUrlStreamHandlerFactory, which relies on 'protocol' to choose the appropriate UrlStreamHandlerFactory like FsUrlStreamHandlerFactory to create URLStreamHandler. ## How was this patch tested? 1. Add a new unit test. 2. Check manually. Before: throw an exception with " failed unknown protocol: hdfs" https://cloud.githubusercontent.com/assets/8546874/24075277/5abe0a7c-0bd5-11e7-900e-ec3d3105da0b.png";> After: https://cloud.githubusercontent.com/assets/8546874/24075283/69382a60-0bd5-11e7-8d30-d9405c3aaaba.png";> Author: Weiqing Yang Closes #17342 from weiqingy/SPARK-18910. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2ba1eba3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2ba1eba3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2ba1eba3 Branch: refs/heads/master Commit: 2ba1eba371213d1ac3d1fa1552e5906e043c2ee4 Parents: a277ae8 Author: Weiqing Yang Authored: Wed Apr 26 13:54:40 2017 -0700 Committer: Marcelo Vanzin Committed: Wed Apr 26 13:54:40 2017 -0700 -- .../org/apache/spark/sql/internal/SharedState.scala| 10 +- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 13 + 2 files changed, 22 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2ba1eba3/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala b/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala index f834569..a93b701 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala @@ -17,12 +17,14 @@ package org.apache.spark.sql.internal +import java.net.URL import java.util.Locale import scala.reflect.ClassTag import scala.util.control.NonFatal import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.FsUrlStreamHandlerFactory import org.apache.spark.{SparkConf, SparkContext, SparkException} import org.apache.spark.internal.Logging @@ -154,7 +156,13 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging { } } -object SharedState { +object SharedState extends Logging { + try { +URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()) + } catch { +case e: Error => + logWarning("URL.setURLStreamHandlerFactory failed to set FsUrlStreamHandlerFactory") + } private val HIVE_EXTERNAL_CATALOG_CLASS_NAME = "org.apache.spark.sql.hive.HiveExternalCatalog" http://git-wip-us.apache.org/repos/asf/spark/blob/2ba1eba3/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index 0dd9296..3ecbf96 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -19,6 +19,7 @@ package org.apache.spark.sql import java.io.File import java.math.MathContext +import java.net.{MalformedURLException, URL} import java.sql.Timestamp import java.util.concurrent.atomic.AtomicBoolean @@ -2606,4 +2607,16 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { case ae: AnalysisException => assert(ae.plan == null && ae.getMessage == ae.getSimpleMessage) } } + + test("SPARK-12868: Allow adding jars from hdfs ") { +val jarFromHdfs = "hdfs://doesnotmatter/test.jar" +val jarFromInvalidFs = "fffs://doesnotmatter/test.jar" + +// if 'hdfs' is not supported, MalformedURLException will be thrown +new URL(jarFromHdfs) + +intercept[MalformedURLException] { + new URL(jarFromInvalidFs) +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-12868][SQL] Allow adding jars from hdfs
Repository: spark Updated Branches: refs/heads/branch-2.2 e278876ba -> b48bb3ab2 [SPARK-12868][SQL] Allow adding jars from hdfs ## What changes were proposed in this pull request? Spark 2.2 is going to be cut, it'll be great if SPARK-12868 can be resolved before that. There have been several PRs for this like [PR#16324](https://github.com/apache/spark/pull/16324) , but all of them are inactivity for a long time or have been closed. This PR added a SparkUrlStreamHandlerFactory, which relies on 'protocol' to choose the appropriate UrlStreamHandlerFactory like FsUrlStreamHandlerFactory to create URLStreamHandler. ## How was this patch tested? 1. Add a new unit test. 2. Check manually. Before: throw an exception with " failed unknown protocol: hdfs" https://cloud.githubusercontent.com/assets/8546874/24075277/5abe0a7c-0bd5-11e7-900e-ec3d3105da0b.png";> After: https://cloud.githubusercontent.com/assets/8546874/24075283/69382a60-0bd5-11e7-8d30-d9405c3aaaba.png";> Author: Weiqing Yang Closes #17342 from weiqingy/SPARK-18910. (cherry picked from commit 2ba1eba371213d1ac3d1fa1552e5906e043c2ee4) Signed-off-by: Marcelo Vanzin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b48bb3ab Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b48bb3ab Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b48bb3ab Branch: refs/heads/branch-2.2 Commit: b48bb3ab2c8134f6b533af29a241dce114076720 Parents: e278876 Author: Weiqing Yang Authored: Wed Apr 26 13:54:40 2017 -0700 Committer: Marcelo Vanzin Committed: Wed Apr 26 13:54:49 2017 -0700 -- .../org/apache/spark/sql/internal/SharedState.scala| 10 +- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 13 + 2 files changed, 22 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b48bb3ab/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala b/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala index f834569..a93b701 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala @@ -17,12 +17,14 @@ package org.apache.spark.sql.internal +import java.net.URL import java.util.Locale import scala.reflect.ClassTag import scala.util.control.NonFatal import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.FsUrlStreamHandlerFactory import org.apache.spark.{SparkConf, SparkContext, SparkException} import org.apache.spark.internal.Logging @@ -154,7 +156,13 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging { } } -object SharedState { +object SharedState extends Logging { + try { +URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()) + } catch { +case e: Error => + logWarning("URL.setURLStreamHandlerFactory failed to set FsUrlStreamHandlerFactory") + } private val HIVE_EXTERNAL_CATALOG_CLASS_NAME = "org.apache.spark.sql.hive.HiveExternalCatalog" http://git-wip-us.apache.org/repos/asf/spark/blob/b48bb3ab/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index 0dd9296..3ecbf96 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -19,6 +19,7 @@ package org.apache.spark.sql import java.io.File import java.math.MathContext +import java.net.{MalformedURLException, URL} import java.sql.Timestamp import java.util.concurrent.atomic.AtomicBoolean @@ -2606,4 +2607,16 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { case ae: AnalysisException => assert(ae.plan == null && ae.getMessage == ae.getSimpleMessage) } } + + test("SPARK-12868: Allow adding jars from hdfs ") { +val jarFromHdfs = "hdfs://doesnotmatter/test.jar" +val jarFromInvalidFs = "fffs://doesnotmatter/test.jar" + +// if 'hdfs' is not supported, MalformedURLException will be thrown +new URL(jarFromHdfs) + +intercept[MalformedURLException] { + new URL(jarFromInvalidFs) +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20474] Fixing OnHeapColumnVector reallocation
Repository: spark Updated Branches: refs/heads/branch-2.2 6709bcf6e -> e278876ba [SPARK-20474] Fixing OnHeapColumnVector reallocation ## What changes were proposed in this pull request? OnHeapColumnVector reallocation copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the ColumnVector.appendX API, while ColumnVector.putX is more commonly used. ## How was this patch tested? Tested using existing unit tests. Author: Michal Szafranski Closes #17773 from michal-databricks/spark-20474. (cherry picked from commit a277ae80a2836e6533b338d2b9c4e59ed8a1daae) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e278876b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e278876b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e278876b Branch: refs/heads/branch-2.2 Commit: e278876ba3d66d3fb249df59c3de8d78ca25c5f0 Parents: 6709bcf Author: Michal Szafranski Authored: Wed Apr 26 12:47:37 2017 -0700 Committer: Reynold Xin Committed: Wed Apr 26 12:47:50 2017 -0700 -- .../vectorized/OnHeapColumnVector.java | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e278876b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java index 9b410ba..94ed322 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java @@ -410,53 +410,53 @@ public final class OnHeapColumnVector extends ColumnVector { int[] newLengths = new int[newCapacity]; int[] newOffsets = new int[newCapacity]; if (this.arrayLengths != null) { -System.arraycopy(this.arrayLengths, 0, newLengths, 0, elementsAppended); -System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, elementsAppended); +System.arraycopy(this.arrayLengths, 0, newLengths, 0, capacity); +System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, capacity); } arrayLengths = newLengths; arrayOffsets = newOffsets; } else if (type instanceof BooleanType) { if (byteData == null || byteData.length < newCapacity) { byte[] newData = new byte[newCapacity]; -if (byteData != null) System.arraycopy(byteData, 0, newData, 0, elementsAppended); +if (byteData != null) System.arraycopy(byteData, 0, newData, 0, capacity); byteData = newData; } } else if (type instanceof ByteType) { if (byteData == null || byteData.length < newCapacity) { byte[] newData = new byte[newCapacity]; -if (byteData != null) System.arraycopy(byteData, 0, newData, 0, elementsAppended); +if (byteData != null) System.arraycopy(byteData, 0, newData, 0, capacity); byteData = newData; } } else if (type instanceof ShortType) { if (shortData == null || shortData.length < newCapacity) { short[] newData = new short[newCapacity]; -if (shortData != null) System.arraycopy(shortData, 0, newData, 0, elementsAppended); +if (shortData != null) System.arraycopy(shortData, 0, newData, 0, capacity); shortData = newData; } } else if (type instanceof IntegerType || type instanceof DateType || DecimalType.is32BitDecimalType(type)) { if (intData == null || intData.length < newCapacity) { int[] newData = new int[newCapacity]; -if (intData != null) System.arraycopy(intData, 0, newData, 0, elementsAppended); +if (intData != null) System.arraycopy(intData, 0, newData, 0, capacity); intData = newData; } } else if (type instanceof LongType || type instanceof TimestampType || DecimalType.is64BitDecimalType(type)) { if (longData == null || longData.length < newCapacity) { long[] newData = new long[newCapacity]; -if (longData != null) System.arraycopy(longData, 0, newData, 0, elementsAppended); +if (longData != null) System.arraycopy(longData, 0, newData, 0, capacity); longData = newData; } } else if (type instanceof FloatType) { if (floatData == null || floatData.length < newCapacity) { float[] newData = new float[newCapacity]; -if (floatData != null) System.arraycopy(floatData, 0, newData, 0, elementsAppended); +if (floatData != null) System.arraycopy(floatData, 0
spark git commit: [SPARK-20474] Fixing OnHeapColumnVector reallocation
Repository: spark Updated Branches: refs/heads/master 99c6cf9ef -> a277ae80a [SPARK-20474] Fixing OnHeapColumnVector reallocation ## What changes were proposed in this pull request? OnHeapColumnVector reallocation copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the ColumnVector.appendX API, while ColumnVector.putX is more commonly used. ## How was this patch tested? Tested using existing unit tests. Author: Michal Szafranski Closes #17773 from michal-databricks/spark-20474. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a277ae80 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a277ae80 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a277ae80 Branch: refs/heads/master Commit: a277ae80a2836e6533b338d2b9c4e59ed8a1daae Parents: 99c6cf9 Author: Michal Szafranski Authored: Wed Apr 26 12:47:37 2017 -0700 Committer: Reynold Xin Committed: Wed Apr 26 12:47:37 2017 -0700 -- .../vectorized/OnHeapColumnVector.java | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a277ae80/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java index 9b410ba..94ed322 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java @@ -410,53 +410,53 @@ public final class OnHeapColumnVector extends ColumnVector { int[] newLengths = new int[newCapacity]; int[] newOffsets = new int[newCapacity]; if (this.arrayLengths != null) { -System.arraycopy(this.arrayLengths, 0, newLengths, 0, elementsAppended); -System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, elementsAppended); +System.arraycopy(this.arrayLengths, 0, newLengths, 0, capacity); +System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, capacity); } arrayLengths = newLengths; arrayOffsets = newOffsets; } else if (type instanceof BooleanType) { if (byteData == null || byteData.length < newCapacity) { byte[] newData = new byte[newCapacity]; -if (byteData != null) System.arraycopy(byteData, 0, newData, 0, elementsAppended); +if (byteData != null) System.arraycopy(byteData, 0, newData, 0, capacity); byteData = newData; } } else if (type instanceof ByteType) { if (byteData == null || byteData.length < newCapacity) { byte[] newData = new byte[newCapacity]; -if (byteData != null) System.arraycopy(byteData, 0, newData, 0, elementsAppended); +if (byteData != null) System.arraycopy(byteData, 0, newData, 0, capacity); byteData = newData; } } else if (type instanceof ShortType) { if (shortData == null || shortData.length < newCapacity) { short[] newData = new short[newCapacity]; -if (shortData != null) System.arraycopy(shortData, 0, newData, 0, elementsAppended); +if (shortData != null) System.arraycopy(shortData, 0, newData, 0, capacity); shortData = newData; } } else if (type instanceof IntegerType || type instanceof DateType || DecimalType.is32BitDecimalType(type)) { if (intData == null || intData.length < newCapacity) { int[] newData = new int[newCapacity]; -if (intData != null) System.arraycopy(intData, 0, newData, 0, elementsAppended); +if (intData != null) System.arraycopy(intData, 0, newData, 0, capacity); intData = newData; } } else if (type instanceof LongType || type instanceof TimestampType || DecimalType.is64BitDecimalType(type)) { if (longData == null || longData.length < newCapacity) { long[] newData = new long[newCapacity]; -if (longData != null) System.arraycopy(longData, 0, newData, 0, elementsAppended); +if (longData != null) System.arraycopy(longData, 0, newData, 0, capacity); longData = newData; } } else if (type instanceof FloatType) { if (floatData == null || floatData.length < newCapacity) { float[] newData = new float[newCapacity]; -if (floatData != null) System.arraycopy(floatData, 0, newData, 0, elementsAppended); +if (floatData != null) System.arraycopy(floatData, 0, newData, 0, capacity); floatData = newData; } } else if (type instanceof DoubleTyp
spark git commit: [SPARK-20473] Enabling missing types in ColumnVector.Array
Repository: spark Updated Branches: refs/heads/branch-2.2 b65858bb3 -> 6709bcf6e [SPARK-20473] Enabling missing types in ColumnVector.Array ## What changes were proposed in this pull request? ColumnVector implementations originally did not support some Catalyst types (float, short, and boolean). Now that they do, those types should be also added to the ColumnVector.Array. ## How was this patch tested? Tested using existing unit tests. Author: Michal Szafranski Closes #17772 from michal-databricks/spark-20473. (cherry picked from commit 99c6cf9ef16bf8fae6edb23a62e46546a16bca80) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6709bcf6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6709bcf6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6709bcf6 Branch: refs/heads/branch-2.2 Commit: 6709bcf6e66e99e17ba2a3b1482df2dba1a15716 Parents: b65858b Author: Michal Szafranski Authored: Wed Apr 26 11:21:25 2017 -0700 Committer: Reynold Xin Committed: Wed Apr 26 11:21:57 2017 -0700 -- .../apache/spark/sql/execution/vectorized/ColumnVector.java| 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6709bcf6/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java index 354c878..b105e60 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java @@ -180,7 +180,7 @@ public abstract class ColumnVector implements AutoCloseable { @Override public boolean getBoolean(int ordinal) { - throw new UnsupportedOperationException(); + return data.getBoolean(offset + ordinal); } @Override @@ -188,7 +188,7 @@ public abstract class ColumnVector implements AutoCloseable { @Override public short getShort(int ordinal) { - throw new UnsupportedOperationException(); + return data.getShort(offset + ordinal); } @Override @@ -199,7 +199,7 @@ public abstract class ColumnVector implements AutoCloseable { @Override public float getFloat(int ordinal) { - throw new UnsupportedOperationException(); + return data.getFloat(offset + ordinal); } @Override - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20473] Enabling missing types in ColumnVector.Array
Repository: spark Updated Branches: refs/heads/master 66dd5b83f -> 99c6cf9ef [SPARK-20473] Enabling missing types in ColumnVector.Array ## What changes were proposed in this pull request? ColumnVector implementations originally did not support some Catalyst types (float, short, and boolean). Now that they do, those types should be also added to the ColumnVector.Array. ## How was this patch tested? Tested using existing unit tests. Author: Michal Szafranski Closes #17772 from michal-databricks/spark-20473. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/99c6cf9e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/99c6cf9e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/99c6cf9e Branch: refs/heads/master Commit: 99c6cf9ef16bf8fae6edb23a62e46546a16bca80 Parents: 66dd5b8 Author: Michal Szafranski Authored: Wed Apr 26 11:21:25 2017 -0700 Committer: Reynold Xin Committed: Wed Apr 26 11:21:25 2017 -0700 -- .../apache/spark/sql/execution/vectorized/ColumnVector.java| 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/99c6cf9e/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java index 354c878..b105e60 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java @@ -180,7 +180,7 @@ public abstract class ColumnVector implements AutoCloseable { @Override public boolean getBoolean(int ordinal) { - throw new UnsupportedOperationException(); + return data.getBoolean(offset + ordinal); } @Override @@ -188,7 +188,7 @@ public abstract class ColumnVector implements AutoCloseable { @Override public short getShort(int ordinal) { - throw new UnsupportedOperationException(); + return data.getShort(offset + ordinal); } @Override @@ -199,7 +199,7 @@ public abstract class ColumnVector implements AutoCloseable { @Override public float getFloat(int ordinal) { - throw new UnsupportedOperationException(); + return data.getFloat(offset + ordinal); } @Override - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20391][CORE] Rename memory related fields in ExecutorSummay
Repository: spark Updated Branches: refs/heads/master dbb06c689 -> 66dd5b83f [SPARK-20391][CORE] Rename memory related fields in ExecutorSummay ## What changes were proposed in this pull request? This is a follow-up of #14617 to make the name of memory related fields more meaningful. Here for the backward compatibility, I didn't change `maxMemory` and `memoryUsed` fields. ## How was this patch tested? Existing UT and local verification. CC squito and tgravescs . Author: jerryshao Closes #17700 from jerryshao/SPARK-20391. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/66dd5b83 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/66dd5b83 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/66dd5b83 Branch: refs/heads/master Commit: 66dd5b83ff95d5f91f37dcdf6aac89faa0b871c5 Parents: dbb06c6 Author: jerryshao Authored: Wed Apr 26 09:01:50 2017 -0500 Committer: Imran Rashid Committed: Wed Apr 26 09:01:50 2017 -0500 -- .../org/apache/spark/ui/static/executorspage.js | 48 +- .../org/apache/spark/status/api/v1/api.scala| 11 +++-- .../apache/spark/ui/exec/ExecutorsPage.scala| 21 .../executor_memory_usage_expectation.json | 51 .../executor_node_blacklisting_expectation.json | 51 5 files changed, 105 insertions(+), 77 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/66dd5b83/core/src/main/resources/org/apache/spark/ui/static/executorspage.js -- diff --git a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js index 930a069..cb9922d 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js +++ b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js @@ -253,10 +253,14 @@ $(document).ready(function () { var deadTotalBlacklisted = 0; response.forEach(function (exec) { -exec.onHeapMemoryUsed = exec.hasOwnProperty('onHeapMemoryUsed') ? exec.onHeapMemoryUsed : 0; -exec.maxOnHeapMemory = exec.hasOwnProperty('maxOnHeapMemory') ? exec.maxOnHeapMemory : 0; -exec.offHeapMemoryUsed = exec.hasOwnProperty('offHeapMemoryUsed') ? exec.offHeapMemoryUsed : 0; -exec.maxOffHeapMemory = exec.hasOwnProperty('maxOffHeapMemory') ? exec.maxOffHeapMemory : 0; +var memoryMetrics = { +usedOnHeapStorageMemory: 0, +usedOffHeapStorageMemory: 0, +totalOnHeapStorageMemory: 0, +totalOffHeapStorageMemory: 0 +}; + +exec.memoryMetrics = exec.hasOwnProperty('memoryMetrics') ? exec.memoryMetrics : memoryMetrics; }); response.forEach(function (exec) { @@ -264,10 +268,10 @@ $(document).ready(function () { allRDDBlocks += exec.rddBlocks; allMemoryUsed += exec.memoryUsed; allMaxMemory += exec.maxMemory; -allOnHeapMemoryUsed += exec.onHeapMemoryUsed; -allOnHeapMaxMemory += exec.maxOnHeapMemory; -allOffHeapMemoryUsed += exec.offHeapMemoryUsed; -allOffHeapMaxMemory += exec.maxOffHeapMemory; +allOnHeapMemoryUsed += exec.memoryMetrics.usedOnHeapStorageMemory; +allOnHeapMaxMemory += exec.memoryMetrics.totalOnHeapStorageMemory; +allOffHeapMemoryUsed += exec.memoryMetrics.usedOffHeapStorageMemory; +allOffHeapMaxMemory += exec.memoryMetrics.totalOffHeapStorageMemory; allDiskUsed += exec.diskUsed; allTotalCores += exec.totalCores; allMaxTasks += exec.maxTasks; @@ -286,10 +290,10 @@ $(document).ready(function () { activeRDDBlocks += exec.rddBlocks; activeMemoryUsed += exec.memoryUsed; activeMaxMemory += exec.maxMemory; -activeOnHeapMemoryUsed += exec.onHeapMemoryUsed; -activeOnHeapMaxMemory += exec.maxOnHeapMemory; -activeOffHeapMemoryUsed += exec.offHeapMemoryUsed; -activeOffHeapMaxMemory += exec.maxOffHeapMemory; +activeOnHeapMemoryUsed += exec.memoryMetrics.usedOnHeapStorageMemory; +activeOnHeapMaxMemory += exec.memoryMetrics.totalOnHeapStorageMemory; +activeOffHeapMemoryUsed += exec.memoryMetrics.usedOffHeapStorageMemory; +activeOffHeapMaxMemory += exec.memoryMetrics.totalOffHeapStorageMemory; activeDiskUsed += exec.diskUse
spark git commit: [SPARK-20391][CORE] Rename memory related fields in ExecutorSummay
Repository: spark Updated Branches: refs/heads/branch-2.2 34dec68d7 -> b65858bb3 [SPARK-20391][CORE] Rename memory related fields in ExecutorSummay ## What changes were proposed in this pull request? This is a follow-up of #14617 to make the name of memory related fields more meaningful. Here for the backward compatibility, I didn't change `maxMemory` and `memoryUsed` fields. ## How was this patch tested? Existing UT and local verification. CC squito and tgravescs . Author: jerryshao Closes #17700 from jerryshao/SPARK-20391. (cherry picked from commit 66dd5b83ff95d5f91f37dcdf6aac89faa0b871c5) Signed-off-by: Imran Rashid Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b65858bb Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b65858bb Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b65858bb Branch: refs/heads/branch-2.2 Commit: b65858bb3cb8e69b1f73f5f2c76a7cd335120695 Parents: 34dec68 Author: jerryshao Authored: Wed Apr 26 09:01:50 2017 -0500 Committer: Imran Rashid Committed: Wed Apr 26 09:02:13 2017 -0500 -- .../org/apache/spark/ui/static/executorspage.js | 48 +- .../org/apache/spark/status/api/v1/api.scala| 11 +++-- .../apache/spark/ui/exec/ExecutorsPage.scala| 21 .../executor_memory_usage_expectation.json | 51 .../executor_node_blacklisting_expectation.json | 51 5 files changed, 105 insertions(+), 77 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b65858bb/core/src/main/resources/org/apache/spark/ui/static/executorspage.js -- diff --git a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js index 930a069..cb9922d 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js +++ b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js @@ -253,10 +253,14 @@ $(document).ready(function () { var deadTotalBlacklisted = 0; response.forEach(function (exec) { -exec.onHeapMemoryUsed = exec.hasOwnProperty('onHeapMemoryUsed') ? exec.onHeapMemoryUsed : 0; -exec.maxOnHeapMemory = exec.hasOwnProperty('maxOnHeapMemory') ? exec.maxOnHeapMemory : 0; -exec.offHeapMemoryUsed = exec.hasOwnProperty('offHeapMemoryUsed') ? exec.offHeapMemoryUsed : 0; -exec.maxOffHeapMemory = exec.hasOwnProperty('maxOffHeapMemory') ? exec.maxOffHeapMemory : 0; +var memoryMetrics = { +usedOnHeapStorageMemory: 0, +usedOffHeapStorageMemory: 0, +totalOnHeapStorageMemory: 0, +totalOffHeapStorageMemory: 0 +}; + +exec.memoryMetrics = exec.hasOwnProperty('memoryMetrics') ? exec.memoryMetrics : memoryMetrics; }); response.forEach(function (exec) { @@ -264,10 +268,10 @@ $(document).ready(function () { allRDDBlocks += exec.rddBlocks; allMemoryUsed += exec.memoryUsed; allMaxMemory += exec.maxMemory; -allOnHeapMemoryUsed += exec.onHeapMemoryUsed; -allOnHeapMaxMemory += exec.maxOnHeapMemory; -allOffHeapMemoryUsed += exec.offHeapMemoryUsed; -allOffHeapMaxMemory += exec.maxOffHeapMemory; +allOnHeapMemoryUsed += exec.memoryMetrics.usedOnHeapStorageMemory; +allOnHeapMaxMemory += exec.memoryMetrics.totalOnHeapStorageMemory; +allOffHeapMemoryUsed += exec.memoryMetrics.usedOffHeapStorageMemory; +allOffHeapMaxMemory += exec.memoryMetrics.totalOffHeapStorageMemory; allDiskUsed += exec.diskUsed; allTotalCores += exec.totalCores; allMaxTasks += exec.maxTasks; @@ -286,10 +290,10 @@ $(document).ready(function () { activeRDDBlocks += exec.rddBlocks; activeMemoryUsed += exec.memoryUsed; activeMaxMemory += exec.maxMemory; -activeOnHeapMemoryUsed += exec.onHeapMemoryUsed; -activeOnHeapMaxMemory += exec.maxOnHeapMemory; -activeOffHeapMemoryUsed += exec.offHeapMemoryUsed; -activeOffHeapMaxMemory += exec.maxOffHeapMemory; +activeOnHeapMemoryUsed += exec.memoryMetrics.usedOnHeapStorageMemory; +activeOnHeapMaxMemory += exec.memoryMetrics.totalOnHeapStorageMemory; +activeOffHeapMemoryUsed += exec.memoryMetrics.usedOffHeapStorageMemory; +activeOffHeapMaxM
spark git commit: [MINOR][ML] Fix some PySpark & SparkR flaky tests
Repository: spark Updated Branches: refs/heads/branch-2.2 612952251 -> 34dec68d7 [MINOR][ML] Fix some PySpark & SparkR flaky tests ## What changes were proposed in this pull request? Some PySpark & SparkR tests run with tiny dataset and tiny ```maxIter```, which means they are not converged. I donât think checking intermediate result during iteration make sense, and these intermediate result may vulnerable and not stable, so we should switch to check the converged result. We hit this issue at #17746 when we upgrade breeze to 0.13.1. ## How was this patch tested? Existing tests. Author: Yanbo Liang Closes #17757 from yanboliang/flaky-test. (cherry picked from commit dbb06c689c157502cb081421baecce411832aad8) Signed-off-by: Yanbo Liang Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/34dec68d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/34dec68d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/34dec68d Branch: refs/heads/branch-2.2 Commit: 34dec68d7eb647d997fdb27fe65d579c74b39e58 Parents: 6129522 Author: Yanbo Liang Authored: Wed Apr 26 21:34:18 2017 +0800 Committer: Yanbo Liang Committed: Wed Apr 26 21:34:35 2017 +0800 -- .../tests/testthat/test_mllib_classification.R | 17 + python/pyspark/ml/classification.py | 71 ++-- 2 files changed, 38 insertions(+), 50 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/34dec68d/R/pkg/inst/tests/testthat/test_mllib_classification.R -- diff --git a/R/pkg/inst/tests/testthat/test_mllib_classification.R b/R/pkg/inst/tests/testthat/test_mllib_classification.R index af7cbdc..cbc7087 100644 --- a/R/pkg/inst/tests/testthat/test_mllib_classification.R +++ b/R/pkg/inst/tests/testthat/test_mllib_classification.R @@ -284,22 +284,11 @@ test_that("spark.mlp", { c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", "1.0", "0.0")) # test initialWeights - model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, initialWeights = + model <- spark.mlp(df, label ~ features, layers = c(4, 3), initialWeights = c(0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 9, 9, 9, 9, 9)) mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction")) expect_equal(head(mlpPredictions$prediction, 10), - c("1.0", "1.0", "2.0", "1.0", "2.0", "1.0", "2.0", "2.0", "1.0", "0.0")) - - model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, initialWeights = -c(0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 9.0, 9.0, 9.0, 9.0, 9.0)) - mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction")) - expect_equal(head(mlpPredictions$prediction, 10), - c("1.0", "1.0", "2.0", "1.0", "2.0", "1.0", "2.0", "2.0", "1.0", "0.0")) - - model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2) - mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction")) - expect_equal(head(mlpPredictions$prediction, 10), - c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0")) + c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", "1.0", "0.0")) # Test formula works well df <- suppressWarnings(createDataFrame(iris)) @@ -310,8 +299,6 @@ test_that("spark.mlp", { expect_equal(summary$numOfOutputs, 3) expect_equal(summary$layers, c(4, 3)) expect_equal(length(summary$weights), 15) - expect_equal(head(summary$weights, 5), list(-0.5793153, -4.652961, 6.216155, -6.649478, - -10.51147), tolerance = 1e-3) }) test_that("spark.naiveBayes", { http://git-wip-us.apache.org/repos/asf/spark/blob/34dec68d/python/pyspark/ml/classification.py -- diff --git a/python/pyspark/ml/classification.py b/python/pyspark/ml/classification.py index 8649683..a9756ea 100644 --- a/python/pyspark/ml/classification.py +++ b/python/pyspark/ml/classification.py @@ -185,34 +185,33 @@ class LogisticRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredicti >>> from pyspark.sql import Row >>> from pyspark.ml.linalg import Vectors >>> bdf = sc.parallelize([ -... Row(label=1.0, weight=2.0, features=Vectors.dense(1.0)), -... Row(label=0.0, weight=2.0, features=Vectors.sparse(1, [], []))]).toDF() ->>> blor = LogisticRegression(maxIter=5, regParam=0.01, weightCol="weight") +... Row(label=1.0, weight=1.0, features=Vectors.dense(0.0, 5.0)), +... Row(label=0.0, weight=2.0, features=Vectors.dense(1.0, 2.0)), +... Row(label=1.0, weight=3.0, features=Vectors.dense(2.0, 1.0)), +... Row(label=0.0, weight=4.0, features=Vectors.dense(3.0, 3.0))]).toDF() +
spark git commit: [MINOR][ML] Fix some PySpark & SparkR flaky tests
Repository: spark Updated Branches: refs/heads/master 7fecf5130 -> dbb06c689 [MINOR][ML] Fix some PySpark & SparkR flaky tests ## What changes were proposed in this pull request? Some PySpark & SparkR tests run with tiny dataset and tiny ```maxIter```, which means they are not converged. I donât think checking intermediate result during iteration make sense, and these intermediate result may vulnerable and not stable, so we should switch to check the converged result. We hit this issue at #17746 when we upgrade breeze to 0.13.1. ## How was this patch tested? Existing tests. Author: Yanbo Liang Closes #17757 from yanboliang/flaky-test. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dbb06c68 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dbb06c68 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dbb06c68 Branch: refs/heads/master Commit: dbb06c689c157502cb081421baecce411832aad8 Parents: 7fecf51 Author: Yanbo Liang Authored: Wed Apr 26 21:34:18 2017 +0800 Committer: Yanbo Liang Committed: Wed Apr 26 21:34:18 2017 +0800 -- .../tests/testthat/test_mllib_classification.R | 17 + python/pyspark/ml/classification.py | 71 ++-- 2 files changed, 38 insertions(+), 50 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/dbb06c68/R/pkg/inst/tests/testthat/test_mllib_classification.R -- diff --git a/R/pkg/inst/tests/testthat/test_mllib_classification.R b/R/pkg/inst/tests/testthat/test_mllib_classification.R index af7cbdc..cbc7087 100644 --- a/R/pkg/inst/tests/testthat/test_mllib_classification.R +++ b/R/pkg/inst/tests/testthat/test_mllib_classification.R @@ -284,22 +284,11 @@ test_that("spark.mlp", { c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", "1.0", "0.0")) # test initialWeights - model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, initialWeights = + model <- spark.mlp(df, label ~ features, layers = c(4, 3), initialWeights = c(0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 9, 9, 9, 9, 9)) mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction")) expect_equal(head(mlpPredictions$prediction, 10), - c("1.0", "1.0", "2.0", "1.0", "2.0", "1.0", "2.0", "2.0", "1.0", "0.0")) - - model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, initialWeights = -c(0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 9.0, 9.0, 9.0, 9.0, 9.0)) - mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction")) - expect_equal(head(mlpPredictions$prediction, 10), - c("1.0", "1.0", "2.0", "1.0", "2.0", "1.0", "2.0", "2.0", "1.0", "0.0")) - - model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2) - mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction")) - expect_equal(head(mlpPredictions$prediction, 10), - c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0")) + c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", "1.0", "0.0")) # Test formula works well df <- suppressWarnings(createDataFrame(iris)) @@ -310,8 +299,6 @@ test_that("spark.mlp", { expect_equal(summary$numOfOutputs, 3) expect_equal(summary$layers, c(4, 3)) expect_equal(length(summary$weights), 15) - expect_equal(head(summary$weights, 5), list(-0.5793153, -4.652961, 6.216155, -6.649478, - -10.51147), tolerance = 1e-3) }) test_that("spark.naiveBayes", { http://git-wip-us.apache.org/repos/asf/spark/blob/dbb06c68/python/pyspark/ml/classification.py -- diff --git a/python/pyspark/ml/classification.py b/python/pyspark/ml/classification.py index 8649683..a9756ea 100644 --- a/python/pyspark/ml/classification.py +++ b/python/pyspark/ml/classification.py @@ -185,34 +185,33 @@ class LogisticRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredicti >>> from pyspark.sql import Row >>> from pyspark.ml.linalg import Vectors >>> bdf = sc.parallelize([ -... Row(label=1.0, weight=2.0, features=Vectors.dense(1.0)), -... Row(label=0.0, weight=2.0, features=Vectors.sparse(1, [], []))]).toDF() ->>> blor = LogisticRegression(maxIter=5, regParam=0.01, weightCol="weight") +... Row(label=1.0, weight=1.0, features=Vectors.dense(0.0, 5.0)), +... Row(label=0.0, weight=2.0, features=Vectors.dense(1.0, 2.0)), +... Row(label=1.0, weight=3.0, features=Vectors.dense(2.0, 1.0)), +... Row(label=0.0, weight=4.0, features=Vectors.dense(3.0, 3.0))]).toDF() +>>> blor = LogisticRegression(regParam=0.01, weightCol="weight") >>> blorModel = blor.fit(bdf)
spark git commit: [SPARK-19812] YARN shuffle service fails to relocate recovery DB acro…
Repository: spark Updated Branches: refs/heads/master 7a365257e -> 7fecf5130 [SPARK-19812] YARN shuffle service fails to relocate recovery DB acro⦠â¦ss NFS directories ## What changes were proposed in this pull request? Change from using java Files.move to use Hadoop filesystem operations to move the directories. The java Files.move does not work when moving directories across NFS mounts and in fact also says that if the directory has entries you should do a recursive move. We are already using Hadoop filesystem here so just use the local filesystem from there as it handles this properly. Note that the DB here is actually a directory of files and not just a single file, hence the change in the name of the local var. ## How was this patch tested? Ran YarnShuffleServiceSuite unit tests. Unfortunately couldn't easily add one here since involves NFS. Ran manual tests to verify that the DB directories were properly moved across NFS mounted directories. Have been running this internally for weeks. Author: Tom Graves Closes #17748 from tgravescs/SPARK-19812. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7fecf513 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7fecf513 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7fecf513 Branch: refs/heads/master Commit: 7fecf5130163df9c204a2764d121a7011d007f4e Parents: 7a36525 Author: Tom Graves Authored: Wed Apr 26 08:23:31 2017 -0500 Committer: Tom Graves Committed: Wed Apr 26 08:23:31 2017 -0500 -- .../spark/network/yarn/YarnShuffleService.java | 23 +++- 1 file changed, 13 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7fecf513/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java -- diff --git a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java index c7620d0..4acc203 100644 --- a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java +++ b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java @@ -21,7 +21,6 @@ import java.io.File; import java.io.IOException; import java.nio.charset.StandardCharsets; import java.nio.ByteBuffer; -import java.nio.file.Files; import java.util.List; import java.util.Map; @@ -340,9 +339,9 @@ public class YarnShuffleService extends AuxiliaryService { * when it previously was not. If YARN NM recovery is enabled it uses that path, otherwise * it will uses a YARN local dir. */ - protected File initRecoveryDb(String dbFileName) { + protected File initRecoveryDb(String dbName) { if (_recoveryPath != null) { -File recoveryFile = new File(_recoveryPath.toUri().getPath(), dbFileName); +File recoveryFile = new File(_recoveryPath.toUri().getPath(), dbName); if (recoveryFile.exists()) { return recoveryFile; } @@ -350,7 +349,7 @@ public class YarnShuffleService extends AuxiliaryService { // db doesn't exist in recovery path go check local dirs for it String[] localDirs = _conf.getTrimmedStrings("yarn.nodemanager.local-dirs"); for (String dir : localDirs) { - File f = new File(new Path(dir).toUri().getPath(), dbFileName); + File f = new File(new Path(dir).toUri().getPath(), dbName); if (f.exists()) { if (_recoveryPath == null) { // If NM recovery is not enabled, we should specify the recovery path using NM local @@ -363,17 +362,21 @@ public class YarnShuffleService extends AuxiliaryService { // make sure to move all DBs to the recovery path from the old NM local dirs. // If another DB was initialized first just make sure all the DBs are in the same // location. - File newLoc = new File(_recoveryPath.toUri().getPath(), dbFileName); - if (!newLoc.equals(f)) { + Path newLoc = new Path(_recoveryPath, dbName); + Path copyFrom = new Path(f.toURI()); + if (!newLoc.equals(copyFrom)) { +logger.info("Moving " + copyFrom + " to: " + newLoc); try { - Files.move(f.toPath(), newLoc.toPath()); + // The move here needs to handle moving non-empty directories across NFS mounts + FileSystem fs = FileSystem.getLocal(_conf); + fs.rename(copyFrom, newLoc); } catch (Exception e) { // Fail to move recovery file to new path, just continue on with new DB location logger.error("Failed to move recovery file {} to the path {}", -dbFileName,
spark git commit: [SPARK-19812] YARN shuffle service fails to relocate recovery DB acro…
Repository: spark Updated Branches: refs/heads/branch-2.2 a2f5ced32 -> 612952251 [SPARK-19812] YARN shuffle service fails to relocate recovery DB acro⦠â¦ss NFS directories ## What changes were proposed in this pull request? Change from using java Files.move to use Hadoop filesystem operations to move the directories. The java Files.move does not work when moving directories across NFS mounts and in fact also says that if the directory has entries you should do a recursive move. We are already using Hadoop filesystem here so just use the local filesystem from there as it handles this properly. Note that the DB here is actually a directory of files and not just a single file, hence the change in the name of the local var. ## How was this patch tested? Ran YarnShuffleServiceSuite unit tests. Unfortunately couldn't easily add one here since involves NFS. Ran manual tests to verify that the DB directories were properly moved across NFS mounted directories. Have been running this internally for weeks. Author: Tom Graves Closes #17748 from tgravescs/SPARK-19812. (cherry picked from commit 7fecf5130163df9c204a2764d121a7011d007f4e) Signed-off-by: Tom Graves Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/61295225 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/61295225 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/61295225 Branch: refs/heads/branch-2.2 Commit: 612952251c5ac626e256bc2ab9414faf1662dde9 Parents: a2f5ced Author: Tom Graves Authored: Wed Apr 26 08:23:31 2017 -0500 Committer: Tom Graves Committed: Wed Apr 26 08:24:12 2017 -0500 -- .../spark/network/yarn/YarnShuffleService.java | 23 +++- 1 file changed, 13 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/61295225/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java -- diff --git a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java index c7620d0..4acc203 100644 --- a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java +++ b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java @@ -21,7 +21,6 @@ import java.io.File; import java.io.IOException; import java.nio.charset.StandardCharsets; import java.nio.ByteBuffer; -import java.nio.file.Files; import java.util.List; import java.util.Map; @@ -340,9 +339,9 @@ public class YarnShuffleService extends AuxiliaryService { * when it previously was not. If YARN NM recovery is enabled it uses that path, otherwise * it will uses a YARN local dir. */ - protected File initRecoveryDb(String dbFileName) { + protected File initRecoveryDb(String dbName) { if (_recoveryPath != null) { -File recoveryFile = new File(_recoveryPath.toUri().getPath(), dbFileName); +File recoveryFile = new File(_recoveryPath.toUri().getPath(), dbName); if (recoveryFile.exists()) { return recoveryFile; } @@ -350,7 +349,7 @@ public class YarnShuffleService extends AuxiliaryService { // db doesn't exist in recovery path go check local dirs for it String[] localDirs = _conf.getTrimmedStrings("yarn.nodemanager.local-dirs"); for (String dir : localDirs) { - File f = new File(new Path(dir).toUri().getPath(), dbFileName); + File f = new File(new Path(dir).toUri().getPath(), dbName); if (f.exists()) { if (_recoveryPath == null) { // If NM recovery is not enabled, we should specify the recovery path using NM local @@ -363,17 +362,21 @@ public class YarnShuffleService extends AuxiliaryService { // make sure to move all DBs to the recovery path from the old NM local dirs. // If another DB was initialized first just make sure all the DBs are in the same // location. - File newLoc = new File(_recoveryPath.toUri().getPath(), dbFileName); - if (!newLoc.equals(f)) { + Path newLoc = new Path(_recoveryPath, dbName); + Path copyFrom = new Path(f.toURI()); + if (!newLoc.equals(copyFrom)) { +logger.info("Moving " + copyFrom + " to: " + newLoc); try { - Files.move(f.toPath(), newLoc.toPath()); + // The move here needs to handle moving non-empty directories across NFS mounts + FileSystem fs = FileSystem.getLocal(_conf); + fs.rename(copyFrom, newLoc); } catch (Exception e) { // Fail to move recovery file to new path, just continue on with new DB location
spark git commit: [SPARK-20400][DOCS] Remove References to 3rd Party Vendor Tools
Repository: spark Updated Branches: refs/heads/branch-2.2 c8803c068 -> a2f5ced32 [SPARK-20400][DOCS] Remove References to 3rd Party Vendor Tools ## What changes were proposed in this pull request? Simple documentation change to remove explicit vendor references. ## How was this patch tested? NA Please review http://spark.apache.org/contributing.html before opening a pull request. Author: anabranch Closes #17695 from anabranch/remove-vendor. (cherry picked from commit 7a365257e934e838bd90f6a0c50362bf47202b0e) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a2f5ced3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a2f5ced3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a2f5ced3 Branch: refs/heads/branch-2.2 Commit: a2f5ced3236db665bb33adc1bf1f90553997f46b Parents: c8803c0 Author: anabranch Authored: Wed Apr 26 09:49:05 2017 +0100 Committer: Sean Owen Committed: Wed Apr 26 09:49:13 2017 +0100 -- docs/configuration.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a2f5ced3/docs/configuration.md -- diff --git a/docs/configuration.md b/docs/configuration.md index 87b7632..8b53e92 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -2270,8 +2270,8 @@ should be included on Spark's classpath: * `hdfs-site.xml`, which provides default behaviors for the HDFS client. * `core-site.xml`, which sets the default filesystem name. -The location of these configuration files varies across CDH and HDP versions, but -a common location is inside of `/etc/hadoop/conf`. Some tools, such as Cloudera Manager, create +The location of these configuration files varies across Hadoop versions, but +a common location is inside of `/etc/hadoop/conf`. Some tools create configurations on-the-fly, but offer a mechanisms to download copies of them. To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/spark-env.sh` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20400][DOCS] Remove References to 3rd Party Vendor Tools
Repository: spark Updated Branches: refs/heads/master df58a95a3 -> 7a365257e [SPARK-20400][DOCS] Remove References to 3rd Party Vendor Tools ## What changes were proposed in this pull request? Simple documentation change to remove explicit vendor references. ## How was this patch tested? NA Please review http://spark.apache.org/contributing.html before opening a pull request. Author: anabranch Closes #17695 from anabranch/remove-vendor. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7a365257 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7a365257 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7a365257 Branch: refs/heads/master Commit: 7a365257e934e838bd90f6a0c50362bf47202b0e Parents: df58a95 Author: anabranch Authored: Wed Apr 26 09:49:05 2017 +0100 Committer: Sean Owen Committed: Wed Apr 26 09:49:05 2017 +0100 -- docs/configuration.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7a365257/docs/configuration.md -- diff --git a/docs/configuration.md b/docs/configuration.md index 87b7632..8b53e92 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -2270,8 +2270,8 @@ should be included on Spark's classpath: * `hdfs-site.xml`, which provides default behaviors for the HDFS client. * `core-site.xml`, which sets the default filesystem name. -The location of these configuration files varies across CDH and HDP versions, but -a common location is inside of `/etc/hadoop/conf`. Some tools, such as Cloudera Manager, create +The location of these configuration files varies across Hadoop versions, but +a common location is inside of `/etc/hadoop/conf`. Some tools create configurations on-the-fly, but offer a mechanisms to download copies of them. To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/spark-env.sh` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org