spark-website git commit: Update committer page
Repository: spark-website Updated Branches: refs/heads/asf-site 114925632 -> f524d4f53 Update committer page Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/f524d4f5 Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/f524d4f5 Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/f524d4f5 Branch: refs/heads/asf-site Commit: f524d4f53dde007b6283eb7e7511620273b6262b Parents: 1149256 Author: hyukjinkwon Authored: Mon Apr 2 17:14:03 2018 +0800 Committer: hyukjinkwon Committed: Tue Apr 3 12:39:41 2018 +0800 -- committers.md| 2 +- site/committers.html | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark-website/blob/f524d4f5/committers.md -- diff --git a/committers.md b/committers.md index 299a160..3456a43 100644 --- a/committers.md +++ b/committers.md @@ -36,7 +36,7 @@ navigation: |Holden Karau|IBM| |Cody Koeninger|Nexstar Digital| |Andy Konwinski|Databricks| -|Hyukjin Kwon|Mobigen| +|Hyukjin Kwon|Hortonworks| |Ryan LeCompte|Quantifind| |Haoyuan Li|Alluxio, UC Berkeley| |Xiao Li|Databricks| http://git-wip-us.apache.org/repos/asf/spark-website/blob/f524d4f5/site/committers.html -- diff --git a/site/committers.html b/site/committers.html index 7996091..ffca33e 100644 --- a/site/committers.html +++ b/site/committers.html @@ -311,7 +311,7 @@ Hyukjin Kwon - Mobigen + Hortonworks Ryan LeCompte - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark-website git commit: add committer
Repository: spark-website Updated Branches: refs/heads/asf-site a1d84bcbf -> 114925632 add committer Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/11492563 Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/11492563 Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/11492563 Branch: refs/heads/asf-site Commit: 114925632af194d6dd7f2ca253c547e79aeb9364 Parents: a1d84bc Author: Zhenhua Wang Authored: Mon Apr 2 23:10:31 2018 +0800 Committer: hyukjinkwon Committed: Tue Apr 3 12:34:10 2018 +0800 -- committers.md| 1 + site/committers.html | 4 2 files changed, 5 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark-website/blob/11492563/committers.md -- diff --git a/committers.md b/committers.md index 38fb3b0..299a160 100644 --- a/committers.md +++ b/committers.md @@ -64,6 +64,7 @@ navigation: |Takuya Ueshin|Databricks| |Marcelo Vanzin|Cloudera| |Shivaram Venkataraman|UC Berkeley| +|Zhenhua Wang|Huawei| |Patrick Wendell|Databricks| |Andrew Xia|Alibaba| |Reynold Xin|Databricks| http://git-wip-us.apache.org/repos/asf/spark-website/blob/11492563/site/committers.html -- diff --git a/site/committers.html b/site/committers.html index 044ad80..7996091 100644 --- a/site/committers.html +++ b/site/committers.html @@ -422,6 +422,10 @@ UC Berkeley + Zhenhua Wang + Huawei + + Patrick Wendell Databricks - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][DOC] Fix a few markdown typos
Repository: spark Updated Branches: refs/heads/master 441d0d076 -> 8020f66fc [MINOR][DOC] Fix a few markdown typos ## What changes were proposed in this pull request? Easy fix in the markdown. ## How was this patch tested? jekyII build test manually. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: lemonjing <932191...@qq.com> Closes #20897 from Lemonjing/master. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8020f66f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8020f66f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8020f66f Branch: refs/heads/master Commit: 8020f66fc47140a1b5f843fb18c34ec80541d5ca Parents: 441d0d0 Author: lemonjing <932191...@qq.com> Authored: Tue Apr 3 09:36:44 2018 +0800 Committer: hyukjinkwon Committed: Tue Apr 3 09:36:44 2018 +0800 -- docs/ml-guide.md | 2 +- docs/mllib-feature-extraction.md | 4 ++-- docs/mllib-pmml-model-export.md | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8020f66f/docs/ml-guide.md -- diff --git a/docs/ml-guide.md b/docs/ml-guide.md index 702bcf7..aea07be 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -111,7 +111,7 @@ and the migration guide below will explain all changes between releases. * The class and trait hierarchy for logistic regression model summaries was changed to be cleaner and better accommodate the addition of the multi-class summary. This is a breaking change for user code that casts a `LogisticRegressionTrainingSummary` to a -` BinaryLogisticRegressionTrainingSummary`. Users should instead use the `model.binarySummary` +`BinaryLogisticRegressionTrainingSummary`. Users should instead use the `model.binarySummary` method. See [SPARK-17139](https://issues.apache.org/jira/browse/SPARK-17139) for more detail (_note_ this is an `Experimental` API). This _does not_ affect the Python `summary` method, which will still work correctly for both multinomial and binary cases. http://git-wip-us.apache.org/repos/asf/spark/blob/8020f66f/docs/mllib-feature-extraction.md -- diff --git a/docs/mllib-feature-extraction.md b/docs/mllib-feature-extraction.md index 75aea70..8b89296 100644 --- a/docs/mllib-feature-extraction.md +++ b/docs/mllib-feature-extraction.md @@ -278,8 +278,8 @@ for details on the API. multiplication. In other words, it scales each column of the dataset by a scalar multiplier. This represents the [Hadamard product](https://en.wikipedia.org/wiki/Hadamard_product_%28matrices%29) between the input vector, `v` and transforming vector, `scalingVec`, to yield a result vector. -Qu8T948*1# -Denoting the `scalingVec` as "`w`," this transformation may be written as: + +Denoting the `scalingVec` as "`w`", this transformation may be written as: `\[ \begin{pmatrix} v_1 \\ http://git-wip-us.apache.org/repos/asf/spark/blob/8020f66f/docs/mllib-pmml-model-export.md -- diff --git a/docs/mllib-pmml-model-export.md b/docs/mllib-pmml-model-export.md index d353090..f567565 100644 --- a/docs/mllib-pmml-model-export.md +++ b/docs/mllib-pmml-model-export.md @@ -7,7 +7,7 @@ displayTitle: PMML model export - RDD-based API * Table of contents {:toc} -## `spark.mllib` supported models +## spark.mllib supported models `spark.mllib` supports model export to Predictive Model Markup Language ([PMML](http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language)). @@ -15,7 +15,7 @@ The table below outlines the `spark.mllib` models that can be exported to PMML a -`spark.mllib` modelPMML model +spark.mllib modelPMML model - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][DOC] Fix a few markdown typos
Repository: spark Updated Branches: refs/heads/branch-2.3 6ca6483c1 -> ce1565115 [MINOR][DOC] Fix a few markdown typos ## What changes were proposed in this pull request? Easy fix in the markdown. ## How was this patch tested? jekyII build test manually. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: lemonjing <932191...@qq.com> Closes #20897 from Lemonjing/master. (cherry picked from commit 8020f66fc47140a1b5f843fb18c34ec80541d5ca) Signed-off-by: hyukjinkwon Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ce156511 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ce156511 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ce156511 Branch: refs/heads/branch-2.3 Commit: ce1565115481343af9043ecc4080d6d97eee698c Parents: 6ca6483 Author: lemonjing <932191...@qq.com> Authored: Tue Apr 3 09:36:44 2018 +0800 Committer: hyukjinkwon Committed: Tue Apr 3 09:36:59 2018 +0800 -- docs/ml-guide.md | 2 +- docs/mllib-feature-extraction.md | 4 ++-- docs/mllib-pmml-model-export.md | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ce156511/docs/ml-guide.md -- diff --git a/docs/ml-guide.md b/docs/ml-guide.md index 702bcf7..aea07be 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -111,7 +111,7 @@ and the migration guide below will explain all changes between releases. * The class and trait hierarchy for logistic regression model summaries was changed to be cleaner and better accommodate the addition of the multi-class summary. This is a breaking change for user code that casts a `LogisticRegressionTrainingSummary` to a -` BinaryLogisticRegressionTrainingSummary`. Users should instead use the `model.binarySummary` +`BinaryLogisticRegressionTrainingSummary`. Users should instead use the `model.binarySummary` method. See [SPARK-17139](https://issues.apache.org/jira/browse/SPARK-17139) for more detail (_note_ this is an `Experimental` API). This _does not_ affect the Python `summary` method, which will still work correctly for both multinomial and binary cases. http://git-wip-us.apache.org/repos/asf/spark/blob/ce156511/docs/mllib-feature-extraction.md -- diff --git a/docs/mllib-feature-extraction.md b/docs/mllib-feature-extraction.md index 75aea70..8b89296 100644 --- a/docs/mllib-feature-extraction.md +++ b/docs/mllib-feature-extraction.md @@ -278,8 +278,8 @@ for details on the API. multiplication. In other words, it scales each column of the dataset by a scalar multiplier. This represents the [Hadamard product](https://en.wikipedia.org/wiki/Hadamard_product_%28matrices%29) between the input vector, `v` and transforming vector, `scalingVec`, to yield a result vector. -Qu8T948*1# -Denoting the `scalingVec` as "`w`," this transformation may be written as: + +Denoting the `scalingVec` as "`w`", this transformation may be written as: `\[ \begin{pmatrix} v_1 \\ http://git-wip-us.apache.org/repos/asf/spark/blob/ce156511/docs/mllib-pmml-model-export.md -- diff --git a/docs/mllib-pmml-model-export.md b/docs/mllib-pmml-model-export.md index d353090..f567565 100644 --- a/docs/mllib-pmml-model-export.md +++ b/docs/mllib-pmml-model-export.md @@ -7,7 +7,7 @@ displayTitle: PMML model export - RDD-based API * Table of contents {:toc} -## `spark.mllib` supported models +## spark.mllib supported models `spark.mllib` supports model export to Predictive Model Markup Language ([PMML](http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language)). @@ -15,7 +15,7 @@ The table below outlines the `spark.mllib` models that can be exported to PMML a -`spark.mllib` modelPMML model +spark.mllib modelPMML model - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-19964][CORE] Avoid reading from remote repos in SparkSubmitSuite.
Repository: spark Updated Branches: refs/heads/branch-2.3 f1f10da2b -> 6ca6483c1 [SPARK-19964][CORE] Avoid reading from remote repos in SparkSubmitSuite. These tests can fail with a timeout if the remote repos are not responding, or slow. The tests don't need anything from those repos, so use an empty ivy config file to avoid setting up the defaults. The tests are passing reliably for me locally now, and failing more often than not today without this change since http://dl.bintray.com/spark-packages/maven doesn't seem to be loading from my machine. Author: Marcelo Vanzin Closes #20916 from vanzin/SPARK-19964. (cherry picked from commit 441d0d0766e9a6ac4c6ff79680394999ff7191fd) Signed-off-by: hyukjinkwon Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6ca6483c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6ca6483c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6ca6483c Branch: refs/heads/branch-2.3 Commit: 6ca6483c122baa40d69c1781bb34a3cd9e1361c0 Parents: f1f10da Author: Marcelo Vanzin Authored: Tue Apr 3 09:31:47 2018 +0800 Committer: hyukjinkwon Committed: Tue Apr 3 09:32:03 2018 +0800 -- .../org/apache/spark/deploy/DependencyUtils.scala | 13 - .../scala/org/apache/spark/deploy/SparkSubmit.scala| 3 ++- .../org/apache/spark/deploy/SparkSubmitArguments.scala | 2 ++ .../org/apache/spark/deploy/worker/DriverWrapper.scala | 13 + .../org/apache/spark/deploy/SparkSubmitSuite.scala | 9 ++--- 5 files changed, 27 insertions(+), 13 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6ca6483c/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala b/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala index ab319c8..fac834a 100644 --- a/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala +++ b/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala @@ -33,7 +33,8 @@ private[deploy] object DependencyUtils { packagesExclusions: String, packages: String, repositories: String, - ivyRepoPath: String): String = { + ivyRepoPath: String, + ivySettingsPath: Option[String]): String = { val exclusions: Seq[String] = if (!StringUtils.isBlank(packagesExclusions)) { packagesExclusions.split(",") @@ -41,10 +42,12 @@ private[deploy] object DependencyUtils { Nil } // Create the IvySettings, either load from file or build defaults -val ivySettings = sys.props.get("spark.jars.ivySettings").map { ivySettingsFile => - SparkSubmitUtils.loadIvySettings(ivySettingsFile, Option(repositories), Option(ivyRepoPath)) -}.getOrElse { - SparkSubmitUtils.buildIvySettings(Option(repositories), Option(ivyRepoPath)) +val ivySettings = ivySettingsPath match { + case Some(path) => +SparkSubmitUtils.loadIvySettings(path, Option(repositories), Option(ivyRepoPath)) + + case None => +SparkSubmitUtils.buildIvySettings(Option(repositories), Option(ivyRepoPath)) } SparkSubmitUtils.resolveMavenCoordinates(packages, ivySettings, exclusions = exclusions) http://git-wip-us.apache.org/repos/asf/spark/blob/6ca6483c/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala index b44c880..deb52a4 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala @@ -361,7 +361,8 @@ object SparkSubmit extends CommandLineUtils with Logging { // Resolve maven dependencies if there are any and add classpath to jars. Add them to py-files // too for packages that include Python code val resolvedMavenCoordinates = DependencyUtils.resolveMavenDependencies( -args.packagesExclusions, args.packages, args.repositories, args.ivyRepoPath) +args.packagesExclusions, args.packages, args.repositories, args.ivyRepoPath, +args.ivySettingsPath) if (!StringUtils.isBlank(resolvedMavenCoordinates)) { args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates) http://git-wip-us.apache.org/repos/asf/spark/blob/6ca6483c/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit
spark git commit: [SPARK-19964][CORE] Avoid reading from remote repos in SparkSubmitSuite.
Repository: spark Updated Branches: refs/heads/master a1351828d -> 441d0d076 [SPARK-19964][CORE] Avoid reading from remote repos in SparkSubmitSuite. These tests can fail with a timeout if the remote repos are not responding, or slow. The tests don't need anything from those repos, so use an empty ivy config file to avoid setting up the defaults. The tests are passing reliably for me locally now, and failing more often than not today without this change since http://dl.bintray.com/spark-packages/maven doesn't seem to be loading from my machine. Author: Marcelo Vanzin Closes #20916 from vanzin/SPARK-19964. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/441d0d07 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/441d0d07 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/441d0d07 Branch: refs/heads/master Commit: 441d0d0766e9a6ac4c6ff79680394999ff7191fd Parents: a135182 Author: Marcelo Vanzin Authored: Tue Apr 3 09:31:47 2018 +0800 Committer: hyukjinkwon Committed: Tue Apr 3 09:31:47 2018 +0800 -- .../org/apache/spark/deploy/DependencyUtils.scala | 13 - .../scala/org/apache/spark/deploy/SparkSubmit.scala| 3 ++- .../org/apache/spark/deploy/SparkSubmitArguments.scala | 2 ++ .../org/apache/spark/deploy/worker/DriverWrapper.scala | 13 + .../org/apache/spark/deploy/SparkSubmitSuite.scala | 9 ++--- 5 files changed, 27 insertions(+), 13 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/441d0d07/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala b/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala index ab319c8..fac834a 100644 --- a/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala +++ b/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala @@ -33,7 +33,8 @@ private[deploy] object DependencyUtils { packagesExclusions: String, packages: String, repositories: String, - ivyRepoPath: String): String = { + ivyRepoPath: String, + ivySettingsPath: Option[String]): String = { val exclusions: Seq[String] = if (!StringUtils.isBlank(packagesExclusions)) { packagesExclusions.split(",") @@ -41,10 +42,12 @@ private[deploy] object DependencyUtils { Nil } // Create the IvySettings, either load from file or build defaults -val ivySettings = sys.props.get("spark.jars.ivySettings").map { ivySettingsFile => - SparkSubmitUtils.loadIvySettings(ivySettingsFile, Option(repositories), Option(ivyRepoPath)) -}.getOrElse { - SparkSubmitUtils.buildIvySettings(Option(repositories), Option(ivyRepoPath)) +val ivySettings = ivySettingsPath match { + case Some(path) => +SparkSubmitUtils.loadIvySettings(path, Option(repositories), Option(ivyRepoPath)) + + case None => +SparkSubmitUtils.buildIvySettings(Option(repositories), Option(ivyRepoPath)) } SparkSubmitUtils.resolveMavenCoordinates(packages, ivySettings, exclusions = exclusions) http://git-wip-us.apache.org/repos/asf/spark/blob/441d0d07/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala index 3965f17..eddbede 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala @@ -359,7 +359,8 @@ object SparkSubmit extends CommandLineUtils with Logging { // Resolve maven dependencies if there are any and add classpath to jars. Add them to py-files // too for packages that include Python code val resolvedMavenCoordinates = DependencyUtils.resolveMavenDependencies( -args.packagesExclusions, args.packages, args.repositories, args.ivyRepoPath) +args.packagesExclusions, args.packages, args.repositories, args.ivyRepoPath, +args.ivySettingsPath) if (!StringUtils.isBlank(resolvedMavenCoordinates)) { args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates) http://git-wip-us.apache.org/repos/asf/spark/blob/441d0d07/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala index e7796d4..8e70705 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmi
spark git commit: [SPARK-23690][ML] Add handleinvalid to VectorAssembler
Repository: spark Updated Branches: refs/heads/master 28ea4e314 -> a1351828d [SPARK-23690][ML] Add handleinvalid to VectorAssembler ## What changes were proposed in this pull request? Introduce `handleInvalid` parameter in `VectorAssembler` that can take in `"keep", "skip", "error"` options. "error" throws an error on seeing a row containing a `null`, "skip" filters out all such rows, and "keep" adds relevant number of NaN. "keep" figures out an example to find out what this number of NaN s should be added and throws an error when no such number could be found. ## How was this patch tested? Unit tests are added to check the behavior of `assemble` on specific rows and the transformer is called on `DataFrame`s of different configurations to test different corner cases. Author: Yogesh Garg Author: Bago Amirbekian Author: Yogesh Garg <1059168+yoge...@users.noreply.github.com> Closes #20829 from yogeshg/rformula_handleinvalid. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a1351828 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a1351828 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a1351828 Branch: refs/heads/master Commit: a1351828d376a01e5ee0959cf608f767d756dd86 Parents: 28ea4e3 Author: Yogesh Garg Authored: Mon Apr 2 16:41:26 2018 -0700 Committer: Joseph K. Bradley Committed: Mon Apr 2 16:41:26 2018 -0700 -- .../apache/spark/ml/feature/StringIndexer.scala | 2 +- .../spark/ml/feature/VectorAssembler.scala | 198 +++ .../spark/ml/feature/VectorAssemblerSuite.scala | 131 ++-- 3 files changed, 284 insertions(+), 47 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a1351828/mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala index 1cdcdfc..67cdb09 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala @@ -234,7 +234,7 @@ class StringIndexerModel ( val metadata = NominalAttribute.defaultAttr .withName($(outputCol)).withValues(filteredLabels).toMetadata() // If we are skipping invalid records, filter them out. -val (filteredDataset, keepInvalid) = getHandleInvalid match { +val (filteredDataset, keepInvalid) = $(handleInvalid) match { case StringIndexer.SKIP_INVALID => val filterer = udf { label: String => labelToIndex.contains(label) http://git-wip-us.apache.org/repos/asf/spark/blob/a1351828/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala index b373ae9..6bf4aa3 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala @@ -17,14 +17,17 @@ package org.apache.spark.ml.feature -import scala.collection.mutable.ArrayBuilder +import java.util.NoSuchElementException + +import scala.collection.mutable +import scala.language.existentials import org.apache.spark.SparkException import org.apache.spark.annotation.Since import org.apache.spark.ml.Transformer import org.apache.spark.ml.attribute.{Attribute, AttributeGroup, NumericAttribute, UnresolvedAttribute} import org.apache.spark.ml.linalg.{Vector, Vectors, VectorUDT} -import org.apache.spark.ml.param.ParamMap +import org.apache.spark.ml.param.{Param, ParamMap, ParamValidators} import org.apache.spark.ml.param.shared._ import org.apache.spark.ml.util._ import org.apache.spark.sql.{DataFrame, Dataset, Row} @@ -33,10 +36,14 @@ import org.apache.spark.sql.types._ /** * A feature transformer that merges multiple columns into a vector column. + * + * This requires one pass over the entire dataset. In case we need to infer column lengths from the + * data we require an additional call to the 'first' Dataset method, see 'handleInvalid' parameter. */ @Since("1.4.0") class VectorAssembler @Since("1.4.0") (@Since("1.4.0") override val uid: String) - extends Transformer with HasInputCols with HasOutputCol with DefaultParamsWritable { + extends Transformer with HasInputCols with HasOutputCol with HasHandleInvalid +with DefaultParamsWritable { @Since("1.4.0") def this() = this(Identifiable.randomUID("vecAssembler")) @@ -49,32 +56,63 @@ class VectorAssembler @Since("1.4.0") (@Since("1.4.0") override val uid:
spark git commit: [SPARK-23834][TEST] Wait for connection before disconnect in LauncherServer test.
Repository: spark Updated Branches: refs/heads/master a7c19d9c2 -> 28ea4e314 [SPARK-23834][TEST] Wait for connection before disconnect in LauncherServer test. It was possible that the disconnect() was called on the handle before the server had received the handshake messages, so no connection was yet attached to the handle. The fix waits until we're sure the handle has been mapped to a client connection. Author: Marcelo Vanzin Closes #20950 from vanzin/SPARK-23834. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/28ea4e31 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/28ea4e31 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/28ea4e31 Branch: refs/heads/master Commit: 28ea4e3142b88eb396aa8dd5daf7b02b556204ba Parents: a7c19d9 Author: Marcelo Vanzin Authored: Mon Apr 2 14:35:07 2018 -0700 Committer: Marcelo Vanzin Committed: Mon Apr 2 14:35:07 2018 -0700 -- .../java/org/apache/spark/launcher/LauncherServerSuite.java | 8 1 file changed, 8 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/28ea4e31/launcher/src/test/java/org/apache/spark/launcher/LauncherServerSuite.java -- diff --git a/launcher/src/test/java/org/apache/spark/launcher/LauncherServerSuite.java b/launcher/src/test/java/org/apache/spark/launcher/LauncherServerSuite.java index 5413d3a..f8dc0ec 100644 --- a/launcher/src/test/java/org/apache/spark/launcher/LauncherServerSuite.java +++ b/launcher/src/test/java/org/apache/spark/launcher/LauncherServerSuite.java @@ -196,6 +196,14 @@ public class LauncherServerSuite extends BaseSuite { Socket s = new Socket(InetAddress.getLoopbackAddress(), server.getPort()); client = new TestClient(s); client.send(new Hello(secret, "1.4.0")); + client.send(new SetAppId("someId")); + + // Wait until we know the server has received the messages and matched the handle to the + // connection before disconnecting. + eventually(Duration.ofSeconds(1), Duration.ofMillis(10), () -> { +assertEquals("someId", handle.getAppId()); + }); + handle.disconnect(); waitForError(client, secret); } finally { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder classes
Repository: spark Updated Branches: refs/heads/master fe2b7a456 -> a7c19d9c2 [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder classes ## What changes were proposed in this pull request? This PR implemented the following cleanups related to `UnsafeWriter` class: - Remove code duplication between `UnsafeRowWriter` and `UnsafeArrayWriter` - Make `BufferHolder` class internal by delegating its accessor methods to `UnsafeWriter` - Replace `UnsafeRow.setTotalSize(...)` with `UnsafeRowWriter.setTotalSize()` ## How was this patch tested? Tested by existing UTs Author: Kazuaki Ishizaki Closes #20850 from kiszk/SPARK-23713. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a7c19d9c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a7c19d9c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a7c19d9c Branch: refs/heads/master Commit: a7c19d9c21d59fd0109a7078c80b33d3da03fafd Parents: fe2b7a4 Author: Kazuaki Ishizaki Authored: Mon Apr 2 21:48:44 2018 +0200 Committer: Herman van Hovell Committed: Mon Apr 2 21:48:44 2018 +0200 -- .../sql/kafka010/KafkaContinuousReader.scala| 3 - .../KafkaRecordToUnsafeRowConverter.scala | 11 +- .../expressions/codegen/BufferHolder.java | 32 ++-- .../expressions/codegen/UnsafeArrayWriter.java | 133 +++-- .../expressions/codegen/UnsafeRowWriter.java| 189 +++ .../expressions/codegen/UnsafeWriter.java | 157 ++- .../InterpretedUnsafeProjection.scala | 90 - .../codegen/GenerateUnsafeProjection.scala | 124 ++-- .../expressions/RowBasedKeyValueBatchSuite.java | 28 +-- .../aggregate/RowBasedHashMapGenerator.scala| 12 +- .../columnar/GenerateColumnAccessor.scala | 9 +- .../datasources/text/TextFileFormat.scala | 11 +- 12 files changed, 391 insertions(+), 408 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a7c19d9c/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousReader.scala -- diff --git a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousReader.scala b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousReader.scala index e7e2787..f26c134 100644 --- a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousReader.scala +++ b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousReader.scala @@ -27,13 +27,10 @@ import org.apache.spark.TaskContext import org.apache.spark.internal.Logging import org.apache.spark.sql.SparkSession import org.apache.spark.sql.catalyst.expressions.UnsafeRow -import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, UnsafeRowWriter} -import org.apache.spark.sql.catalyst.util.DateTimeUtils import org.apache.spark.sql.kafka010.KafkaSourceProvider.{INSTRUCTION_FOR_FAIL_ON_DATA_LOSS_FALSE, INSTRUCTION_FOR_FAIL_ON_DATA_LOSS_TRUE} import org.apache.spark.sql.sources.v2.reader._ import org.apache.spark.sql.sources.v2.reader.streaming.{ContinuousDataReader, ContinuousReader, Offset, PartitionOffset} import org.apache.spark.sql.types.StructType -import org.apache.spark.unsafe.types.UTF8String /** * A [[ContinuousReader]] for data from kafka. http://git-wip-us.apache.org/repos/asf/spark/blob/a7c19d9c/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRecordToUnsafeRowConverter.scala -- diff --git a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRecordToUnsafeRowConverter.scala b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRecordToUnsafeRowConverter.scala index 1acdd56..f35a143 100644 --- a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRecordToUnsafeRowConverter.scala +++ b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRecordToUnsafeRowConverter.scala @@ -20,18 +20,16 @@ package org.apache.spark.sql.kafka010 import org.apache.kafka.clients.consumer.ConsumerRecord import org.apache.spark.sql.catalyst.expressions.UnsafeRow -import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, UnsafeRowWriter} +import org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter import org.apache.spark.sql.catalyst.util.DateTimeUtils import org.apache.spark.unsafe.types.UTF8String /** A simple class for converting Kafka ConsumerRecord to UnsafeRow */ private[kafka010] class KafkaRecordToUnsafeRowConverter { - private val sharedRow = new UnsafeRow(7) - private val bufferHolder = new BufferHolder(sharedRow) - privat
spark git commit: [SPARK-23285][K8S] Add a config property for specifying physical executor cores
Repository: spark Updated Branches: refs/heads/master 6151f29f9 -> fe2b7a456 [SPARK-23285][K8S] Add a config property for specifying physical executor cores ## What changes were proposed in this pull request? As mentioned in SPARK-23285, this PR introduces a new configuration property `spark.kubernetes.executor.cores` for specifying the physical CPU cores requested for each executor pod. This is to avoid changing the semantics of `spark.executor.cores` and `spark.task.cpus` and their role in task scheduling, task parallelism, dynamic resource allocation, etc. The new configuration property only determines the physical CPU cores available to an executor. An executor can still run multiple tasks simultaneously by using appropriate values for `spark.executor.cores` and `spark.task.cpus`. ## How was this patch tested? Unit tests. felixcheung srowen jiangxb1987 jerryshao mccheah foxish Author: Yinan Li Author: Yinan Li Closes #20553 from liyinan926/master. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fe2b7a45 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fe2b7a45 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fe2b7a45 Branch: refs/heads/master Commit: fe2b7a4568d65a62da6e6eb00fff05f248b4332c Parents: 6151f29 Author: Yinan Li Authored: Mon Apr 2 12:20:55 2018 -0700 Committer: Anirudh Ramanathan Committed: Mon Apr 2 12:20:55 2018 -0700 -- docs/running-on-kubernetes.md | 15 --- .../org/apache/spark/deploy/k8s/Config.scala| 6 + .../cluster/k8s/ExecutorPodFactory.scala| 12 ++--- .../cluster/k8s/ExecutorPodFactorySuite.scala | 27 4 files changed, 53 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/fe2b7a45/docs/running-on-kubernetes.md -- diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 975b28d..9c46449 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -549,14 +549,23 @@ specific to Spark on Kubernetes. spark.kubernetes.driver.limit.cores (none) -Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for the driver pod. +Specify a hard cpu [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for the driver pod. + spark.kubernetes.executor.request.cores + (none) + +Specify the cpu request for each executor pod. Values conform to the Kubernetes [convention](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu). +Example values include 0.1, 500m, 1.5, 5, etc., with the definition of cpu units documented in [CPU units](https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/#cpu-units). +This is distinct from spark.executor.cores: it is only used and takes precedence over spark.executor.cores for specifying the executor pod cpu request if set. Task +parallelism, e.g., number of tasks an executor can run concurrently is not affected by this. + + spark.kubernetes.executor.limit.cores (none) -Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for each executor pod launched for the Spark Application. +Specify a hard cpu [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for each executor pod launched for the Spark Application. @@ -593,4 +602,4 @@ specific to Spark on Kubernetes. spark.kubernetes.executor.secrets.spark-secret=/etc/secrets. - \ No newline at end of file + http://git-wip-us.apache.org/repos/asf/spark/blob/fe2b7a45/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala -- diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala index da34a7e..405ea47 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala @@ -91,6 +91,12 @@ private[spark] object Config extends Logging { .stringConf .createOptional + val KUBERNETES_EXECUTOR_REQUEST_CORES = +ConfigB
svn commit: r26088 - in /dev/spark/2.4.0-SNAPSHOT-2018_04_02_12_01-6151f29-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Mon Apr 2 19:17:05 2018 New Revision: 26088 Log: Apache Spark 2.4.0-SNAPSHOT-2018_04_02_12_01-6151f29 docs [This commit notification would consist of 1452 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-23825][K8S] Requesting memory + memory overhead for pod memory
Repository: spark Updated Branches: refs/heads/master 44a9f8e6e -> 6151f29f9 [SPARK-23825][K8S] Requesting memory + memory overhead for pod memory ## What changes were proposed in this pull request? Kubernetes driver and executor pods should request `memory + memoryOverhead` as their resources instead of just `memory`, see https://issues.apache.org/jira/browse/SPARK-23825 ## How was this patch tested? Existing unit tests were adapted. Author: David Vogelbacher Closes #20943 from dvogelbacher/spark-23825. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6151f29f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6151f29f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6151f29f Branch: refs/heads/master Commit: 6151f29f9f589301159482044fc32717f430db6e Parents: 44a9f8e Author: David Vogelbacher Authored: Mon Apr 2 12:00:37 2018 -0700 Committer: mcheah Committed: Mon Apr 2 12:00:37 2018 -0700 -- .../deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala | 5 + .../spark/scheduler/cluster/k8s/ExecutorPodFactory.scala | 5 + .../k8s/submit/steps/BasicDriverConfigurationStepSuite.scala | 2 +- .../spark/scheduler/cluster/k8s/ExecutorPodFactorySuite.scala | 6 -- 4 files changed, 7 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6151f29f/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala -- diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala index 347c4d2..b811db3 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala @@ -93,9 +93,6 @@ private[spark] class BasicDriverConfigurationStep( .withAmount(driverCpuCores) .build() val driverMemoryQuantity = new QuantityBuilder(false) - .withAmount(s"${driverMemoryMiB}Mi") - .build() -val driverMemoryLimitQuantity = new QuantityBuilder(false) .withAmount(s"${driverMemoryWithOverheadMiB}Mi") .build() val maybeCpuLimitQuantity = driverLimitCores.map { limitCores => @@ -117,7 +114,7 @@ private[spark] class BasicDriverConfigurationStep( .withNewResources() .addToRequests("cpu", driverCpuQuantity) .addToRequests("memory", driverMemoryQuantity) -.addToLimits("memory", driverMemoryLimitQuantity) +.addToLimits("memory", driverMemoryQuantity) .addToLimits(maybeCpuLimitQuantity.toMap.asJava) .endResources() .addToArgs("driver") http://git-wip-us.apache.org/repos/asf/spark/blob/6151f29f/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala -- diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala index 98cbd56..ac42385 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala @@ -108,9 +108,6 @@ private[spark] class ExecutorPodFactory( SPARK_ROLE_LABEL -> SPARK_POD_EXECUTOR_ROLE) ++ executorLabels val executorMemoryQuantity = new QuantityBuilder(false) - .withAmount(s"${executorMemoryMiB}Mi") - .build() -val executorMemoryLimitQuantity = new QuantityBuilder(false) .withAmount(s"${executorMemoryWithOverhead}Mi") .build() val executorCpuQuantity = new QuantityBuilder(false) @@ -167,7 +164,7 @@ private[spark] class ExecutorPodFactory( .withImagePullPolicy(imagePullPolicy) .withNewResources() .addToRequests("memory", executorMemoryQuantity) -.addToLimits("memory", executorMemoryLimitQuantity) +.addToLimits("memory", executorMemoryQuantity) .addToRequests("cpu", executorCpuQuantity) .endResources() .addAllToEnv(executorEnv.asJava) http://git-wip-us.apache.org/repos/asf/spark/blob/6151f29f/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep
spark git commit: [SPARK-15009][PYTHON][FOLLOWUP] Add default param checks for CountVectorizerModel
Repository: spark Updated Branches: refs/heads/master 529f84710 -> 44a9f8e6e [SPARK-15009][PYTHON][FOLLOWUP] Add default param checks for CountVectorizerModel ## What changes were proposed in this pull request? Adding test for default params for `CountVectorizerModel` constructed from vocabulary. This required that the param `maxDF` be added, which was done in SPARK-23615. ## How was this patch tested? Added an explicit test for CountVectorizerModel in DefaultValuesTests. Author: Bryan Cutler Closes #20942 from BryanCutler/pyspark-CountVectorizerModel-default-param-test-SPARK-15009. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/44a9f8e6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/44a9f8e6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/44a9f8e6 Branch: refs/heads/master Commit: 44a9f8e6e82c300dc61ca18515aee16f17f27501 Parents: 529f847 Author: Bryan Cutler Authored: Mon Apr 2 09:53:37 2018 -0700 Committer: Bryan Cutler Committed: Mon Apr 2 09:53:37 2018 -0700 -- python/pyspark/ml/tests.py | 5 + 1 file changed, 5 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/44a9f8e6/python/pyspark/ml/tests.py -- diff --git a/python/pyspark/ml/tests.py b/python/pyspark/ml/tests.py index 6b4376c..c2c4861 100755 --- a/python/pyspark/ml/tests.py +++ b/python/pyspark/ml/tests.py @@ -2096,6 +2096,11 @@ class DefaultValuesTests(PySparkTestCase): # NOTE: disable check_params_exist until there is parity with Scala API ParamTests.check_params(self, cls(), check_params_exist=False) +# Additional classes that need explicit construction +from pyspark.ml.feature import CountVectorizerModel +ParamTests.check_params(self, CountVectorizerModel.from_vocabulary(['a'], 'input'), +check_params_exist=False) + def _squared_distance(a, b): if isinstance(a, Vector): - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org