spark git commit: [MINOR][BUILD] Add modernizr MIT license; specify "2014 and onwards" in license copyright
Repository: spark Updated Branches: refs/heads/branch-2.0 729730159 -> ed1e20207 [MINOR][BUILD] Add modernizr MIT license; specify "2014 and onwards" in license copyright ## What changes were proposed in this pull request? Per conversation on dev list, add missing modernizr license. Specify "2014 and onwards" in copyright statement. ## How was this patch tested? (none required) Author: Sean Owen Closes #13510 from srowen/ModernizrLicense. (cherry picked from commit 681387b2dc9a094cfba84188a1dd1ac9192bb99c) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ed1e2020 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ed1e2020 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ed1e2020 Branch: refs/heads/branch-2.0 Commit: ed1e20207c1c2e503a22d5ad2cdf505ef6ecbcad Parents: 7297301 Author: Sean Owen Authored: Sat Jun 4 21:41:27 2016 +0100 Committer: Sean Owen Committed: Sat Jun 4 21:41:35 2016 +0100 -- LICENSE| 1 + NOTICE | 2 +- licenses/LICENSE-modernizr.txt | 21 + 3 files changed, 23 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ed1e2020/LICENSE -- diff --git a/LICENSE b/LICENSE index f403640..94fd46f 100644 --- a/LICENSE +++ b/LICENSE @@ -296,3 +296,4 @@ The text of each license is also included at licenses/LICENSE-[project].txt. (MIT License) blockUI (http://jquery.malsup.com/block/) (MIT License) RowsGroup (http://datatables.net/license/mit) (MIT License) jsonFormatter (http://www.jqueryscript.net/other/jQuery-Plugin-For-Pretty-JSON-Formatting-jsonFormatter.html) + (MIT License) modernizr (https://github.com/Modernizr/Modernizr/blob/master/LICENSE) http://git-wip-us.apache.org/repos/asf/spark/blob/ed1e2020/NOTICE -- diff --git a/NOTICE b/NOTICE index f4b1260..69b513e 100644 --- a/NOTICE +++ b/NOTICE @@ -1,5 +1,5 @@ Apache Spark -Copyright 2014 The Apache Software Foundation. +Copyright 2014 and onwards The Apache Software Foundation. This product includes software developed at The Apache Software Foundation (http://www.apache.org/). http://git-wip-us.apache.org/repos/asf/spark/blob/ed1e2020/licenses/LICENSE-modernizr.txt -- diff --git a/licenses/LICENSE-modernizr.txt b/licenses/LICENSE-modernizr.txt new file mode 100644 index 000..2bf24b9 --- /dev/null +++ b/licenses/LICENSE-modernizr.txt @@ -0,0 +1,21 @@ +The MIT License (MIT) + +Copyright (c) + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. \ No newline at end of file - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15707][SQL] Make Code Neat - Use map instead of if check.
Repository: spark Updated Branches: refs/heads/master 091f81e1f -> 0f307db5e [SPARK-15707][SQL] Make Code Neat - Use map instead of if check. ## What changes were proposed in this pull request? In forType function of object RandomDataGenerator, the code following: if (maybeSqlTypeGenerator.isDefined){ Some(generator) } else{ None } will be changed. Instead, maybeSqlTypeGenerator.map will be used. ## How was this patch tested? All of the current unit tests passed. Author: Weiqing Yang Closes #13448 from Sherry302/master. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0f307db5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0f307db5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0f307db5 Branch: refs/heads/master Commit: 0f307db5e17e1e8a655cfa751218ac4ed88717a7 Parents: 091f81e Author: Weiqing Yang Authored: Sat Jun 4 22:44:03 2016 +0100 Committer: Sean Owen Committed: Sat Jun 4 22:44:03 2016 +0100 -- .../scala/org/apache/spark/sql/RandomDataGenerator.scala | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0f307db5/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala index 711e870..8508697 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala @@ -236,9 +236,8 @@ object RandomDataGenerator { // convert it to catalyst value to call udt's deserialize. val toCatalystType = CatalystTypeConverters.createToCatalystConverter(udt.sqlType) -if (maybeSqlTypeGenerator.isDefined) { - val sqlTypeGenerator = maybeSqlTypeGenerator.get - val generator = () => { +maybeSqlTypeGenerator.map { sqlTypeGenerator => + () => { val generatedScalaValue = sqlTypeGenerator.apply() if (generatedScalaValue == null) { null @@ -246,9 +245,6 @@ object RandomDataGenerator { udt.deserialize(toCatalystType(generatedScalaValue)) } } - Some(generator) -} else { - None } case unsupportedType => None } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15707][SQL] Make Code Neat - Use map instead of if check.
Repository: spark Updated Branches: refs/heads/branch-2.0 7e4c9dd55 -> 32a64d8fc [SPARK-15707][SQL] Make Code Neat - Use map instead of if check. ## What changes were proposed in this pull request? In forType function of object RandomDataGenerator, the code following: if (maybeSqlTypeGenerator.isDefined){ Some(generator) } else{ None } will be changed. Instead, maybeSqlTypeGenerator.map will be used. ## How was this patch tested? All of the current unit tests passed. Author: Weiqing Yang Closes #13448 from Sherry302/master. (cherry picked from commit 0f307db5e17e1e8a655cfa751218ac4ed88717a7) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/32a64d8f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/32a64d8f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/32a64d8f Branch: refs/heads/branch-2.0 Commit: 32a64d8fc9e7ddaf993bdd7e679113dc605a69a7 Parents: 7e4c9dd Author: Weiqing Yang Authored: Sat Jun 4 22:44:03 2016 +0100 Committer: Sean Owen Committed: Sat Jun 4 22:44:12 2016 +0100 -- .../scala/org/apache/spark/sql/RandomDataGenerator.scala | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/32a64d8f/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala index 711e870..8508697 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala @@ -236,9 +236,8 @@ object RandomDataGenerator { // convert it to catalyst value to call udt's deserialize. val toCatalystType = CatalystTypeConverters.createToCatalystConverter(udt.sqlType) -if (maybeSqlTypeGenerator.isDefined) { - val sqlTypeGenerator = maybeSqlTypeGenerator.get - val generator = () => { +maybeSqlTypeGenerator.map { sqlTypeGenerator => + () => { val generatedScalaValue = sqlTypeGenerator.apply() if (generatedScalaValue == null) { null @@ -246,9 +245,6 @@ object RandomDataGenerator { udt.deserialize(toCatalystType(generatedScalaValue)) } } - Some(generator) -} else { - None } case unsupportedType => None } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15723] Fixed local-timezone-brittle test where short-timezone form "EST" is …
Repository: spark Updated Branches: refs/heads/master 0f307db5e -> 4e767d0f9 [SPARK-15723] Fixed local-timezone-brittle test where short-timezone form "EST" is ⦠## What changes were proposed in this pull request? Stop using the abbreviated and ambiguous timezone "EST" in a test, since it is machine-local default timezone dependent, and fails in different timezones. Fixed [SPARK-15723](https://issues.apache.org/jira/browse/SPARK-15723). ## How was this patch tested? Note that to reproduce this problem in any locale/timezone, you can modify the scalatest-maven-plugin argLine to add a timezone: -ea -Xmx3g -XX:MaxPermSize=${MaxPermGen} -XX:ReservedCodeCacheSize=${CodeCacheSize} -Duser.timezone="Australia/Sydney" and run $ mvn test -DwildcardSuites=org.apache.spark.status.api.v1.SimpleDateParamSuite -Dtest=none. Equally this will fix it in an effected timezone: -ea -Xmx3g -XX:MaxPermSize=${MaxPermGen} -XX:ReservedCodeCacheSize=${CodeCacheSize} -Duser.timezone="America/New_York" To test the fix, apply the above change to `pom.xml` to set test TZ to `Australia/Sydney`, and confirm the test now passes. Author: Brett Randall Closes #13462 from javabrett/SPARK-15723-SimpleDateParamSuite. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4e767d0f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4e767d0f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4e767d0f Branch: refs/heads/master Commit: 4e767d0f9042bfea6074c2637438859699ec4dc3 Parents: 0f307db Author: Brett Randall Authored: Sun Jun 5 15:31:56 2016 +0100 Committer: Sean Owen Committed: Sun Jun 5 15:31:56 2016 +0100 -- .../org/apache/spark/status/api/v1/SimpleDateParamSuite.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4e767d0f/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala b/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala index 63b0e77..18baeb1 100644 --- a/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala +++ b/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala @@ -26,7 +26,8 @@ class SimpleDateParamSuite extends SparkFunSuite with Matchers { test("date parsing") { new SimpleDateParam("2015-02-20T23:21:17.190GMT").timestamp should be (1424474477190L) -new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be (1424470877190L) +// don't use EST, it is ambiguous, use -0500 instead, see SPARK-15723 +new SimpleDateParam("2015-02-20T17:21:17.190-0500").timestamp should be (1424470877190L) new SimpleDateParam("2015-02-20").timestamp should be (142439040L) // GMT intercept[WebApplicationException] { new SimpleDateParam("invalid date") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15723] Fixed local-timezone-brittle test where short-timezone form "EST" is …
Repository: spark Updated Branches: refs/heads/branch-2.0 32a64d8fc -> 8c0ec85e6 [SPARK-15723] Fixed local-timezone-brittle test where short-timezone form "EST" is ⦠## What changes were proposed in this pull request? Stop using the abbreviated and ambiguous timezone "EST" in a test, since it is machine-local default timezone dependent, and fails in different timezones. Fixed [SPARK-15723](https://issues.apache.org/jira/browse/SPARK-15723). ## How was this patch tested? Note that to reproduce this problem in any locale/timezone, you can modify the scalatest-maven-plugin argLine to add a timezone: -ea -Xmx3g -XX:MaxPermSize=${MaxPermGen} -XX:ReservedCodeCacheSize=${CodeCacheSize} -Duser.timezone="Australia/Sydney" and run $ mvn test -DwildcardSuites=org.apache.spark.status.api.v1.SimpleDateParamSuite -Dtest=none. Equally this will fix it in an effected timezone: -ea -Xmx3g -XX:MaxPermSize=${MaxPermGen} -XX:ReservedCodeCacheSize=${CodeCacheSize} -Duser.timezone="America/New_York" To test the fix, apply the above change to `pom.xml` to set test TZ to `Australia/Sydney`, and confirm the test now passes. Author: Brett Randall Closes #13462 from javabrett/SPARK-15723-SimpleDateParamSuite. (cherry picked from commit 4e767d0f9042bfea6074c2637438859699ec4dc3) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8c0ec85e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8c0ec85e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8c0ec85e Branch: refs/heads/branch-2.0 Commit: 8c0ec85e62f762c11e0686d1c35d1dfec05df9de Parents: 32a64d8 Author: Brett Randall Authored: Sun Jun 5 15:31:56 2016 +0100 Committer: Sean Owen Committed: Sun Jun 5 16:12:24 2016 +0100 -- .../org/apache/spark/status/api/v1/SimpleDateParamSuite.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8c0ec85e/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala b/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala index 63b0e77..18baeb1 100644 --- a/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala +++ b/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala @@ -26,7 +26,8 @@ class SimpleDateParamSuite extends SparkFunSuite with Matchers { test("date parsing") { new SimpleDateParam("2015-02-20T23:21:17.190GMT").timestamp should be (1424474477190L) -new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be (1424470877190L) +// don't use EST, it is ambiguous, use -0500 instead, see SPARK-15723 +new SimpleDateParam("2015-02-20T17:21:17.190-0500").timestamp should be (1424470877190L) new SimpleDateParam("2015-02-20").timestamp should be (142439040L) // GMT intercept[WebApplicationException] { new SimpleDateParam("invalid date") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15723] Fixed local-timezone-brittle test where short-timezone form "EST" is …
Repository: spark Updated Branches: refs/heads/branch-1.6 a0cf7d0b2 -> 6a9f19dd5 [SPARK-15723] Fixed local-timezone-brittle test where short-timezone form "EST" is ⦠## What changes were proposed in this pull request? Stop using the abbreviated and ambiguous timezone "EST" in a test, since it is machine-local default timezone dependent, and fails in different timezones. Fixed [SPARK-15723](https://issues.apache.org/jira/browse/SPARK-15723). ## How was this patch tested? Note that to reproduce this problem in any locale/timezone, you can modify the scalatest-maven-plugin argLine to add a timezone: -ea -Xmx3g -XX:MaxPermSize=${MaxPermGen} -XX:ReservedCodeCacheSize=${CodeCacheSize} -Duser.timezone="Australia/Sydney" and run $ mvn test -DwildcardSuites=org.apache.spark.status.api.v1.SimpleDateParamSuite -Dtest=none. Equally this will fix it in an effected timezone: -ea -Xmx3g -XX:MaxPermSize=${MaxPermGen} -XX:ReservedCodeCacheSize=${CodeCacheSize} -Duser.timezone="America/New_York" To test the fix, apply the above change to `pom.xml` to set test TZ to `Australia/Sydney`, and confirm the test now passes. Author: Brett Randall Closes #13462 from javabrett/SPARK-15723-SimpleDateParamSuite. (cherry picked from commit 4e767d0f9042bfea6074c2637438859699ec4dc3) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6a9f19dd Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6a9f19dd Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6a9f19dd Branch: refs/heads/branch-1.6 Commit: 6a9f19dd57dadb80bccc328cf1d099bed04f7f18 Parents: a0cf7d0 Author: Brett Randall Authored: Sun Jun 5 15:31:56 2016 +0100 Committer: Sean Owen Committed: Sun Jun 5 16:12:49 2016 +0100 -- .../org/apache/spark/status/api/v1/SimpleDateParamSuite.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6a9f19dd/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala b/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala index 63b0e77..18baeb1 100644 --- a/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala +++ b/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala @@ -26,7 +26,8 @@ class SimpleDateParamSuite extends SparkFunSuite with Matchers { test("date parsing") { new SimpleDateParam("2015-02-20T23:21:17.190GMT").timestamp should be (1424474477190L) -new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be (1424470877190L) +// don't use EST, it is ambiguous, use -0500 instead, see SPARK-15723 +new SimpleDateParam("2015-02-20T17:21:17.190-0500").timestamp should be (1424470877190L) new SimpleDateParam("2015-02-20").timestamp should be (142439040L) // GMT intercept[WebApplicationException] { new SimpleDateParam("invalid date") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR] Fix Typos 'an -> a'
Repository: spark Updated Branches: refs/heads/master 32f2f95db -> fd8af3971 [MINOR] Fix Typos 'an -> a' ## What changes were proposed in this pull request? `an -> a` Use cmds like `find . -name '*.R' | xargs -i sh -c "grep -in ' an [^aeiou]' {} && echo {}"` to generate candidates, and review them one by one. ## How was this patch tested? manual tests Author: Zheng RuiFeng Closes #13515 from zhengruifeng/an_a. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fd8af397 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fd8af397 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fd8af397 Branch: refs/heads/master Commit: fd8af397132fa1415a4c19d7f5cb5a41aa6ddb27 Parents: 32f2f95 Author: Zheng RuiFeng Authored: Mon Jun 6 09:35:47 2016 +0100 Committer: Sean Owen Committed: Mon Jun 6 09:35:47 2016 +0100 -- R/pkg/R/utils.R | 2 +- .../src/main/scala/org/apache/spark/Accumulable.scala | 2 +- .../org/apache/spark/api/java/JavaSparkContext.scala | 2 +- .../scala/org/apache/spark/api/python/PythonRDD.scala | 2 +- .../scala/org/apache/spark/deploy/SparkSubmit.scala | 6 +++--- .../src/main/scala/org/apache/spark/rdd/JdbcRDD.scala | 6 +++--- .../main/scala/org/apache/spark/scheduler/Pool.scala | 2 +- .../org/apache/spark/broadcast/BroadcastSuite.scala | 2 +- .../spark/deploy/rest/StandaloneRestSubmitSuite.scala | 2 +- .../test/scala/org/apache/spark/rpc/RpcEnvSuite.scala | 2 +- .../apache/spark/scheduler/DAGSchedulerSuite.scala| 4 ++-- .../org/apache/spark/util/JsonProtocolSuite.scala | 2 +- .../spark/streaming/flume/FlumeBatchFetcher.scala | 2 +- .../spark/graphx/impl/VertexPartitionBaseOps.scala| 2 +- .../scala/org/apache/spark/ml/linalg/Vectors.scala| 2 +- .../src/main/scala/org/apache/spark/ml/Pipeline.scala | 2 +- .../spark/ml/classification/LogisticRegression.scala | 4 ++-- .../org/apache/spark/ml/tree/impl/RandomForest.scala | 2 +- .../mllib/classification/LogisticRegression.scala | 2 +- .../org/apache/spark/mllib/classification/SVM.scala | 2 +- .../spark/mllib/feature/VectorTransformer.scala | 2 +- .../scala/org/apache/spark/mllib/linalg/Vectors.scala | 2 +- .../mllib/linalg/distributed/CoordinateMatrix.scala | 2 +- .../apache/spark/mllib/rdd/MLPairRDDFunctions.scala | 2 +- python/pyspark/ml/classification.py | 4 ++-- python/pyspark/ml/pipeline.py | 2 +- python/pyspark/mllib/classification.py| 2 +- python/pyspark/mllib/common.py| 2 +- python/pyspark/rdd.py | 4 ++-- python/pyspark/sql/session.py | 2 +- python/pyspark/sql/streaming.py | 2 +- python/pyspark/sql/types.py | 2 +- python/pyspark/streaming/dstream.py | 4 ++-- .../src/main/scala/org/apache/spark/sql/Row.scala | 2 +- .../apache/spark/sql/catalyst/analysis/Analyzer.scala | 4 ++-- .../sql/catalyst/analysis/FunctionRegistry.scala | 2 +- .../sql/catalyst/analysis/MultiInstanceRelation.scala | 2 +- .../spark/sql/catalyst/catalog/SessionCatalog.scala | 6 +++--- .../sql/catalyst/catalog/functionResources.scala | 2 +- .../sql/catalyst/expressions/ExpectsInputTypes.scala | 2 +- .../spark/sql/catalyst/expressions/Projection.scala | 4 ++-- .../sql/catalyst/expressions/complexTypeCreator.scala | 2 +- .../org/apache/spark/sql/types/AbstractDataType.scala | 2 +- .../scala/org/apache/spark/sql/DataFrameReader.scala | 2 +- .../main/scala/org/apache/spark/sql/SQLContext.scala | 14 +++--- .../scala/org/apache/spark/sql/SQLImplicits.scala | 2 +- .../scala/org/apache/spark/sql/SparkSession.scala | 14 +++--- .../org/apache/spark/sql/catalyst/SQLBuilder.scala| 2 +- .../aggregate/SortBasedAggregationIterator.scala | 2 +- .../apache/spark/sql/execution/aggregate/udaf.scala | 2 +- .../execution/columnar/GenerateColumnAccessor.scala | 2 +- .../execution/datasources/FileSourceStrategy.scala| 2 +- .../execution/datasources/json/JacksonParser.scala| 2 +- .../datasources/parquet/CatalystRowConverter.scala| 2 +- .../sql/execution/exchange/ExchangeCoordinator.scala | 10 +- .../spark/sql/execution/joins/SortMergeJoinExec.scala | 2 +- .../spark/sql/execution/r/MapPartitionsRWrapper.scala | 2 +- .../scala/org/apache/spark/sql/expressions/udaf.scala | 2 +- .../org/apache/spark/sql/internal/SharedState.scala | 2 +- .../apache/spark/sql/streaming/ContinuousQuery.scala | 2 +- .../org/apache/spark/sql/hive/client/HiveClient.scala | 2 +- .../apache/spark/sql/hive/orc/OrcFileOperator.scala | 2 +- .../spark/sql/hive/execution/HiveComparisonTest.scala |
spark git commit: [SPARK-15771][ML][EXAMPLES] Use 'accuracy' rather than 'precision' in many ML examples
Repository: spark Updated Branches: refs/heads/master fd8af3971 -> a95252823 [SPARK-15771][ML][EXAMPLES] Use 'accuracy' rather than 'precision' in many ML examples ## What changes were proposed in this pull request? Since [SPARK-15617](https://issues.apache.org/jira/browse/SPARK-15617) deprecated ```precision``` in ```MulticlassClassificationEvaluator```, many ML examples broken. ```python pyspark.sql.utils.IllegalArgumentException: u'MulticlassClassificationEvaluator_4c3bb1d73d8cc0cedae6 parameter metricName given invalid value precision.' ``` We should use ```accuracy``` to replace ```precision``` in these examples. ## How was this patch tested? Offline tests. Author: Yanbo Liang Closes #13519 from yanboliang/spark-15771. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a9525282 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a9525282 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a9525282 Branch: refs/heads/master Commit: a95252823e09939b654dd425db38dadc4100bc87 Parents: fd8af39 Author: Yanbo Liang Authored: Mon Jun 6 09:36:34 2016 +0100 Committer: Sean Owen Committed: Mon Jun 6 09:36:34 2016 +0100 -- .../examples/ml/JavaDecisionTreeClassificationExample.java | 2 +- .../examples/ml/JavaGradientBoostedTreeClassifierExample.java | 2 +- .../examples/ml/JavaMultilayerPerceptronClassifierExample.java | 6 +++--- .../org/apache/spark/examples/ml/JavaNaiveBayesExample.java| 6 +++--- .../org/apache/spark/examples/ml/JavaOneVsRestExample.java | 6 +++--- .../spark/examples/ml/JavaRandomForestClassifierExample.java | 2 +- .../src/main/python/ml/decision_tree_classification_example.py | 2 +- .../main/python/ml/gradient_boosted_tree_classifier_example.py | 2 +- .../src/main/python/ml/multilayer_perceptron_classification.py | 6 +++--- examples/src/main/python/ml/naive_bayes_example.py | 6 +++--- examples/src/main/python/ml/one_vs_rest_example.py | 6 +++--- .../src/main/python/ml/random_forest_classifier_example.py | 2 +- .../spark/examples/ml/DecisionTreeClassificationExample.scala | 2 +- .../examples/ml/GradientBoostedTreeClassifierExample.scala | 2 +- .../examples/ml/MultilayerPerceptronClassifierExample.scala| 6 +++--- .../scala/org/apache/spark/examples/ml/NaiveBayesExample.scala | 6 +++--- .../scala/org/apache/spark/examples/ml/OneVsRestExample.scala | 6 +++--- .../spark/examples/ml/RandomForestClassifierExample.scala | 2 +- python/pyspark/ml/evaluation.py| 2 +- 19 files changed, 37 insertions(+), 37 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a9525282/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java index bdb76f0..a9c6e7f 100644 --- a/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java +++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java @@ -90,7 +90,7 @@ public class JavaDecisionTreeClassificationExample { MulticlassClassificationEvaluator evaluator = new MulticlassClassificationEvaluator() .setLabelCol("indexedLabel") .setPredictionCol("prediction") - .setMetricName("precision"); + .setMetricName("accuracy"); double accuracy = evaluator.evaluate(predictions); System.out.println("Test Error = " + (1.0 - accuracy)); http://git-wip-us.apache.org/repos/asf/spark/blob/a9525282/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java index 5c2e03e..3e9eb99 100644 --- a/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java +++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java @@ -92,7 +92,7 @@ public class JavaGradientBoostedTreeClassifierExample { MulticlassClassificationEvaluator evaluator = new MulticlassClassificationEvaluator() .setLabelCol("indexedLabel") .setPredictionCol("prediction") - .setMetricName("precision"); + .setMetricName("accuracy"); double accuracy = evaluator.evaluate(predictions); System.out.println("Test Error = " + (1.
spark git commit: [MINOR] Fix Typos 'an -> a'
Repository: spark Updated Branches: refs/heads/branch-2.0 7d10e4bdd -> 90e94b826 [MINOR] Fix Typos 'an -> a' ## What changes were proposed in this pull request? `an -> a` Use cmds like `find . -name '*.R' | xargs -i sh -c "grep -in ' an [^aeiou]' {} && echo {}"` to generate candidates, and review them one by one. ## How was this patch tested? manual tests Author: Zheng RuiFeng Closes #13515 from zhengruifeng/an_a. (cherry picked from commit fd8af397132fa1415a4c19d7f5cb5a41aa6ddb27) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/90e94b82 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/90e94b82 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/90e94b82 Branch: refs/heads/branch-2.0 Commit: 90e94b82649d9816cd4065549678b82751238552 Parents: 7d10e4b Author: Zheng RuiFeng Authored: Mon Jun 6 09:35:47 2016 +0100 Committer: Sean Owen Committed: Mon Jun 6 09:35:57 2016 +0100 -- R/pkg/R/utils.R | 2 +- .../src/main/scala/org/apache/spark/Accumulable.scala | 2 +- .../org/apache/spark/api/java/JavaSparkContext.scala | 2 +- .../scala/org/apache/spark/api/python/PythonRDD.scala | 2 +- .../scala/org/apache/spark/deploy/SparkSubmit.scala | 6 +++--- .../src/main/scala/org/apache/spark/rdd/JdbcRDD.scala | 6 +++--- .../main/scala/org/apache/spark/scheduler/Pool.scala | 2 +- .../org/apache/spark/broadcast/BroadcastSuite.scala | 2 +- .../spark/deploy/rest/StandaloneRestSubmitSuite.scala | 2 +- .../test/scala/org/apache/spark/rpc/RpcEnvSuite.scala | 2 +- .../apache/spark/scheduler/DAGSchedulerSuite.scala| 4 ++-- .../org/apache/spark/util/JsonProtocolSuite.scala | 2 +- .../spark/streaming/flume/FlumeBatchFetcher.scala | 2 +- .../spark/graphx/impl/VertexPartitionBaseOps.scala| 2 +- .../scala/org/apache/spark/ml/linalg/Vectors.scala| 2 +- .../src/main/scala/org/apache/spark/ml/Pipeline.scala | 2 +- .../spark/ml/classification/LogisticRegression.scala | 4 ++-- .../org/apache/spark/ml/tree/impl/RandomForest.scala | 2 +- .../mllib/classification/LogisticRegression.scala | 2 +- .../org/apache/spark/mllib/classification/SVM.scala | 2 +- .../spark/mllib/feature/VectorTransformer.scala | 2 +- .../scala/org/apache/spark/mllib/linalg/Vectors.scala | 2 +- .../mllib/linalg/distributed/CoordinateMatrix.scala | 2 +- .../apache/spark/mllib/rdd/MLPairRDDFunctions.scala | 2 +- python/pyspark/ml/classification.py | 4 ++-- python/pyspark/ml/pipeline.py | 2 +- python/pyspark/mllib/classification.py| 2 +- python/pyspark/mllib/common.py| 2 +- python/pyspark/rdd.py | 4 ++-- python/pyspark/sql/session.py | 2 +- python/pyspark/sql/streaming.py | 2 +- python/pyspark/sql/types.py | 2 +- python/pyspark/streaming/dstream.py | 4 ++-- .../src/main/scala/org/apache/spark/sql/Row.scala | 2 +- .../apache/spark/sql/catalyst/analysis/Analyzer.scala | 4 ++-- .../sql/catalyst/analysis/FunctionRegistry.scala | 2 +- .../sql/catalyst/analysis/MultiInstanceRelation.scala | 2 +- .../spark/sql/catalyst/catalog/SessionCatalog.scala | 6 +++--- .../sql/catalyst/catalog/functionResources.scala | 2 +- .../sql/catalyst/expressions/ExpectsInputTypes.scala | 2 +- .../spark/sql/catalyst/expressions/Projection.scala | 4 ++-- .../sql/catalyst/expressions/complexTypeCreator.scala | 2 +- .../org/apache/spark/sql/types/AbstractDataType.scala | 2 +- .../scala/org/apache/spark/sql/DataFrameReader.scala | 2 +- .../main/scala/org/apache/spark/sql/SQLContext.scala | 14 +++--- .../scala/org/apache/spark/sql/SQLImplicits.scala | 2 +- .../scala/org/apache/spark/sql/SparkSession.scala | 14 +++--- .../org/apache/spark/sql/catalyst/SQLBuilder.scala| 2 +- .../aggregate/SortBasedAggregationIterator.scala | 2 +- .../apache/spark/sql/execution/aggregate/udaf.scala | 2 +- .../execution/columnar/GenerateColumnAccessor.scala | 2 +- .../execution/datasources/FileSourceStrategy.scala| 2 +- .../execution/datasources/json/JacksonParser.scala| 2 +- .../datasources/parquet/CatalystRowConverter.scala| 2 +- .../sql/execution/exchange/ExchangeCoordinator.scala | 10 +- .../spark/sql/execution/joins/SortMergeJoinExec.scala | 2 +- .../spark/sql/execution/r/MapPartitionsRWrapper.scala | 2 +- .../scala/org/apache/spark/sql/expressions/udaf.scala | 2 +- .../org/apache/spark/sql/internal/SharedState.scala | 2 +- .../apache/spark/sql/streaming/ContinuousQuery.scala | 2 +- .../org/apache/spark/sql/hive/client/HiveClient.scala | 2 +- .../apache/spark
spark git commit: [SPARK-15771][ML][EXAMPLES] Use 'accuracy' rather than 'precision' in many ML examples
Repository: spark Updated Branches: refs/heads/branch-2.0 90e94b826 -> 86a35a229 [SPARK-15771][ML][EXAMPLES] Use 'accuracy' rather than 'precision' in many ML examples ## What changes were proposed in this pull request? Since [SPARK-15617](https://issues.apache.org/jira/browse/SPARK-15617) deprecated ```precision``` in ```MulticlassClassificationEvaluator```, many ML examples broken. ```python pyspark.sql.utils.IllegalArgumentException: u'MulticlassClassificationEvaluator_4c3bb1d73d8cc0cedae6 parameter metricName given invalid value precision.' ``` We should use ```accuracy``` to replace ```precision``` in these examples. ## How was this patch tested? Offline tests. Author: Yanbo Liang Closes #13519 from yanboliang/spark-15771. (cherry picked from commit a95252823e09939b654dd425db38dadc4100bc87) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/86a35a22 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/86a35a22 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/86a35a22 Branch: refs/heads/branch-2.0 Commit: 86a35a22985b9e592744e6ef31453995f2322a31 Parents: 90e94b8 Author: Yanbo Liang Authored: Mon Jun 6 09:36:34 2016 +0100 Committer: Sean Owen Committed: Mon Jun 6 09:36:43 2016 +0100 -- .../examples/ml/JavaDecisionTreeClassificationExample.java | 2 +- .../examples/ml/JavaGradientBoostedTreeClassifierExample.java | 2 +- .../examples/ml/JavaMultilayerPerceptronClassifierExample.java | 6 +++--- .../org/apache/spark/examples/ml/JavaNaiveBayesExample.java| 6 +++--- .../org/apache/spark/examples/ml/JavaOneVsRestExample.java | 6 +++--- .../spark/examples/ml/JavaRandomForestClassifierExample.java | 2 +- .../src/main/python/ml/decision_tree_classification_example.py | 2 +- .../main/python/ml/gradient_boosted_tree_classifier_example.py | 2 +- .../src/main/python/ml/multilayer_perceptron_classification.py | 6 +++--- examples/src/main/python/ml/naive_bayes_example.py | 6 +++--- examples/src/main/python/ml/one_vs_rest_example.py | 6 +++--- .../src/main/python/ml/random_forest_classifier_example.py | 2 +- .../spark/examples/ml/DecisionTreeClassificationExample.scala | 2 +- .../examples/ml/GradientBoostedTreeClassifierExample.scala | 2 +- .../examples/ml/MultilayerPerceptronClassifierExample.scala| 6 +++--- .../scala/org/apache/spark/examples/ml/NaiveBayesExample.scala | 6 +++--- .../scala/org/apache/spark/examples/ml/OneVsRestExample.scala | 6 +++--- .../spark/examples/ml/RandomForestClassifierExample.scala | 2 +- python/pyspark/ml/evaluation.py| 2 +- 19 files changed, 37 insertions(+), 37 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/86a35a22/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java index bdb76f0..a9c6e7f 100644 --- a/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java +++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java @@ -90,7 +90,7 @@ public class JavaDecisionTreeClassificationExample { MulticlassClassificationEvaluator evaluator = new MulticlassClassificationEvaluator() .setLabelCol("indexedLabel") .setPredictionCol("prediction") - .setMetricName("precision"); + .setMetricName("accuracy"); double accuracy = evaluator.evaluate(predictions); System.out.println("Test Error = " + (1.0 - accuracy)); http://git-wip-us.apache.org/repos/asf/spark/blob/86a35a22/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java index 5c2e03e..3e9eb99 100644 --- a/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java +++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java @@ -92,7 +92,7 @@ public class JavaGradientBoostedTreeClassifierExample { MulticlassClassificationEvaluator evaluator = new MulticlassClassificationEvaluator() .setLabelCol("indexedLabel") .setPredictionCol("prediction") - .setMetricName("precision"); + .setMetricName("accuracy"
spark git commit: [SPARK-14900][ML][PYSPARK] Add accuracy and deprecate precison, recall, f1
Repository: spark Updated Branches: refs/heads/master a95252823 -> 00ad4f054 [SPARK-14900][ML][PYSPARK] Add accuracy and deprecate precison,recall,f1 ## What changes were proposed in this pull request? 1, add accuracy for MulticlassMetrics 2, deprecate overall precision,recall,f1 and recommend accuracy usage ## How was this patch tested? manual tests in pyspark shell Author: Zheng RuiFeng Closes #13511 from zhengruifeng/deprecate_py_precisonrecall. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/00ad4f05 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/00ad4f05 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/00ad4f05 Branch: refs/heads/master Commit: 00ad4f054cd044e17d29b7c2c62efd8616462619 Parents: a952528 Author: Zheng RuiFeng Authored: Mon Jun 6 15:19:22 2016 +0100 Committer: Sean Owen Committed: Mon Jun 6 15:19:22 2016 +0100 -- python/pyspark/mllib/evaluation.py | 18 ++ 1 file changed, 18 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/00ad4f05/python/pyspark/mllib/evaluation.py -- diff --git a/python/pyspark/mllib/evaluation.py b/python/pyspark/mllib/evaluation.py index 5f32f09..2eaac87 100644 --- a/python/pyspark/mllib/evaluation.py +++ b/python/pyspark/mllib/evaluation.py @@ -15,6 +15,8 @@ # limitations under the License. # +import warnings + from pyspark import since from pyspark.mllib.common import JavaModelWrapper, callMLlibFunc from pyspark.sql import SQLContext @@ -181,6 +183,8 @@ class MulticlassMetrics(JavaModelWrapper): 0.66... >>> metrics.recall() 0.66... +>>> metrics.accuracy() +0.66... >>> metrics.weightedFalsePositiveRate 0.19... >>> metrics.weightedPrecision @@ -233,6 +237,8 @@ class MulticlassMetrics(JavaModelWrapper): Returns precision or precision for a given label (category) if specified. """ if label is None: +# note:: Deprecated in 2.0.0. Use accuracy. +warnings.warn("Deprecated in 2.0.0. Use accuracy.") return self.call("precision") else: return self.call("precision", float(label)) @@ -243,6 +249,8 @@ class MulticlassMetrics(JavaModelWrapper): Returns recall or recall for a given label (category) if specified. """ if label is None: +# note:: Deprecated in 2.0.0. Use accuracy. +warnings.warn("Deprecated in 2.0.0. Use accuracy.") return self.call("recall") else: return self.call("recall", float(label)) @@ -254,6 +262,8 @@ class MulticlassMetrics(JavaModelWrapper): """ if beta is None: if label is None: +# note:: Deprecated in 2.0.0. Use accuracy. +warnings.warn("Deprecated in 2.0.0. Use accuracy.") return self.call("fMeasure") else: return self.call("fMeasure", label) @@ -263,6 +273,14 @@ class MulticlassMetrics(JavaModelWrapper): else: return self.call("fMeasure", label, beta) +@since('2.0.0') +def accuracy(self): +""" +Returns accuracy (equals to the total number of correctly classified instances +out of the total number of instances). +""" +return self.call("accuracy") + @property @since('1.4.0') def weightedTruePositiveRate(self): - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-14900][ML][PYSPARK] Add accuracy and deprecate precison, recall, f1
Repository: spark Updated Branches: refs/heads/branch-2.0 86a35a229 -> e38ff70e6 [SPARK-14900][ML][PYSPARK] Add accuracy and deprecate precison,recall,f1 ## What changes were proposed in this pull request? 1, add accuracy for MulticlassMetrics 2, deprecate overall precision,recall,f1 and recommend accuracy usage ## How was this patch tested? manual tests in pyspark shell Author: Zheng RuiFeng Closes #13511 from zhengruifeng/deprecate_py_precisonrecall. (cherry picked from commit 00ad4f054cd044e17d29b7c2c62efd8616462619) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e38ff70e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e38ff70e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e38ff70e Branch: refs/heads/branch-2.0 Commit: e38ff70e6bacf1c85edc390d28f8a8d5ecc6cbc3 Parents: 86a35a2 Author: Zheng RuiFeng Authored: Mon Jun 6 15:19:22 2016 +0100 Committer: Sean Owen Committed: Mon Jun 6 15:19:38 2016 +0100 -- python/pyspark/mllib/evaluation.py | 18 ++ 1 file changed, 18 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e38ff70e/python/pyspark/mllib/evaluation.py -- diff --git a/python/pyspark/mllib/evaluation.py b/python/pyspark/mllib/evaluation.py index 5f32f09..2eaac87 100644 --- a/python/pyspark/mllib/evaluation.py +++ b/python/pyspark/mllib/evaluation.py @@ -15,6 +15,8 @@ # limitations under the License. # +import warnings + from pyspark import since from pyspark.mllib.common import JavaModelWrapper, callMLlibFunc from pyspark.sql import SQLContext @@ -181,6 +183,8 @@ class MulticlassMetrics(JavaModelWrapper): 0.66... >>> metrics.recall() 0.66... +>>> metrics.accuracy() +0.66... >>> metrics.weightedFalsePositiveRate 0.19... >>> metrics.weightedPrecision @@ -233,6 +237,8 @@ class MulticlassMetrics(JavaModelWrapper): Returns precision or precision for a given label (category) if specified. """ if label is None: +# note:: Deprecated in 2.0.0. Use accuracy. +warnings.warn("Deprecated in 2.0.0. Use accuracy.") return self.call("precision") else: return self.call("precision", float(label)) @@ -243,6 +249,8 @@ class MulticlassMetrics(JavaModelWrapper): Returns recall or recall for a given label (category) if specified. """ if label is None: +# note:: Deprecated in 2.0.0. Use accuracy. +warnings.warn("Deprecated in 2.0.0. Use accuracy.") return self.call("recall") else: return self.call("recall", float(label)) @@ -254,6 +262,8 @@ class MulticlassMetrics(JavaModelWrapper): """ if beta is None: if label is None: +# note:: Deprecated in 2.0.0. Use accuracy. +warnings.warn("Deprecated in 2.0.0. Use accuracy.") return self.call("fMeasure") else: return self.call("fMeasure", label) @@ -263,6 +273,14 @@ class MulticlassMetrics(JavaModelWrapper): else: return self.call("fMeasure", label, beta) +@since('2.0.0') +def accuracy(self): +""" +Returns accuracy (equals to the total number of correctly classified instances +out of the total number of instances). +""" +return self.call("accuracy") + @property @since('1.4.0') def weightedTruePositiveRate(self): - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r1747061 - in /spark: downloads.md js/downloads.js site/downloads.html site/js/downloads.js
Author: srowen Date: Mon Jun 6 19:56:07 2016 New Revision: 1747061 URL: http://svn.apache.org/viewvc?rev=1747061&view=rev Log: SPARK-15778 add spark-2.0.0-preview release to options and other minor related updates Modified: spark/downloads.md spark/js/downloads.js spark/site/downloads.html spark/site/js/downloads.js Modified: spark/downloads.md URL: http://svn.apache.org/viewvc/spark/downloads.md?rev=1747061&r1=1747060&r2=1747061&view=diff == --- spark/downloads.md (original) +++ spark/downloads.md Mon Jun 6 19:56:07 2016 @@ -16,7 +16,7 @@ $(document).ready(function() { ## Download Apache Spark™ -Our latest version is Apache Spark 1.6.1, released on March 9, 2016 +Our latest stable version is Apache Spark 1.6.1, released on March 9, 2016 (release notes) https://github.com/apache/spark/releases/tag/v1.6.1";>(git tag) @@ -36,6 +36,17 @@ Our latest version is Apache Spark 1.6.1 _Note: Scala 2.11 users should download the Spark source package and build [with Scala 2.11 support](http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211)._ +### Latest Preview Release + +Preview releases, as the name suggests, are releases for previewing upcoming features. +Unlike nightly packages, preview releases have been audited by the project's management committee +to satisfy the legal requirements of Apache Software Foundation's release policy. +Preview releases are not meant to be functional, i.e. they can and highly likely will contain +critical bugs or documentation errors. + +The latest preview release is Spark 2.0.0-preview, published on May 24, 2016. +You can select and download it above. + ### Link with Spark Spark artifacts are [hosted in Maven Central](http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.spark%22). You can add a Maven dependency with the following coordinates: @@ -54,14 +65,9 @@ If you are interested in working with th Once you've downloaded Spark, you can find instructions for installing and building it on the documentation page. -Stable Releases - - -### Latest Preview Release (Spark 2.0.0-preview) -Preview releases, as the name suggests, are releases for previewing upcoming features. Unlike nightly packages, preview releases have been audited by the project's management committee to satisfy the legal requirements of Apache Software Foundation's release policy.Preview releases are not meant to be functional, i.e. they can and highly likely will contain critical bugs or documentation errors. - -The latest preview release is Spark 2.0.0-preview, published on May 24, 2016. You can https://dist.apache.org/repos/dist/release/spark/spark-2.0.0-preview/";>download it here. +### Release Notes for Stable Releases + ### Nightly Packages and Artifacts For developers, Spark maintains nightly builds and SNAPSHOT artifacts. More information is available on the [Spark developer Wiki](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-NightlyBuilds). Modified: spark/js/downloads.js URL: http://svn.apache.org/viewvc/spark/js/downloads.js?rev=1747061&r1=1747060&r2=1747061&view=diff == --- spark/js/downloads.js (original) +++ spark/js/downloads.js Mon Jun 6 19:56:07 2016 @@ -3,8 +3,8 @@ releases = {}; -function addRelease(version, releaseDate, packages, downloadable) { - releases[version] = {released: releaseDate, packages: packages, downloadable: downloadable}; +function addRelease(version, releaseDate, packages, downloadable, stable) { + releases[version] = {released: releaseDate, packages: packages, downloadable: downloadable, stable: stable}; } var sources = {pretty: "Source Code [can build several Hadoop versions]", tag: "sources"}; @@ -13,8 +13,9 @@ var hadoop1 = {pretty: "Pre-built for Ha var cdh4 = {pretty: "Pre-built for CDH 4", tag: "cdh4"}; var hadoop2 = {pretty: "Pre-built for Hadoop 2.2", tag: "hadoop2"}; var hadoop2p3 = {pretty: "Pre-built for Hadoop 2.3", tag: "hadoop2.3"}; -var hadoop2p4 = {pretty: "Pre-built for Hadoop 2.4 and later", tag: "hadoop2.4"}; -var hadoop2p6 = {pretty: "Pre-built for Hadoop 2.6 and later", tag: "hadoop2.6"}; +var hadoop2p4 = {pretty: "Pre-built for Hadoop 2.4", tag: "hadoop2.4"}; +var hadoop2p6 = {pretty: "Pre-built for Hadoop 2.6", tag: "hadoop2.6"}; +var hadoop2p7 = {pretty: "Pre-built for Hadoop 2.7 and later", tag: "hadoop2.7"}; var mapr3 = {pretty: "Pre-built for MapR 3.X", tag: "mapr3"}; var mapr4 = {pretty: "Pre-built for MapR 4.X", tag: "mapr4"}; @@ -31,32 +3
svn commit: r1747076 - in /spark: js/downloads.js site/js/downloads.js
Author: srowen Date: Mon Jun 6 20:59:54 2016 New Revision: 1747076 URL: http://svn.apache.org/viewvc?rev=1747076&view=rev Log: SPARK-15778 part 2: group preview/stable releases in download version dropdown Modified: spark/js/downloads.js spark/site/js/downloads.js Modified: spark/js/downloads.js URL: http://svn.apache.org/viewvc/spark/js/downloads.js?rev=1747076&r1=1747075&r2=1747076&view=diff == --- spark/js/downloads.js (original) +++ spark/js/downloads.js Mon Jun 6 20:59:54 2016 @@ -53,18 +53,18 @@ addRelease("1.1.0", new Date("9/11/2014" addRelease("1.0.2", new Date("8/5/2014"), sources.concat(packagesV3), true, true); addRelease("1.0.1", new Date("7/11/2014"), sources.concat(packagesV3), false, true); addRelease("1.0.0", new Date("5/30/2014"), sources.concat(packagesV2), false, true); -addRelease("0.9.2", new Date("7/23/2014"), sources.concat(packagesV2), true, false); -addRelease("0.9.1", new Date("4/9/2014"), sources.concat(packagesV2), false, false); -addRelease("0.9.0-incubating", new Date("2/2/2014"), sources.concat(packagesV2), false, false); -addRelease("0.8.1-incubating", new Date("12/19/2013"), sources.concat(packagesV2), true, false); -addRelease("0.8.0-incubating", new Date("9/25/2013"), sources.concat(packagesV1), true, false); -addRelease("0.7.3", new Date("7/16/2013"), sources.concat(packagesV1), true, false); -addRelease("0.7.2", new Date("2/6/2013"), sources.concat(packagesV1), false, false); -addRelease("0.7.0", new Date("2/27/2013"), sources, false, false); +addRelease("0.9.2", new Date("7/23/2014"), sources.concat(packagesV2), true, true); +addRelease("0.9.1", new Date("4/9/2014"), sources.concat(packagesV2), false, true); +addRelease("0.9.0-incubating", new Date("2/2/2014"), sources.concat(packagesV2), false, true); +addRelease("0.8.1-incubating", new Date("12/19/2013"), sources.concat(packagesV2), true, true); +addRelease("0.8.0-incubating", new Date("9/25/2013"), sources.concat(packagesV1), true, true); +addRelease("0.7.3", new Date("7/16/2013"), sources.concat(packagesV1), true, true); +addRelease("0.7.2", new Date("2/6/2013"), sources.concat(packagesV1), false, true); +addRelease("0.7.0", new Date("2/27/2013"), sources, false, true); function append(el, contents) { - el.innerHTML = el.innerHTML + contents; -}; + el.innerHTML += contents; +} function empty(el) { el.innerHTML = ""; @@ -79,27 +79,25 @@ function versionShort(version) { return function initDownloads() { var versionSelect = document.getElementById("sparkVersionSelect"); - // Populate versions - var markedDefault = false; + // Populate stable versions + append(versionSelect, ""); for (var version in releases) { +if (!releases[version].downloadable || !releases[version].stable) { continue; } var releaseDate = releases[version].released; -var downloadable = releases[version].downloadable; -var stable = releases[version].stable; - -if (!downloadable) { continue; } - -var selected = false; -if (!markedDefault && stable) { - selected = true; - markedDefault = true; -} +var title = versionShort(version) + " (" + releaseDate.toDateString().slice(4) + ")"; +append(versionSelect, "" + title + ""); + } + append(versionSelect, ""); -// Don't display incubation status here + // Populate other versions + append(versionSelect, ""); + for (var version in releases) { +if (!releases[version].downloadable || releases[version].stable) { continue; } +var releaseDate = releases[version].released; var title = versionShort(version) + " (" + releaseDate.toDateString().slice(4) + ")"; -append(versionSelect, - "" + - title + ""); +append(versionSelect, "" + title + ""); } + append(versionSelect, ""); // Populate packages and (transitively) releases onVersionSelect(); Modified: spark/site/js/downloads.js URL: http://svn.apache.org/viewvc/spark/site/js/downloads.js?rev=1747076&r1=1747075&r2=1747076&view=diff == --- spark/site/js/downloads.js (original) +++ spark/site/js/downloads.js Mon Jun 6 20:59:54 2016 @@ -53,18 +53,18 @@ addRelease("1.1.0", new Date("9/11/2014" addRelease("1.0.2", new Date("8/5
spark git commit: [SPARK-12655][GRAPHX] GraphX does not unpersist RDDs
Repository: spark Updated Branches: refs/heads/branch-1.6 6a9f19dd5 -> 5830828ef [SPARK-12655][GRAPHX] GraphX does not unpersist RDDs Some VertexRDD and EdgeRDD are created during the intermediate step of g.connectedComponents() but unnecessarily left cached after the method is done. The fix is to unpersist these RDDs once they are no longer in use. A test case is added to confirm the fix for the reported bug. Author: Jason Lee Closes #10713 from jasoncl/SPARK-12655. (cherry picked from commit d0a5c32bd05841f411a342a80c5da9f73f30d69a) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5830828e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5830828e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5830828e Branch: refs/heads/branch-1.6 Commit: 5830828efbf863df510a2b5b17d76214863ff48f Parents: 6a9f19d Author: Jason Lee Authored: Fri Jan 15 12:04:05 2016 + Committer: Sean Owen Committed: Tue Jun 7 09:25:04 2016 +0100 -- .../scala/org/apache/spark/graphx/Pregel.scala | 2 +- .../spark/graphx/lib/ConnectedComponents.scala | 4 +++- .../scala/org/apache/spark/graphx/GraphSuite.scala | 17 + 3 files changed, 21 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5830828e/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala -- diff --git a/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala b/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala index 2ca60d5..8a89295 100644 --- a/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala +++ b/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala @@ -151,7 +151,7 @@ object Pregel extends Logging { // count the iteration i += 1 } - +messages.unpersist(blocking = false) g } // end of apply http://git-wip-us.apache.org/repos/asf/spark/blob/5830828e/graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala -- diff --git a/graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala b/graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala index 859f896..f72cbb1 100644 --- a/graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala +++ b/graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala @@ -47,9 +47,11 @@ object ConnectedComponents { } } val initialMessage = Long.MaxValue -Pregel(ccGraph, initialMessage, activeDirection = EdgeDirection.Either)( +val pregelGraph = Pregel(ccGraph, initialMessage, activeDirection = EdgeDirection.Either)( vprog = (id, attr, msg) => math.min(attr, msg), sendMsg = sendMessage, mergeMsg = (a, b) => math.min(a, b)) +ccGraph.unpersist() +pregelGraph } // end of connectedComponents } http://git-wip-us.apache.org/repos/asf/spark/blob/5830828e/graphx/src/test/scala/org/apache/spark/graphx/GraphSuite.scala -- diff --git a/graphx/src/test/scala/org/apache/spark/graphx/GraphSuite.scala b/graphx/src/test/scala/org/apache/spark/graphx/GraphSuite.scala index 9acbd79..a46c5da 100644 --- a/graphx/src/test/scala/org/apache/spark/graphx/GraphSuite.scala +++ b/graphx/src/test/scala/org/apache/spark/graphx/GraphSuite.scala @@ -428,6 +428,23 @@ class GraphSuite extends SparkFunSuite with LocalSparkContext { } } + test("unpersist graph RDD") { +withSpark { sc => + val vert = sc.parallelize(List((1L, "a"), (2L, "b"), (3L, "c")), 1) + val edges = sc.parallelize(List(Edge[Long](1L, 2L), Edge[Long](1L, 3L)), 1) + val g0 = Graph(vert, edges) + val g = g0.partitionBy(PartitionStrategy.EdgePartition2D, 2) + val cc = g.connectedComponents() + assert(sc.getPersistentRDDs.nonEmpty) + cc.unpersist() + g.unpersist() + g0.unpersist() + vert.unpersist() + edges.unpersist() + assert(sc.getPersistentRDDs.isEmpty) +} + } + test("SPARK-14219: pickRandomVertex") { withSpark { sc => val vert = sc.parallelize(List((1L, "a")), 1) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR] fix typo in documents
Repository: spark Updated Branches: refs/heads/branch-2.0 57dd4efcd -> a7e9e60df [MINOR] fix typo in documents ## What changes were proposed in this pull request? I use spell check tools checks typo in spark documents and fix them. ## How was this patch tested? N/A Author: WeichenXu Closes #13538 from WeichenXu123/fix_doc_typo. (cherry picked from commit 1e2c9311871968426e019164b129652fd6d0037f) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a7e9e60d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a7e9e60d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a7e9e60d Branch: refs/heads/branch-2.0 Commit: a7e9e60df5c10a90c06883ea3203ec895b9b1f82 Parents: 57dd4ef Author: WeichenXu Authored: Tue Jun 7 13:29:27 2016 +0100 Committer: Sean Owen Committed: Tue Jun 7 13:29:36 2016 +0100 -- docs/graphx-programming-guide.md| 2 +- docs/hardware-provisioning.md | 2 +- docs/streaming-programming-guide.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a7e9e60d/docs/graphx-programming-guide.md -- diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md index 9dea9b5..81cf174 100644 --- a/docs/graphx-programming-guide.md +++ b/docs/graphx-programming-guide.md @@ -132,7 +132,7 @@ var graph: Graph[VertexProperty, String] = null Like RDDs, property graphs are immutable, distributed, and fault-tolerant. Changes to the values or structure of the graph are accomplished by producing a new graph with the desired changes. Note -that substantial parts of the original graph (i.e., unaffected structure, attributes, and indicies) +that substantial parts of the original graph (i.e., unaffected structure, attributes, and indices) are reused in the new graph reducing the cost of this inherently functional data structure. The graph is partitioned across the executors using a range of vertex partitioning heuristics. As with RDDs, each partition of the graph can be recreated on a different machine in the event of a failure. http://git-wip-us.apache.org/repos/asf/spark/blob/a7e9e60d/docs/hardware-provisioning.md -- diff --git a/docs/hardware-provisioning.md b/docs/hardware-provisioning.md index 60ecb4f..bb6f616 100644 --- a/docs/hardware-provisioning.md +++ b/docs/hardware-provisioning.md @@ -22,7 +22,7 @@ Hadoop and Spark on a common cluster manager like [Mesos](running-on-mesos.html) * If this is not possible, run Spark on different nodes in the same local-area network as HDFS. -* For low-latency data stores like HBase, it may be preferrable to run computing jobs on different +* For low-latency data stores like HBase, it may be preferable to run computing jobs on different nodes than the storage system to avoid interference. # Local Disks http://git-wip-us.apache.org/repos/asf/spark/blob/a7e9e60d/docs/streaming-programming-guide.md -- diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md index 78ae6a7..0a6a039 100644 --- a/docs/streaming-programming-guide.md +++ b/docs/streaming-programming-guide.md @@ -1259,7 +1259,7 @@ dstream.foreachRDD(sendRecord) This is incorrect as this requires the connection object to be serialized and sent from the -driver to the worker. Such connection objects are rarely transferrable across machines. This +driver to the worker. Such connection objects are rarely transferable across machines. This error may manifest as serialization errors (connection object not serializable), initialization errors (connection object needs to be initialized at the workers), etc. The correct solution is to create the connection object at the worker. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR] fix typo in documents
Repository: spark Updated Branches: refs/heads/master 5f731d685 -> 1e2c93118 [MINOR] fix typo in documents ## What changes were proposed in this pull request? I use spell check tools checks typo in spark documents and fix them. ## How was this patch tested? N/A Author: WeichenXu Closes #13538 from WeichenXu123/fix_doc_typo. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1e2c9311 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1e2c9311 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1e2c9311 Branch: refs/heads/master Commit: 1e2c9311871968426e019164b129652fd6d0037f Parents: 5f731d6 Author: WeichenXu Authored: Tue Jun 7 13:29:27 2016 +0100 Committer: Sean Owen Committed: Tue Jun 7 13:29:27 2016 +0100 -- docs/graphx-programming-guide.md| 2 +- docs/hardware-provisioning.md | 2 +- docs/streaming-programming-guide.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1e2c9311/docs/graphx-programming-guide.md -- diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md index 9dea9b5..81cf174 100644 --- a/docs/graphx-programming-guide.md +++ b/docs/graphx-programming-guide.md @@ -132,7 +132,7 @@ var graph: Graph[VertexProperty, String] = null Like RDDs, property graphs are immutable, distributed, and fault-tolerant. Changes to the values or structure of the graph are accomplished by producing a new graph with the desired changes. Note -that substantial parts of the original graph (i.e., unaffected structure, attributes, and indicies) +that substantial parts of the original graph (i.e., unaffected structure, attributes, and indices) are reused in the new graph reducing the cost of this inherently functional data structure. The graph is partitioned across the executors using a range of vertex partitioning heuristics. As with RDDs, each partition of the graph can be recreated on a different machine in the event of a failure. http://git-wip-us.apache.org/repos/asf/spark/blob/1e2c9311/docs/hardware-provisioning.md -- diff --git a/docs/hardware-provisioning.md b/docs/hardware-provisioning.md index 60ecb4f..bb6f616 100644 --- a/docs/hardware-provisioning.md +++ b/docs/hardware-provisioning.md @@ -22,7 +22,7 @@ Hadoop and Spark on a common cluster manager like [Mesos](running-on-mesos.html) * If this is not possible, run Spark on different nodes in the same local-area network as HDFS. -* For low-latency data stores like HBase, it may be preferrable to run computing jobs on different +* For low-latency data stores like HBase, it may be preferable to run computing jobs on different nodes than the storage system to avoid interference. # Local Disks http://git-wip-us.apache.org/repos/asf/spark/blob/1e2c9311/docs/streaming-programming-guide.md -- diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md index 78ae6a7..0a6a039 100644 --- a/docs/streaming-programming-guide.md +++ b/docs/streaming-programming-guide.md @@ -1259,7 +1259,7 @@ dstream.foreachRDD(sendRecord) This is incorrect as this requires the connection object to be serialized and sent from the -driver to the worker. Such connection objects are rarely transferrable across machines. This +driver to the worker. Such connection objects are rarely transferable across machines. This error may manifest as serialization errors (connection object not serializable), initialization errors (connection object needs to be initialized at the workers), etc. The correct solution is to create the connection object at the worker. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15793][ML] Add maxSentenceLength for ml.Word2Vec
Repository: spark Updated Branches: refs/heads/master 91fbc880b -> 87706eb66 [SPARK-15793][ML] Add maxSentenceLength for ml.Word2Vec ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-15793 Word2vec in ML package should have maxSentenceLength method for feature parity. ## How was this patch tested? Tested with Spark unit test. Author: yinxusen Closes #13536 from yinxusen/SPARK-15793. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/87706eb6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/87706eb6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/87706eb6 Branch: refs/heads/master Commit: 87706eb66cd1370862a1f8ea447484c80969e45f Parents: 91fbc88 Author: yinxusen Authored: Wed Jun 8 09:18:04 2016 +0100 Committer: Sean Owen Committed: Wed Jun 8 09:18:04 2016 +0100 -- .../org/apache/spark/ml/feature/Word2Vec.scala | 19 +++ .../apache/spark/ml/feature/Word2VecSuite.scala | 1 + 2 files changed, 20 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/87706eb6/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala index 2d89eb0..33515b2 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala @@ -87,6 +87,21 @@ private[feature] trait Word2VecBase extends Params /** @group getParam */ def getMinCount: Int = $(minCount) + /** + * Sets the maximum length (in words) of each sentence in the input data. + * Any sentence longer than this threshold will be divided into chunks of + * up to `maxSentenceLength` size. + * Default: 1000 + * @group param + */ + final val maxSentenceLength = new IntParam(this, "maxSentenceLength", "Maximum length " + +"(in words) of each sentence in the input data. Any sentence longer than this threshold will " + +"be divided into chunks up to the size.") + setDefault(maxSentenceLength -> 1000) + + /** @group getParam */ + def getMaxSentenceLength: Int = $(maxSentenceLength) + setDefault(stepSize -> 0.025) setDefault(maxIter -> 1) @@ -137,6 +152,9 @@ final class Word2Vec(override val uid: String) extends Estimator[Word2VecModel] /** @group setParam */ def setMinCount(value: Int): this.type = set(minCount, value) + /** @group setParam */ + def setMaxSentenceLength(value: Int): this.type = set(maxSentenceLength, value) + @Since("2.0.0") override def fit(dataset: Dataset[_]): Word2VecModel = { transformSchema(dataset.schema, logging = true) @@ -149,6 +167,7 @@ final class Word2Vec(override val uid: String) extends Estimator[Word2VecModel] .setSeed($(seed)) .setVectorSize($(vectorSize)) .setWindowSize($(windowSize)) + .setMaxSentenceLength($(maxSentenceLength)) .fit(input) copyValues(new Word2VecModel(uid, wordVectors).setParent(this)) } http://git-wip-us.apache.org/repos/asf/spark/blob/87706eb6/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala -- diff --git a/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala index 280a36f..16c74f6 100644 --- a/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala @@ -191,6 +191,7 @@ class Word2VecSuite extends SparkFunSuite with MLlibTestSparkContext with Defaul .setSeed(42L) .setStepSize(0.01) .setVectorSize(100) + .setMaxSentenceLength(500) testDefaultReadWrite(t) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15793][ML] Add maxSentenceLength for ml.Word2Vec
Repository: spark Updated Branches: refs/heads/branch-2.0 141e910af -> a790ac579 [SPARK-15793][ML] Add maxSentenceLength for ml.Word2Vec ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-15793 Word2vec in ML package should have maxSentenceLength method for feature parity. ## How was this patch tested? Tested with Spark unit test. Author: yinxusen Closes #13536 from yinxusen/SPARK-15793. (cherry picked from commit 87706eb66cd1370862a1f8ea447484c80969e45f) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a790ac57 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a790ac57 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a790ac57 Branch: refs/heads/branch-2.0 Commit: a790ac5793e1988895341fa878f947b09b275926 Parents: 141e910 Author: yinxusen Authored: Wed Jun 8 09:18:04 2016 +0100 Committer: Sean Owen Committed: Wed Jun 8 09:18:17 2016 +0100 -- .../org/apache/spark/ml/feature/Word2Vec.scala | 19 +++ .../apache/spark/ml/feature/Word2VecSuite.scala | 1 + 2 files changed, 20 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a790ac57/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala index 2d89eb0..33515b2 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala @@ -87,6 +87,21 @@ private[feature] trait Word2VecBase extends Params /** @group getParam */ def getMinCount: Int = $(minCount) + /** + * Sets the maximum length (in words) of each sentence in the input data. + * Any sentence longer than this threshold will be divided into chunks of + * up to `maxSentenceLength` size. + * Default: 1000 + * @group param + */ + final val maxSentenceLength = new IntParam(this, "maxSentenceLength", "Maximum length " + +"(in words) of each sentence in the input data. Any sentence longer than this threshold will " + +"be divided into chunks up to the size.") + setDefault(maxSentenceLength -> 1000) + + /** @group getParam */ + def getMaxSentenceLength: Int = $(maxSentenceLength) + setDefault(stepSize -> 0.025) setDefault(maxIter -> 1) @@ -137,6 +152,9 @@ final class Word2Vec(override val uid: String) extends Estimator[Word2VecModel] /** @group setParam */ def setMinCount(value: Int): this.type = set(minCount, value) + /** @group setParam */ + def setMaxSentenceLength(value: Int): this.type = set(maxSentenceLength, value) + @Since("2.0.0") override def fit(dataset: Dataset[_]): Word2VecModel = { transformSchema(dataset.schema, logging = true) @@ -149,6 +167,7 @@ final class Word2Vec(override val uid: String) extends Estimator[Word2VecModel] .setSeed($(seed)) .setVectorSize($(vectorSize)) .setWindowSize($(windowSize)) + .setMaxSentenceLength($(maxSentenceLength)) .fit(input) copyValues(new Word2VecModel(uid, wordVectors).setParent(this)) } http://git-wip-us.apache.org/repos/asf/spark/blob/a790ac57/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala -- diff --git a/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala index 280a36f..16c74f6 100644 --- a/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala @@ -191,6 +191,7 @@ class Word2VecSuite extends SparkFunSuite with MLlibTestSparkContext with Defaul .setSeed(42L) .setStepSize(0.01) .setVectorSize(100) + .setMaxSentenceLength(500) testDefaultReadWrite(t) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r1747385 - in /spark: ./ site/ site/docs/ site/docs/2.0.0-preview/ site/docs/2.0.0-preview/api/ site/docs/2.0.0-preview/api/R/ site/docs/2.0.0-preview/api/java/ site/docs/2.0.0-preview/api
Author: srowen Date: Wed Jun 8 12:04:28 2016 New Revision: 1747385 URL: http://svn.apache.org/viewvc?rev=1747385&view=rev Log: Uploaded Spark 2.0.0 preview docs and added preview docs section on site [This commit notification would consist of 1214 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR] Fix Java Lint errors introduced by #13286 and #13280
Repository: spark Updated Branches: refs/heads/branch-2.0 a790ac579 -> 5e9a8e715 [MINOR] Fix Java Lint errors introduced by #13286 and #13280 ## What changes were proposed in this pull request? revived #13464 Fix Java Lint errors introduced by #13286 and #13280 Before: ``` Using `mvn` from path: /Users/pichu/Project/spark/build/apache-maven-3.3.9/bin/mvn Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 Checkstyle checks failed at following occurrences: [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[340,5] (whitespace) FileTabCharacter: Line contains a tab character. [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[341,5] (whitespace) FileTabCharacter: Line contains a tab character. [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[342,5] (whitespace) FileTabCharacter: Line contains a tab character. [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[343,5] (whitespace) FileTabCharacter: Line contains a tab character. [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[41,28] (naming) MethodName: Method name 'Append' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'. [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[52,28] (naming) MethodName: Method name 'Complete' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'. [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[61,8] (imports) UnusedImports: Unused import - org.apache.parquet.schema.PrimitiveType. [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[62,8] (imports) UnusedImports: Unused import - org.apache.parquet.schema.Type. ``` ## How was this patch tested? ran `dev/lint-java` locally Author: Sandeep Singh Closes #13559 from techaddict/minor-3. (cherry picked from commit f958c1c3e292aba98d283637606890f353a9836c) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5e9a8e71 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5e9a8e71 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5e9a8e71 Branch: refs/heads/branch-2.0 Commit: 5e9a8e715953feadaa16ecd0f8e1818272b9c952 Parents: a790ac5 Author: Sandeep Singh Authored: Wed Jun 8 14:51:00 2016 +0100 Committer: Sean Owen Committed: Wed Jun 8 14:51:10 2016 +0100 -- dev/checkstyle-suppressions.xml | 2 ++ .../main/java/org/apache/spark/launcher/LauncherServer.java | 8 2 files changed, 6 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5e9a8e71/dev/checkstyle-suppressions.xml -- diff --git a/dev/checkstyle-suppressions.xml b/dev/checkstyle-suppressions.xml index bfc2e73..31656ca 100644 --- a/dev/checkstyle-suppressions.xml +++ b/dev/checkstyle-suppressions.xml @@ -42,4 +42,6 @@ files="src/main/java/org/apache/hive/service/auth/PasswdAuthenticationProvider.java"/> + http://git-wip-us.apache.org/repos/asf/spark/blob/5e9a8e71/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java -- diff --git a/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java b/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java index 28e9420..ae43f56 100644 --- a/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java +++ b/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java @@ -337,10 +337,10 @@ class LauncherServer implements Closeable { } super.close(); if (handle != null) { - if (!handle.getState().isFinal()) { - LOG.log(Level.WARNING, "Lost connection to spark application."); - handle.setState(SparkAppHandle.State.LOST); - } +if (!handle.getState().isFinal()) { + LOG.log(Level.WARNING, "Lost connection to spark application."); + handle.setState(SparkAppHandle.State.LOST); +} handle.disconnect(); } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR] Fix Java Lint errors introduced by #13286 and #13280
Repository: spark Updated Branches: refs/heads/master 87706eb66 -> f958c1c3e [MINOR] Fix Java Lint errors introduced by #13286 and #13280 ## What changes were proposed in this pull request? revived #13464 Fix Java Lint errors introduced by #13286 and #13280 Before: ``` Using `mvn` from path: /Users/pichu/Project/spark/build/apache-maven-3.3.9/bin/mvn Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 Checkstyle checks failed at following occurrences: [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[340,5] (whitespace) FileTabCharacter: Line contains a tab character. [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[341,5] (whitespace) FileTabCharacter: Line contains a tab character. [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[342,5] (whitespace) FileTabCharacter: Line contains a tab character. [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[343,5] (whitespace) FileTabCharacter: Line contains a tab character. [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[41,28] (naming) MethodName: Method name 'Append' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'. [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[52,28] (naming) MethodName: Method name 'Complete' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'. [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[61,8] (imports) UnusedImports: Unused import - org.apache.parquet.schema.PrimitiveType. [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[62,8] (imports) UnusedImports: Unused import - org.apache.parquet.schema.Type. ``` ## How was this patch tested? ran `dev/lint-java` locally Author: Sandeep Singh Closes #13559 from techaddict/minor-3. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f958c1c3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f958c1c3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f958c1c3 Branch: refs/heads/master Commit: f958c1c3e292aba98d283637606890f353a9836c Parents: 87706eb Author: Sandeep Singh Authored: Wed Jun 8 14:51:00 2016 +0100 Committer: Sean Owen Committed: Wed Jun 8 14:51:00 2016 +0100 -- dev/checkstyle-suppressions.xml | 2 ++ .../main/java/org/apache/spark/launcher/LauncherServer.java | 8 .../datasources/parquet/SpecificParquetRecordReaderBase.java | 2 -- 3 files changed, 6 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f958c1c3/dev/checkstyle-suppressions.xml -- diff --git a/dev/checkstyle-suppressions.xml b/dev/checkstyle-suppressions.xml index bfc2e73..31656ca 100644 --- a/dev/checkstyle-suppressions.xml +++ b/dev/checkstyle-suppressions.xml @@ -42,4 +42,6 @@ files="src/main/java/org/apache/hive/service/auth/PasswdAuthenticationProvider.java"/> + http://git-wip-us.apache.org/repos/asf/spark/blob/f958c1c3/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java -- diff --git a/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java b/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java index 28e9420..ae43f56 100644 --- a/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java +++ b/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java @@ -337,10 +337,10 @@ class LauncherServer implements Closeable { } super.close(); if (handle != null) { - if (!handle.getState().isFinal()) { - LOG.log(Level.WARNING, "Lost connection to spark application."); - handle.setState(SparkAppHandle.State.LOST); - } +if (!handle.getState().isFinal()) { + LOG.log(Level.WARNING, "Lost connection to spark application."); + handle.setState(SparkAppHandle.State.LOST); +} handle.disconnect(); } } http://git-wip-us.apache.org/repos/asf/spark/blob/f958c1c3/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java index 3f7a872..14626e5 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecor
spark git commit: [DOCUMENTATION] Fixed target JAR path
Repository: spark Updated Branches: refs/heads/master f958c1c3e -> ca70ab27c [DOCUMENTATION] Fixed target JAR path ## What changes were proposed in this pull request? Mentioned Scala version in the sbt configuration file is 2.11, so the path of the target JAR should be `/target/scala-2.11/simple-project_2.11-1.0.jar` ## How was this patch tested? n/a Author: prabs Author: Prabeesh K Closes #13554 from prabeesh/master. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ca70ab27 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ca70ab27 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ca70ab27 Branch: refs/heads/master Commit: ca70ab27cc73f6ea7fce5d179ca8f13459c8ba95 Parents: f958c1c Author: prabs Authored: Wed Jun 8 17:22:55 2016 +0100 Committer: Sean Owen Committed: Wed Jun 8 17:22:55 2016 +0100 -- docs/quick-start.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ca70ab27/docs/quick-start.md -- diff --git a/docs/quick-start.md b/docs/quick-start.md index 72372a6..1b961fd 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -289,13 +289,13 @@ $ find . # Package a jar containing your application $ sbt package ... -[info] Packaging {..}/{..}/target/scala-2.10/simple-project_2.10-1.0.jar +[info] Packaging {..}/{..}/target/scala-{{site.SCALA_BINARY_VERSION}}/simple-project_{{site.SCALA_BINARY_VERSION}}-1.0.jar # Use spark-submit to run your application $ YOUR_SPARK_HOME/bin/spark-submit \ --class "SimpleApp" \ --master local[4] \ - target/scala-2.10/simple-project_2.10-1.0.jar + target/scala-{{site.SCALA_BINARY_VERSION}}/simple-project_{{site.SCALA_BINARY_VERSION}}-1.0.jar ... Lines with a: 46, Lines with b: 23 {% endhighlight %} - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [DOCUMENTATION] Fixed target JAR path
Repository: spark Updated Branches: refs/heads/branch-2.0 5e9a8e715 -> b2778c8bb [DOCUMENTATION] Fixed target JAR path ## What changes were proposed in this pull request? Mentioned Scala version in the sbt configuration file is 2.11, so the path of the target JAR should be `/target/scala-2.11/simple-project_2.11-1.0.jar` ## How was this patch tested? n/a Author: prabs Author: Prabeesh K Closes #13554 from prabeesh/master. (cherry picked from commit ca70ab27cc73f6ea7fce5d179ca8f13459c8ba95) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b2778c8b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b2778c8b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b2778c8b Branch: refs/heads/branch-2.0 Commit: b2778c8bbdf3b3a2e650b17346f87f2568f88295 Parents: 5e9a8e7 Author: prabs Authored: Wed Jun 8 17:22:55 2016 +0100 Committer: Sean Owen Committed: Wed Jun 8 17:23:03 2016 +0100 -- docs/quick-start.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b2778c8b/docs/quick-start.md -- diff --git a/docs/quick-start.md b/docs/quick-start.md index 72372a6..1b961fd 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -289,13 +289,13 @@ $ find . # Package a jar containing your application $ sbt package ... -[info] Packaging {..}/{..}/target/scala-2.10/simple-project_2.10-1.0.jar +[info] Packaging {..}/{..}/target/scala-{{site.SCALA_BINARY_VERSION}}/simple-project_{{site.SCALA_BINARY_VERSION}}-1.0.jar # Use spark-submit to run your application $ YOUR_SPARK_HOME/bin/spark-submit \ --class "SimpleApp" \ --master local[4] \ - target/scala-2.10/simple-project_2.10-1.0.jar + target/scala-{{site.SCALA_BINARY_VERSION}}/simple-project_{{site.SCALA_BINARY_VERSION}}-1.0.jar ... Lines with a: 46, Lines with b: 23 {% endhighlight %} - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15818][BUILD] Upgrade to Hadoop 2.7.2
Repository: spark Updated Branches: refs/heads/master 921fa40b1 -> 147c02082 [SPARK-15818][BUILD] Upgrade to Hadoop 2.7.2 ## What changes were proposed in this pull request? Updating the Hadoop version from 2.7.0 to 2.7.2 if we use the Hadoop-2.7 build profile ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Existing tests (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) I'd like us to use Hadoop 2.7.2 owing to the Hadoop release notes stating Hadoop 2.7.0 is not ready for production use https://hadoop.apache.org/docs/r2.7.0/ states "Apache Hadoop 2.7.0 is a minor release in the 2.x.y release line, building upon the previous stable release 2.6.0. This release is not yet ready for production use. Production users should use 2.7.1 release and beyond." Hadoop 2.7.1 release notes: "Apache Hadoop 2.7.1 is a minor release in the 2.x.y release line, building upon the previous release 2.7.0. This is the next stable release after Apache Hadoop 2.6.x." And then Hadoop 2.7.2 release notes: "Apache Hadoop 2.7.2 is a minor release in the 2.x.y release line, building upon the previous stable release 2.7.1." I've tested this is OK with Intel hardware and IBM Java 8 so let's test it with OpenJDK, ideally this will be pushed to branch-2.0 and master. Author: Adam Roberts Closes #13556 from a-roberts/patch-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/147c0208 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/147c0208 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/147c0208 Branch: refs/heads/master Commit: 147c020823080c60b495f7950629d8134bf895db Parents: 921fa40 Author: Adam Roberts Authored: Thu Jun 9 10:34:01 2016 +0100 Committer: Sean Owen Committed: Thu Jun 9 10:34:01 2016 +0100 -- dev/deps/spark-deps-hadoop-2.4 | 30 +++--- dev/deps/spark-deps-hadoop-2.6 | 30 +++--- dev/deps/spark-deps-hadoop-2.7 | 30 +++--- pom.xml| 6 +++--- 4 files changed, 48 insertions(+), 48 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/147c0208/dev/deps/spark-deps-hadoop-2.4 -- diff --git a/dev/deps/spark-deps-hadoop-2.4 b/dev/deps/spark-deps-hadoop-2.4 index f0491ec..501bf58 100644 --- a/dev/deps/spark-deps-hadoop-2.4 +++ b/dev/deps/spark-deps-hadoop-2.4 @@ -53,21 +53,21 @@ eigenbase-properties-1.1.5.jar guava-14.0.1.jar guice-3.0.jar guice-servlet-3.0.jar -hadoop-annotations-2.4.0.jar -hadoop-auth-2.4.0.jar -hadoop-client-2.4.0.jar -hadoop-common-2.4.0.jar -hadoop-hdfs-2.4.0.jar -hadoop-mapreduce-client-app-2.4.0.jar -hadoop-mapreduce-client-common-2.4.0.jar -hadoop-mapreduce-client-core-2.4.0.jar -hadoop-mapreduce-client-jobclient-2.4.0.jar -hadoop-mapreduce-client-shuffle-2.4.0.jar -hadoop-yarn-api-2.4.0.jar -hadoop-yarn-client-2.4.0.jar -hadoop-yarn-common-2.4.0.jar -hadoop-yarn-server-common-2.4.0.jar -hadoop-yarn-server-web-proxy-2.4.0.jar +hadoop-annotations-2.4.1.jar +hadoop-auth-2.4.1.jar +hadoop-client-2.4.1.jar +hadoop-common-2.4.1.jar +hadoop-hdfs-2.4.1.jar +hadoop-mapreduce-client-app-2.4.1.jar +hadoop-mapreduce-client-common-2.4.1.jar +hadoop-mapreduce-client-core-2.4.1.jar +hadoop-mapreduce-client-jobclient-2.4.1.jar +hadoop-mapreduce-client-shuffle-2.4.1.jar +hadoop-yarn-api-2.4.1.jar +hadoop-yarn-client-2.4.1.jar +hadoop-yarn-common-2.4.1.jar +hadoop-yarn-server-common-2.4.1.jar +hadoop-yarn-server-web-proxy-2.4.1.jar hk2-api-2.4.0-b34.jar hk2-locator-2.4.0-b34.jar hk2-utils-2.4.0-b34.jar http://git-wip-us.apache.org/repos/asf/spark/blob/147c0208/dev/deps/spark-deps-hadoop-2.6 -- diff --git a/dev/deps/spark-deps-hadoop-2.6 b/dev/deps/spark-deps-hadoop-2.6 index b3dced6..b915727 100644 --- a/dev/deps/spark-deps-hadoop-2.6 +++ b/dev/deps/spark-deps-hadoop-2.6 @@ -58,21 +58,21 @@ gson-2.2.4.jar guava-14.0.1.jar guice-3.0.jar guice-servlet-3.0.jar -hadoop-annotations-2.6.0.jar -hadoop-auth-2.6.0.jar -hadoop-client-2.6.0.jar -hadoop-common-2.6.0.jar -hadoop-hdfs-2.6.0.jar -hadoop-mapreduce-client-app-2.6.0.jar -hadoop-mapreduce-client-common-2.6.0.jar -hadoop-mapreduce-client-core-2.6.0.jar -hadoop-mapreduce-client-jobclient-2.6.0.jar -hadoop-mapreduce-client-shuffle-2.6.0.jar -hadoop-yarn-api-2.6.0.jar -hadoop-yarn-client-2.6.0.jar -hadoop-yarn-common-2.6.0.jar -hadoop-yarn-server-common-2.6.0.jar -hadoop-yarn-server-web-proxy-2.6.0.jar +hadoop-annotations-2.6.4.jar +hadoop-auth-2.6.4.jar +hadoop-client-2.6.4.jar +hadoop-common-2.6.4.jar +hadoop-hdfs-2.6.4.jar +hadoop-mapreduce-client-app-2.6.4.jar +h
spark git commit: [SPARK-15818][BUILD] Upgrade to Hadoop 2.7.2
Repository: spark Updated Branches: refs/heads/branch-2.0 8ee93eed9 -> 77c08d224 [SPARK-15818][BUILD] Upgrade to Hadoop 2.7.2 ## What changes were proposed in this pull request? Updating the Hadoop version from 2.7.0 to 2.7.2 if we use the Hadoop-2.7 build profile ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Existing tests (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) I'd like us to use Hadoop 2.7.2 owing to the Hadoop release notes stating Hadoop 2.7.0 is not ready for production use https://hadoop.apache.org/docs/r2.7.0/ states "Apache Hadoop 2.7.0 is a minor release in the 2.x.y release line, building upon the previous stable release 2.6.0. This release is not yet ready for production use. Production users should use 2.7.1 release and beyond." Hadoop 2.7.1 release notes: "Apache Hadoop 2.7.1 is a minor release in the 2.x.y release line, building upon the previous release 2.7.0. This is the next stable release after Apache Hadoop 2.6.x." And then Hadoop 2.7.2 release notes: "Apache Hadoop 2.7.2 is a minor release in the 2.x.y release line, building upon the previous stable release 2.7.1." I've tested this is OK with Intel hardware and IBM Java 8 so let's test it with OpenJDK, ideally this will be pushed to branch-2.0 and master. Author: Adam Roberts Closes #13556 from a-roberts/patch-2. (cherry picked from commit 147c020823080c60b495f7950629d8134bf895db) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/77c08d22 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/77c08d22 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/77c08d22 Branch: refs/heads/branch-2.0 Commit: 77c08d2240bef7d814fc6e4dd0a53fbdf1e2f795 Parents: 8ee93ee Author: Adam Roberts Authored: Thu Jun 9 10:34:01 2016 +0100 Committer: Sean Owen Committed: Thu Jun 9 10:34:15 2016 +0100 -- dev/deps/spark-deps-hadoop-2.4 | 30 +++--- dev/deps/spark-deps-hadoop-2.6 | 30 +++--- dev/deps/spark-deps-hadoop-2.7 | 30 +++--- pom.xml| 6 +++--- 4 files changed, 48 insertions(+), 48 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/77c08d22/dev/deps/spark-deps-hadoop-2.4 -- diff --git a/dev/deps/spark-deps-hadoop-2.4 b/dev/deps/spark-deps-hadoop-2.4 index 77d5266..3df292e 100644 --- a/dev/deps/spark-deps-hadoop-2.4 +++ b/dev/deps/spark-deps-hadoop-2.4 @@ -53,21 +53,21 @@ eigenbase-properties-1.1.5.jar guava-14.0.1.jar guice-3.0.jar guice-servlet-3.0.jar -hadoop-annotations-2.4.0.jar -hadoop-auth-2.4.0.jar -hadoop-client-2.4.0.jar -hadoop-common-2.4.0.jar -hadoop-hdfs-2.4.0.jar -hadoop-mapreduce-client-app-2.4.0.jar -hadoop-mapreduce-client-common-2.4.0.jar -hadoop-mapreduce-client-core-2.4.0.jar -hadoop-mapreduce-client-jobclient-2.4.0.jar -hadoop-mapreduce-client-shuffle-2.4.0.jar -hadoop-yarn-api-2.4.0.jar -hadoop-yarn-client-2.4.0.jar -hadoop-yarn-common-2.4.0.jar -hadoop-yarn-server-common-2.4.0.jar -hadoop-yarn-server-web-proxy-2.4.0.jar +hadoop-annotations-2.4.1.jar +hadoop-auth-2.4.1.jar +hadoop-client-2.4.1.jar +hadoop-common-2.4.1.jar +hadoop-hdfs-2.4.1.jar +hadoop-mapreduce-client-app-2.4.1.jar +hadoop-mapreduce-client-common-2.4.1.jar +hadoop-mapreduce-client-core-2.4.1.jar +hadoop-mapreduce-client-jobclient-2.4.1.jar +hadoop-mapreduce-client-shuffle-2.4.1.jar +hadoop-yarn-api-2.4.1.jar +hadoop-yarn-client-2.4.1.jar +hadoop-yarn-common-2.4.1.jar +hadoop-yarn-server-common-2.4.1.jar +hadoop-yarn-server-web-proxy-2.4.1.jar hk2-api-2.4.0-b34.jar hk2-locator-2.4.0-b34.jar hk2-utils-2.4.0-b34.jar http://git-wip-us.apache.org/repos/asf/spark/blob/77c08d22/dev/deps/spark-deps-hadoop-2.6 -- diff --git a/dev/deps/spark-deps-hadoop-2.6 b/dev/deps/spark-deps-hadoop-2.6 index 9afe50f..9540f58 100644 --- a/dev/deps/spark-deps-hadoop-2.6 +++ b/dev/deps/spark-deps-hadoop-2.6 @@ -58,21 +58,21 @@ gson-2.2.4.jar guava-14.0.1.jar guice-3.0.jar guice-servlet-3.0.jar -hadoop-annotations-2.6.0.jar -hadoop-auth-2.6.0.jar -hadoop-client-2.6.0.jar -hadoop-common-2.6.0.jar -hadoop-hdfs-2.6.0.jar -hadoop-mapreduce-client-app-2.6.0.jar -hadoop-mapreduce-client-common-2.6.0.jar -hadoop-mapreduce-client-core-2.6.0.jar -hadoop-mapreduce-client-jobclient-2.6.0.jar -hadoop-mapreduce-client-shuffle-2.6.0.jar -hadoop-yarn-api-2.6.0.jar -hadoop-yarn-client-2.6.0.jar -hadoop-yarn-common-2.6.0.jar -hadoop-yarn-server-common-2.6.0.jar -hadoop-yarn-server-web-proxy-2.6.0.jar +hadoop-annotations-2.6.4.jar +hadoop-auth-2.6.4.jar +hadoop-cl
spark git commit: [SPARK-15823][PYSPARK][ML] Add @property for 'accuracy' in MulticlassMetrics
Repository: spark Updated Branches: refs/heads/master 675a73715 -> 16ca32eac [SPARK-15823][PYSPARK][ML] Add @property for 'accuracy' in MulticlassMetrics ## What changes were proposed in this pull request? `accuracy` should be decorated with `property` to keep step with other methods in `pyspark.MulticlassMetrics`, like `weightedPrecision`, `weightedRecall`, etc ## How was this patch tested? manual tests Author: Zheng RuiFeng Closes #13560 from zhengruifeng/add_accuracy_property. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/16ca32ea Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/16ca32ea Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/16ca32ea Branch: refs/heads/master Commit: 16ca32eace39c423224b0ec25922038fd45c501a Parents: 675a737 Author: Zheng RuiFeng Authored: Fri Jun 10 10:09:19 2016 +0100 Committer: Sean Owen Committed: Fri Jun 10 10:09:19 2016 +0100 -- python/pyspark/mllib/evaluation.py | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/16ca32ea/python/pyspark/mllib/evaluation.py -- diff --git a/python/pyspark/mllib/evaluation.py b/python/pyspark/mllib/evaluation.py index 2eaac87..fc2a0b3 100644 --- a/python/pyspark/mllib/evaluation.py +++ b/python/pyspark/mllib/evaluation.py @@ -179,11 +179,7 @@ class MulticlassMetrics(JavaModelWrapper): 1.0... >>> metrics.fMeasure(0.0, 2.0) 0.52... ->>> metrics.precision() -0.66... ->>> metrics.recall() -0.66... ->>> metrics.accuracy() +>>> metrics.accuracy 0.66... >>> metrics.weightedFalsePositiveRate 0.19... @@ -273,6 +269,7 @@ class MulticlassMetrics(JavaModelWrapper): else: return self.call("fMeasure", label, beta) +@property @since('2.0.0') def accuracy(self): """ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15823][PYSPARK][ML] Add @property for 'accuracy' in MulticlassMetrics
Repository: spark Updated Branches: refs/heads/branch-2.0 84a8421e5 -> 6709ce1ae [SPARK-15823][PYSPARK][ML] Add @property for 'accuracy' in MulticlassMetrics ## What changes were proposed in this pull request? `accuracy` should be decorated with `property` to keep step with other methods in `pyspark.MulticlassMetrics`, like `weightedPrecision`, `weightedRecall`, etc ## How was this patch tested? manual tests Author: Zheng RuiFeng Closes #13560 from zhengruifeng/add_accuracy_property. (cherry picked from commit 16ca32eace39c423224b0ec25922038fd45c501a) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6709ce1a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6709ce1a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6709ce1a Branch: refs/heads/branch-2.0 Commit: 6709ce1aea4a8d7438722f48fd7f2ed0fc7fa5be Parents: 84a8421 Author: Zheng RuiFeng Authored: Fri Jun 10 10:09:19 2016 +0100 Committer: Sean Owen Committed: Fri Jun 10 10:09:29 2016 +0100 -- python/pyspark/mllib/evaluation.py | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6709ce1a/python/pyspark/mllib/evaluation.py -- diff --git a/python/pyspark/mllib/evaluation.py b/python/pyspark/mllib/evaluation.py index 2eaac87..fc2a0b3 100644 --- a/python/pyspark/mllib/evaluation.py +++ b/python/pyspark/mllib/evaluation.py @@ -179,11 +179,7 @@ class MulticlassMetrics(JavaModelWrapper): 1.0... >>> metrics.fMeasure(0.0, 2.0) 0.52... ->>> metrics.precision() -0.66... ->>> metrics.recall() -0.66... ->>> metrics.accuracy() +>>> metrics.accuracy 0.66... >>> metrics.weightedFalsePositiveRate 0.19... @@ -273,6 +269,7 @@ class MulticlassMetrics(JavaModelWrapper): else: return self.call("fMeasure", label, beta) +@property @since('2.0.0') def accuracy(self): """ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15837][ML][PYSPARK] Word2vec python add maxsentence parameter
Repository: spark Updated Branches: refs/heads/master 16ca32eac -> cdd7f5a57 [SPARK-15837][ML][PYSPARK] Word2vec python add maxsentence parameter ## What changes were proposed in this pull request? Word2vec python add maxsentence parameter. ## How was this patch tested? Existing test. Author: WeichenXu Closes #13578 from WeichenXu123/word2vec_python_add_maxsentence. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cdd7f5a5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cdd7f5a5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cdd7f5a5 Branch: refs/heads/master Commit: cdd7f5a57a21d4a8f93456d149f65859c96190cf Parents: 16ca32e Author: WeichenXu Authored: Fri Jun 10 12:26:53 2016 +0100 Committer: Sean Owen Committed: Fri Jun 10 12:26:53 2016 +0100 -- python/pyspark/ml/feature.py | 29 - 1 file changed, 24 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/cdd7f5a5/python/pyspark/ml/feature.py -- diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py index ebe1300..bfb2fb7 100755 --- a/python/pyspark/ml/feature.py +++ b/python/pyspark/ml/feature.py @@ -2244,28 +2244,33 @@ class Word2Vec(JavaEstimator, HasStepSize, HasMaxIter, HasSeed, HasInputCol, Has windowSize = Param(Params._dummy(), "windowSize", "the window size (context words from [-window, window]). Default value is 5", typeConverter=TypeConverters.toInt) +maxSentenceLength = Param(Params._dummy(), "maxSentenceLength", + "Maximum length (in words) of each sentence in the input data. " + + "Any sentence longer than this threshold will " + + "be divided into chunks up to the size.", + typeConverter=TypeConverters.toInt) @keyword_only def __init__(self, vectorSize=100, minCount=5, numPartitions=1, stepSize=0.025, maxIter=1, - seed=None, inputCol=None, outputCol=None, windowSize=5): + seed=None, inputCol=None, outputCol=None, windowSize=5, maxSentenceLength=1000): """ __init__(self, vectorSize=100, minCount=5, numPartitions=1, stepSize=0.025, maxIter=1, \ - seed=None, inputCol=None, outputCol=None, windowSize=5) + seed=None, inputCol=None, outputCol=None, windowSize=5, maxSentenceLength=1000) """ super(Word2Vec, self).__init__() self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.Word2Vec", self.uid) self._setDefault(vectorSize=100, minCount=5, numPartitions=1, stepSize=0.025, maxIter=1, - seed=None, windowSize=5) + seed=None, windowSize=5, maxSentenceLength=1000) kwargs = self.__init__._input_kwargs self.setParams(**kwargs) @keyword_only @since("1.4.0") def setParams(self, vectorSize=100, minCount=5, numPartitions=1, stepSize=0.025, maxIter=1, - seed=None, inputCol=None, outputCol=None, windowSize=5): + seed=None, inputCol=None, outputCol=None, windowSize=5, maxSentenceLength=1000): """ setParams(self, minCount=5, numPartitions=1, stepSize=0.025, maxIter=1, seed=None, \ - inputCol=None, outputCol=None, windowSize=5) + inputCol=None, outputCol=None, windowSize=5, maxSentenceLength=1000) Sets params for this Word2Vec. """ kwargs = self.setParams._input_kwargs @@ -2327,6 +2332,20 @@ class Word2Vec(JavaEstimator, HasStepSize, HasMaxIter, HasSeed, HasInputCol, Has """ return self.getOrDefault(self.windowSize) +@since("2.0.0") +def setMaxSentenceLength(self, value): +""" +Sets the value of :py:attr:`maxSentenceLength`. +""" +return self._set(maxSentenceLength=value) + +@since("2.0.0") +def getMaxSentenceLength(self): +""" +Gets the value of maxSentenceLength or its default value. +""" +return self.getOrDefault(self.maxSentenceLength) + def _create_model(self, java_model): return Word2VecModel(java_model) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15837][ML][PYSPARK] Word2vec python add maxsentence parameter
Repository: spark Updated Branches: refs/heads/branch-2.0 6709ce1ae -> 54b4763d2 [SPARK-15837][ML][PYSPARK] Word2vec python add maxsentence parameter ## What changes were proposed in this pull request? Word2vec python add maxsentence parameter. ## How was this patch tested? Existing test. Author: WeichenXu Closes #13578 from WeichenXu123/word2vec_python_add_maxsentence. (cherry picked from commit cdd7f5a57a21d4a8f93456d149f65859c96190cf) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/54b4763d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/54b4763d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/54b4763d Branch: refs/heads/branch-2.0 Commit: 54b4763d295d6aeab6105d0430470343dd4ca3a3 Parents: 6709ce1 Author: WeichenXu Authored: Fri Jun 10 12:26:53 2016 +0100 Committer: Sean Owen Committed: Fri Jun 10 12:27:04 2016 +0100 -- python/pyspark/ml/feature.py | 29 - 1 file changed, 24 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/54b4763d/python/pyspark/ml/feature.py -- diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py index ebe1300..bfb2fb7 100755 --- a/python/pyspark/ml/feature.py +++ b/python/pyspark/ml/feature.py @@ -2244,28 +2244,33 @@ class Word2Vec(JavaEstimator, HasStepSize, HasMaxIter, HasSeed, HasInputCol, Has windowSize = Param(Params._dummy(), "windowSize", "the window size (context words from [-window, window]). Default value is 5", typeConverter=TypeConverters.toInt) +maxSentenceLength = Param(Params._dummy(), "maxSentenceLength", + "Maximum length (in words) of each sentence in the input data. " + + "Any sentence longer than this threshold will " + + "be divided into chunks up to the size.", + typeConverter=TypeConverters.toInt) @keyword_only def __init__(self, vectorSize=100, minCount=5, numPartitions=1, stepSize=0.025, maxIter=1, - seed=None, inputCol=None, outputCol=None, windowSize=5): + seed=None, inputCol=None, outputCol=None, windowSize=5, maxSentenceLength=1000): """ __init__(self, vectorSize=100, minCount=5, numPartitions=1, stepSize=0.025, maxIter=1, \ - seed=None, inputCol=None, outputCol=None, windowSize=5) + seed=None, inputCol=None, outputCol=None, windowSize=5, maxSentenceLength=1000) """ super(Word2Vec, self).__init__() self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.Word2Vec", self.uid) self._setDefault(vectorSize=100, minCount=5, numPartitions=1, stepSize=0.025, maxIter=1, - seed=None, windowSize=5) + seed=None, windowSize=5, maxSentenceLength=1000) kwargs = self.__init__._input_kwargs self.setParams(**kwargs) @keyword_only @since("1.4.0") def setParams(self, vectorSize=100, minCount=5, numPartitions=1, stepSize=0.025, maxIter=1, - seed=None, inputCol=None, outputCol=None, windowSize=5): + seed=None, inputCol=None, outputCol=None, windowSize=5, maxSentenceLength=1000): """ setParams(self, minCount=5, numPartitions=1, stepSize=0.025, maxIter=1, seed=None, \ - inputCol=None, outputCol=None, windowSize=5) + inputCol=None, outputCol=None, windowSize=5, maxSentenceLength=1000) Sets params for this Word2Vec. """ kwargs = self.setParams._input_kwargs @@ -2327,6 +2332,20 @@ class Word2Vec(JavaEstimator, HasStepSize, HasMaxIter, HasSeed, HasInputCol, Has """ return self.getOrDefault(self.windowSize) +@since("2.0.0") +def setMaxSentenceLength(self, value): +""" +Sets the value of :py:attr:`maxSentenceLength`. +""" +return self._set(maxSentenceLength=value) + +@since("2.0.0") +def getMaxSentenceLength(self): +""" +Gets the value of maxSentenceLength or its default value. +""" +return self.getOrDefault(self.maxSentenceLength) + def _create_model(self, java_model): return Word2VecModel(java_model) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache"
Repository: spark Updated Branches: refs/heads/master 7504bc73f -> 3761330dd [SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache" ## What changes were proposed in this pull request? Use new Spark logo including "Apache" (now, with crushed PNGs). Remove old unreferenced logo files. ## How was this patch tested? Manual check of generated HTML site and Spark UI. I searched for references to the deleted files to make sure they were not used. Author: Sean Owen Closes #13609 from srowen/SPARK-15879. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3761330d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3761330d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3761330d Branch: refs/heads/master Commit: 3761330dd0151d7369d7fba4d4c344e9863990ef Parents: 7504bc7 Author: Sean Owen Authored: Sat Jun 11 12:46:07 2016 +0100 Committer: Sean Owen Committed: Sat Jun 11 12:46:07 2016 +0100 -- .../spark/ui/static/spark-logo-77x50px-hd.png | Bin 3536 -> 4182 bytes .../org/apache/spark/ui/static/spark_logo.png | Bin 14233 -> 0 bytes docs/img/incubator-logo.png | Bin 11651 -> 0 bytes docs/img/spark-logo-100x40px.png| Bin 3635 -> 0 bytes docs/img/spark-logo-77x40px-hd.png | Bin 1904 -> 0 bytes docs/img/spark-logo-77x50px-hd.png | Bin 3536 -> 0 bytes docs/img/spark-logo-hd.png | Bin 13512 -> 16418 bytes 7 files changed, 0 insertions(+), 0 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png -- diff --git a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png index 6c5f099..ffe2550 100644 Binary files a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png and b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png differ http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png -- diff --git a/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png b/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png deleted file mode 100644 index 4b18734..000 Binary files a/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/docs/img/incubator-logo.png -- diff --git a/docs/img/incubator-logo.png b/docs/img/incubator-logo.png deleted file mode 100644 index 33ca7f6..000 Binary files a/docs/img/incubator-logo.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/docs/img/spark-logo-100x40px.png -- diff --git a/docs/img/spark-logo-100x40px.png b/docs/img/spark-logo-100x40px.png deleted file mode 100644 index 54c3187..000 Binary files a/docs/img/spark-logo-100x40px.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/docs/img/spark-logo-77x40px-hd.png -- diff --git a/docs/img/spark-logo-77x40px-hd.png b/docs/img/spark-logo-77x40px-hd.png deleted file mode 100644 index 270402f..000 Binary files a/docs/img/spark-logo-77x40px-hd.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/docs/img/spark-logo-77x50px-hd.png -- diff --git a/docs/img/spark-logo-77x50px-hd.png b/docs/img/spark-logo-77x50px-hd.png deleted file mode 100644 index 6c5f099..000 Binary files a/docs/img/spark-logo-77x50px-hd.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/docs/img/spark-logo-hd.png -- diff --git a/docs/img/spark-logo-hd.png b/docs/img/spark-logo-hd.png index 1381e30..e4508e7 100644 Binary files a/docs/img/spark-logo-hd.png and b/docs/img/spark-logo-hd.png differ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache"
Repository: spark Updated Branches: refs/heads/branch-2.0 f0fa0a894 -> 4c29c55f2 [SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache" ## What changes were proposed in this pull request? Use new Spark logo including "Apache" (now, with crushed PNGs). Remove old unreferenced logo files. ## How was this patch tested? Manual check of generated HTML site and Spark UI. I searched for references to the deleted files to make sure they were not used. Author: Sean Owen Closes #13609 from srowen/SPARK-15879. (cherry picked from commit 3761330dd0151d7369d7fba4d4c344e9863990ef) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4c29c55f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4c29c55f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4c29c55f Branch: refs/heads/branch-2.0 Commit: 4c29c55f22d57c5fbadd0b759155fbab4b07a70a Parents: f0fa0a8 Author: Sean Owen Authored: Sat Jun 11 12:46:07 2016 +0100 Committer: Sean Owen Committed: Sat Jun 11 12:46:21 2016 +0100 -- .../spark/ui/static/spark-logo-77x50px-hd.png | Bin 3536 -> 4182 bytes .../org/apache/spark/ui/static/spark_logo.png | Bin 14233 -> 0 bytes docs/img/incubator-logo.png | Bin 11651 -> 0 bytes docs/img/spark-logo-100x40px.png| Bin 3635 -> 0 bytes docs/img/spark-logo-77x40px-hd.png | Bin 1904 -> 0 bytes docs/img/spark-logo-77x50px-hd.png | Bin 3536 -> 0 bytes docs/img/spark-logo-hd.png | Bin 13512 -> 16418 bytes 7 files changed, 0 insertions(+), 0 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png -- diff --git a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png index 6c5f099..ffe2550 100644 Binary files a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png and b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png differ http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png -- diff --git a/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png b/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png deleted file mode 100644 index 4b18734..000 Binary files a/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/docs/img/incubator-logo.png -- diff --git a/docs/img/incubator-logo.png b/docs/img/incubator-logo.png deleted file mode 100644 index 33ca7f6..000 Binary files a/docs/img/incubator-logo.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/docs/img/spark-logo-100x40px.png -- diff --git a/docs/img/spark-logo-100x40px.png b/docs/img/spark-logo-100x40px.png deleted file mode 100644 index 54c3187..000 Binary files a/docs/img/spark-logo-100x40px.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/docs/img/spark-logo-77x40px-hd.png -- diff --git a/docs/img/spark-logo-77x40px-hd.png b/docs/img/spark-logo-77x40px-hd.png deleted file mode 100644 index 270402f..000 Binary files a/docs/img/spark-logo-77x40px-hd.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/docs/img/spark-logo-77x50px-hd.png -- diff --git a/docs/img/spark-logo-77x50px-hd.png b/docs/img/spark-logo-77x50px-hd.png deleted file mode 100644 index 6c5f099..000 Binary files a/docs/img/spark-logo-77x50px-hd.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/docs/img/spark-logo-hd.png -- diff --git a/docs/img/spark-logo-hd.png b/docs/img/spark-logo-hd.png index 1381e30..e4508e7 100644 Binary files a/docs/img/spark-logo-hd.png and b/docs/img/spark-logo-hd.png differ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents
Repository: spark Updated Branches: refs/heads/master 3761330dd -> ad102af16 [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents ## What changes were proposed in this pull request? This issue fixes all broken links on Spark 2.0 preview MLLib documents. Also, this contains some editorial change. **Fix broken links** * mllib-data-types.md * mllib-decision-tree.md * mllib-ensembles.md * mllib-feature-extraction.md * mllib-pmml-model-export.md * mllib-statistics.md **Fix malformed section header and scala coding style** * mllib-linear-methods.md **Replace indirect forward links with direct one** * ml-classification-regression.md ## How was this patch tested? Manual tests (with `cd docs; jekyll build`.) Author: Dongjoon Hyun Closes #13608 from dongjoon-hyun/SPARK-15883. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ad102af1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ad102af1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ad102af1 Branch: refs/heads/master Commit: ad102af169c7344b30d3b84aa16452fcdc22542c Parents: 3761330 Author: Dongjoon Hyun Authored: Sat Jun 11 12:55:38 2016 +0100 Committer: Sean Owen Committed: Sat Jun 11 12:55:38 2016 +0100 -- docs/ml-classification-regression.md | 4 ++-- docs/mllib-data-types.md | 16 ++-- docs/mllib-decision-tree.md | 6 +++--- docs/mllib-ensembles.md | 6 +++--- docs/mllib-feature-extraction.md | 2 +- docs/mllib-linear-methods.md | 10 +- docs/mllib-pmml-model-export.md | 2 +- docs/mllib-statistics.md | 8 8 files changed, 25 insertions(+), 29 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ad102af1/docs/ml-classification-regression.md -- diff --git a/docs/ml-classification-regression.md b/docs/ml-classification-regression.md index 88457d4..d7e5521 100644 --- a/docs/ml-classification-regression.md +++ b/docs/ml-classification-regression.md @@ -815,7 +815,7 @@ The main differences between this API and the [original MLlib ensembles API](mll ## Random Forests [Random forests](http://en.wikipedia.org/wiki/Random_forest) -are ensembles of [decision trees](ml-decision-tree.html). +are ensembles of [decision trees](ml-classification-regression.html#decision-trees). Random forests combine many decision trees in order to reduce the risk of overfitting. The `spark.ml` implementation supports random forests for binary and multiclass classification and for regression, using both continuous and categorical features. @@ -896,7 +896,7 @@ All output columns are optional; to exclude an output column, set its correspond ## Gradient-Boosted Trees (GBTs) [Gradient-Boosted Trees (GBTs)](http://en.wikipedia.org/wiki/Gradient_boosting) -are ensembles of [decision trees](ml-decision-tree.html). +are ensembles of [decision trees](ml-classification-regression.html#decision-trees). GBTs iteratively train decision trees in order to minimize a loss function. The `spark.ml` implementation supports GBTs for binary classification and for regression, using both continuous and categorical features. http://git-wip-us.apache.org/repos/asf/spark/blob/ad102af1/docs/mllib-data-types.md -- diff --git a/docs/mllib-data-types.md b/docs/mllib-data-types.md index 2ffe0f1..ef56aeb 100644 --- a/docs/mllib-data-types.md +++ b/docs/mllib-data-types.md @@ -33,7 +33,7 @@ implementations: [`DenseVector`](api/scala/index.html#org.apache.spark.mllib.lin using the factory methods implemented in [`Vectors`](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) to create local vectors. -Refer to the [`Vector` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) and [`Vectors` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors) for details on the API. +Refer to the [`Vector` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) and [`Vectors` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) for details on the API. {% highlight scala %} import org.apache.spark.mllib.linalg.{Vector, Vectors} @@ -199,7 +199,7 @@ After loading, the feature indices are converted to zero-based. [`MLUtils.loadLibSVMFile`](api/scala/index.html#org.apache.spark.mllib.util.MLUtils$) reads training examples stored in LIBSVM format. -Refer to the [`MLUtils` Scala docs](api/scala/index.html#org.apache.spark.mllib.util.MLUtils) for details on the API. +Refer to the [`MLUtils` Scala docs](api/scala/index.html#org.apache.spark.mllib.util.MLUtils$) for details on the API. {% highlight scala %} imp
spark git commit: [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents
Repository: spark Updated Branches: refs/heads/branch-2.0 4c29c55f2 -> 8cf33fb8a [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents ## What changes were proposed in this pull request? This issue fixes all broken links on Spark 2.0 preview MLLib documents. Also, this contains some editorial change. **Fix broken links** * mllib-data-types.md * mllib-decision-tree.md * mllib-ensembles.md * mllib-feature-extraction.md * mllib-pmml-model-export.md * mllib-statistics.md **Fix malformed section header and scala coding style** * mllib-linear-methods.md **Replace indirect forward links with direct one** * ml-classification-regression.md ## How was this patch tested? Manual tests (with `cd docs; jekyll build`.) Author: Dongjoon Hyun Closes #13608 from dongjoon-hyun/SPARK-15883. (cherry picked from commit ad102af169c7344b30d3b84aa16452fcdc22542c) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8cf33fb8 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8cf33fb8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8cf33fb8 Branch: refs/heads/branch-2.0 Commit: 8cf33fb8a945e8f76833f68fc99b1ad5dee13641 Parents: 4c29c55 Author: Dongjoon Hyun Authored: Sat Jun 11 12:55:38 2016 +0100 Committer: Sean Owen Committed: Sat Jun 11 12:55:48 2016 +0100 -- docs/ml-classification-regression.md | 4 ++-- docs/mllib-data-types.md | 16 ++-- docs/mllib-decision-tree.md | 6 +++--- docs/mllib-ensembles.md | 6 +++--- docs/mllib-feature-extraction.md | 2 +- docs/mllib-linear-methods.md | 10 +- docs/mllib-pmml-model-export.md | 2 +- docs/mllib-statistics.md | 8 8 files changed, 25 insertions(+), 29 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8cf33fb8/docs/ml-classification-regression.md -- diff --git a/docs/ml-classification-regression.md b/docs/ml-classification-regression.md index 88457d4..d7e5521 100644 --- a/docs/ml-classification-regression.md +++ b/docs/ml-classification-regression.md @@ -815,7 +815,7 @@ The main differences between this API and the [original MLlib ensembles API](mll ## Random Forests [Random forests](http://en.wikipedia.org/wiki/Random_forest) -are ensembles of [decision trees](ml-decision-tree.html). +are ensembles of [decision trees](ml-classification-regression.html#decision-trees). Random forests combine many decision trees in order to reduce the risk of overfitting. The `spark.ml` implementation supports random forests for binary and multiclass classification and for regression, using both continuous and categorical features. @@ -896,7 +896,7 @@ All output columns are optional; to exclude an output column, set its correspond ## Gradient-Boosted Trees (GBTs) [Gradient-Boosted Trees (GBTs)](http://en.wikipedia.org/wiki/Gradient_boosting) -are ensembles of [decision trees](ml-decision-tree.html). +are ensembles of [decision trees](ml-classification-regression.html#decision-trees). GBTs iteratively train decision trees in order to minimize a loss function. The `spark.ml` implementation supports GBTs for binary classification and for regression, using both continuous and categorical features. http://git-wip-us.apache.org/repos/asf/spark/blob/8cf33fb8/docs/mllib-data-types.md -- diff --git a/docs/mllib-data-types.md b/docs/mllib-data-types.md index 2ffe0f1..ef56aeb 100644 --- a/docs/mllib-data-types.md +++ b/docs/mllib-data-types.md @@ -33,7 +33,7 @@ implementations: [`DenseVector`](api/scala/index.html#org.apache.spark.mllib.lin using the factory methods implemented in [`Vectors`](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) to create local vectors. -Refer to the [`Vector` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) and [`Vectors` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors) for details on the API. +Refer to the [`Vector` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) and [`Vectors` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) for details on the API. {% highlight scala %} import org.apache.spark.mllib.linalg.{Vector, Vectors} @@ -199,7 +199,7 @@ After loading, the feature indices are converted to zero-based. [`MLUtils.loadLibSVMFile`](api/scala/index.html#org.apache.spark.mllib.util.MLUtils$) reads training examples stored in LIBSVM format. -Refer to the [`MLUtils` Scala docs](api/scala/index.html#org.apache.spark.mllib.util.MLUtils) for details on the API. +Refer to the [`MLUtils` Scala docs](api/scala
spark git commit: [SPARK-15878][CORE][TEST] fix cleanup in EventLoggingListenerSuite and ReplayListenerSuite
Repository: spark Updated Branches: refs/heads/master 9e204c62c -> 8cc22b008 [SPARK-15878][CORE][TEST] fix cleanup in EventLoggingListenerSuite and ReplayListenerSuite ## What changes were proposed in this pull request? These tests weren't properly using `LocalSparkContext` so weren't cleaning up correctly when tests failed. ## How was this patch tested? Jenkins. Author: Imran Rashid Closes #13602 from squito/SPARK-15878_cleanup_replaylistener. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8cc22b00 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8cc22b00 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8cc22b00 Branch: refs/heads/master Commit: 8cc22b0085475a188f229536b4f83988ae889a8e Parents: 9e204c6 Author: Imran Rashid Authored: Sun Jun 12 12:54:57 2016 +0100 Committer: Sean Owen Committed: Sun Jun 12 12:54:57 2016 +0100 -- .../org/apache/spark/scheduler/EventLoggingListenerSuite.scala | 2 +- .../scala/org/apache/spark/scheduler/ReplayListenerSuite.scala | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8cc22b00/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala index 176d893..c4c80b5 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala @@ -181,7 +181,7 @@ class EventLoggingListenerSuite extends SparkFunSuite with LocalSparkContext wit // into SPARK-6688. val conf = getLoggingConf(testDirPath, compressionCodec) .set("spark.hadoop.fs.defaultFS", "unsupported://example.com") -val sc = new SparkContext("local-cluster[2,2,1024]", "test", conf) +sc = new SparkContext("local-cluster[2,2,1024]", "test", conf) assert(sc.eventLogger.isDefined) val eventLogger = sc.eventLogger.get val eventLogPath = eventLogger.logPath http://git-wip-us.apache.org/repos/asf/spark/blob/8cc22b00/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala index 35215c1..1732aca 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala @@ -23,7 +23,7 @@ import java.net.URI import org.json4s.jackson.JsonMethods._ import org.scalatest.BeforeAndAfter -import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite} +import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, SparkFunSuite} import org.apache.spark.deploy.SparkHadoopUtil import org.apache.spark.io.CompressionCodec import org.apache.spark.util.{JsonProtocol, JsonProtocolSuite, Utils} @@ -31,7 +31,7 @@ import org.apache.spark.util.{JsonProtocol, JsonProtocolSuite, Utils} /** * Test whether ReplayListenerBus replays events from logs correctly. */ -class ReplayListenerSuite extends SparkFunSuite with BeforeAndAfter { +class ReplayListenerSuite extends SparkFunSuite with BeforeAndAfter with LocalSparkContext { private val fileSystem = Utils.getHadoopFileSystem("/", SparkHadoopUtil.get.newConfiguration(new SparkConf())) private var testDir: File = _ @@ -101,7 +101,7 @@ class ReplayListenerSuite extends SparkFunSuite with BeforeAndAfter { fileSystem.mkdirs(logDirPath) val conf = EventLoggingListenerSuite.getLoggingConf(logDirPath, codecName) -val sc = new SparkContext("local-cluster[2,1,1024]", "Test replay", conf) +sc = new SparkContext("local-cluster[2,1,1024]", "Test replay", conf) // Run a few jobs sc.parallelize(1 to 100, 1).count() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15878][CORE][TEST] fix cleanup in EventLoggingListenerSuite and ReplayListenerSuite
Repository: spark Updated Branches: refs/heads/branch-2.0 d494a483a -> 879e8fd09 [SPARK-15878][CORE][TEST] fix cleanup in EventLoggingListenerSuite and ReplayListenerSuite ## What changes were proposed in this pull request? These tests weren't properly using `LocalSparkContext` so weren't cleaning up correctly when tests failed. ## How was this patch tested? Jenkins. Author: Imran Rashid Closes #13602 from squito/SPARK-15878_cleanup_replaylistener. (cherry picked from commit 8cc22b0085475a188f229536b4f83988ae889a8e) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/879e8fd0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/879e8fd0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/879e8fd0 Branch: refs/heads/branch-2.0 Commit: 879e8fd09477fc78d66c9da9e0e117a513b0b046 Parents: d494a48 Author: Imran Rashid Authored: Sun Jun 12 12:54:57 2016 +0100 Committer: Sean Owen Committed: Sun Jun 12 12:55:17 2016 +0100 -- .../org/apache/spark/scheduler/EventLoggingListenerSuite.scala | 2 +- .../scala/org/apache/spark/scheduler/ReplayListenerSuite.scala | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/879e8fd0/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala index 176d893..c4c80b5 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala @@ -181,7 +181,7 @@ class EventLoggingListenerSuite extends SparkFunSuite with LocalSparkContext wit // into SPARK-6688. val conf = getLoggingConf(testDirPath, compressionCodec) .set("spark.hadoop.fs.defaultFS", "unsupported://example.com") -val sc = new SparkContext("local-cluster[2,2,1024]", "test", conf) +sc = new SparkContext("local-cluster[2,2,1024]", "test", conf) assert(sc.eventLogger.isDefined) val eventLogger = sc.eventLogger.get val eventLogPath = eventLogger.logPath http://git-wip-us.apache.org/repos/asf/spark/blob/879e8fd0/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala index 35215c1..1732aca 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala @@ -23,7 +23,7 @@ import java.net.URI import org.json4s.jackson.JsonMethods._ import org.scalatest.BeforeAndAfter -import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite} +import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, SparkFunSuite} import org.apache.spark.deploy.SparkHadoopUtil import org.apache.spark.io.CompressionCodec import org.apache.spark.util.{JsonProtocol, JsonProtocolSuite, Utils} @@ -31,7 +31,7 @@ import org.apache.spark.util.{JsonProtocol, JsonProtocolSuite, Utils} /** * Test whether ReplayListenerBus replays events from logs correctly. */ -class ReplayListenerSuite extends SparkFunSuite with BeforeAndAfter { +class ReplayListenerSuite extends SparkFunSuite with BeforeAndAfter with LocalSparkContext { private val fileSystem = Utils.getHadoopFileSystem("/", SparkHadoopUtil.get.newConfiguration(new SparkConf())) private var testDir: File = _ @@ -101,7 +101,7 @@ class ReplayListenerSuite extends SparkFunSuite with BeforeAndAfter { fileSystem.mkdirs(logDirPath) val conf = EventLoggingListenerSuite.getLoggingConf(logDirPath, codecName) -val sc = new SparkContext("local-cluster[2,1,1024]", "Test replay", conf) +sc = new SparkContext("local-cluster[2,1,1024]", "Test replay", conf) // Run a few jobs sc.parallelize(1 to 100, 1).count() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15781][DOCUMENTATION] remove deprecated environment variable doc
Repository: spark Updated Branches: refs/heads/branch-2.0 879e8fd09 -> 8c294f4ad [SPARK-15781][DOCUMENTATION] remove deprecated environment variable doc ## What changes were proposed in this pull request? Like `SPARK_JAVA_OPTS` and `SPARK_CLASSPATH`, we will remove the document for `SPARK_WORKER_INSTANCES` to discourage user not to use them. If they are actually used, SparkConf will show a warning message as before. ## How was this patch tested? Manually tested. Author: bomeng Closes #13533 from bomeng/SPARK-15781. (cherry picked from commit 3fd3ee038b89821f51f30a4ecd4452b5b3bc6568) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8c294f4a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8c294f4a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8c294f4a Branch: refs/heads/branch-2.0 Commit: 8c294f4ad95e95f6c8873d7b346394d34cc40975 Parents: 879e8fd Author: bomeng Authored: Sun Jun 12 12:58:34 2016 +0100 Committer: Sean Owen Committed: Sun Jun 12 12:58:41 2016 +0100 -- docs/spark-standalone.md | 9 - 1 file changed, 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8c294f4a/docs/spark-standalone.md -- diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md index fd94c34..40c7293 100644 --- a/docs/spark-standalone.md +++ b/docs/spark-standalone.md @@ -134,15 +134,6 @@ You can optionally configure the cluster further by setting environment variable Port for the worker web UI (default: 8081). -SPARK_WORKER_INSTANCES - - Number of worker instances to run on each machine (default: 1). You can make this more than 1 if - you have have very large machines and would like multiple Spark worker processes. If you do set - this, make sure to also set SPARK_WORKER_CORES explicitly to limit the cores per worker, - or else each worker will try to use all the cores. - - - SPARK_WORKER_DIR Directory to run applications in, which will include both logs and scratch space (default: SPARK_HOME/work). - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15781][DOCUMENTATION] remove deprecated environment variable doc
Repository: spark Updated Branches: refs/heads/master 8cc22b008 -> 3fd3ee038 [SPARK-15781][DOCUMENTATION] remove deprecated environment variable doc ## What changes were proposed in this pull request? Like `SPARK_JAVA_OPTS` and `SPARK_CLASSPATH`, we will remove the document for `SPARK_WORKER_INSTANCES` to discourage user not to use them. If they are actually used, SparkConf will show a warning message as before. ## How was this patch tested? Manually tested. Author: bomeng Closes #13533 from bomeng/SPARK-15781. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3fd3ee03 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3fd3ee03 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3fd3ee03 Branch: refs/heads/master Commit: 3fd3ee038b89821f51f30a4ecd4452b5b3bc6568 Parents: 8cc22b0 Author: bomeng Authored: Sun Jun 12 12:58:34 2016 +0100 Committer: Sean Owen Committed: Sun Jun 12 12:58:34 2016 +0100 -- docs/spark-standalone.md | 9 - 1 file changed, 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3fd3ee03/docs/spark-standalone.md -- diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md index fd94c34..40c7293 100644 --- a/docs/spark-standalone.md +++ b/docs/spark-standalone.md @@ -134,15 +134,6 @@ You can optionally configure the cluster further by setting environment variable Port for the worker web UI (default: 8081). -SPARK_WORKER_INSTANCES - - Number of worker instances to run on each machine (default: 1). You can make this more than 1 if - you have have very large machines and would like multiple Spark worker processes. If you do set - this, make sure to also set SPARK_WORKER_CORES explicitly to limit the cores per worker, - or else each worker will try to use all the cores. - - - SPARK_WORKER_DIR Directory to run applications in, which will include both logs and scratch space (default: SPARK_HOME/work). - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15806][DOCUMENTATION] update doc for SPARK_MASTER_IP
Repository: spark Updated Branches: refs/heads/master 3fd3ee038 -> 50248dcff [SPARK-15806][DOCUMENTATION] update doc for SPARK_MASTER_IP ## What changes were proposed in this pull request? SPARK_MASTER_IP is a deprecated environment variable. It is replaced by SPARK_MASTER_HOST according to MasterArguments.scala. ## How was this patch tested? Manually verified. Author: bomeng Closes #13543 from bomeng/SPARK-15806. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/50248dcf Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/50248dcf Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/50248dcf Branch: refs/heads/master Commit: 50248dcfff3ba79b73323f3a804c1e19a8be6097 Parents: 3fd3ee0 Author: bomeng Authored: Sun Jun 12 14:25:48 2016 +0100 Committer: Sean Owen Committed: Sun Jun 12 14:25:48 2016 +0100 -- conf/spark-env.sh.template | 2 +- .../org/apache/spark/deploy/master/MasterArguments.scala | 8 +++- docs/spark-standalone.md | 4 ++-- sbin/start-master.sh | 6 +++--- sbin/start-slaves.sh | 6 +++--- 5 files changed, 16 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/50248dcf/conf/spark-env.sh.template -- diff --git a/conf/spark-env.sh.template b/conf/spark-env.sh.template index 9cffdc3..c750c72 100755 --- a/conf/spark-env.sh.template +++ b/conf/spark-env.sh.template @@ -42,7 +42,7 @@ # - SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G) # Options for the daemons used in the standalone deploy mode -# - SPARK_MASTER_IP, to bind the master to a different IP address or hostname +# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master # - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y") # - SPARK_WORKER_CORES, to set the number of cores to use on this machine http://git-wip-us.apache.org/repos/asf/spark/blob/50248dcf/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala b/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala index 585e083..c63793c 100644 --- a/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala +++ b/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala @@ -20,18 +20,24 @@ package org.apache.spark.deploy.master import scala.annotation.tailrec import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging import org.apache.spark.util.{IntParam, Utils} /** * Command-line parser for the master. */ -private[master] class MasterArguments(args: Array[String], conf: SparkConf) { +private[master] class MasterArguments(args: Array[String], conf: SparkConf) extends Logging { var host = Utils.localHostName() var port = 7077 var webUiPort = 8080 var propertiesFile: String = null // Check for settings in environment variables + if (System.getenv("SPARK_MASTER_IP") != null) { +logWarning("SPARK_MASTER_IP is deprecated, please use SPARK_MASTER_HOST") +host = System.getenv("SPARK_MASTER_IP") + } + if (System.getenv("SPARK_MASTER_HOST") != null) { host = System.getenv("SPARK_MASTER_HOST") } http://git-wip-us.apache.org/repos/asf/spark/blob/50248dcf/docs/spark-standalone.md -- diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md index 40c7293..c864c90 100644 --- a/docs/spark-standalone.md +++ b/docs/spark-standalone.md @@ -94,8 +94,8 @@ You can optionally configure the cluster further by setting environment variable Environment VariableMeaning -SPARK_MASTER_IP -Bind the master to a specific IP address, for example a public one. +SPARK_MASTER_HOST +Bind the master to a specific hostname or IP address, for example a public one. SPARK_MASTER_PORT http://git-wip-us.apache.org/repos/asf/spark/blob/50248dcf/sbin/start-master.sh -- diff --git a/sbin/start-master.sh b/sbin/start-master.sh index ce7f177..981cb15 100755 --- a/sbin/start-master.sh +++ b/sbin/start-master.sh @@ -47,8 +47,8 @@ if [ "$SPARK_MASTER_PORT" = "" ]; then SPARK_MASTER_PORT=7077 fi -if [ "$SPARK_MASTER_IP" = "" ]; then - SPARK_MASTER_IP=`hostname` +if [ "$SPARK_MASTER_HOST" = "" ]; then + SPARK_MASTER_
spark git commit: [SPARK-15806][DOCUMENTATION] update doc for SPARK_MASTER_IP
Repository: spark Updated Branches: refs/heads/branch-2.0 8c294f4ad -> b75d1c201 [SPARK-15806][DOCUMENTATION] update doc for SPARK_MASTER_IP ## What changes were proposed in this pull request? SPARK_MASTER_IP is a deprecated environment variable. It is replaced by SPARK_MASTER_HOST according to MasterArguments.scala. ## How was this patch tested? Manually verified. Author: bomeng Closes #13543 from bomeng/SPARK-15806. (cherry picked from commit 50248dcfff3ba79b73323f3a804c1e19a8be6097) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b75d1c20 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b75d1c20 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b75d1c20 Branch: refs/heads/branch-2.0 Commit: b75d1c20131b438999645d0be6ea5765a2f7da80 Parents: 8c294f4 Author: bomeng Authored: Sun Jun 12 14:25:48 2016 +0100 Committer: Sean Owen Committed: Sun Jun 12 14:25:56 2016 +0100 -- conf/spark-env.sh.template | 2 +- .../org/apache/spark/deploy/master/MasterArguments.scala | 8 +++- docs/spark-standalone.md | 4 ++-- sbin/start-master.sh | 6 +++--- sbin/start-slaves.sh | 6 +++--- 5 files changed, 16 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b75d1c20/conf/spark-env.sh.template -- diff --git a/conf/spark-env.sh.template b/conf/spark-env.sh.template index 9cffdc3..c750c72 100755 --- a/conf/spark-env.sh.template +++ b/conf/spark-env.sh.template @@ -42,7 +42,7 @@ # - SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G) # Options for the daemons used in the standalone deploy mode -# - SPARK_MASTER_IP, to bind the master to a different IP address or hostname +# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master # - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y") # - SPARK_WORKER_CORES, to set the number of cores to use on this machine http://git-wip-us.apache.org/repos/asf/spark/blob/b75d1c20/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala b/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala index 585e083..c63793c 100644 --- a/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala +++ b/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala @@ -20,18 +20,24 @@ package org.apache.spark.deploy.master import scala.annotation.tailrec import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging import org.apache.spark.util.{IntParam, Utils} /** * Command-line parser for the master. */ -private[master] class MasterArguments(args: Array[String], conf: SparkConf) { +private[master] class MasterArguments(args: Array[String], conf: SparkConf) extends Logging { var host = Utils.localHostName() var port = 7077 var webUiPort = 8080 var propertiesFile: String = null // Check for settings in environment variables + if (System.getenv("SPARK_MASTER_IP") != null) { +logWarning("SPARK_MASTER_IP is deprecated, please use SPARK_MASTER_HOST") +host = System.getenv("SPARK_MASTER_IP") + } + if (System.getenv("SPARK_MASTER_HOST") != null) { host = System.getenv("SPARK_MASTER_HOST") } http://git-wip-us.apache.org/repos/asf/spark/blob/b75d1c20/docs/spark-standalone.md -- diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md index 40c7293..c864c90 100644 --- a/docs/spark-standalone.md +++ b/docs/spark-standalone.md @@ -94,8 +94,8 @@ You can optionally configure the cluster further by setting environment variable Environment VariableMeaning -SPARK_MASTER_IP -Bind the master to a specific IP address, for example a public one. +SPARK_MASTER_HOST +Bind the master to a specific hostname or IP address, for example a public one. SPARK_MASTER_PORT http://git-wip-us.apache.org/repos/asf/spark/blob/b75d1c20/sbin/start-master.sh -- diff --git a/sbin/start-master.sh b/sbin/start-master.sh index ce7f177..981cb15 100755 --- a/sbin/start-master.sh +++ b/sbin/start-master.sh @@ -47,8 +47,8 @@ if [ "$SPARK_MASTER_PORT" = "" ]; then SPARK_MASTER_PORT=7077 fi -if [ "$SPARK_MASTER
spark git commit: [SPARK-15813] Improve Canceling log message to make it less ambiguous
Repository: spark Updated Branches: refs/heads/branch-2.0 b96e7f6aa -> 41f309bfb [SPARK-15813] Improve Canceling log message to make it less ambiguous ## What changes were proposed in this pull request? Add new desired executor number to make the log message less ambiguous. ## How was this patch tested? This is a trivial change Author: Peter Ableda Closes #13552 from peterableda/patch-1. (cherry picked from commit d681742b2d37bd68cf5d8d3161e0f48846f6f9d4) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/41f309bf Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/41f309bf Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/41f309bf Branch: refs/heads/branch-2.0 Commit: 41f309bfbcefcc9612efb7c0571a4009147e5896 Parents: b96e7f6 Author: Peter Ableda Authored: Mon Jun 13 09:40:17 2016 +0100 Committer: Sean Owen Committed: Mon Jun 13 09:40:25 2016 +0100 -- .../main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/41f309bf/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala -- diff --git a/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala b/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala index b110d82..1b80071 100644 --- a/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala +++ b/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala @@ -354,7 +354,8 @@ private[yarn] class YarnAllocator( } else if (missing < 0) { val numToCancel = math.min(numPendingAllocate, -missing) - logInfo(s"Canceling requests for $numToCancel executor containers") + logInfo(s"Canceling requests for $numToCancel executor container(s) to have a new desired " + +s"total $targetNumExecutors executors.") val matchingRequests = amClient.getMatchingRequests(RM_REQUEST_PRIORITY, ANY_HOST, resource) if (!matchingRequests.isEmpty) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15813] Improve Canceling log message to make it less ambiguous
Repository: spark Updated Branches: refs/heads/master e2ab79d5e -> d681742b2 [SPARK-15813] Improve Canceling log message to make it less ambiguous ## What changes were proposed in this pull request? Add new desired executor number to make the log message less ambiguous. ## How was this patch tested? This is a trivial change Author: Peter Ableda Closes #13552 from peterableda/patch-1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d681742b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d681742b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d681742b Branch: refs/heads/master Commit: d681742b2d37bd68cf5d8d3161e0f48846f6f9d4 Parents: e2ab79d Author: Peter Ableda Authored: Mon Jun 13 09:40:17 2016 +0100 Committer: Sean Owen Committed: Mon Jun 13 09:40:17 2016 +0100 -- .../main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d681742b/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala -- diff --git a/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala b/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala index b110d82..1b80071 100644 --- a/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala +++ b/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala @@ -354,7 +354,8 @@ private[yarn] class YarnAllocator( } else if (missing < 0) { val numToCancel = math.min(numPendingAllocate, -missing) - logInfo(s"Canceling requests for $numToCancel executor containers") + logInfo(s"Canceling requests for $numToCancel executor container(s) to have a new desired " + +s"total $targetNumExecutors executors.") val matchingRequests = amClient.getMatchingRequests(RM_REQUEST_PRIORITY, ANY_HOST, resource) if (!matchingRequests.isEmpty) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [DOCUMENTATION] fixed typos in python programming guide
Repository: spark Updated Branches: refs/heads/master 688b6ef9d -> a87a56f5c [DOCUMENTATION] fixed typos in python programming guide ## What changes were proposed in this pull request? minor typo ## How was this patch tested? minor typo in the doc, should be self explanatory Author: Mortada Mehyar Closes #13639 from mortada/typo. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a87a56f5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a87a56f5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a87a56f5 Branch: refs/heads/master Commit: a87a56f5c70792eccbb57046f6b26d40494c380a Parents: 688b6ef Author: Mortada Mehyar Authored: Tue Jun 14 09:45:46 2016 +0100 Committer: Sean Owen Committed: Tue Jun 14 09:45:46 2016 +0100 -- docs/programming-guide.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a87a56f5/docs/programming-guide.md -- diff --git a/docs/programming-guide.md b/docs/programming-guide.md index 3f081a0..97bcb51 100644 --- a/docs/programming-guide.md +++ b/docs/programming-guide.md @@ -491,7 +491,7 @@ for examples of using Cassandra / HBase ```InputFormat``` and ```OutputFormat``` RDDs support two types of operations: *transformations*, which create a new dataset from an existing one, and *actions*, which return a value to the driver program after running a computation on the dataset. For example, `map` is a transformation that passes each dataset element through a function and returns a new RDD representing the results. On the other hand, `reduce` is an action that aggregates all the elements of the RDD using some function and returns the final result to the driver program (although there is also a parallel `reduceByKey` that returns a distributed dataset). -All transformations in Spark are lazy, in that they do not compute their results right away. Instead, they just remember the transformations applied to some base dataset (e.g. a file). The transformations are only computed when an action requires a result to be returned to the driver program. This design enables Spark to run more efficiently -- for example, we can realize that a dataset created through `map` will be used in a `reduce` and return only the result of the `reduce` to the driver, rather than the larger mapped dataset. +All transformations in Spark are lazy, in that they do not compute their results right away. Instead, they just remember the transformations applied to some base dataset (e.g. a file). The transformations are only computed when an action requires a result to be returned to the driver program. This design enables Spark to run more efficiently. For example, we can realize that a dataset created through `map` will be used in a `reduce` and return only the result of the `reduce` to the driver, rather than the larger mapped dataset. By default, each transformed RDD may be recomputed each time you run an action on it. However, you may also *persist* an RDD in memory using the `persist` (or `cache`) method, in which case Spark will keep the elements around on the cluster for much faster access the next time you query it. There is also support for persisting RDDs on disk, or replicated across multiple nodes. @@ -618,7 +618,7 @@ class MyClass { } {% endhighlight %} -Here, if we create a `new MyClass` and call `doStuff` on it, the `map` inside there references the +Here, if we create a new `MyClass` instance and call `doStuff` on it, the `map` inside there references the `func1` method *of that `MyClass` instance*, so the whole object needs to be sent to the cluster. It is similar to writing `rdd.map(x => this.func1(x))`. @@ -1156,7 +1156,7 @@ to disk, incurring the additional overhead of disk I/O and increased garbage col Shuffle also generates a large number of intermediate files on disk. As of Spark 1.3, these files are preserved until the corresponding RDDs are no longer used and are garbage collected. This is done so the shuffle files don't need to be re-created if the lineage is re-computed. -Garbage collection may happen only after a long period time, if the application retains references +Garbage collection may happen only after a long period of time, if the application retains references to these RDDs or if GC does not kick in frequently. This means that long-running Spark jobs may consume a large amount of disk space. The temporary storage directory is specified by the `spark.local.dir` configuration parameter when configuring the Spark context. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional comma
spark git commit: [DOCUMENTATION] fixed typos in python programming guide
Repository: spark Updated Branches: refs/heads/branch-2.0 974be6241 -> cf52375b9 [DOCUMENTATION] fixed typos in python programming guide ## What changes were proposed in this pull request? minor typo ## How was this patch tested? minor typo in the doc, should be self explanatory Author: Mortada Mehyar Closes #13639 from mortada/typo. (cherry picked from commit a87a56f5c70792eccbb57046f6b26d40494c380a) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cf52375b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cf52375b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cf52375b Branch: refs/heads/branch-2.0 Commit: cf52375b9f3da84d6aad31134d4f2859de7d447c Parents: 974be62 Author: Mortada Mehyar Authored: Tue Jun 14 09:45:46 2016 +0100 Committer: Sean Owen Committed: Tue Jun 14 09:45:56 2016 +0100 -- docs/programming-guide.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/cf52375b/docs/programming-guide.md -- diff --git a/docs/programming-guide.md b/docs/programming-guide.md index 3f081a0..97bcb51 100644 --- a/docs/programming-guide.md +++ b/docs/programming-guide.md @@ -491,7 +491,7 @@ for examples of using Cassandra / HBase ```InputFormat``` and ```OutputFormat``` RDDs support two types of operations: *transformations*, which create a new dataset from an existing one, and *actions*, which return a value to the driver program after running a computation on the dataset. For example, `map` is a transformation that passes each dataset element through a function and returns a new RDD representing the results. On the other hand, `reduce` is an action that aggregates all the elements of the RDD using some function and returns the final result to the driver program (although there is also a parallel `reduceByKey` that returns a distributed dataset). -All transformations in Spark are lazy, in that they do not compute their results right away. Instead, they just remember the transformations applied to some base dataset (e.g. a file). The transformations are only computed when an action requires a result to be returned to the driver program. This design enables Spark to run more efficiently -- for example, we can realize that a dataset created through `map` will be used in a `reduce` and return only the result of the `reduce` to the driver, rather than the larger mapped dataset. +All transformations in Spark are lazy, in that they do not compute their results right away. Instead, they just remember the transformations applied to some base dataset (e.g. a file). The transformations are only computed when an action requires a result to be returned to the driver program. This design enables Spark to run more efficiently. For example, we can realize that a dataset created through `map` will be used in a `reduce` and return only the result of the `reduce` to the driver, rather than the larger mapped dataset. By default, each transformed RDD may be recomputed each time you run an action on it. However, you may also *persist* an RDD in memory using the `persist` (or `cache`) method, in which case Spark will keep the elements around on the cluster for much faster access the next time you query it. There is also support for persisting RDDs on disk, or replicated across multiple nodes. @@ -618,7 +618,7 @@ class MyClass { } {% endhighlight %} -Here, if we create a `new MyClass` and call `doStuff` on it, the `map` inside there references the +Here, if we create a new `MyClass` instance and call `doStuff` on it, the `map` inside there references the `func1` method *of that `MyClass` instance*, so the whole object needs to be sent to the cluster. It is similar to writing `rdd.map(x => this.func1(x))`. @@ -1156,7 +1156,7 @@ to disk, incurring the additional overhead of disk I/O and increased garbage col Shuffle also generates a large number of intermediate files on disk. As of Spark 1.3, these files are preserved until the corresponding RDDs are no longer used and are garbage collected. This is done so the shuffle files don't need to be re-created if the lineage is re-computed. -Garbage collection may happen only after a long period time, if the application retains references +Garbage collection may happen only after a long period of time, if the application retains references to these RDDs or if GC does not kick in frequently. This means that long-running Spark jobs may consume a large amount of disk space. The temporary storage directory is specified by the `spark.local.dir` configuration parameter when configuring the Spark context. ---
spark git commit: [SPARK-15821][DOCS] Include parallel build info
Repository: spark Updated Branches: refs/heads/master 96c3500c6 -> a431e3f1f [SPARK-15821][DOCS] Include parallel build info ## What changes were proposed in this pull request? We should mention that users can build Spark using multiple threads to decrease build times; either here or in "Building Spark" ## How was this patch tested? Built on machines with between one core to 192 cores using mvn -T 1C and observed faster build times with no loss in stability In response to the question here https://issues.apache.org/jira/browse/SPARK-15821 I think we should suggest this option as we know it works for Spark and can result in faster builds Author: Adam Roberts Closes #13562 from a-roberts/patch-3. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a431e3f1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a431e3f1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a431e3f1 Branch: refs/heads/master Commit: a431e3f1f8575e2498650ac767e69fbc903e9929 Parents: 96c3500 Author: Adam Roberts Authored: Tue Jun 14 13:59:01 2016 +0100 Committer: Sean Owen Committed: Tue Jun 14 13:59:01 2016 +0100 -- README.md| 2 ++ dev/make-distribution.sh | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a431e3f1/README.md -- diff --git a/README.md b/README.md index d5804d1..c77c429 100644 --- a/README.md +++ b/README.md @@ -25,6 +25,8 @@ To build Spark and its example programs, run: build/mvn -DskipTests clean package (You do not need to do this if you downloaded a pre-built package.) + +You can build Spark using more than one thread by using the -T option with Maven, see ["Parallel builds in Maven 3"](https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3). More detailed documentation is available from the project site, at ["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html). For developing Spark using an IDE, see [Eclipse](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-Eclipse) http://git-wip-us.apache.org/repos/asf/spark/blob/a431e3f1/dev/make-distribution.sh -- diff --git a/dev/make-distribution.sh b/dev/make-distribution.sh index 4f7544f..9be4fdf 100755 --- a/dev/make-distribution.sh +++ b/dev/make-distribution.sh @@ -53,7 +53,7 @@ while (( "$#" )); do --hadoop) echo "Error: '--hadoop' is no longer supported:" echo "Error: use Maven profiles and options -Dhadoop.version and -Dyarn.version instead." - echo "Error: Related profiles include hadoop-2.2, hadoop-2.3 and hadoop-2.4." + echo "Error: Related profiles include hadoop-2.2, hadoop-2.3, hadoop-2.4, hadoop-2.6 and hadoop-2.7." exit_with_usage ;; --with-yarn) @@ -150,7 +150,7 @@ export MAVEN_OPTS="${MAVEN_OPTS:--Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCac # Store the command as an array because $MVN variable might have spaces in it. # Normal quoting tricks don't work. # See: http://mywiki.wooledge.org/BashFAQ/050 -BUILD_COMMAND=("$MVN" clean package -DskipTests $@) +BUILD_COMMAND=("$MVN" -T 1C clean package -DskipTests $@) # Actually build the jar echo -e "\nBuilding with..." - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15821][DOCS] Include parallel build info
Repository: spark Updated Branches: refs/heads/branch-2.0 d59859d38 -> 0d80bc291 [SPARK-15821][DOCS] Include parallel build info ## What changes were proposed in this pull request? We should mention that users can build Spark using multiple threads to decrease build times; either here or in "Building Spark" ## How was this patch tested? Built on machines with between one core to 192 cores using mvn -T 1C and observed faster build times with no loss in stability In response to the question here https://issues.apache.org/jira/browse/SPARK-15821 I think we should suggest this option as we know it works for Spark and can result in faster builds Author: Adam Roberts Closes #13562 from a-roberts/patch-3. (cherry picked from commit a431e3f1f8575e2498650ac767e69fbc903e9929) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0d80bc29 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0d80bc29 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0d80bc29 Branch: refs/heads/branch-2.0 Commit: 0d80bc291f8c96359b22bda2df8cb7b835e31339 Parents: d59859d Author: Adam Roberts Authored: Tue Jun 14 13:59:01 2016 +0100 Committer: Sean Owen Committed: Tue Jun 14 13:59:16 2016 +0100 -- README.md| 2 ++ dev/make-distribution.sh | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0d80bc29/README.md -- diff --git a/README.md b/README.md index d5804d1..c77c429 100644 --- a/README.md +++ b/README.md @@ -25,6 +25,8 @@ To build Spark and its example programs, run: build/mvn -DskipTests clean package (You do not need to do this if you downloaded a pre-built package.) + +You can build Spark using more than one thread by using the -T option with Maven, see ["Parallel builds in Maven 3"](https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3). More detailed documentation is available from the project site, at ["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html). For developing Spark using an IDE, see [Eclipse](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-Eclipse) http://git-wip-us.apache.org/repos/asf/spark/blob/0d80bc29/dev/make-distribution.sh -- diff --git a/dev/make-distribution.sh b/dev/make-distribution.sh index 4f7544f..9be4fdf 100755 --- a/dev/make-distribution.sh +++ b/dev/make-distribution.sh @@ -53,7 +53,7 @@ while (( "$#" )); do --hadoop) echo "Error: '--hadoop' is no longer supported:" echo "Error: use Maven profiles and options -Dhadoop.version and -Dyarn.version instead." - echo "Error: Related profiles include hadoop-2.2, hadoop-2.3 and hadoop-2.4." + echo "Error: Related profiles include hadoop-2.2, hadoop-2.3, hadoop-2.4, hadoop-2.6 and hadoop-2.7." exit_with_usage ;; --with-yarn) @@ -150,7 +150,7 @@ export MAVEN_OPTS="${MAVEN_OPTS:--Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCac # Store the command as an array because $MVN variable might have spaces in it. # Normal quoting tricks don't work. # See: http://mywiki.wooledge.org/BashFAQ/050 -BUILD_COMMAND=("$MVN" clean package -DskipTests $@) +BUILD_COMMAND=("$MVN" -T 1C clean package -DskipTests $@) # Actually build the jar echo -e "\nBuilding with..." - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: doc fix of HiveThriftServer
Repository: spark Updated Branches: refs/heads/master a431e3f1f -> 53bb03084 doc fix of HiveThriftServer ## What changes were proposed in this pull request? Just minor doc fix. \cc yhuai Author: Jeff Zhang Closes #13659 from zjffdu/doc_fix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/53bb0308 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/53bb0308 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/53bb0308 Branch: refs/heads/master Commit: 53bb03084796231f724ff8369490df520e1ee33c Parents: a431e3f Author: Jeff Zhang Authored: Tue Jun 14 14:28:40 2016 +0100 Committer: Sean Owen Committed: Tue Jun 14 14:28:40 2016 +0100 -- .../apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala | 2 +- .../spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala | 4 ++-- .../apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/53bb0308/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala -- diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala index c82fa4e..2e0fa1e 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala @@ -30,7 +30,7 @@ import org.apache.spark.ui._ import org.apache.spark.ui.UIUtils._ -/** Page for Spark Web UI that shows statistics of a thrift server */ +/** Page for Spark Web UI that shows statistics of the thrift server */ private[ui] class ThriftServerPage(parent: ThriftServerTab) extends WebUIPage("") with Logging { private val listener = parent.listener http://git-wip-us.apache.org/repos/asf/spark/blob/53bb0308/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala -- diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala index 008108a..f39e9dc 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala @@ -29,7 +29,7 @@ import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.{ExecutionInfo, import org.apache.spark.ui._ import org.apache.spark.ui.UIUtils._ -/** Page for Spark Web UI that shows statistics of a streaming job */ +/** Page for Spark Web UI that shows statistics of jobs running in the thrift server */ private[ui] class ThriftServerSessionPage(parent: ThriftServerTab) extends WebUIPage("session") with Logging { @@ -60,7 +60,7 @@ private[ui] class ThriftServerSessionPage(parent: ThriftServerTab) UIUtils.headerSparkPage("JDBC/ODBC Session", content, parent, Some(5000)) } - /** Generate basic stats of the streaming program */ + /** Generate basic stats of the thrift server program */ private def generateBasicStats(): Seq[Node] = { val timeSinceStart = System.currentTimeMillis() - startTime.getTime http://git-wip-us.apache.org/repos/asf/spark/blob/53bb0308/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala -- diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala index 923ba8a..db20660 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala @@ -24,7 +24,7 @@ import org.apache.spark.sql.hive.thriftserver.ui.ThriftServerTab._ import org.apache.spark.ui.{SparkUI, SparkUITab} /** - * Spark Web UI tab that shows statistics of a streaming job. + * Spark Web UI tab that shows statistics of jobs running in the thrift server. * This assumes the given SparkContext has enabled its SparkUI. */ private[thriftserver] class ThriftServerTab(sparkContext: SparkContext) ---
spark git commit: doc fix of HiveThriftServer
Repository: spark Updated Branches: refs/heads/branch-2.0 0d80bc291 -> e90ba2287 doc fix of HiveThriftServer ## What changes were proposed in this pull request? Just minor doc fix. \cc yhuai Author: Jeff Zhang Closes #13659 from zjffdu/doc_fix. (cherry picked from commit 53bb03084796231f724ff8369490df520e1ee33c) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e90ba228 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e90ba228 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e90ba228 Branch: refs/heads/branch-2.0 Commit: e90ba228787c0a8b50855bafb0bc16eddee8329b Parents: 0d80bc2 Author: Jeff Zhang Authored: Tue Jun 14 14:28:40 2016 +0100 Committer: Sean Owen Committed: Tue Jun 14 14:28:54 2016 +0100 -- .../apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala | 2 +- .../spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala | 4 ++-- .../apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e90ba228/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala -- diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala index c82fa4e..2e0fa1e 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala @@ -30,7 +30,7 @@ import org.apache.spark.ui._ import org.apache.spark.ui.UIUtils._ -/** Page for Spark Web UI that shows statistics of a thrift server */ +/** Page for Spark Web UI that shows statistics of the thrift server */ private[ui] class ThriftServerPage(parent: ThriftServerTab) extends WebUIPage("") with Logging { private val listener = parent.listener http://git-wip-us.apache.org/repos/asf/spark/blob/e90ba228/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala -- diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala index 008108a..f39e9dc 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala @@ -29,7 +29,7 @@ import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.{ExecutionInfo, import org.apache.spark.ui._ import org.apache.spark.ui.UIUtils._ -/** Page for Spark Web UI that shows statistics of a streaming job */ +/** Page for Spark Web UI that shows statistics of jobs running in the thrift server */ private[ui] class ThriftServerSessionPage(parent: ThriftServerTab) extends WebUIPage("session") with Logging { @@ -60,7 +60,7 @@ private[ui] class ThriftServerSessionPage(parent: ThriftServerTab) UIUtils.headerSparkPage("JDBC/ODBC Session", content, parent, Some(5000)) } - /** Generate basic stats of the streaming program */ + /** Generate basic stats of the thrift server program */ private def generateBasicStats(): Seq[Node] = { val timeSinceStart = System.currentTimeMillis() - startTime.getTime http://git-wip-us.apache.org/repos/asf/spark/blob/e90ba228/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala -- diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala index 923ba8a..db20660 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala @@ -24,7 +24,7 @@ import org.apache.spark.sql.hive.thriftserver.ui.ThriftServerTab._ import org.apache.spark.ui.{SparkUI, SparkUITab} /** - * Spark Web UI tab that shows statistics of a streaming job. + * Spark Web UI tab that shows statistics of jobs running in the thrift server. * This assumes the given SparkContext has enabled its SparkUI. */ private[thrif
spark git commit: [MINOR] Clean up several build warnings, mostly due to internal use of old accumulators
Repository: spark Updated Branches: refs/heads/branch-2.0 e03c25193 -> 24539223b [MINOR] Clean up several build warnings, mostly due to internal use of old accumulators Another PR to clean up recent build warnings. This particularly cleans up several instances of the old accumulator API usage in tests that are straightforward to update. I think this qualifies as "minor". Jenkins Author: Sean Owen Closes #13642 from srowen/BuildWarnings. (cherry picked from commit 6151d2641f91c8e3ec0c324e78afb46cdb2ef111) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/24539223 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/24539223 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/24539223 Branch: refs/heads/branch-2.0 Commit: 24539223b043b621a377251bdab206833af78d0c Parents: e03c251 Author: Sean Owen Authored: Tue Jun 14 09:40:07 2016 -0700 Committer: Sean Owen Committed: Tue Jun 14 20:36:30 2016 +0100 -- core/pom.xml| 6 +- .../spark/scheduler/DAGSchedulerSuite.scala | 12 +-- .../spark/scheduler/TaskContextSuite.scala | 9 +- .../spark/sql/execution/debug/package.scala | 34 +++--- .../sql/execution/metric/SQLMetricsSuite.scala | 105 +-- .../spark/deploy/yarn/YarnAllocatorSuite.scala | 1 + 6 files changed, 31 insertions(+), 136 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/24539223/core/pom.xml -- diff --git a/core/pom.xml b/core/pom.xml index f5fdb40..90c8f97 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -356,12 +356,12 @@ generate-resources - + - + - + run http://git-wip-us.apache.org/repos/asf/spark/blob/24539223/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala index 5bcc8ff..ce4e7a2 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala @@ -1593,13 +1593,11 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with Timeou } test("misbehaved accumulator should not crash DAGScheduler and SparkContext") { -val acc = new Accumulator[Int](0, new AccumulatorParam[Int] { - override def addAccumulator(t1: Int, t2: Int): Int = t1 + t2 - override def zero(initialValue: Int): Int = 0 - override def addInPlace(r1: Int, r2: Int): Int = { -throw new DAGSchedulerSuiteDummyException - } -}) +val acc = new LongAccumulator { + override def add(v: java.lang.Long): Unit = throw new DAGSchedulerSuiteDummyException + override def add(v: Long): Unit = throw new DAGSchedulerSuiteDummyException +} +sc.register(acc) // Run this on executors sc.parallelize(1 to 10, 2).foreach { item => acc.add(1) } http://git-wip-us.apache.org/repos/asf/spark/blob/24539223/core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala index 368668b..9eda79a 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala @@ -146,14 +146,13 @@ class TaskContextSuite extends SparkFunSuite with BeforeAndAfter with LocalSpark test("accumulators are updated on exception failures") { // This means use 1 core and 4 max task failures sc = new SparkContext("local[1,4]", "test") -val param = AccumulatorParam.LongAccumulatorParam // Create 2 accumulators, one that counts failed values and another that doesn't -val acc1 = new Accumulator(0L, param, Some("x"), countFailedValues = true) -val acc2 = new Accumulator(0L, param, Some("y"), countFailedValues = false) +val acc1 = AccumulatorSuite.createLongAccum("x", true) +val acc2 = AccumulatorSuite.createLongAccum("y", false) // Fail first 3 attempts of every task. This means each task should be run 4 times. sc.parallelize(1 to
spark git commit: [SPARK-15922][MLLIB] `toIndexedRowMatrix` should consider the case `cols < offset+colsPerBlock`
Repository: spark Updated Branches: refs/heads/master f9bf15d9b -> 36110a830 [SPARK-15922][MLLIB] `toIndexedRowMatrix` should consider the case `cols < offset+colsPerBlock` ## What changes were proposed in this pull request? SPARK-15922 reports the following scenario throwing an exception due to the mismatched vector sizes. This PR handles the exceptional case, `cols < (offset + colsPerBlock)`. **Before** ```scala scala> import org.apache.spark.mllib.linalg.distributed._ scala> import org.apache.spark.mllib.linalg._ scala> val rows = IndexedRow(0L, new DenseVector(Array(1,2,3))) :: IndexedRow(1L, new DenseVector(Array(1,2,3))):: IndexedRow(2L, new DenseVector(Array(1,2,3))):: Nil scala> val rdd = sc.parallelize(rows) scala> val matrix = new IndexedRowMatrix(rdd, 3, 3) scala> val bmat = matrix.toBlockMatrix scala> val imat = bmat.toIndexedRowMatrix scala> imat.rows.collect ... // java.lang.IllegalArgumentException: requirement failed: Vectors must be the same length! ``` **After** ```scala ... scala> imat.rows.collect res0: Array[org.apache.spark.mllib.linalg.distributed.IndexedRow] = Array(IndexedRow(0,[1.0,2.0,3.0]), IndexedRow(1,[1.0,2.0,3.0]), IndexedRow(2,[1.0,2.0,3.0])) ``` ## How was this patch tested? Pass the Jenkins tests (including the above case) Author: Dongjoon Hyun Closes #13643 from dongjoon-hyun/SPARK-15922. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/36110a83 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/36110a83 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/36110a83 Branch: refs/heads/master Commit: 36110a8306608186696c536028d2776e022d305a Parents: f9bf15d Author: Dongjoon Hyun Authored: Thu Jun 16 23:02:46 2016 +0200 Committer: Sean Owen Committed: Thu Jun 16 23:02:46 2016 +0200 -- .../org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala | 2 +- .../spark/mllib/linalg/distributed/BlockMatrixSuite.scala | 5 + 2 files changed, 6 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/36110a83/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala index 7a24617..639295c 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala @@ -288,7 +288,7 @@ class BlockMatrix @Since("1.3.0") ( vectors.foreach { case (blockColIdx: Int, vec: BV[Double]) => val offset = colsPerBlock * blockColIdx -wholeVector(offset until offset + colsPerBlock) := vec +wholeVector(offset until Math.min(cols, offset + colsPerBlock)) := vec } new IndexedRow(rowIdx, Vectors.fromBreeze(wholeVector)) } http://git-wip-us.apache.org/repos/asf/spark/blob/36110a83/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala -- diff --git a/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala b/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala index e5a2cbb..61266f3 100644 --- a/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala @@ -135,6 +135,11 @@ class BlockMatrixSuite extends SparkFunSuite with MLlibTestSparkContext { assert(rowMat.numCols() === n) assert(rowMat.toBreeze() === gridBasedMat.toBreeze()) +// SPARK-15922: BlockMatrix to IndexedRowMatrix throws an error" +val bmat = rowMat.toBlockMatrix +val imat = bmat.toIndexedRowMatrix +imat.rows.collect + val rows = 1 val cols = 10 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15922][MLLIB] `toIndexedRowMatrix` should consider the case `cols < offset+colsPerBlock`
Repository: spark Updated Branches: refs/heads/branch-2.0 5b003c9bc -> 579268426 [SPARK-15922][MLLIB] `toIndexedRowMatrix` should consider the case `cols < offset+colsPerBlock` ## What changes were proposed in this pull request? SPARK-15922 reports the following scenario throwing an exception due to the mismatched vector sizes. This PR handles the exceptional case, `cols < (offset + colsPerBlock)`. **Before** ```scala scala> import org.apache.spark.mllib.linalg.distributed._ scala> import org.apache.spark.mllib.linalg._ scala> val rows = IndexedRow(0L, new DenseVector(Array(1,2,3))) :: IndexedRow(1L, new DenseVector(Array(1,2,3))):: IndexedRow(2L, new DenseVector(Array(1,2,3))):: Nil scala> val rdd = sc.parallelize(rows) scala> val matrix = new IndexedRowMatrix(rdd, 3, 3) scala> val bmat = matrix.toBlockMatrix scala> val imat = bmat.toIndexedRowMatrix scala> imat.rows.collect ... // java.lang.IllegalArgumentException: requirement failed: Vectors must be the same length! ``` **After** ```scala ... scala> imat.rows.collect res0: Array[org.apache.spark.mllib.linalg.distributed.IndexedRow] = Array(IndexedRow(0,[1.0,2.0,3.0]), IndexedRow(1,[1.0,2.0,3.0]), IndexedRow(2,[1.0,2.0,3.0])) ``` ## How was this patch tested? Pass the Jenkins tests (including the above case) Author: Dongjoon Hyun Closes #13643 from dongjoon-hyun/SPARK-15922. (cherry picked from commit 36110a8306608186696c536028d2776e022d305a) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/57926842 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/57926842 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/57926842 Branch: refs/heads/branch-2.0 Commit: 5792684268b273562e694855eb671c21c4044280 Parents: 5b003c9 Author: Dongjoon Hyun Authored: Thu Jun 16 23:02:46 2016 +0200 Committer: Sean Owen Committed: Thu Jun 16 23:03:00 2016 +0200 -- .../org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala | 2 +- .../spark/mllib/linalg/distributed/BlockMatrixSuite.scala | 5 + 2 files changed, 6 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/57926842/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala index 7a24617..639295c 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala @@ -288,7 +288,7 @@ class BlockMatrix @Since("1.3.0") ( vectors.foreach { case (blockColIdx: Int, vec: BV[Double]) => val offset = colsPerBlock * blockColIdx -wholeVector(offset until offset + colsPerBlock) := vec +wholeVector(offset until Math.min(cols, offset + colsPerBlock)) := vec } new IndexedRow(rowIdx, Vectors.fromBreeze(wholeVector)) } http://git-wip-us.apache.org/repos/asf/spark/blob/57926842/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala -- diff --git a/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala b/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala index e5a2cbb..61266f3 100644 --- a/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala @@ -135,6 +135,11 @@ class BlockMatrixSuite extends SparkFunSuite with MLlibTestSparkContext { assert(rowMat.numCols() === n) assert(rowMat.toBreeze() === gridBasedMat.toBreeze()) +// SPARK-15922: BlockMatrix to IndexedRowMatrix throws an error" +val bmat = rowMat.toBlockMatrix +val imat = bmat.toIndexedRowMatrix +imat.rows.collect + val rows = 1 val cols = 10 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15796][CORE] Reduce spark.memory.fraction default to avoid overrunning old gen in JVM default config
Repository: spark Updated Branches: refs/heads/master 36110a830 -> 457126e42 [SPARK-15796][CORE] Reduce spark.memory.fraction default to avoid overrunning old gen in JVM default config ## What changes were proposed in this pull request? Reduce `spark.memory.fraction` default to 0.6 in order to make it fit within default JVM old generation size (2/3 heap). See JIRA discussion. This means a full cache doesn't spill into the new gen. CC andrewor14 ## How was this patch tested? Jenkins tests. Author: Sean Owen Closes #13618 from srowen/SPARK-15796. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/457126e4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/457126e4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/457126e4 Branch: refs/heads/master Commit: 457126e420e66228cc68def4bc3d87e7a282069a Parents: 36110a8 Author: Sean Owen Authored: Thu Jun 16 23:04:10 2016 +0200 Committer: Sean Owen Committed: Thu Jun 16 23:04:10 2016 +0200 -- .../spark/memory/UnifiedMemoryManager.scala | 8 .../scala/org/apache/spark/DistributedSuite.scala | 2 +- docs/configuration.md | 7 --- docs/tuning.md| 18 +- 4 files changed, 26 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/457126e4/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala -- diff --git a/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala b/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala index ae747c1..c7b36be 100644 --- a/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala +++ b/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala @@ -25,9 +25,9 @@ import org.apache.spark.storage.BlockId * either side can borrow memory from the other. * * The region shared between execution and storage is a fraction of (the total heap space - 300MB) - * configurable through `spark.memory.fraction` (default 0.75). The position of the boundary + * configurable through `spark.memory.fraction` (default 0.6). The position of the boundary * within this space is further determined by `spark.memory.storageFraction` (default 0.5). - * This means the size of the storage region is 0.75 * 0.5 = 0.375 of the heap space by default. + * This means the size of the storage region is 0.6 * 0.5 = 0.3 of the heap space by default. * * Storage can borrow as much execution memory as is free until execution reclaims its space. * When this happens, cached blocks will be evicted from memory until sufficient borrowed @@ -187,7 +187,7 @@ object UnifiedMemoryManager { // Set aside a fixed amount of memory for non-storage, non-execution purposes. // This serves a function similar to `spark.memory.fraction`, but guarantees that we reserve // sufficient memory for the system even for small heaps. E.g. if we have a 1GB JVM, then - // the memory used for execution and storage will be (1024 - 300) * 0.75 = 543MB by default. + // the memory used for execution and storage will be (1024 - 300) * 0.6 = 434MB by default. private val RESERVED_SYSTEM_MEMORY_BYTES = 300 * 1024 * 1024 def apply(conf: SparkConf, numCores: Int): UnifiedMemoryManager = { @@ -223,7 +223,7 @@ object UnifiedMemoryManager { } } val usableMemory = systemMemory - reservedMemory -val memoryFraction = conf.getDouble("spark.memory.fraction", 0.75) +val memoryFraction = conf.getDouble("spark.memory.fraction", 0.6) (usableMemory * memoryFraction).toLong } } http://git-wip-us.apache.org/repos/asf/spark/blob/457126e4/core/src/test/scala/org/apache/spark/DistributedSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/DistributedSuite.scala b/core/src/test/scala/org/apache/spark/DistributedSuite.scala index 6e69fc4..0515e6e 100644 --- a/core/src/test/scala/org/apache/spark/DistributedSuite.scala +++ b/core/src/test/scala/org/apache/spark/DistributedSuite.scala @@ -223,7 +223,7 @@ class DistributedSuite extends SparkFunSuite with Matchers with LocalSparkContex test("compute when only some partitions fit in memory") { val size = 1 -val numPartitions = 10 +val numPartitions = 20 val conf = new SparkConf() .set("spark.storage.unrollMemoryThreshold", "1024") .set("spark.testing.memory", size.toString) http://git-wip-us.apache.org/repos/asf/spark/blob/457126e4/docs/configuration.md -- diff
spark git commit: [SPARK-15796][CORE] Reduce spark.memory.fraction default to avoid overrunning old gen in JVM default config
Repository: spark Updated Branches: refs/heads/branch-2.0 579268426 -> 095ddb4c9 [SPARK-15796][CORE] Reduce spark.memory.fraction default to avoid overrunning old gen in JVM default config ## What changes were proposed in this pull request? Reduce `spark.memory.fraction` default to 0.6 in order to make it fit within default JVM old generation size (2/3 heap). See JIRA discussion. This means a full cache doesn't spill into the new gen. CC andrewor14 ## How was this patch tested? Jenkins tests. Author: Sean Owen Closes #13618 from srowen/SPARK-15796. (cherry picked from commit 457126e420e66228cc68def4bc3d87e7a282069a) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/095ddb4c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/095ddb4c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/095ddb4c Branch: refs/heads/branch-2.0 Commit: 095ddb4c9e7ab9193c15c69eb057a9bb2dbdaed1 Parents: 5792684 Author: Sean Owen Authored: Thu Jun 16 23:04:10 2016 +0200 Committer: Sean Owen Committed: Thu Jun 16 23:04:19 2016 +0200 -- .../spark/memory/UnifiedMemoryManager.scala | 8 .../scala/org/apache/spark/DistributedSuite.scala | 2 +- docs/configuration.md | 7 --- docs/tuning.md| 18 +- 4 files changed, 26 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/095ddb4c/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala -- diff --git a/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala b/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala index ae747c1..c7b36be 100644 --- a/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala +++ b/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala @@ -25,9 +25,9 @@ import org.apache.spark.storage.BlockId * either side can borrow memory from the other. * * The region shared between execution and storage is a fraction of (the total heap space - 300MB) - * configurable through `spark.memory.fraction` (default 0.75). The position of the boundary + * configurable through `spark.memory.fraction` (default 0.6). The position of the boundary * within this space is further determined by `spark.memory.storageFraction` (default 0.5). - * This means the size of the storage region is 0.75 * 0.5 = 0.375 of the heap space by default. + * This means the size of the storage region is 0.6 * 0.5 = 0.3 of the heap space by default. * * Storage can borrow as much execution memory as is free until execution reclaims its space. * When this happens, cached blocks will be evicted from memory until sufficient borrowed @@ -187,7 +187,7 @@ object UnifiedMemoryManager { // Set aside a fixed amount of memory for non-storage, non-execution purposes. // This serves a function similar to `spark.memory.fraction`, but guarantees that we reserve // sufficient memory for the system even for small heaps. E.g. if we have a 1GB JVM, then - // the memory used for execution and storage will be (1024 - 300) * 0.75 = 543MB by default. + // the memory used for execution and storage will be (1024 - 300) * 0.6 = 434MB by default. private val RESERVED_SYSTEM_MEMORY_BYTES = 300 * 1024 * 1024 def apply(conf: SparkConf, numCores: Int): UnifiedMemoryManager = { @@ -223,7 +223,7 @@ object UnifiedMemoryManager { } } val usableMemory = systemMemory - reservedMemory -val memoryFraction = conf.getDouble("spark.memory.fraction", 0.75) +val memoryFraction = conf.getDouble("spark.memory.fraction", 0.6) (usableMemory * memoryFraction).toLong } } http://git-wip-us.apache.org/repos/asf/spark/blob/095ddb4c/core/src/test/scala/org/apache/spark/DistributedSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/DistributedSuite.scala b/core/src/test/scala/org/apache/spark/DistributedSuite.scala index 6e69fc4..0515e6e 100644 --- a/core/src/test/scala/org/apache/spark/DistributedSuite.scala +++ b/core/src/test/scala/org/apache/spark/DistributedSuite.scala @@ -223,7 +223,7 @@ class DistributedSuite extends SparkFunSuite with Matchers with LocalSparkContex test("compute when only some partitions fit in memory") { val size = 1 -val numPartitions = 10 +val numPartitions = 20 val conf = new SparkConf() .set("spark.storage.unrollMemoryThreshold", "1024") .set("spark.testing.memory", size.toString) http://git-w
spark git commit: [SPARK-15942][REPL] Unblock `:reset` command in REPL.
Repository: spark Updated Branches: refs/heads/master 001a58960 -> 1b3a9b966 [SPARK-15942][REPL] Unblock `:reset` command in REPL. ## What changes were proposed in this pull (Paste from JIRA issue.) As a follow up for SPARK-15697, I have following semantics for `:reset` command. On `:reset` we forget all that user has done but not the initialization of spark. To avoid confusion or make it more clear, we show the message `spark` and `sc` are not erased, infact they are in same state as they were left by previous operations done by the user. While doing above, somewhere I felt that this is not usually what reset means. But an accidental shutdown of a cluster can be very costly, so may be in that sense this is less surprising and still useful. ## How was this patch tested? Manually, by calling `:reset` command, by both altering the state of SparkContext and creating some local variables. Author: Prashant Sharma Author: Prashant Sharma Closes #13661 from ScrapCodes/repl-reset-command. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1b3a9b96 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1b3a9b96 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1b3a9b96 Branch: refs/heads/master Commit: 1b3a9b966a7813e2406dfb020e83605af22f9ef3 Parents: 001a589 Author: Prashant Sharma Authored: Sun Jun 19 20:12:00 2016 +0100 Committer: Sean Owen Committed: Sun Jun 19 20:12:00 2016 +0100 -- .../scala/org/apache/spark/repl/SparkILoop.scala| 16 ++-- .../scala/org/apache/spark/repl/ReplSuite.scala | 3 ++- 2 files changed, 16 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1b3a9b96/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala -- diff --git a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala index dcf3209..2707b08 100644 --- a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala +++ b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala @@ -36,7 +36,11 @@ class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) def initializeSpark() { intp.beQuietDuring { processLine(""" -@transient val spark = org.apache.spark.repl.Main.createSparkSession() +@transient val spark = if (org.apache.spark.repl.Main.sparkSession != null) { +org.apache.spark.repl.Main.sparkSession + } else { +org.apache.spark.repl.Main.createSparkSession() + } @transient val sc = { val _sc = spark.sparkContext _sc.uiWebUrl.foreach(webUrl => println(s"Spark context Web UI available at ${webUrl}")) @@ -50,6 +54,7 @@ class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) processLine("import spark.implicits._") processLine("import spark.sql") processLine("import org.apache.spark.sql.functions._") + replayCommandStack = Nil // remove above commands from session history. } } @@ -70,7 +75,8 @@ class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) echo("Type :help for more information.") } - private val blockedCommands = Set[String]("reset") + /** Add repl commands that needs to be blocked. e.g. reset */ + private val blockedCommands = Set[String]() /** Standard commands */ lazy val sparkStandardCommands: List[SparkILoop.this.LoopCommand] = @@ -88,6 +94,12 @@ class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) initializeSpark() super.loadFiles(settings) } + + override def resetCommand(line: String): Unit = { +super.resetCommand(line) +initializeSpark() +echo("Note that after :reset, state of SparkSession and SparkContext is unchanged.") + } } object SparkILoop { http://git-wip-us.apache.org/repos/asf/spark/blob/1b3a9b96/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala -- diff --git a/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala index 2444e93..c10db94 100644 --- a/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala +++ b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala @@ -49,7 +49,8 @@ class ReplSuite extends SparkFunSuite { val oldExecutorClasspath = System.getProperty(CONF_EXECUTOR_CLASSPATH) System.setProperty(CONF_EXECUTOR_CLASSPATH, classpath) - +Main.sparkContext = null +Main.sparkSession = null // causes recreation of SparkContext for each test. Main.conf.set("spark.master",
spark git commit: [SPARK-15942][REPL] Unblock `:reset` command in REPL.
Repository: spark Updated Branches: refs/heads/branch-2.0 dc85bd0a0 -> 2c1c337ba [SPARK-15942][REPL] Unblock `:reset` command in REPL. ## What changes were proposed in this pull (Paste from JIRA issue.) As a follow up for SPARK-15697, I have following semantics for `:reset` command. On `:reset` we forget all that user has done but not the initialization of spark. To avoid confusion or make it more clear, we show the message `spark` and `sc` are not erased, infact they are in same state as they were left by previous operations done by the user. While doing above, somewhere I felt that this is not usually what reset means. But an accidental shutdown of a cluster can be very costly, so may be in that sense this is less surprising and still useful. ## How was this patch tested? Manually, by calling `:reset` command, by both altering the state of SparkContext and creating some local variables. Author: Prashant Sharma Author: Prashant Sharma Closes #13661 from ScrapCodes/repl-reset-command. (cherry picked from commit 1b3a9b966a7813e2406dfb020e83605af22f9ef3) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2c1c337b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2c1c337b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2c1c337b Branch: refs/heads/branch-2.0 Commit: 2c1c337ba5984b9e495b4d02bf865e56fd83ab03 Parents: dc85bd0 Author: Prashant Sharma Authored: Sun Jun 19 20:12:00 2016 +0100 Committer: Sean Owen Committed: Sun Jun 19 20:12:08 2016 +0100 -- .../scala/org/apache/spark/repl/SparkILoop.scala| 16 ++-- .../scala/org/apache/spark/repl/ReplSuite.scala | 3 ++- 2 files changed, 16 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2c1c337b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala -- diff --git a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala index dcf3209..2707b08 100644 --- a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala +++ b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala @@ -36,7 +36,11 @@ class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) def initializeSpark() { intp.beQuietDuring { processLine(""" -@transient val spark = org.apache.spark.repl.Main.createSparkSession() +@transient val spark = if (org.apache.spark.repl.Main.sparkSession != null) { +org.apache.spark.repl.Main.sparkSession + } else { +org.apache.spark.repl.Main.createSparkSession() + } @transient val sc = { val _sc = spark.sparkContext _sc.uiWebUrl.foreach(webUrl => println(s"Spark context Web UI available at ${webUrl}")) @@ -50,6 +54,7 @@ class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) processLine("import spark.implicits._") processLine("import spark.sql") processLine("import org.apache.spark.sql.functions._") + replayCommandStack = Nil // remove above commands from session history. } } @@ -70,7 +75,8 @@ class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) echo("Type :help for more information.") } - private val blockedCommands = Set[String]("reset") + /** Add repl commands that needs to be blocked. e.g. reset */ + private val blockedCommands = Set[String]() /** Standard commands */ lazy val sparkStandardCommands: List[SparkILoop.this.LoopCommand] = @@ -88,6 +94,12 @@ class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) initializeSpark() super.loadFiles(settings) } + + override def resetCommand(line: String): Unit = { +super.resetCommand(line) +initializeSpark() +echo("Note that after :reset, state of SparkSession and SparkContext is unchanged.") + } } object SparkILoop { http://git-wip-us.apache.org/repos/asf/spark/blob/2c1c337b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala -- diff --git a/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala index 2444e93..c10db94 100644 --- a/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala +++ b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala @@ -49,7 +49,8 @@ class ReplSuite extends SparkFunSuite { val oldExecutorClasspath = System.getProperty(CONF_EXECUTOR_CLASSPATH) System.setProperty(CONF_EXECUTOR_CLASSPATH, classpath) - +Main.sparkContext = null +Main.spar
spark git commit: [SPARK-16040][MLLIB][DOC] spark.mllib PIC document extra line of refernece
Repository: spark Updated Branches: refs/heads/branch-2.0 2c1c337ba -> 80c6d4e3a [SPARK-16040][MLLIB][DOC] spark.mllib PIC document extra line of refernece ## What changes were proposed in this pull request? In the 2.0 document, Line "A full example that produces the experiment described in the PIC paper can be found under examples/." is redundant. There is already "Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala" in the Spark repo.". We should remove the first line, which is consistent with other documents. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Manual test Author: wm...@hotmail.com Closes #13755 from wangmiao1981/doc. (cherry picked from commit 5930d7a2e95b2fe4d470cf39546e5a12306553fe) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/80c6d4e3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/80c6d4e3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/80c6d4e3 Branch: refs/heads/branch-2.0 Commit: 80c6d4e3a49fad4dac46738fe5458641f21b96a1 Parents: 2c1c337 Author: wm...@hotmail.com Authored: Sun Jun 19 20:19:40 2016 +0100 Committer: Sean Owen Committed: Sun Jun 19 20:19:48 2016 +0100 -- docs/mllib-clustering.md | 4 1 file changed, 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/80c6d4e3/docs/mllib-clustering.md -- diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md index 6897ba4..073927c 100644 --- a/docs/mllib-clustering.md +++ b/docs/mllib-clustering.md @@ -170,10 +170,6 @@ which contains the computed clustering assignments. Refer to the [`PowerIterationClustering` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClustering) and [`PowerIterationClusteringModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClusteringModel) for details on the API. {% include_example scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala %} - -A full example that produces the experiment described in the PIC paper can be found under -[`examples/`](https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala). - - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16040][MLLIB][DOC] spark.mllib PIC document extra line of refernece
Repository: spark Updated Branches: refs/heads/master 1b3a9b966 -> 5930d7a2e [SPARK-16040][MLLIB][DOC] spark.mllib PIC document extra line of refernece ## What changes were proposed in this pull request? In the 2.0 document, Line "A full example that produces the experiment described in the PIC paper can be found under examples/." is redundant. There is already "Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala" in the Spark repo.". We should remove the first line, which is consistent with other documents. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Manual test Author: wm...@hotmail.com Closes #13755 from wangmiao1981/doc. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5930d7a2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5930d7a2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5930d7a2 Branch: refs/heads/master Commit: 5930d7a2e95b2fe4d470cf39546e5a12306553fe Parents: 1b3a9b9 Author: wm...@hotmail.com Authored: Sun Jun 19 20:19:40 2016 +0100 Committer: Sean Owen Committed: Sun Jun 19 20:19:40 2016 +0100 -- docs/mllib-clustering.md | 4 1 file changed, 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5930d7a2/docs/mllib-clustering.md -- diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md index 6897ba4..073927c 100644 --- a/docs/mllib-clustering.md +++ b/docs/mllib-clustering.md @@ -170,10 +170,6 @@ which contains the computed clustering assignments. Refer to the [`PowerIterationClustering` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClustering) and [`PowerIterationClusteringModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClusteringModel) for details on the API. {% include_example scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala %} - -A full example that produces the experiment described in the PIC paper can be found under -[`examples/`](https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala). - - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR] Closing stale pull requests.
Repository: spark Updated Branches: refs/heads/master 359c2e827 -> 92514232e [MINOR] Closing stale pull requests. Closes #13114 Closes #10187 Closes #13432 Closes #13550 Author: Sean Owen Closes #13781 from srowen/CloseStalePR. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/92514232 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/92514232 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/92514232 Branch: refs/heads/master Commit: 92514232e52af0f5f0413ed97b9571b1b9daaa90 Parents: 359c2e8 Author: Sean Owen Authored: Mon Jun 20 22:12:55 2016 +0100 Committer: Sean Owen Committed: Mon Jun 20 22:12:55 2016 +0100 -- -- - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16084][SQL] Minor comments update for "DESCRIBE" table
Repository: spark Updated Branches: refs/heads/master a58f40239 -> f3a768b7b [SPARK-16084][SQL] Minor comments update for "DESCRIBE" table ## What changes were proposed in this pull request? 1. FORMATTED is actually supported, but partition is not supported; 2. Remove parenthesis as it is not necessary just like anywhere else. ## How was this patch tested? Minor issue. I do not think it needs a test case! Author: bomeng Closes #13791 from bomeng/SPARK-16084. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f3a768b7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f3a768b7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f3a768b7 Branch: refs/heads/master Commit: f3a768b7b96f00f33d2fe4e6c0bf4acf373ad4f4 Parents: a58f402 Author: bomeng Authored: Tue Jun 21 08:51:43 2016 +0100 Committer: Sean Owen Committed: Tue Jun 21 08:51:43 2016 +0100 -- .../scala/org/apache/spark/sql/execution/SparkSqlParser.scala | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f3a768b7/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala index 154c25a..2ae8380 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala @@ -279,15 +279,15 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { * Create a [[DescribeTableCommand]] logical plan. */ override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan = withOrigin(ctx) { -// FORMATTED and columns are not supported. Return null and let the parser decide what to do -// with this (create an exception or pass it on to a different system). +// Describe partition and column are not supported yet. Return null and let the parser decide +// what to do with this (create an exception or pass it on to a different system). if (ctx.describeColName != null || ctx.partitionSpec != null) { null } else { DescribeTableCommand( visitTableIdentifier(ctx.tableIdentifier), ctx.EXTENDED != null, -ctx.FORMATTED() != null) +ctx.FORMATTED != null) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16084][SQL] Minor comments update for "DESCRIBE" table
Repository: spark Updated Branches: refs/heads/branch-2.0 0499ed961 -> 34a8e23c7 [SPARK-16084][SQL] Minor comments update for "DESCRIBE" table ## What changes were proposed in this pull request? 1. FORMATTED is actually supported, but partition is not supported; 2. Remove parenthesis as it is not necessary just like anywhere else. ## How was this patch tested? Minor issue. I do not think it needs a test case! Author: bomeng Closes #13791 from bomeng/SPARK-16084. (cherry picked from commit f3a768b7b96f00f33d2fe4e6c0bf4acf373ad4f4) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/34a8e23c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/34a8e23c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/34a8e23c Branch: refs/heads/branch-2.0 Commit: 34a8e23c739532cd2cb059d9d4e785368d6d0a98 Parents: 0499ed9 Author: bomeng Authored: Tue Jun 21 08:51:43 2016 +0100 Committer: Sean Owen Committed: Tue Jun 21 08:51:57 2016 +0100 -- .../scala/org/apache/spark/sql/execution/SparkSqlParser.scala | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/34a8e23c/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala index 154c25a..2ae8380 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala @@ -279,15 +279,15 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { * Create a [[DescribeTableCommand]] logical plan. */ override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan = withOrigin(ctx) { -// FORMATTED and columns are not supported. Return null and let the parser decide what to do -// with this (create an exception or pass it on to a different system). +// Describe partition and column are not supported yet. Return null and let the parser decide +// what to do with this (create an exception or pass it on to a different system). if (ctx.describeColName != null || ctx.partitionSpec != null) { null } else { DescribeTableCommand( visitTableIdentifier(ctx.tableIdentifier), ctx.EXTENDED != null, -ctx.FORMATTED() != null) +ctx.FORMATTED != null) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-6005][TESTS] Fix flaky test: o.a.s.streaming.kafka.DirectKafkaStreamSuite.offset recovery
Repository: spark Updated Branches: refs/heads/branch-1.6 d98fb19c1 -> 4fdac3c27 [SPARK-6005][TESTS] Fix flaky test: o.a.s.streaming.kafka.DirectKafkaStreamSuite.offset recovery ## What changes were proposed in this pull request? Because this test extracts data from `DStream.generatedRDDs` before stopping, it may get data before checkpointing. Then after recovering from the checkpoint, `recoveredOffsetRanges` may contain something not in `offsetRangesBeforeStop`, which will fail the test. Adding `Thread.sleep(1000)` before `ssc.stop()` will reproduce this failure. This PR just moves the logic of `offsetRangesBeforeStop` (also renamed to `offsetRangesAfterStop`) after `ssc.stop()` to fix the flaky test. ## How was this patch tested? Jenkins unit tests. Author: Shixiong Zhu Closes #12903 from zsxwing/SPARK-6005. (cherry picked from commit 9533f5390a3ad7ab96a7bea01cdb6aed89503a51) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4fdac3c2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4fdac3c2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4fdac3c2 Branch: refs/heads/branch-1.6 Commit: 4fdac3c271eccc5db69c45788af15e955752a163 Parents: d98fb19 Author: Shixiong Zhu Authored: Tue May 10 13:26:53 2016 -0700 Committer: Sean Owen Committed: Wed Jun 22 14:10:50 2016 +0100 -- .../kafka/DirectKafkaStreamSuite.scala | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4fdac3c2/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala -- diff --git a/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala b/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala index 02225d5..feea0ae 100644 --- a/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala +++ b/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala @@ -280,14 +280,20 @@ class DirectKafkaStreamSuite sendDataAndWaitForReceive(i) } +ssc.stop() + // Verify that offset ranges were generated -val offsetRangesBeforeStop = getOffsetRanges(kafkaStream) -assert(offsetRangesBeforeStop.size >= 1, "No offset ranges generated") +// Since "offsetRangesAfterStop" will be used to compare with "recoveredOffsetRanges", we should +// collect offset ranges after stopping. Otherwise, because new RDDs keep being generated before +// stopping, we may not be able to get the latest RDDs, then "recoveredOffsetRanges" will +// contain something not in "offsetRangesAfterStop". +val offsetRangesAfterStop = getOffsetRanges(kafkaStream) +assert(offsetRangesAfterStop.size >= 1, "No offset ranges generated") assert( - offsetRangesBeforeStop.head._2.forall { _.fromOffset === 0 }, + offsetRangesAfterStop.head._2.forall { _.fromOffset === 0 }, "starting offset not zero" ) -ssc.stop() + logInfo("== RESTARTING ") // Recover context from checkpoints @@ -297,12 +303,14 @@ class DirectKafkaStreamSuite // Verify offset ranges have been recovered val recoveredOffsetRanges = getOffsetRanges(recoveredStream) assert(recoveredOffsetRanges.size > 0, "No offset ranges recovered") -val earlierOffsetRangesAsSets = offsetRangesBeforeStop.map { x => (x._1, x._2.toSet) } +val earlierOffsetRangesAsSets = offsetRangesAfterStop.map { x => (x._1, x._2.toSet) } assert( recoveredOffsetRanges.forall { or => earlierOffsetRangesAsSets.contains((or._1, or._2.toSet)) }, - "Recovered ranges are not the same as the ones generated" + "Recovered ranges are not the same as the ones generated\n" + +s"recoveredOffsetRanges: $recoveredOffsetRanges\n" + +s"earlierOffsetRangesAsSets: $earlierOffsetRangesAsSets" ) // Restart context, give more data and verify the total at the end // If the total is write that means each records has been received only once - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15660][CORE] Update RDD `variance/stdev` description and add popVariance/popStdev
Repository: spark Updated Branches: refs/heads/master 4374a46bf -> 5eef1e6c6 [SPARK-15660][CORE] Update RDD `variance/stdev` description and add popVariance/popStdev ## What changes were proposed in this pull request? In Spark-11490, `variance/stdev` are redefined as the **sample** `variance/stdev` instead of population ones. This PR updates the other old documentations to prevent users from misunderstanding. This will update the following Scala/Java API docs. - http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.api.java.JavaDoubleRDD - http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.rdd.DoubleRDDFunctions - http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.util.StatCounter - http://spark.apache.org/docs/2.0.0-preview/api/java/org/apache/spark/api/java/JavaDoubleRDD.html - http://spark.apache.org/docs/2.0.0-preview/api/java/org/apache/spark/rdd/DoubleRDDFunctions.html - http://spark.apache.org/docs/2.0.0-preview/api/java/org/apache/spark/util/StatCounter.html Also, this PR adds them `popVariance` and `popStdev` functions clearly. ## How was this patch tested? Pass the updated Jenkins tests. Author: Dongjoon Hyun Closes #13403 from dongjoon-hyun/SPARK-15660. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5eef1e6c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5eef1e6c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5eef1e6c Branch: refs/heads/master Commit: 5eef1e6c6a8b6202fc6db4a90c4caab5169e86c6 Parents: 4374a46 Author: Dongjoon Hyun Authored: Thu Jun 23 11:07:34 2016 +0100 Committer: Sean Owen Committed: Thu Jun 23 11:07:34 2016 +0100 -- .../apache/spark/api/java/JavaDoubleRDD.scala | 17 +-- .../apache/spark/rdd/DoubleRDDFunctions.scala | 21 +-- .../org/apache/spark/util/StatCounter.scala | 22 .../java/org/apache/spark/JavaAPISuite.java | 2 ++ .../org/apache/spark/PartitioningSuite.scala| 4 5 files changed, 58 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5eef1e6c/core/src/main/scala/org/apache/spark/api/java/JavaDoubleRDD.scala -- diff --git a/core/src/main/scala/org/apache/spark/api/java/JavaDoubleRDD.scala b/core/src/main/scala/org/apache/spark/api/java/JavaDoubleRDD.scala index 0d3a523..0026fc9 100644 --- a/core/src/main/scala/org/apache/spark/api/java/JavaDoubleRDD.scala +++ b/core/src/main/scala/org/apache/spark/api/java/JavaDoubleRDD.scala @@ -22,6 +22,7 @@ import java.lang.{Double => JDouble} import scala.language.implicitConversions import scala.reflect.ClassTag +import org.apache.spark.annotation.Since import org.apache.spark.Partitioner import org.apache.spark.api.java.function.{Function => JFunction} import org.apache.spark.partial.{BoundedDouble, PartialResult} @@ -184,10 +185,10 @@ class JavaDoubleRDD(val srdd: RDD[scala.Double]) /** Compute the mean of this RDD's elements. */ def mean(): JDouble = srdd.mean() - /** Compute the variance of this RDD's elements. */ + /** Compute the population variance of this RDD's elements. */ def variance(): JDouble = srdd.variance() - /** Compute the standard deviation of this RDD's elements. */ + /** Compute the population standard deviation of this RDD's elements. */ def stdev(): JDouble = srdd.stdev() /** @@ -202,6 +203,18 @@ class JavaDoubleRDD(val srdd: RDD[scala.Double]) */ def sampleVariance(): JDouble = srdd.sampleVariance() + /** + * Compute the population standard deviation of this RDD's elements. + */ + @Since("2.1.0") + def popStdev(): JDouble = srdd.popStdev() + + /** + * Compute the population variance of this RDD's elements. + */ + @Since("2.1.0") + def popVariance(): JDouble = srdd.popVariance() + /** Return the approximate mean of the elements in this RDD. */ def meanApprox(timeout: Long, confidence: JDouble): PartialResult[BoundedDouble] = srdd.meanApprox(timeout, confidence) http://git-wip-us.apache.org/repos/asf/spark/blob/5eef1e6c/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala -- diff --git a/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala b/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala index 368916a..a05a770 100644 --- a/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala +++ b/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala @@ -17,6 +17,7 @@ package org.apache.spark.rdd +import org.apache.spark.annotation.Since import org.apache.spark.TaskContext import org.apache.spark.internal.L
spark git commit: [SPARK-16125][YARN] Fix not test yarn cluster mode correctly in YarnClusterSuite
Repository: spark Updated Branches: refs/heads/master 2d2f607bf -> f4fd7432f [SPARK-16125][YARN] Fix not test yarn cluster mode correctly in YarnClusterSuite ## What changes were proposed in this pull request? Since SPARK-13220(Deprecate "yarn-client" and "yarn-cluster"), YarnClusterSuite doesn't test "yarn cluster" mode correctly. This pull request fixes it. ## How was this patch tested? Unit test (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: peng.zhang Closes #13836 from renozhang/SPARK-16125-test-yarn-cluster-mode. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f4fd7432 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f4fd7432 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f4fd7432 Branch: refs/heads/master Commit: f4fd7432fb9cf7b197ccada1378c4f2a6d427522 Parents: 2d2f607 Author: peng.zhang Authored: Fri Jun 24 08:28:32 2016 +0100 Committer: Sean Owen Committed: Fri Jun 24 08:28:32 2016 +0100 -- core/src/test/scala/org/apache/spark/util/UtilsSuite.scala | 3 ++- python/pyspark/context.py| 4 .../src/main/scala/org/apache/spark/repl/SparkILoop.scala| 2 -- .../scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala| 2 +- 4 files changed, 3 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f4fd7432/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala index e3a8e83..df279b5 100644 --- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala +++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala @@ -754,7 +754,8 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { test("isDynamicAllocationEnabled") { val conf = new SparkConf() -conf.set("spark.master", "yarn-client") +conf.set("spark.master", "yarn") +conf.set("spark.submit.deployMode", "client") assert(Utils.isDynamicAllocationEnabled(conf) === false) assert(Utils.isDynamicAllocationEnabled( conf.set("spark.dynamicAllocation.enabled", "false")) === false) http://git-wip-us.apache.org/repos/asf/spark/blob/f4fd7432/python/pyspark/context.py -- diff --git a/python/pyspark/context.py b/python/pyspark/context.py index aec0215..7217a99 100644 --- a/python/pyspark/context.py +++ b/python/pyspark/context.py @@ -155,10 +155,6 @@ class SparkContext(object): self.appName = self._conf.get("spark.app.name") self.sparkHome = self._conf.get("spark.home", None) -# Let YARN know it's a pyspark app, so it distributes needed libraries. -if self.master == "yarn-client": -self._conf.set("spark.yarn.isPython", "true") - for (k, v) in self._conf.getAll(): if k.startswith("spark.executorEnv."): varName = k[len("spark.executorEnv."):] http://git-wip-us.apache.org/repos/asf/spark/blob/f4fd7432/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala -- diff --git a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala index 8fcab38..e871004 100644 --- a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala +++ b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala @@ -943,8 +943,6 @@ class SparkILoop( }) private def process(settings: Settings): Boolean = savingContextLoader { -if (getMaster() == "yarn-client") System.setProperty("SPARK_YARN_MODE", "true") - this.settings = settings createInterpreter() http://git-wip-us.apache.org/repos/asf/spark/blob/f4fd7432/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala -- diff --git a/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala b/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala index 4ce33e0..6b20dea 100644 --- a/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala +++ b/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala @@ -312,7 +312,7 @@ private object YarnClusterDriver extends Logging with Matchers { // If we are running in yarn-cluster mode, verify that driver logs links and present and are // in the expected format. -if (conf.get("spark.master") == "yarn-cluster") { +if (con
spark git commit: [SPARK-16125][YARN] Fix not test yarn cluster mode correctly in YarnClusterSuite
Repository: spark Updated Branches: refs/heads/branch-2.0 3ccdd6b9c -> b6420db9e [SPARK-16125][YARN] Fix not test yarn cluster mode correctly in YarnClusterSuite ## What changes were proposed in this pull request? Since SPARK-13220(Deprecate "yarn-client" and "yarn-cluster"), YarnClusterSuite doesn't test "yarn cluster" mode correctly. This pull request fixes it. ## How was this patch tested? Unit test (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: peng.zhang Closes #13836 from renozhang/SPARK-16125-test-yarn-cluster-mode. (cherry picked from commit f4fd7432fb9cf7b197ccada1378c4f2a6d427522) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b6420db9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b6420db9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b6420db9 Branch: refs/heads/branch-2.0 Commit: b6420db9ebc59c453a6a523aba68addf5762bb2c Parents: 3ccdd6b Author: peng.zhang Authored: Fri Jun 24 08:28:32 2016 +0100 Committer: Sean Owen Committed: Fri Jun 24 08:28:45 2016 +0100 -- core/src/test/scala/org/apache/spark/util/UtilsSuite.scala | 3 ++- python/pyspark/context.py| 4 .../src/main/scala/org/apache/spark/repl/SparkILoop.scala| 2 -- .../scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala| 2 +- 4 files changed, 3 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b6420db9/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala index e3a8e83..df279b5 100644 --- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala +++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala @@ -754,7 +754,8 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { test("isDynamicAllocationEnabled") { val conf = new SparkConf() -conf.set("spark.master", "yarn-client") +conf.set("spark.master", "yarn") +conf.set("spark.submit.deployMode", "client") assert(Utils.isDynamicAllocationEnabled(conf) === false) assert(Utils.isDynamicAllocationEnabled( conf.set("spark.dynamicAllocation.enabled", "false")) === false) http://git-wip-us.apache.org/repos/asf/spark/blob/b6420db9/python/pyspark/context.py -- diff --git a/python/pyspark/context.py b/python/pyspark/context.py index aec0215..7217a99 100644 --- a/python/pyspark/context.py +++ b/python/pyspark/context.py @@ -155,10 +155,6 @@ class SparkContext(object): self.appName = self._conf.get("spark.app.name") self.sparkHome = self._conf.get("spark.home", None) -# Let YARN know it's a pyspark app, so it distributes needed libraries. -if self.master == "yarn-client": -self._conf.set("spark.yarn.isPython", "true") - for (k, v) in self._conf.getAll(): if k.startswith("spark.executorEnv."): varName = k[len("spark.executorEnv."):] http://git-wip-us.apache.org/repos/asf/spark/blob/b6420db9/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala -- diff --git a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala index 8fcab38..e871004 100644 --- a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala +++ b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala @@ -943,8 +943,6 @@ class SparkILoop( }) private def process(settings: Settings): Boolean = savingContextLoader { -if (getMaster() == "yarn-client") System.setProperty("SPARK_YARN_MODE", "true") - this.settings = settings createInterpreter() http://git-wip-us.apache.org/repos/asf/spark/blob/b6420db9/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala -- diff --git a/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala b/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala index 4ce33e0..6b20dea 100644 --- a/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala +++ b/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala @@ -312,7 +312,7 @@ private object YarnClusterDriver extends Logging with Matchers { // If we are running in yarn-cluster mode, verify that driver logs links and present and
spark git commit: [SPARK-16129][CORE][SQL] Eliminate direct use of commons-lang classes in favor of commons-lang3
Repository: spark Updated Branches: refs/heads/master f4fd7432f -> 158af162e [SPARK-16129][CORE][SQL] Eliminate direct use of commons-lang classes in favor of commons-lang3 ## What changes were proposed in this pull request? Replace use of `commons-lang` in favor of `commons-lang3` and forbid the former via scalastyle; remove `NotImplementedException` from `comons-lang` in favor of JDK `UnsupportedOperationException` ## How was this patch tested? Jenkins tests Author: Sean Owen Closes #13843 from srowen/SPARK-16129. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/158af162 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/158af162 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/158af162 Branch: refs/heads/master Commit: 158af162eac7348464c6751c8acd48fc6c117688 Parents: f4fd743 Author: Sean Owen Authored: Fri Jun 24 10:35:54 2016 +0100 Committer: Sean Owen Committed: Fri Jun 24 10:35:54 2016 +0100 -- .../scala/org/apache/spark/SparkContext.scala | 5 ++-- scalastyle-config.xml | 6 + .../sql/catalyst/expressions/TimeWindow.scala | 2 +- .../spark/sql/catalyst/trees/TreeNode.scala | 2 +- .../parquet/VectorizedColumnReader.java | 25 ++-- .../sql/execution/vectorized/ColumnVector.java | 17 +++-- .../execution/vectorized/ColumnVectorUtils.java | 6 ++--- .../sql/execution/vectorized/ColumnarBatch.java | 12 -- .../spark/sql/execution/ExistingRDD.scala | 2 +- .../execution/columnar/InMemoryRelation.scala | 2 +- .../service/cli/session/HiveSessionImpl.java| 2 +- .../spark/streaming/StreamingContext.scala | 5 ++-- .../streaming/scheduler/JobScheduler.scala | 6 ++--- 13 files changed, 44 insertions(+), 48 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/158af162/core/src/main/scala/org/apache/spark/SparkContext.scala -- diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index d870181..fe15052 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -24,7 +24,6 @@ import java.util.{Arrays, Locale, Properties, ServiceLoader, UUID} import java.util.concurrent.ConcurrentMap import java.util.concurrent.atomic.{AtomicBoolean, AtomicInteger, AtomicReference} -import scala.annotation.tailrec import scala.collection.JavaConverters._ import scala.collection.Map import scala.collection.generic.Growable @@ -34,7 +33,7 @@ import scala.reflect.{classTag, ClassTag} import scala.util.control.NonFatal import com.google.common.collect.MapMaker -import org.apache.commons.lang.SerializationUtils +import org.apache.commons.lang3.SerializationUtils import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.Path import org.apache.hadoop.io.{ArrayWritable, BooleanWritable, BytesWritable, DoubleWritable, @@ -334,7 +333,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli override protected def childValue(parent: Properties): Properties = { // Note: make a clone such that changes in the parent properties aren't reflected in // the those of the children threads, which has confusing semantics (SPARK-10563). - SerializationUtils.clone(parent).asInstanceOf[Properties] + SerializationUtils.clone(parent) } override protected def initialValue(): Properties = new Properties() } http://git-wip-us.apache.org/repos/asf/spark/blob/158af162/scalastyle-config.xml -- diff --git a/scalastyle-config.xml b/scalastyle-config.xml index 270104f..9a35183 100644 --- a/scalastyle-config.xml +++ b/scalastyle-config.xml @@ -210,6 +210,12 @@ This file is divided into 3 sections: scala.collection.JavaConverters._ and use .asScala / .asJava methods + +org\.apache\.commons\.lang\. +Use Commons Lang 3 classes (package org.apache.commons.lang3.*) instead +of Commons Lang 2 (package org.apache.commons.lang.*) + + java,scala,3rdParty,spark http://git-wip-us.apache.org/repos/asf/spark/blob/158af162/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala index 83fa447..66c4bf2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWind
spark git commit: [SPARK-16129][CORE][SQL] Eliminate direct use of commons-lang classes in favor of commons-lang3
Repository: spark Updated Branches: refs/heads/branch-2.0 b6420db9e -> 201d5e8db [SPARK-16129][CORE][SQL] Eliminate direct use of commons-lang classes in favor of commons-lang3 ## What changes were proposed in this pull request? Replace use of `commons-lang` in favor of `commons-lang3` and forbid the former via scalastyle; remove `NotImplementedException` from `comons-lang` in favor of JDK `UnsupportedOperationException` ## How was this patch tested? Jenkins tests Author: Sean Owen Closes #13843 from srowen/SPARK-16129. (cherry picked from commit 158af162eac7348464c6751c8acd48fc6c117688) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/201d5e8d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/201d5e8d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/201d5e8d Branch: refs/heads/branch-2.0 Commit: 201d5e8db3fd29898a6cd69e015ca491e5721b08 Parents: b6420db Author: Sean Owen Authored: Fri Jun 24 10:35:54 2016 +0100 Committer: Sean Owen Committed: Fri Jun 24 10:36:04 2016 +0100 -- .../scala/org/apache/spark/SparkContext.scala | 5 ++-- scalastyle-config.xml | 6 + .../sql/catalyst/expressions/TimeWindow.scala | 2 +- .../spark/sql/catalyst/trees/TreeNode.scala | 2 +- .../parquet/VectorizedColumnReader.java | 25 ++-- .../sql/execution/vectorized/ColumnVector.java | 17 +++-- .../execution/vectorized/ColumnVectorUtils.java | 6 ++--- .../sql/execution/vectorized/ColumnarBatch.java | 12 -- .../spark/sql/execution/ExistingRDD.scala | 2 +- .../execution/columnar/InMemoryRelation.scala | 2 +- .../service/cli/session/HiveSessionImpl.java| 2 +- .../spark/streaming/StreamingContext.scala | 5 ++-- .../streaming/scheduler/JobScheduler.scala | 6 ++--- 13 files changed, 44 insertions(+), 48 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/201d5e8d/core/src/main/scala/org/apache/spark/SparkContext.scala -- diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index d870181..fe15052 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -24,7 +24,6 @@ import java.util.{Arrays, Locale, Properties, ServiceLoader, UUID} import java.util.concurrent.ConcurrentMap import java.util.concurrent.atomic.{AtomicBoolean, AtomicInteger, AtomicReference} -import scala.annotation.tailrec import scala.collection.JavaConverters._ import scala.collection.Map import scala.collection.generic.Growable @@ -34,7 +33,7 @@ import scala.reflect.{classTag, ClassTag} import scala.util.control.NonFatal import com.google.common.collect.MapMaker -import org.apache.commons.lang.SerializationUtils +import org.apache.commons.lang3.SerializationUtils import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.Path import org.apache.hadoop.io.{ArrayWritable, BooleanWritable, BytesWritable, DoubleWritable, @@ -334,7 +333,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli override protected def childValue(parent: Properties): Properties = { // Note: make a clone such that changes in the parent properties aren't reflected in // the those of the children threads, which has confusing semantics (SPARK-10563). - SerializationUtils.clone(parent).asInstanceOf[Properties] + SerializationUtils.clone(parent) } override protected def initialValue(): Properties = new Properties() } http://git-wip-us.apache.org/repos/asf/spark/blob/201d5e8d/scalastyle-config.xml -- diff --git a/scalastyle-config.xml b/scalastyle-config.xml index 270104f..9a35183 100644 --- a/scalastyle-config.xml +++ b/scalastyle-config.xml @@ -210,6 +210,12 @@ This file is divided into 3 sections: scala.collection.JavaConverters._ and use .asScala / .asJava methods + +org\.apache\.commons\.lang\. +Use Commons Lang 3 classes (package org.apache.commons.lang3.*) instead +of Commons Lang 2 (package org.apache.commons.lang.*) + + java,scala,3rdParty,spark http://git-wip-us.apache.org/repos/asf/spark/blob/201d5e8d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala index
spark git commit: [MLLIB] org.apache.spark.mllib.util.SVMDataGenerator generates ArrayIndexOutOfBoundsException. I have found the bug and tested the solution.
Repository: spark Updated Branches: refs/heads/master a7d29499d -> a3c7b4187 [MLLIB] org.apache.spark.mllib.util.SVMDataGenerator generates ArrayIndexOutOfBoundsException. I have found the bug and tested the solution. ## What changes were proposed in this pull request? Just adjust the size of an array in line 58 so it does not cause an ArrayOutOfBoundsException in line 66. ## How was this patch tested? Manual tests. I have recompiled the entire project with the fix, it has been built successfully and I have run the code, also with good results. line 66: val yD = blas.ddot(trueWeights.length, x, 1, trueWeights, 1) + rnd.nextGaussian() * 0.1 crashes because trueWeights has length "nfeatures + 1" while "x" has length "features", and they should have the same length. To fix this just make trueWeights be the same length as x. I have recompiled the project with the change and it is working now: [spark-1.6.1]$ spark-submit --master local[*] --class org.apache.spark.mllib.util.SVMDataGenerator mllib/target/spark-mllib_2.11-1.6.1.jar local /home/user/test And it generates the data successfully now in the specified folder. Author: José Antonio Closes #13895 from j4munoz/patch-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a3c7b418 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a3c7b418 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a3c7b418 Branch: refs/heads/master Commit: a3c7b4187bad00dad87df7e3b5929a44d29568ed Parents: a7d2949 Author: José Antonio Authored: Sat Jun 25 09:11:25 2016 +0100 Committer: Sean Owen Committed: Sat Jun 25 09:11:25 2016 +0100 -- .../main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a3c7b418/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala b/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala index cde5979..c946860 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala @@ -55,7 +55,7 @@ object SVMDataGenerator { val sc = new SparkContext(sparkMaster, "SVMGenerator") val globalRnd = new Random(94720) -val trueWeights = Array.fill[Double](nfeatures + 1)(globalRnd.nextGaussian()) +val trueWeights = Array.fill[Double](nfeatures)(globalRnd.nextGaussian()) val data: RDD[LabeledPoint] = sc.parallelize(0 until nexamples, parts).map { idx => val rnd = new Random(42 + idx) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MLLIB] org.apache.spark.mllib.util.SVMDataGenerator generates ArrayIndexOutOfBoundsException. I have found the bug and tested the solution.
Repository: spark Updated Branches: refs/heads/branch-2.0 d079b5de7 -> cbfcdcfb6 [MLLIB] org.apache.spark.mllib.util.SVMDataGenerator generates ArrayIndexOutOfBoundsException. I have found the bug and tested the solution. ## What changes were proposed in this pull request? Just adjust the size of an array in line 58 so it does not cause an ArrayOutOfBoundsException in line 66. ## How was this patch tested? Manual tests. I have recompiled the entire project with the fix, it has been built successfully and I have run the code, also with good results. line 66: val yD = blas.ddot(trueWeights.length, x, 1, trueWeights, 1) + rnd.nextGaussian() * 0.1 crashes because trueWeights has length "nfeatures + 1" while "x" has length "features", and they should have the same length. To fix this just make trueWeights be the same length as x. I have recompiled the project with the change and it is working now: [spark-1.6.1]$ spark-submit --master local[*] --class org.apache.spark.mllib.util.SVMDataGenerator mllib/target/spark-mllib_2.11-1.6.1.jar local /home/user/test And it generates the data successfully now in the specified folder. Author: José Antonio Closes #13895 from j4munoz/patch-2. (cherry picked from commit a3c7b4187bad00dad87df7e3b5929a44d29568ed) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cbfcdcfb Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cbfcdcfb Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cbfcdcfb Branch: refs/heads/branch-2.0 Commit: cbfcdcfb60d41126e17cddda52922d6058f1a401 Parents: d079b5d Author: José Antonio Authored: Sat Jun 25 09:11:25 2016 +0100 Committer: Sean Owen Committed: Sat Jun 25 09:11:35 2016 +0100 -- .../main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/cbfcdcfb/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala b/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala index cde5979..c946860 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala @@ -55,7 +55,7 @@ object SVMDataGenerator { val sc = new SparkContext(sparkMaster, "SVMGenerator") val globalRnd = new Random(94720) -val trueWeights = Array.fill[Double](nfeatures + 1)(globalRnd.nextGaussian()) +val trueWeights = Array.fill[Double](nfeatures)(globalRnd.nextGaussian()) val data: RDD[LabeledPoint] = sc.parallelize(0 until nexamples, parts).map { idx => val rnd = new Random(42 + idx) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MLLIB] org.apache.spark.mllib.util.SVMDataGenerator generates ArrayIndexOutOfBoundsException. I have found the bug and tested the solution.
Repository: spark Updated Branches: refs/heads/branch-1.6 b7acc1b71 -> 24d59fb64 [MLLIB] org.apache.spark.mllib.util.SVMDataGenerator generates ArrayIndexOutOfBoundsException. I have found the bug and tested the solution. ## What changes were proposed in this pull request? Just adjust the size of an array in line 58 so it does not cause an ArrayOutOfBoundsException in line 66. ## How was this patch tested? Manual tests. I have recompiled the entire project with the fix, it has been built successfully and I have run the code, also with good results. line 66: val yD = blas.ddot(trueWeights.length, x, 1, trueWeights, 1) + rnd.nextGaussian() * 0.1 crashes because trueWeights has length "nfeatures + 1" while "x" has length "features", and they should have the same length. To fix this just make trueWeights be the same length as x. I have recompiled the project with the change and it is working now: [spark-1.6.1]$ spark-submit --master local[*] --class org.apache.spark.mllib.util.SVMDataGenerator mllib/target/spark-mllib_2.11-1.6.1.jar local /home/user/test And it generates the data successfully now in the specified folder. Author: José Antonio Closes #13895 from j4munoz/patch-2. (cherry picked from commit a3c7b4187bad00dad87df7e3b5929a44d29568ed) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/24d59fb6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/24d59fb6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/24d59fb6 Branch: refs/heads/branch-1.6 Commit: 24d59fb64770fb8951794df9ee6398329838359a Parents: b7acc1b Author: José Antonio Authored: Sat Jun 25 09:11:25 2016 +0100 Committer: Sean Owen Committed: Sat Jun 25 09:11:47 2016 +0100 -- .../main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/24d59fb6/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala b/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala index cde5979..c946860 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala @@ -55,7 +55,7 @@ object SVMDataGenerator { val sc = new SparkContext(sparkMaster, "SVMGenerator") val globalRnd = new Random(94720) -val trueWeights = Array.fill[Double](nfeatures + 1)(globalRnd.nextGaussian()) +val trueWeights = Array.fill[Double](nfeatures)(globalRnd.nextGaussian()) val data: RDD[LabeledPoint] = sc.parallelize(0 until nexamples, parts).map { idx => val rnd = new Random(42 + idx) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15958] Make initial buffer size for the Sorter configurable
Repository: spark Updated Branches: refs/heads/master a3c7b4187 -> bf665a958 [SPARK-15958] Make initial buffer size for the Sorter configurable ## What changes were proposed in this pull request? Currently the initial buffer size in the sorter is hard coded inside the code and is too small for large workload. As a result, the sorter spends significant time expanding the buffer size and copying the data. It would be useful to have it configurable. ## How was this patch tested? Tested by running a job on the cluster. Author: Sital Kedia Closes #13699 from sitalkedia/config_sort_buffer_upstream. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bf665a95 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bf665a95 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bf665a95 Branch: refs/heads/master Commit: bf665a958631125a1670504ef5966ef1a0e14798 Parents: a3c7b41 Author: Sital Kedia Authored: Sat Jun 25 09:13:39 2016 +0100 Committer: Sean Owen Committed: Sat Jun 25 09:13:39 2016 +0100 -- .../org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java| 7 +-- .../apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java | 4 ++-- .../apache/spark/sql/execution/UnsafeExternalRowSorter.java | 4 +++- .../apache/spark/sql/execution/UnsafeKVExternalSorter.java| 7 +-- 4 files changed, 15 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/bf665a95/core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java -- diff --git a/core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java b/core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java index daa63d4..05fa04c 100644 --- a/core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java +++ b/core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java @@ -61,7 +61,7 @@ public class UnsafeShuffleWriter extends ShuffleWriter { private static final ClassTag OBJECT_CLASS_TAG = ClassTag$.MODULE$.Object(); @VisibleForTesting - static final int INITIAL_SORT_BUFFER_SIZE = 4096; + static final int DEFAULT_INITIAL_SORT_BUFFER_SIZE = 4096; private final BlockManager blockManager; private final IndexShuffleBlockResolver shuffleBlockResolver; @@ -74,6 +74,7 @@ public class UnsafeShuffleWriter extends ShuffleWriter { private final TaskContext taskContext; private final SparkConf sparkConf; private final boolean transferToEnabled; + private final int initialSortBufferSize; @Nullable private MapStatus mapStatus; @Nullable private ShuffleExternalSorter sorter; @@ -122,6 +123,8 @@ public class UnsafeShuffleWriter extends ShuffleWriter { this.taskContext = taskContext; this.sparkConf = sparkConf; this.transferToEnabled = sparkConf.getBoolean("spark.file.transferTo", true); +this.initialSortBufferSize = sparkConf.getInt("spark.shuffle.sort.initialBufferSize", + DEFAULT_INITIAL_SORT_BUFFER_SIZE); open(); } @@ -187,7 +190,7 @@ public class UnsafeShuffleWriter extends ShuffleWriter { memoryManager, blockManager, taskContext, - INITIAL_SORT_BUFFER_SIZE, + initialSortBufferSize, partitioner.numPartitions(), sparkConf, writeMetrics); http://git-wip-us.apache.org/repos/asf/spark/blob/bf665a95/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java -- diff --git a/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java b/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java index 7dd61f8..daeb467 100644 --- a/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java +++ b/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java @@ -413,10 +413,10 @@ public class UnsafeShuffleWriterSuite { } private void writeEnoughRecordsToTriggerSortBufferExpansionAndSpill() throws Exception { -memoryManager.limit(UnsafeShuffleWriter.INITIAL_SORT_BUFFER_SIZE * 16); +memoryManager.limit(UnsafeShuffleWriter.DEFAULT_INITIAL_SORT_BUFFER_SIZE * 16); final UnsafeShuffleWriter writer = createWriter(false); final ArrayList> dataToWrite = new ArrayList<>(); -for (int i = 0; i < UnsafeShuffleWriter.INITIAL_SORT_BUFFER_SIZE + 1; i++) { +for (int i = 0; i < UnsafeShuffleWriter.DEFAULT_INITIAL_SORT_BUFFER_SIZE + 1; i++) { dataToWrite.add(new Tuple2(i, i)); } writer.write(dataToWrite.iterator()); http://git-wip-us.apache.org/repos/asf/spark/blob/bf665a95/sql/catalyst/src/main/java/org/apache/spark/sq
spark git commit: [SPARK-1301][WEB UI] Added anchor links to Accumulators and Tasks on StagePage
Repository: spark Updated Branches: refs/heads/master bf665a958 -> 3ee9695d1 [SPARK-1301][WEB UI] Added anchor links to Accumulators and Tasks on StagePage ## What changes were proposed in this pull request? Sometimes the "Aggregated Metrics by Executor" table on the Stage page can get very long so actor links to the Accumulators and Tasks tables below it have been added to the summary at the top of the page. This has been done in the same way as the Jobs and Stages pages. Note: the Accumulators link only displays when the table exists. ## How was this patch tested? Manually Tested and dev/run-tests ![justtasks](https://cloud.githubusercontent.com/assets/13952758/15165269/6e8efe8c-16c9-11e6-9784-cffe966fdcf0.png) ![withaccumulators](https://cloud.githubusercontent.com/assets/13952758/15165270/7019ec9e-16c9-11e6-8649-db69ed7a317d.png) Author: Alex Bozarth Closes #13037 from ajbozarth/spark1301. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3ee9695d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3ee9695d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3ee9695d Branch: refs/heads/master Commit: 3ee9695d1fcf3750cbf7896a56f8a1ba93f4e82f Parents: bf665a9 Author: Alex Bozarth Authored: Sat Jun 25 09:27:22 2016 +0100 Committer: Sean Owen Committed: Sat Jun 25 09:27:22 2016 +0100 -- .../org/apache/spark/ui/static/webui.css| 4 +- .../org/apache/spark/ui/static/webui.js | 47 .../scala/org/apache/spark/ui/UIUtils.scala | 1 + .../org/apache/spark/ui/jobs/StagePage.scala| 16 ++- 4 files changed, 64 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3ee9695d/core/src/main/resources/org/apache/spark/ui/static/webui.css -- diff --git a/core/src/main/resources/org/apache/spark/ui/static/webui.css b/core/src/main/resources/org/apache/spark/ui/static/webui.css index 595e80a..b157f3e 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/webui.css +++ b/core/src/main/resources/org/apache/spark/ui/static/webui.css @@ -155,7 +155,7 @@ pre { display: none; } -span.expand-additional-metrics, span.expand-dag-viz { +span.expand-additional-metrics, span.expand-dag-viz, span.collapse-table { cursor: pointer; } @@ -163,7 +163,7 @@ span.additional-metric-title { cursor: pointer; } -.additional-metrics.collapsed { +.additional-metrics.collapsed, .collapsible-table.collapsed { display: none; } http://git-wip-us.apache.org/repos/asf/spark/blob/3ee9695d/core/src/main/resources/org/apache/spark/ui/static/webui.js -- diff --git a/core/src/main/resources/org/apache/spark/ui/static/webui.js b/core/src/main/resources/org/apache/spark/ui/static/webui.js new file mode 100644 index 000..e37307a --- /dev/null +++ b/core/src/main/resources/org/apache/spark/ui/static/webui.js @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +function collapseTablePageLoad(name, table){ + if (window.localStorage.getItem(name) == "true") { +// Set it to false so that the click function can revert it +window.localStorage.setItem(name, "false"); +collapseTable(name, table); + } +} + +function collapseTable(thisName, table){ +var status = window.localStorage.getItem(thisName) == "true"; +status = !status; + +thisClass = '.' + thisName + +// Expand the list of additional metrics. +var tableDiv = $(thisClass).parent().find('.' + table); +$(tableDiv).toggleClass('collapsed'); + +// Switch the class of the arrow from open to closed. +$(thisClass).find('.collapse-table-arrow').toggleClass('arrow-open'); +$(thisClass).find('.collapse-table-arrow').toggleClass('arrow-closed'); + +window.localStorage.setItem(thisName, "" + status); +} + +// Add a call to collapseTablePageLoad() on each collapsible table +// to remember if it's collapsed on each page re
spark git commit: [SPARK-16193][TESTS] Address flaky ExternalAppendOnlyMapSuite spilling tests
Repository: spark Updated Branches: refs/heads/branch-2.0 cbfcdcfb6 -> b03b0976f [SPARK-16193][TESTS] Address flaky ExternalAppendOnlyMapSuite spilling tests ## What changes were proposed in this pull request? Make spill tests wait until job has completed before returning the number of stages that spilled ## How was this patch tested? Existing Jenkins tests. Author: Sean Owen Closes #13896 from srowen/SPARK-16193. (cherry picked from commit e87741589a24821b5fe73e5d9ee2164247998580) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b03b0976 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b03b0976 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b03b0976 Branch: refs/heads/branch-2.0 Commit: b03b0976fac878bf7e5d1721441179a4d4d9c317 Parents: cbfcdcf Author: Sean Owen Authored: Sat Jun 25 12:14:14 2016 +0100 Committer: Sean Owen Committed: Sat Jun 25 12:14:24 2016 +0100 -- core/src/main/scala/org/apache/spark/TestUtils.scala | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b03b0976/core/src/main/scala/org/apache/spark/TestUtils.scala -- diff --git a/core/src/main/scala/org/apache/spark/TestUtils.scala b/core/src/main/scala/org/apache/spark/TestUtils.scala index 43c89b2..871b9d1 100644 --- a/core/src/main/scala/org/apache/spark/TestUtils.scala +++ b/core/src/main/scala/org/apache/spark/TestUtils.scala @@ -22,6 +22,7 @@ import java.net.{URI, URL} import java.nio.charset.StandardCharsets import java.nio.file.Paths import java.util.Arrays +import java.util.concurrent.{CountDownLatch, TimeUnit} import java.util.jar.{JarEntry, JarOutputStream} import scala.collection.JavaConverters._ @@ -190,8 +191,14 @@ private[spark] object TestUtils { private class SpillListener extends SparkListener { private val stageIdToTaskMetrics = new mutable.HashMap[Int, ArrayBuffer[TaskMetrics]] private val spilledStageIds = new mutable.HashSet[Int] + private val stagesDone = new CountDownLatch(1) - def numSpilledStages: Int = spilledStageIds.size + def numSpilledStages: Int = { +// Long timeout, just in case somehow the job end isn't notified. +// Fails if a timeout occurs +assert(stagesDone.await(10, TimeUnit.SECONDS)) +spilledStageIds.size + } override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = { stageIdToTaskMetrics.getOrElseUpdate( @@ -206,4 +213,8 @@ private class SpillListener extends SparkListener { spilledStageIds += stageId } } + + override def onJobEnd(jobEnd: SparkListenerJobEnd): Unit = { +stagesDone.countDown() + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16193][TESTS] Address flaky ExternalAppendOnlyMapSuite spilling tests
Repository: spark Updated Branches: refs/heads/branch-1.6 24d59fb64 -> 60e095b9b [SPARK-16193][TESTS] Address flaky ExternalAppendOnlyMapSuite spilling tests ## What changes were proposed in this pull request? Make spill tests wait until job has completed before returning the number of stages that spilled ## How was this patch tested? Existing Jenkins tests. Author: Sean Owen Closes #13896 from srowen/SPARK-16193. (cherry picked from commit e87741589a24821b5fe73e5d9ee2164247998580) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/60e095b9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/60e095b9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/60e095b9 Branch: refs/heads/branch-1.6 Commit: 60e095b9bea3caa3e9d1e768d116f911a048d8ec Parents: 24d59fb Author: Sean Owen Authored: Sat Jun 25 12:14:14 2016 +0100 Committer: Sean Owen Committed: Sat Jun 25 12:14:40 2016 +0100 -- core/src/main/scala/org/apache/spark/TestUtils.scala | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/60e095b9/core/src/main/scala/org/apache/spark/TestUtils.scala -- diff --git a/core/src/main/scala/org/apache/spark/TestUtils.scala b/core/src/main/scala/org/apache/spark/TestUtils.scala index 43c89b2..871b9d1 100644 --- a/core/src/main/scala/org/apache/spark/TestUtils.scala +++ b/core/src/main/scala/org/apache/spark/TestUtils.scala @@ -22,6 +22,7 @@ import java.net.{URI, URL} import java.nio.charset.StandardCharsets import java.nio.file.Paths import java.util.Arrays +import java.util.concurrent.{CountDownLatch, TimeUnit} import java.util.jar.{JarEntry, JarOutputStream} import scala.collection.JavaConverters._ @@ -190,8 +191,14 @@ private[spark] object TestUtils { private class SpillListener extends SparkListener { private val stageIdToTaskMetrics = new mutable.HashMap[Int, ArrayBuffer[TaskMetrics]] private val spilledStageIds = new mutable.HashSet[Int] + private val stagesDone = new CountDownLatch(1) - def numSpilledStages: Int = spilledStageIds.size + def numSpilledStages: Int = { +// Long timeout, just in case somehow the job end isn't notified. +// Fails if a timeout occurs +assert(stagesDone.await(10, TimeUnit.SECONDS)) +spilledStageIds.size + } override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = { stageIdToTaskMetrics.getOrElseUpdate( @@ -206,4 +213,8 @@ private class SpillListener extends SparkListener { spilledStageIds += stageId } } + + override def onJobEnd(jobEnd: SparkListenerJobEnd): Unit = { +stagesDone.countDown() + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16193][TESTS] Address flaky ExternalAppendOnlyMapSuite spilling tests
Repository: spark Updated Branches: refs/heads/master 3ee9695d1 -> e87741589 [SPARK-16193][TESTS] Address flaky ExternalAppendOnlyMapSuite spilling tests ## What changes were proposed in this pull request? Make spill tests wait until job has completed before returning the number of stages that spilled ## How was this patch tested? Existing Jenkins tests. Author: Sean Owen Closes #13896 from srowen/SPARK-16193. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e8774158 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e8774158 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e8774158 Branch: refs/heads/master Commit: e87741589a24821b5fe73e5d9ee2164247998580 Parents: 3ee9695 Author: Sean Owen Authored: Sat Jun 25 12:14:14 2016 +0100 Committer: Sean Owen Committed: Sat Jun 25 12:14:14 2016 +0100 -- core/src/main/scala/org/apache/spark/TestUtils.scala | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e8774158/core/src/main/scala/org/apache/spark/TestUtils.scala -- diff --git a/core/src/main/scala/org/apache/spark/TestUtils.scala b/core/src/main/scala/org/apache/spark/TestUtils.scala index 43c89b2..871b9d1 100644 --- a/core/src/main/scala/org/apache/spark/TestUtils.scala +++ b/core/src/main/scala/org/apache/spark/TestUtils.scala @@ -22,6 +22,7 @@ import java.net.{URI, URL} import java.nio.charset.StandardCharsets import java.nio.file.Paths import java.util.Arrays +import java.util.concurrent.{CountDownLatch, TimeUnit} import java.util.jar.{JarEntry, JarOutputStream} import scala.collection.JavaConverters._ @@ -190,8 +191,14 @@ private[spark] object TestUtils { private class SpillListener extends SparkListener { private val stageIdToTaskMetrics = new mutable.HashMap[Int, ArrayBuffer[TaskMetrics]] private val spilledStageIds = new mutable.HashSet[Int] + private val stagesDone = new CountDownLatch(1) - def numSpilledStages: Int = spilledStageIds.size + def numSpilledStages: Int = { +// Long timeout, just in case somehow the job end isn't notified. +// Fails if a timeout occurs +assert(stagesDone.await(10, TimeUnit.SECONDS)) +spilledStageIds.size + } override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = { stageIdToTaskMetrics.getOrElseUpdate( @@ -206,4 +213,8 @@ private class SpillListener extends SparkListener { spilledStageIds += stageId } } + + override def onJobEnd(jobEnd: SparkListenerJobEnd): Unit = { +stagesDone.countDown() + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16214][EXAMPLES] fix the denominator of SparkPi
Repository: spark Updated Branches: refs/heads/branch-2.0 e01776395 -> efce6e17c [SPARK-16214][EXAMPLES] fix the denominator of SparkPi ## What changes were proposed in this pull request? reduce the denominator of SparkPi by 1 ## How was this patch tested? integration tests Author: æ¨æµ© Closes #13910 from yanghaogn/patch-1. (cherry picked from commit b452026324da20f76f7d8b78e5ba1c007712e585) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/efce6e17 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/efce6e17 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/efce6e17 Branch: refs/heads/branch-2.0 Commit: efce6e17c3a7c2c63b9d40bd02fe4f4fec4085bd Parents: e017763 Author: æ¨æµ© Authored: Mon Jun 27 08:31:52 2016 +0100 Committer: Sean Owen Committed: Mon Jun 27 08:32:01 2016 +0100 -- examples/src/main/scala/org/apache/spark/examples/SparkPi.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/efce6e17/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala -- diff --git a/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala b/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala index 42f6cef..272c1a4 100644 --- a/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala +++ b/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala @@ -36,7 +36,7 @@ object SparkPi { val y = random * 2 - 1 if (x*x + y*y < 1) 1 else 0 }.reduce(_ + _) -println("Pi is roughly " + 4.0 * count / n) +println("Pi is roughly " + 4.0 * count / (n - 1)) spark.stop() } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16214][EXAMPLES] fix the denominator of SparkPi
Repository: spark Updated Branches: refs/heads/master 30b182bcc -> b45202632 [SPARK-16214][EXAMPLES] fix the denominator of SparkPi ## What changes were proposed in this pull request? reduce the denominator of SparkPi by 1 ## How was this patch tested? integration tests Author: æ¨æµ© Closes #13910 from yanghaogn/patch-1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b4520263 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b4520263 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b4520263 Branch: refs/heads/master Commit: b452026324da20f76f7d8b78e5ba1c007712e585 Parents: 30b182b Author: æ¨æµ© Authored: Mon Jun 27 08:31:52 2016 +0100 Committer: Sean Owen Committed: Mon Jun 27 08:31:52 2016 +0100 -- examples/src/main/scala/org/apache/spark/examples/SparkPi.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b4520263/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala -- diff --git a/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala b/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala index 42f6cef..272c1a4 100644 --- a/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala +++ b/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala @@ -36,7 +36,7 @@ object SparkPi { val y = random * 2 - 1 if (x*x + y*y < 1) 1 else 0 }.reduce(_ + _) -println("Pi is roughly " + 4.0 * count / n) +println("Pi is roughly " + 4.0 * count / (n - 1)) spark.stop() } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16214][EXAMPLES] fix the denominator of SparkPi
Repository: spark Updated Branches: refs/heads/branch-1.6 60e095b9b -> 22a496d2a [SPARK-16214][EXAMPLES] fix the denominator of SparkPi ## What changes were proposed in this pull request? reduce the denominator of SparkPi by 1 ## How was this patch tested? integration tests Author: æ¨æµ© Closes #13910 from yanghaogn/patch-1. (cherry picked from commit b452026324da20f76f7d8b78e5ba1c007712e585) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/22a496d2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/22a496d2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/22a496d2 Branch: refs/heads/branch-1.6 Commit: 22a496d2a12e24f97977d324c38f5aa6ff260588 Parents: 60e095b Author: æ¨æµ© Authored: Mon Jun 27 08:31:52 2016 +0100 Committer: Sean Owen Committed: Mon Jun 27 08:32:12 2016 +0100 -- examples/src/main/scala/org/apache/spark/examples/SparkPi.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/22a496d2/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala -- diff --git a/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala b/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala index 818d4f2..ead8f46 100644 --- a/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala +++ b/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala @@ -34,7 +34,7 @@ object SparkPi { val y = random * 2 - 1 if (x*x + y*y < 1) 1 else 0 }.reduce(_ + _) -println("Pi is roughly " + 4.0 * count / n) +println("Pi is roughly " + 4.0 * count / (n - 1)) spark.stop() } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][CORE] Fix display wrong free memory size in the log
Repository: spark Updated Branches: refs/heads/master b45202632 -> 52d4fe057 [MINOR][CORE] Fix display wrong free memory size in the log ## What changes were proposed in this pull request? Free memory size displayed in the log is wrong (used memory), fix to make it correct. ## How was this patch tested? N/A Author: jerryshao Closes #13804 from jerryshao/memory-log-fix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/52d4fe05 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/52d4fe05 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/52d4fe05 Branch: refs/heads/master Commit: 52d4fe057909e8d431ae36f538dc4cafb351cdb5 Parents: b452026 Author: jerryshao Authored: Mon Jun 27 09:23:58 2016 +0100 Committer: Sean Owen Committed: Mon Jun 27 09:23:58 2016 +0100 -- .../main/scala/org/apache/spark/storage/memory/MemoryStore.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/52d4fe05/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala -- diff --git a/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala b/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala index 99be4de..0349da0 100644 --- a/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala +++ b/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala @@ -377,7 +377,8 @@ private[spark] class MemoryStore( entries.put(blockId, entry) } logInfo("Block %s stored as bytes in memory (estimated size %s, free %s)".format( -blockId, Utils.bytesToString(entry.size), Utils.bytesToString(blocksMemoryUsed))) +blockId, Utils.bytesToString(entry.size), +Utils.bytesToString(maxMemory - blocksMemoryUsed))) Right(entry.size) } else { // We ran out of space while unrolling the values for this block - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][CORE] Fix display wrong free memory size in the log
Repository: spark Updated Branches: refs/heads/branch-2.0 efce6e17c -> ea8d419c1 [MINOR][CORE] Fix display wrong free memory size in the log ## What changes were proposed in this pull request? Free memory size displayed in the log is wrong (used memory), fix to make it correct. ## How was this patch tested? N/A Author: jerryshao Closes #13804 from jerryshao/memory-log-fix. (cherry picked from commit 52d4fe057909e8d431ae36f538dc4cafb351cdb5) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ea8d419c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ea8d419c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ea8d419c Branch: refs/heads/branch-2.0 Commit: ea8d419c106ad90f8f5b48e6bf897b0ff3f49f1f Parents: efce6e1 Author: jerryshao Authored: Mon Jun 27 09:23:58 2016 +0100 Committer: Sean Owen Committed: Mon Jun 27 09:24:06 2016 +0100 -- .../main/scala/org/apache/spark/storage/memory/MemoryStore.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ea8d419c/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala -- diff --git a/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala b/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala index 99be4de..0349da0 100644 --- a/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala +++ b/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala @@ -377,7 +377,8 @@ private[spark] class MemoryStore( entries.put(blockId, entry) } logInfo("Block %s stored as bytes in memory (estimated size %s, free %s)".format( -blockId, Utils.bytesToString(entry.size), Utils.bytesToString(blocksMemoryUsed))) +blockId, Utils.bytesToString(entry.size), +Utils.bytesToString(maxMemory - blocksMemoryUsed))) Right(entry.size) } else { // We ran out of space while unrolling the values for this block - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r1750410 [2/2] - in /spark: ./ _plugins/ mllib/ releases/_posts/ site/ site/mllib/ site/news/ site/releases/ site/sql/ site/streaming/ sql/ streaming/
Modified: spark/site/releases/spark-release-1-1-0.html URL: http://svn.apache.org/viewvc/spark/site/releases/spark-release-1-1-0.html?rev=1750410&r1=1750409&r2=1750410&view=diff == --- spark/site/releases/spark-release-1-1-0.html (original) +++ spark/site/releases/spark-release-1-1-0.html Mon Jun 27 20:31:41 2016 @@ -197,7 +197,7 @@ Spark SQL adds a number of new features and performance improvements in this release. A http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#running-the-thrift-jdbc-server";>JDBC/ODBC server allows users to connect to SparkSQL from many different applications and provides shared access to cached tables. A new module provides http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#json-datasets";>support for loading JSON data directly into Sparkâs SchemaRDD format, including automatic schema inference. Spark SQL introduces http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#other-configuration-options";>dynamic bytecode generation in this release, a technique which significantly speeds up execution for queries that perform complex expression evaluation. This release also adds support for registering Python, Scala, and Java lambda functions as UDFs, which can then be called directly in SQL. Spark 1.1 adds a http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#programmatically-specifying-the-schema";>public types API to allow users to create SchemaRDDâs from custom data sources. Finally, many optimizations have been added to the native Parquet support as well as throughout the engine. MLlib -MLlib adds several new algorithms and optimizations in this release. 1.1 introduces a https://issues.apache.org/jira/browse/SPARK-2359";>new library of statistical packages which provides exploratory analytic functions. These include stratified sampling, correlations, chi-squared tests and support for creating random datasets. This release adds utilities for feature extraction (https://issues.apache.org/jira/browse/SPARK-2510";>Word2Vec and https://issues.apache.org/jira/browse/SPARK-2511";>TF-IDF) and feature transformation (https://issues.apache.org/jira/browse/SPARK-2272";>normalization and standard scaling). Also new are support for https://issues.apache.org/jira/browse/SPARK-1553";>nonnegative matrix factorization and https://issues.apache.org/jira/browse/SPARK-1782";>SVD via Lanczos. The decision tree algorithm has been https://issues.apache.org/jira/browse/SPARK-2478";>added in Python and Java< /a>. A tree aggregation primitive has been added to help optimize many existing algorithms. Performance improves across the board in MLlib 1.1, with improvements of around 2-3X for many algorithms and up to 5X for large scale decision tree problems. +MLlib adds several new algorithms and optimizations in this release. 1.1 introduces a https://issues.apache.org/jira/browse/SPARK-2359";>new library of statistical packages which provides exploratory analytic functions. These include stratified sampling, correlations, chi-squared tests and support for creating random datasets. This release adds utilities for feature extraction (https://issues.apache.org/jira/browse/SPARK-2510";>Word2Vec and https://issues.apache.org/jira/browse/SPARK-2511";>TF-IDF) and feature transformation (https://issues.apache.org/jira/browse/SPARK-2272";>normalization and standard scaling). Also new are support for https://issues.apache.org/jira/browse/SPARK-1553";>nonnegative matrix factorization and https://issues.apache.org/jira/browse/SPARK-1782";>SVD via Lanczos. The decision tree algorithm has been https://issues.apache.org/jira/browse/SPARK-2478";>added in Python and Java< /a>. A tree aggregation primitive has been added to help optimize many existing algorithms. Performance improves across the board in MLlib 1.1, with improvements of around 2-3X for many algorithms and up to 5X for large scale decision tree problems. GraphX and Spark Streaming Spark streaming adds a new data source https://issues.apache.org/jira/browse/SPARK-1981";>Amazon Kinesis. For the Apache Flume, a new mode is supported which https://issues.apache.org/jira/browse/SPARK-1729";>pulls data from Flume, simplifying deployment and providing high availability. The first of a set of https://issues.apache.org/jira/browse/SPARK-2438";>streaming machine learning algorithms is introduced with streaming linear regression. Finally, https://issues.apache.org/jira/browse/SPARK-1341";>rate limiting has been added for streaming inputs. GraphX adds https://issues.apache.org/jira/browse/SPARK-1991";>custom storage levels for vertices and edges along with https://issues.apache.org/jira/browse/SPARK-2748";>improved numerical precision across the board. Finally, GraphX adds a new label propagation algorithm. @@ -215,7 +215,7 @@ The default value of spark.io.compression.codec is now snappy f
svn commit: r1750410 [1/2] - in /spark: ./ _plugins/ mllib/ releases/_posts/ site/ site/mllib/ site/news/ site/releases/ site/sql/ site/streaming/ sql/ streaming/
Author: srowen Date: Mon Jun 27 20:31:41 2016 New Revision: 1750410 URL: http://svn.apache.org/viewvc?rev=1750410&view=rev Log: Remove Spark site plugins (not used/working); fix jekyll build warning and one bad heading tag; remove inactive {% extra %} tag; commit current output of jekyll for consistency (mostly minor whitespace changes) Removed: spark/_plugins/ Modified: spark/_config.yml spark/index.md spark/mllib/index.md spark/releases/_posts/2016-01-04-spark-release-1-6-0.md spark/site/documentation.html spark/site/examples.html spark/site/index.html spark/site/mllib/index.html spark/site/news/index.html spark/site/news/spark-0-9-1-released.html spark/site/news/spark-0-9-2-released.html spark/site/news/spark-1-1-0-released.html spark/site/news/spark-1-2-2-released.html spark/site/news/spark-and-shark-in-the-news.html spark/site/news/spark-summit-east-2015-videos-posted.html spark/site/releases/spark-release-0-8-0.html spark/site/releases/spark-release-0-9-1.html spark/site/releases/spark-release-1-0-1.html spark/site/releases/spark-release-1-0-2.html spark/site/releases/spark-release-1-1-0.html spark/site/releases/spark-release-1-2-0.html spark/site/releases/spark-release-1-3-0.html spark/site/releases/spark-release-1-3-1.html spark/site/releases/spark-release-1-4-0.html spark/site/releases/spark-release-1-5-0.html spark/site/releases/spark-release-1-6-0.html spark/site/sql/index.html spark/site/streaming/index.html spark/sql/index.md spark/streaming/index.md Modified: spark/_config.yml URL: http://svn.apache.org/viewvc/spark/_config.yml?rev=1750410&r1=1750409&r2=1750410&view=diff == --- spark/_config.yml (original) +++ spark/_config.yml Mon Jun 27 20:31:41 2016 @@ -1,12 +1,10 @@ -# pygments option has been renamed to highlighter. -# pygments: true highlighter: pygments markdown: kramdown kramdown: entity_output: symbol permalink: none destination: site -exclude: README.md +exclude: ['README.md'] keep_files: ['docs', '.svn'] # The recommended way of viewing the website on your local machine is via jekyll using @@ -16,5 +14,3 @@ keep_files: ['docs', '.svn'] # E.g. on OS X this might be: #url: file:///Users/andyk/Development/spark/website/site/ url: / - -shark_url: http://shark.cs.berkeley.edu Modified: spark/index.md URL: http://svn.apache.org/viewvc/spark/index.md?rev=1750410&r1=1750409&r2=1750410&view=diff == --- spark/index.md (original) +++ spark/index.md Mon Jun 27 20:31:41 2016 @@ -123,9 +123,6 @@ navigation: -{% extra %} - - Community @@ -190,5 +187,3 @@ navigation: Download Apache Spark - -{% endextra %} Modified: spark/mllib/index.md URL: http://svn.apache.org/viewvc/spark/mllib/index.md?rev=1750410&r1=1750409&r2=1750410&view=diff == --- spark/mllib/index.md (original) +++ spark/mllib/index.md Mon Jun 27 20:31:41 2016 @@ -76,9 +76,6 @@ subproject: MLlib -{% extra %} - - Algorithms @@ -148,5 +145,3 @@ subproject: MLlib - -{% endextra %} Modified: spark/releases/_posts/2016-01-04-spark-release-1-6-0.md URL: http://svn.apache.org/viewvc/spark/releases/_posts/2016-01-04-spark-release-1-6-0.md?rev=1750410&r1=1750409&r2=1750410&view=diff == --- spark/releases/_posts/2016-01-04-spark-release-1-6-0.md (original) +++ spark/releases/_posts/2016-01-04-spark-release-1-6-0.md Mon Jun 27 20:31:41 2016 @@ -82,7 +82,7 @@ You can consult JIRA for the [detailed c - [SPARK-11337](https://issues.apache.org/jira/browse/SPARK-11337) **Testable example code** - Automated testing for code in user guide examples -##Deprecations +## Deprecations * In spark.mllib.clustering.KMeans, the "runs" parameter has been deprecated. * In spark.ml.classification.LogisticRegressionModel and spark.ml.regression.LinearRegressionModel, the "weights" field has been deprecated, in favor of the new name "coefficients." This helps disambiguate from instance (row) weights given to algorithms. Modified: spark/site/documentation.html URL: http://svn.apache.org/viewvc/spark/site/documentation.html?rev=1750410&r1=1750409&r2=1750410&view=diff == --- spark/site/documentation.html (original) +++ spark/site/documentation.html Mon Jun 27 20:31:41 2016 @@ -249,12 +249,13 @@ Meetup Talk Videos -In addition to the videos listed below, you can also view http://www.meetup.com/spark-users/files/";>
spark git commit: [SPARK-15858][ML] Fix calculating error by tree stack over flow prob…
Repository: spark Updated Branches: refs/heads/master 21385d02a -> 393db655c [SPARK-15858][ML] Fix calculating error by tree stack over flow prob⦠## What changes were proposed in this pull request? What changes were proposed in this pull request? Improving evaluateEachIteration function in mllib as it fails when trying to calculate error by tree for a model that has more than 500 trees ## How was this patch tested? the batch tested on productions data set (2K rows x 2K features) training a gradient boosted model without validation with 1000 maxIteration settings, then trying to produce the error by tree, the new patch was able to perform the calculation within 30 seconds, while previously it was take hours then fail. **PS**: It would be better if this PR can be cherry picked into release branches 1.6.1 and 2.0 Author: Mahmoud Rawas Author: Mahmoud Rawas Closes #13624 from mhmoudr/SPARK-15858.master. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/393db655 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/393db655 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/393db655 Branch: refs/heads/master Commit: 393db655c3c43155305fbba1b2f8c48a95f18d93 Parents: 21385d0 Author: Mahmoud Rawas Authored: Wed Jun 29 13:12:17 2016 +0100 Committer: Sean Owen Committed: Wed Jun 29 13:12:17 2016 +0100 -- .../ml/tree/impl/GradientBoostedTrees.scala | 40 ++-- .../mllib/tree/model/treeEnsembleModels.scala | 37 -- 2 files changed, 34 insertions(+), 43 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/393db655/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala b/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala index a0faff2..7bef899 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala @@ -205,31 +205,29 @@ private[spark] object GradientBoostedTrees extends Logging { case _ => data } -val numIterations = trees.length -val evaluationArray = Array.fill(numIterations)(0.0) -val localTreeWeights = treeWeights - -var predictionAndError = computeInitialPredictionAndError( - remappedData, localTreeWeights(0), trees(0), loss) - -evaluationArray(0) = predictionAndError.values.mean() - val broadcastTrees = sc.broadcast(trees) -(1 until numIterations).foreach { nTree => - predictionAndError = remappedData.zip(predictionAndError).mapPartitions { iter => -val currentTree = broadcastTrees.value(nTree) -val currentTreeWeight = localTreeWeights(nTree) -iter.map { case (point, (pred, error)) => - val newPred = updatePrediction(point.features, pred, currentTree, currentTreeWeight) - val newError = loss.computeError(newPred, point.label) - (newPred, newError) -} +val localTreeWeights = treeWeights +val treesIndices = trees.indices + +val dataCount = remappedData.count() +val evaluation = remappedData.map { point => + treesIndices.map { idx => +val prediction = broadcastTrees.value(idx) + .rootNode + .predictImpl(point.features) + .prediction +prediction * localTreeWeights(idx) } - evaluationArray(nTree) = predictionAndError.values.mean() + .scanLeft(0.0)(_ + _).drop(1) + .map(prediction => loss.computeError(prediction, point.label)) } +.aggregate(treesIndices.map(_ => 0.0))( + (aggregated, row) => treesIndices.map(idx => aggregated(idx) + row(idx)), + (a, b) => treesIndices.map(idx => a(idx) + b(idx))) +.map(_ / dataCount) -broadcastTrees.unpersist() -evaluationArray +broadcastTrees.destroy() +evaluation.toArray } /** http://git-wip-us.apache.org/repos/asf/spark/blob/393db655/mllib/src/main/scala/org/apache/spark/mllib/tree/model/treeEnsembleModels.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/tree/model/treeEnsembleModels.scala b/mllib/src/main/scala/org/apache/spark/mllib/tree/model/treeEnsembleModels.scala index f7d9b22..657ed0a 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/tree/model/treeEnsembleModels.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/tree/model/treeEnsembleModels.scala @@ -151,31 +151,24 @@ class GradientBoostedTreesModel @Since("1.2.0") ( case _ => data } -val numIterations = trees.length -val evaluationArray =
spark git commit: [SPARK-16257][BUILD] Update spark_ec2.py to support Spark 1.6.2 and 1.6.3.
Repository: spark Updated Branches: refs/heads/branch-1.6 1ac830aca -> ccc7fa357 [SPARK-16257][BUILD] Update spark_ec2.py to support Spark 1.6.2 and 1.6.3. ## What changes were proposed in this pull request? - Adds 1.6.2 and 1.6.3 as supported Spark versions within the bundled spark-ec2 script. - Makes the default Spark version 1.6.3 to keep in sync with the upcoming release. - Does not touch the newer spark-ec2 scripts in the separate amplabs repository. ## How was this patch tested? - Manual script execution: export AWS_SECRET_ACCESS_KEY=_snip_ export AWS_ACCESS_KEY_ID=_snip_ $SPARK_HOME/ec2/spark-ec2 \ --key-pair=_snip_ \ --identity-file=_snip_ \ --region=us-east-1 \ --vpc-id=_snip_ \ --slaves=1 \ --instance-type=t1.micro \ --spark-version=1.6.2 \ --hadoop-major-version=yarn \ launch test-cluster - Result: Successful creation of a 1.6.2-based Spark cluster. This contribution is my original work and I license the work to the project under the project's open source license. Author: Brian Uri Closes #13947 from briuri/branch-1.6-bug-spark-16257. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ccc7fa35 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ccc7fa35 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ccc7fa35 Branch: refs/heads/branch-1.6 Commit: ccc7fa357099e0f621cfc02448ba20d3f6fabc14 Parents: 1ac830a Author: Brian Uri Authored: Thu Jun 30 07:52:28 2016 +0100 Committer: Sean Owen Committed: Thu Jun 30 07:52:28 2016 +0100 -- ec2/spark_ec2.py | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ccc7fa35/ec2/spark_ec2.py -- diff --git a/ec2/spark_ec2.py b/ec2/spark_ec2.py index 76c09f0..b28b4c5 100755 --- a/ec2/spark_ec2.py +++ b/ec2/spark_ec2.py @@ -51,7 +51,7 @@ else: raw_input = input xrange = range -SPARK_EC2_VERSION = "1.6.1" +SPARK_EC2_VERSION = "1.6.3" SPARK_EC2_DIR = os.path.dirname(os.path.realpath(__file__)) VALID_SPARK_VERSIONS = set([ @@ -77,6 +77,8 @@ VALID_SPARK_VERSIONS = set([ "1.5.2", "1.6.0", "1.6.1", +"1.6.2", +"1.6.3", ]) SPARK_TACHYON_MAP = { @@ -96,6 +98,8 @@ SPARK_TACHYON_MAP = { "1.5.2": "0.7.1", "1.6.0": "0.8.2", "1.6.1": "0.8.2", +"1.6.2": "0.8.2", +"1.6.3": "0.8.2", } DEFAULT_SPARK_VERSION = SPARK_EC2_VERSION @@ -103,7 +107,7 @@ DEFAULT_SPARK_GITHUB_REPO = "https://github.com/apache/spark"; # Default location to get the spark-ec2 scripts (and ami-list) from DEFAULT_SPARK_EC2_GITHUB_REPO = "https://github.com/amplab/spark-ec2"; -DEFAULT_SPARK_EC2_BRANCH = "branch-1.5" +DEFAULT_SPARK_EC2_BRANCH = "branch-1.6" def setup_external_libs(libs): - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16182][CORE] Utils.scala -- terminateProcess() should call Process.destroyForcibly() if and only if Process.destroy() fails
Repository: spark Updated Branches: refs/heads/master fbfd0ab9d -> 2075bf8ef [SPARK-16182][CORE] Utils.scala -- terminateProcess() should call Process.destroyForcibly() if and only if Process.destroy() fails ## What changes were proposed in this pull request? Utils.terminateProcess should `destroy()` first and only fall back to `destroyForcibly()` if it fails. It's kind of bad that we're force-killing executors -- and only in Java 8. See JIRA for an example of the impact: no shutdown While here: `Utils.waitForProcess` should use the Java 8 method if available instead of a custom implementation. ## How was this patch tested? Existing tests, which cover the force-kill case, and Amplab tests, which will cover both Java 7 and Java 8 eventually. However I tested locally on Java 8 and the PR builder will try Java 7 here. Author: Sean Owen Closes #13973 from srowen/SPARK-16182. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2075bf8e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2075bf8e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2075bf8e Branch: refs/heads/master Commit: 2075bf8ef6035fd7606bcf20dc2cd7d7b9cda446 Parents: fbfd0ab Author: Sean Owen Authored: Fri Jul 1 09:22:27 2016 +0100 Committer: Sean Owen Committed: Fri Jul 1 09:22:27 2016 +0100 -- .../scala/org/apache/spark/util/Utils.scala | 76 .../org/apache/spark/util/UtilsSuite.scala | 2 +- 2 files changed, 47 insertions(+), 31 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2075bf8e/core/src/main/scala/org/apache/spark/util/Utils.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala b/core/src/main/scala/org/apache/spark/util/Utils.scala index f77cc2f..0c23f3c 100644 --- a/core/src/main/scala/org/apache/spark/util/Utils.scala +++ b/core/src/main/scala/org/apache/spark/util/Utils.scala @@ -1772,50 +1772,66 @@ private[spark] object Utils extends Logging { } /** - * Terminates a process waiting for at most the specified duration. Returns whether - * the process terminated. + * Terminates a process waiting for at most the specified duration. + * + * @return the process exit value if it was successfully terminated, else None */ def terminateProcess(process: Process, timeoutMs: Long): Option[Int] = { -try { - // Java8 added a new API which will more forcibly kill the process. Use that if available. - val destroyMethod = process.getClass().getMethod("destroyForcibly"); - destroyMethod.setAccessible(true) - destroyMethod.invoke(process) -} catch { - case NonFatal(e) => -if (!e.isInstanceOf[NoSuchMethodException]) { - logWarning("Exception when attempting to kill process", e) -} -process.destroy() -} +// Politely destroy first +process.destroy() + if (waitForProcess(process, timeoutMs)) { + // Successful exit Option(process.exitValue()) } else { - None + // Java 8 added a new API which will more forcibly kill the process. Use that if available. + try { +classOf[Process].getMethod("destroyForcibly").invoke(process) + } catch { +case _: NoSuchMethodException => return None // Not available; give up +case NonFatal(e) => logWarning("Exception when attempting to kill process", e) + } + // Wait, again, although this really should return almost immediately + if (waitForProcess(process, timeoutMs)) { +Option(process.exitValue()) + } else { +logWarning("Timed out waiting to forcibly kill process") +None + } } } /** * Wait for a process to terminate for at most the specified duration. - * Return whether the process actually terminated after the given timeout. + * + * @return whether the process actually terminated before the given timeout. */ def waitForProcess(process: Process, timeoutMs: Long): Boolean = { -var terminated = false -val startTime = System.currentTimeMillis -while (!terminated) { - try { -process.exitValue() -terminated = true - } catch { -case e: IllegalThreadStateException => - // Process not terminated yet - if (System.currentTimeMillis - startTime > timeoutMs) { -return false +try { + // Use Java 8 method if available + classOf[Process].getMethod("waitFor", java.lang.Long.TYPE, classOf[TimeUnit]) +.invoke(process, timeoutMs.asInstanceOf[java.lang.Long], TimeUnit.MILLISECONDS) +.asInstanceOf[Boolean] +} catch { +
spark git commit: [SPARK-16182][CORE] Utils.scala -- terminateProcess() should call Process.destroyForcibly() if and only if Process.destroy() fails
Repository: spark Updated Branches: refs/heads/branch-2.0 1932bb683 -> 972106dd3 [SPARK-16182][CORE] Utils.scala -- terminateProcess() should call Process.destroyForcibly() if and only if Process.destroy() fails ## What changes were proposed in this pull request? Utils.terminateProcess should `destroy()` first and only fall back to `destroyForcibly()` if it fails. It's kind of bad that we're force-killing executors -- and only in Java 8. See JIRA for an example of the impact: no shutdown While here: `Utils.waitForProcess` should use the Java 8 method if available instead of a custom implementation. ## How was this patch tested? Existing tests, which cover the force-kill case, and Amplab tests, which will cover both Java 7 and Java 8 eventually. However I tested locally on Java 8 and the PR builder will try Java 7 here. Author: Sean Owen Closes #13973 from srowen/SPARK-16182. (cherry picked from commit 2075bf8ef6035fd7606bcf20dc2cd7d7b9cda446) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/972106dd Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/972106dd Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/972106dd Branch: refs/heads/branch-2.0 Commit: 972106dd3bdc40b0980949a09783d6d460e8d268 Parents: 1932bb6 Author: Sean Owen Authored: Fri Jul 1 09:22:27 2016 +0100 Committer: Sean Owen Committed: Fri Jul 1 09:22:36 2016 +0100 -- .../scala/org/apache/spark/util/Utils.scala | 76 .../org/apache/spark/util/UtilsSuite.scala | 2 +- 2 files changed, 47 insertions(+), 31 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/972106dd/core/src/main/scala/org/apache/spark/util/Utils.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala b/core/src/main/scala/org/apache/spark/util/Utils.scala index f77cc2f..0c23f3c 100644 --- a/core/src/main/scala/org/apache/spark/util/Utils.scala +++ b/core/src/main/scala/org/apache/spark/util/Utils.scala @@ -1772,50 +1772,66 @@ private[spark] object Utils extends Logging { } /** - * Terminates a process waiting for at most the specified duration. Returns whether - * the process terminated. + * Terminates a process waiting for at most the specified duration. + * + * @return the process exit value if it was successfully terminated, else None */ def terminateProcess(process: Process, timeoutMs: Long): Option[Int] = { -try { - // Java8 added a new API which will more forcibly kill the process. Use that if available. - val destroyMethod = process.getClass().getMethod("destroyForcibly"); - destroyMethod.setAccessible(true) - destroyMethod.invoke(process) -} catch { - case NonFatal(e) => -if (!e.isInstanceOf[NoSuchMethodException]) { - logWarning("Exception when attempting to kill process", e) -} -process.destroy() -} +// Politely destroy first +process.destroy() + if (waitForProcess(process, timeoutMs)) { + // Successful exit Option(process.exitValue()) } else { - None + // Java 8 added a new API which will more forcibly kill the process. Use that if available. + try { +classOf[Process].getMethod("destroyForcibly").invoke(process) + } catch { +case _: NoSuchMethodException => return None // Not available; give up +case NonFatal(e) => logWarning("Exception when attempting to kill process", e) + } + // Wait, again, although this really should return almost immediately + if (waitForProcess(process, timeoutMs)) { +Option(process.exitValue()) + } else { +logWarning("Timed out waiting to forcibly kill process") +None + } } } /** * Wait for a process to terminate for at most the specified duration. - * Return whether the process actually terminated after the given timeout. + * + * @return whether the process actually terminated before the given timeout. */ def waitForProcess(process: Process, timeoutMs: Long): Boolean = { -var terminated = false -val startTime = System.currentTimeMillis -while (!terminated) { - try { -process.exitValue() -terminated = true - } catch { -case e: IllegalThreadStateException => - // Process not terminated yet - if (System.currentTimeMillis - startTime > timeoutMs) { -return false +try { + // Use Java 8 method if available + classOf[Process].getMethod("waitFor", java.lang.Long.TYPE, classOf[TimeUnit]) +.invoke(process, timeou
spark git commit: [SPARK-16182][CORE] Utils.scala -- terminateProcess() should call Process.destroyForcibly() if and only if Process.destroy() fails
Repository: spark Updated Branches: refs/heads/branch-1.6 ccc7fa357 -> 83f860448 [SPARK-16182][CORE] Utils.scala -- terminateProcess() should call Process.destroyForcibly() if and only if Process.destroy() fails ## What changes were proposed in this pull request? Utils.terminateProcess should `destroy()` first and only fall back to `destroyForcibly()` if it fails. It's kind of bad that we're force-killing executors -- and only in Java 8. See JIRA for an example of the impact: no shutdown While here: `Utils.waitForProcess` should use the Java 8 method if available instead of a custom implementation. ## How was this patch tested? Existing tests, which cover the force-kill case, and Amplab tests, which will cover both Java 7 and Java 8 eventually. However I tested locally on Java 8 and the PR builder will try Java 7 here. Author: Sean Owen Closes #13973 from srowen/SPARK-16182. (cherry picked from commit 2075bf8ef6035fd7606bcf20dc2cd7d7b9cda446) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/83f86044 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/83f86044 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/83f86044 Branch: refs/heads/branch-1.6 Commit: 83f86044879b3c6bbfb0f3075cba552070b064cf Parents: ccc7fa3 Author: Sean Owen Authored: Fri Jul 1 09:22:27 2016 +0100 Committer: Sean Owen Committed: Fri Jul 1 09:25:02 2016 +0100 -- .../scala/org/apache/spark/util/Utils.scala | 76 .../org/apache/spark/util/UtilsSuite.scala | 2 +- 2 files changed, 47 insertions(+), 31 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/83f86044/core/src/main/scala/org/apache/spark/util/Utils.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala b/core/src/main/scala/org/apache/spark/util/Utils.scala index 36ab3ac..427b382 100644 --- a/core/src/main/scala/org/apache/spark/util/Utils.scala +++ b/core/src/main/scala/org/apache/spark/util/Utils.scala @@ -1732,50 +1732,66 @@ private[spark] object Utils extends Logging { } /** - * Terminates a process waiting for at most the specified duration. Returns whether - * the process terminated. + * Terminates a process waiting for at most the specified duration. + * + * @return the process exit value if it was successfully terminated, else None */ def terminateProcess(process: Process, timeoutMs: Long): Option[Int] = { -try { - // Java8 added a new API which will more forcibly kill the process. Use that if available. - val destroyMethod = process.getClass().getMethod("destroyForcibly"); - destroyMethod.setAccessible(true) - destroyMethod.invoke(process) -} catch { - case NonFatal(e) => -if (!e.isInstanceOf[NoSuchMethodException]) { - logWarning("Exception when attempting to kill process", e) -} -process.destroy() -} +// Politely destroy first +process.destroy() + if (waitForProcess(process, timeoutMs)) { + // Successful exit Option(process.exitValue()) } else { - None + // Java 8 added a new API which will more forcibly kill the process. Use that if available. + try { +classOf[Process].getMethod("destroyForcibly").invoke(process) + } catch { +case _: NoSuchMethodException => return None // Not available; give up +case NonFatal(e) => logWarning("Exception when attempting to kill process", e) + } + // Wait, again, although this really should return almost immediately + if (waitForProcess(process, timeoutMs)) { +Option(process.exitValue()) + } else { +logWarning("Timed out waiting to forcibly kill process") +None + } } } /** * Wait for a process to terminate for at most the specified duration. - * Return whether the process actually terminated after the given timeout. + * + * @return whether the process actually terminated before the given timeout. */ def waitForProcess(process: Process, timeoutMs: Long): Boolean = { -var terminated = false -val startTime = System.currentTimeMillis -while (!terminated) { - try { -process.exitValue() -terminated = true - } catch { -case e: IllegalThreadStateException => - // Process not terminated yet - if (System.currentTimeMillis - startTime > timeoutMs) { -return false +try { + // Use Java 8 method if available + classOf[Process].getMethod("waitFor", java.lang.Long.TYPE, classOf[TimeUnit]) +.invoke(process, timeou
spark git commit: [SPARK-15761][MLLIB][PYSPARK] Load ipython when default python is Python3
Repository: spark Updated Branches: refs/heads/branch-1.6 83f860448 -> 1026aba16 [SPARK-15761][MLLIB][PYSPARK] Load ipython when default python is Python3 ## What changes were proposed in this pull request? I would like to use IPython with Python 3.5. It is annoying when it fails with IPython requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON when I have a version greater than 2.7 ## How was this patch tested It now works with IPython and Python3 Author: MechCoder Closes #13503 from MechCoder/spark-15761. (cherry picked from commit 66283ee0b25de2a5daaa21d50a05a7fadec1de77) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1026aba1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1026aba1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1026aba1 Branch: refs/heads/branch-1.6 Commit: 1026aba16554f6c5b5a6a3fdc2b9bdb7911a9fcc Parents: 83f8604 Author: MechCoder Authored: Fri Jul 1 09:27:34 2016 +0100 Committer: Sean Owen Committed: Fri Jul 1 09:27:54 2016 +0100 -- bin/pyspark | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1026aba1/bin/pyspark -- diff --git a/bin/pyspark b/bin/pyspark index 5eaa17d..42af597 100755 --- a/bin/pyspark +++ b/bin/pyspark @@ -54,9 +54,11 @@ elif [[ -z "$PYSPARK_DRIVER_PYTHON" ]]; then PYSPARK_DRIVER_PYTHON="${PYSPARK_PYTHON:-"$DEFAULT_PYTHON"}" fi +WORKS_WITH_IPYTHON=$($DEFAULT_PYTHON -c 'import sys; print(sys.version_info >= (2, 7, 0))') + # Determine the Python executable to use for the executors: if [[ -z "$PYSPARK_PYTHON" ]]; then - if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && $DEFAULT_PYTHON != "python2.7" ]]; then + if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && ! WORKS_WITH_IPYTHON ]]; then echo "IPython requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON" 1>&2 exit 1 else - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15761][MLLIB][PYSPARK] Load ipython when default python is Python3
Repository: spark Updated Branches: refs/heads/master 2075bf8ef -> 66283ee0b [SPARK-15761][MLLIB][PYSPARK] Load ipython when default python is Python3 ## What changes were proposed in this pull request? I would like to use IPython with Python 3.5. It is annoying when it fails with IPython requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON when I have a version greater than 2.7 ## How was this patch tested It now works with IPython and Python3 Author: MechCoder Closes #13503 from MechCoder/spark-15761. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/66283ee0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/66283ee0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/66283ee0 Branch: refs/heads/master Commit: 66283ee0b25de2a5daaa21d50a05a7fadec1de77 Parents: 2075bf8 Author: MechCoder Authored: Fri Jul 1 09:27:34 2016 +0100 Committer: Sean Owen Committed: Fri Jul 1 09:27:34 2016 +0100 -- bin/pyspark | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/66283ee0/bin/pyspark -- diff --git a/bin/pyspark b/bin/pyspark index 396a07c..ac8aa04 100755 --- a/bin/pyspark +++ b/bin/pyspark @@ -50,9 +50,11 @@ if [[ -z "$PYSPARK_DRIVER_PYTHON" ]]; then PYSPARK_DRIVER_PYTHON="${PYSPARK_PYTHON:-"$DEFAULT_PYTHON"}" fi +WORKS_WITH_IPYTHON=$($DEFAULT_PYTHON -c 'import sys; print(sys.version_info >= (2, 7, 0))') + # Determine the Python executable to use for the executors: if [[ -z "$PYSPARK_PYTHON" ]]; then - if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && $DEFAULT_PYTHON != "python2.7" ]]; then + if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && ! WORKS_WITH_IPYTHON ]]; then echo "IPython requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON" 1>&2 exit 1 else - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15761][MLLIB][PYSPARK] Load ipython when default python is Python3
Repository: spark Updated Branches: refs/heads/branch-2.0 972106dd3 -> 0b64543c5 [SPARK-15761][MLLIB][PYSPARK] Load ipython when default python is Python3 ## What changes were proposed in this pull request? I would like to use IPython with Python 3.5. It is annoying when it fails with IPython requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON when I have a version greater than 2.7 ## How was this patch tested It now works with IPython and Python3 Author: MechCoder Closes #13503 from MechCoder/spark-15761. (cherry picked from commit 66283ee0b25de2a5daaa21d50a05a7fadec1de77) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0b64543c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0b64543c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0b64543c Branch: refs/heads/branch-2.0 Commit: 0b64543c5ba6a943294f189b7ca02e0debbfad9c Parents: 972106d Author: MechCoder Authored: Fri Jul 1 09:27:34 2016 +0100 Committer: Sean Owen Committed: Fri Jul 1 09:27:42 2016 +0100 -- bin/pyspark | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0b64543c/bin/pyspark -- diff --git a/bin/pyspark b/bin/pyspark index 396a07c..ac8aa04 100755 --- a/bin/pyspark +++ b/bin/pyspark @@ -50,9 +50,11 @@ if [[ -z "$PYSPARK_DRIVER_PYTHON" ]]; then PYSPARK_DRIVER_PYTHON="${PYSPARK_PYTHON:-"$DEFAULT_PYTHON"}" fi +WORKS_WITH_IPYTHON=$($DEFAULT_PYTHON -c 'import sys; print(sys.version_info >= (2, 7, 0))') + # Determine the Python executable to use for the executors: if [[ -z "$PYSPARK_PYTHON" ]]; then - if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && $DEFAULT_PYTHON != "python2.7" ]]; then + if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && ! WORKS_WITH_IPYTHON ]]; then echo "IPython requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON" 1>&2 exit 1 else - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16222][SQL] JDBC Sources - Handling illegal input values for `fetchsize` and `batchsize`
Repository: spark Updated Branches: refs/heads/branch-2.0 0b64543c5 -> 3665927c6 [SPARK-16222][SQL] JDBC Sources - Handling illegal input values for `fetchsize` and `batchsize` What changes were proposed in this pull request? For JDBC data sources, users can specify `batchsize` for multi-row inserts and `fetchsize` for multi-row fetch. A few issues exist: - The property keys are case sensitive. Thus, the existing test cases for `fetchsize` use incorrect names, `fetchSize`. Basically, the test cases are broken. - No test case exists for `batchsize`. - We do not detect the illegal input values for `fetchsize` and `batchsize`. For example, when `batchsize` is zero, we got the following exception: ``` Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ArithmeticException: / by zero ``` when `fetchsize` is less than zero, we got the exception from the underlying JDBC driver: ``` Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): org.h2.jdbc.JdbcSQLException: Invalid value "-1" for parameter "rows" [90008-183] ``` This PR fixes all the above issues, and issue the appropriate exceptions when detecting the illegal inputs for `fetchsize` and `batchsize`. Also update the function descriptions. How was this patch tested? Test cases are fixed and added. Author: gatorsmile Closes #13919 from gatorsmile/jdbcProperties. (cherry picked from commit 0ad6ce7e54b1d8f5946dde652fa5341d15059158) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3665927c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3665927c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3665927c Branch: refs/heads/branch-2.0 Commit: 3665927c6f5fa4794a59718fd2d339310c70a985 Parents: 0b64543 Author: gatorsmile Authored: Fri Jul 1 09:54:02 2016 +0100 Committer: Sean Owen Committed: Fri Jul 1 09:54:10 2016 +0100 -- .../org/apache/spark/sql/DataFrameReader.scala | 6 +- .../org/apache/spark/sql/DataFrameWriter.scala | 3 +- .../execution/datasources/jdbc/JDBCRDD.scala| 6 +- .../execution/datasources/jdbc/JdbcUtils.scala | 10 +++- .../apache/spark/sql/jdbc/PostgresDialect.scala | 2 +- .../org/apache/spark/sql/jdbc/JDBCSuite.scala | 62 .../apache/spark/sql/jdbc/JDBCWriteSuite.scala | 54 - 7 files changed, 98 insertions(+), 45 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3665927c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala index 35ba522..e8c2885 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala @@ -177,7 +177,8 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { * clause expressions used to split the column `columnName` evenly. * @param connectionProperties JDBC database connection arguments, a list of arbitrary string * tag/value. Normally at least a "user" and "password" property - * should be included. + * should be included. "fetchsize" can be used to control the + * number of rows per fetch. * @since 1.4.0 */ def jdbc( @@ -207,7 +208,8 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { * @param predicates Condition in the where clause for each partition. * @param connectionProperties JDBC database connection arguments, a list of arbitrary string * tag/value. Normally at least a "user" and "password" property - * should be included. + * should be included. "fetchsize" can be used to control the + * number of rows per fetch. * @since 1.4.0 */ def jdbc( http://git-wip-us.apache.org/repos/asf/spark/blob/3665927c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala index ca3972d..f77af76 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala +++ b/sql/core/src
spark git commit: [SPARK-16222][SQL] JDBC Sources - Handling illegal input values for `fetchsize` and `batchsize`
Repository: spark Updated Branches: refs/heads/master 66283ee0b -> 0ad6ce7e5 [SPARK-16222][SQL] JDBC Sources - Handling illegal input values for `fetchsize` and `batchsize` What changes were proposed in this pull request? For JDBC data sources, users can specify `batchsize` for multi-row inserts and `fetchsize` for multi-row fetch. A few issues exist: - The property keys are case sensitive. Thus, the existing test cases for `fetchsize` use incorrect names, `fetchSize`. Basically, the test cases are broken. - No test case exists for `batchsize`. - We do not detect the illegal input values for `fetchsize` and `batchsize`. For example, when `batchsize` is zero, we got the following exception: ``` Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ArithmeticException: / by zero ``` when `fetchsize` is less than zero, we got the exception from the underlying JDBC driver: ``` Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): org.h2.jdbc.JdbcSQLException: Invalid value "-1" for parameter "rows" [90008-183] ``` This PR fixes all the above issues, and issue the appropriate exceptions when detecting the illegal inputs for `fetchsize` and `batchsize`. Also update the function descriptions. How was this patch tested? Test cases are fixed and added. Author: gatorsmile Closes #13919 from gatorsmile/jdbcProperties. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0ad6ce7e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0ad6ce7e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0ad6ce7e Branch: refs/heads/master Commit: 0ad6ce7e54b1d8f5946dde652fa5341d15059158 Parents: 66283ee Author: gatorsmile Authored: Fri Jul 1 09:54:02 2016 +0100 Committer: Sean Owen Committed: Fri Jul 1 09:54:02 2016 +0100 -- .../org/apache/spark/sql/DataFrameReader.scala | 6 +- .../org/apache/spark/sql/DataFrameWriter.scala | 3 +- .../execution/datasources/jdbc/JDBCRDD.scala| 6 +- .../execution/datasources/jdbc/JdbcUtils.scala | 10 +++- .../apache/spark/sql/jdbc/PostgresDialect.scala | 2 +- .../org/apache/spark/sql/jdbc/JDBCSuite.scala | 62 .../apache/spark/sql/jdbc/JDBCWriteSuite.scala | 54 - 7 files changed, 98 insertions(+), 45 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0ad6ce7e/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala index 35ba522..e8c2885 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala @@ -177,7 +177,8 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { * clause expressions used to split the column `columnName` evenly. * @param connectionProperties JDBC database connection arguments, a list of arbitrary string * tag/value. Normally at least a "user" and "password" property - * should be included. + * should be included. "fetchsize" can be used to control the + * number of rows per fetch. * @since 1.4.0 */ def jdbc( @@ -207,7 +208,8 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { * @param predicates Condition in the where clause for each partition. * @param connectionProperties JDBC database connection arguments, a list of arbitrary string * tag/value. Normally at least a "user" and "password" property - * should be included. + * should be included. "fetchsize" can be used to control the + * number of rows per fetch. * @since 1.4.0 */ def jdbc( http://git-wip-us.apache.org/repos/asf/spark/blob/0ad6ce7e/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala index ca3972d..f77af76 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala @@ -391,7 +391,8 @@ final class DataFrameWriter[T
spark git commit: [GRAPHX][EXAMPLES] move graphx test data directory and update graphx document
Repository: spark Updated Branches: refs/heads/master bad0f7dbb -> 192d1f9cf [GRAPHX][EXAMPLES] move graphx test data directory and update graphx document ## What changes were proposed in this pull request? There are two test data files used for graphx examples existing in directory "graphx/data" I move it into "data/" directory because the "graphx" directory is used for code files and other test data files (such as mllib, streaming test data) are all in there. I also update the graphx document where reference the data files which I move place. ## How was this patch tested? N/A Author: WeichenXu Closes #14010 from WeichenXu123/move_graphx_data_dir. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/192d1f9c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/192d1f9c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/192d1f9c Branch: refs/heads/master Commit: 192d1f9cf3463d050b87422939448f2acf86acc9 Parents: bad0f7d Author: WeichenXu Authored: Sat Jul 2 08:40:23 2016 +0100 Committer: Sean Owen Committed: Sat Jul 2 08:40:23 2016 +0100 -- data/graphx/followers.txt| 8 data/graphx/users.txt| 7 +++ docs/graphx-programming-guide.md | 18 +- graphx/data/followers.txt| 8 graphx/data/users.txt| 7 --- 5 files changed, 24 insertions(+), 24 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/192d1f9c/data/graphx/followers.txt -- diff --git a/data/graphx/followers.txt b/data/graphx/followers.txt new file mode 100644 index 000..7bb8e90 --- /dev/null +++ b/data/graphx/followers.txt @@ -0,0 +1,8 @@ +2 1 +4 1 +1 2 +6 3 +7 3 +7 6 +6 7 +3 7 http://git-wip-us.apache.org/repos/asf/spark/blob/192d1f9c/data/graphx/users.txt -- diff --git a/data/graphx/users.txt b/data/graphx/users.txt new file mode 100644 index 000..982d19d --- /dev/null +++ b/data/graphx/users.txt @@ -0,0 +1,7 @@ +1,BarackObama,Barack Obama +2,ladygaga,Goddess of Love +3,jeresig,John Resig +4,justinbieber,Justin Bieber +6,matei_zaharia,Matei Zaharia +7,odersky,Martin Odersky +8,anonsys http://git-wip-us.apache.org/repos/asf/spark/blob/192d1f9c/docs/graphx-programming-guide.md -- diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md index 81cf174..e376b66 100644 --- a/docs/graphx-programming-guide.md +++ b/docs/graphx-programming-guide.md @@ -1007,15 +1007,15 @@ PageRank measures the importance of each vertex in a graph, assuming an edge fro GraphX comes with static and dynamic implementations of PageRank as methods on the [`PageRank` object][PageRank]. Static PageRank runs for a fixed number of iterations, while dynamic PageRank runs until the ranks converge (i.e., stop changing by more than a specified tolerance). [`GraphOps`][GraphOps] allows calling these algorithms directly as methods on `Graph`. -GraphX also includes an example social network dataset that we can run PageRank on. A set of users is given in `graphx/data/users.txt`, and a set of relationships between users is given in `graphx/data/followers.txt`. We compute the PageRank of each user as follows: +GraphX also includes an example social network dataset that we can run PageRank on. A set of users is given in `data/graphx/users.txt`, and a set of relationships between users is given in `data/graphx/followers.txt`. We compute the PageRank of each user as follows: {% highlight scala %} // Load the edges as a graph -val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt") +val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt") // Run PageRank val ranks = graph.pageRank(0.0001).vertices // Join the ranks with the usernames -val users = sc.textFile("graphx/data/users.txt").map { line => +val users = sc.textFile("data/graphx/users.txt").map { line => val fields = line.split(",") (fields(0).toLong, fields(1)) } @@ -1032,11 +1032,11 @@ The connected components algorithm labels each connected component of the graph {% highlight scala %} // Load the graph as in the PageRank example -val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt") +val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt") // Find the connected components val cc = graph.connectedComponents().vertices // Join the connected components with the usernames -val users = sc.textFile("graphx/data/users.txt").map { line => +val users = sc.textFile("data/graphx/users.txt").map { line => val fields = line.split(",") (fields(0).toLong, fields(1)) } @@ -1053,11 +1053,11 @@ A ver
spark git commit: [GRAPHX][EXAMPLES] move graphx test data directory and update graphx document
Repository: spark Updated Branches: refs/heads/branch-2.0 ab4303800 -> f3a359939 [GRAPHX][EXAMPLES] move graphx test data directory and update graphx document ## What changes were proposed in this pull request? There are two test data files used for graphx examples existing in directory "graphx/data" I move it into "data/" directory because the "graphx" directory is used for code files and other test data files (such as mllib, streaming test data) are all in there. I also update the graphx document where reference the data files which I move place. ## How was this patch tested? N/A Author: WeichenXu Closes #14010 from WeichenXu123/move_graphx_data_dir. (cherry picked from commit 192d1f9cf3463d050b87422939448f2acf86acc9) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f3a35993 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f3a35993 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f3a35993 Branch: refs/heads/branch-2.0 Commit: f3a359939afb25c8b91fabe5955e1cdf609be521 Parents: ab43038 Author: WeichenXu Authored: Sat Jul 2 08:40:23 2016 +0100 Committer: Sean Owen Committed: Sat Jul 2 08:40:31 2016 +0100 -- data/graphx/followers.txt| 8 data/graphx/users.txt| 7 +++ docs/graphx-programming-guide.md | 18 +- graphx/data/followers.txt| 8 graphx/data/users.txt| 7 --- 5 files changed, 24 insertions(+), 24 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f3a35993/data/graphx/followers.txt -- diff --git a/data/graphx/followers.txt b/data/graphx/followers.txt new file mode 100644 index 000..7bb8e90 --- /dev/null +++ b/data/graphx/followers.txt @@ -0,0 +1,8 @@ +2 1 +4 1 +1 2 +6 3 +7 3 +7 6 +6 7 +3 7 http://git-wip-us.apache.org/repos/asf/spark/blob/f3a35993/data/graphx/users.txt -- diff --git a/data/graphx/users.txt b/data/graphx/users.txt new file mode 100644 index 000..982d19d --- /dev/null +++ b/data/graphx/users.txt @@ -0,0 +1,7 @@ +1,BarackObama,Barack Obama +2,ladygaga,Goddess of Love +3,jeresig,John Resig +4,justinbieber,Justin Bieber +6,matei_zaharia,Matei Zaharia +7,odersky,Martin Odersky +8,anonsys http://git-wip-us.apache.org/repos/asf/spark/blob/f3a35993/docs/graphx-programming-guide.md -- diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md index 81cf174..e376b66 100644 --- a/docs/graphx-programming-guide.md +++ b/docs/graphx-programming-guide.md @@ -1007,15 +1007,15 @@ PageRank measures the importance of each vertex in a graph, assuming an edge fro GraphX comes with static and dynamic implementations of PageRank as methods on the [`PageRank` object][PageRank]. Static PageRank runs for a fixed number of iterations, while dynamic PageRank runs until the ranks converge (i.e., stop changing by more than a specified tolerance). [`GraphOps`][GraphOps] allows calling these algorithms directly as methods on `Graph`. -GraphX also includes an example social network dataset that we can run PageRank on. A set of users is given in `graphx/data/users.txt`, and a set of relationships between users is given in `graphx/data/followers.txt`. We compute the PageRank of each user as follows: +GraphX also includes an example social network dataset that we can run PageRank on. A set of users is given in `data/graphx/users.txt`, and a set of relationships between users is given in `data/graphx/followers.txt`. We compute the PageRank of each user as follows: {% highlight scala %} // Load the edges as a graph -val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt") +val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt") // Run PageRank val ranks = graph.pageRank(0.0001).vertices // Join the ranks with the usernames -val users = sc.textFile("graphx/data/users.txt").map { line => +val users = sc.textFile("data/graphx/users.txt").map { line => val fields = line.split(",") (fields(0).toLong, fields(1)) } @@ -1032,11 +1032,11 @@ The connected components algorithm labels each connected component of the graph {% highlight scala %} // Load the graph as in the PageRank example -val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt") +val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt") // Find the connected components val cc = graph.connectedComponents().vertices // Join the connected components with the usernames -val users = sc.textFile("graphx/data/users.txt").map { line => +val users = sc.textFile("data/graphx/users.txt").map { l
spark git commit: [SPARK-16345][DOCUMENTATION][EXAMPLES][GRAPHX] Extract graphx programming guide example snippets from source files instead of hard code them
Repository: spark Updated Branches: refs/heads/master 192d1f9cf -> 0bd7cd18b [SPARK-16345][DOCUMENTATION][EXAMPLES][GRAPHX] Extract graphx programming guide example snippets from source files instead of hard code them ## What changes were proposed in this pull request? I extract 6 example programs from GraphX programming guide and replace them with `include_example` label. The 6 example programs are: - AggregateMessagesExample.scala - SSSPExample.scala - TriangleCountingExample.scala - ConnectedComponentsExample.scala - ComprehensiveExample.scala - PageRankExample.scala All the example code can run using `bin/run-example graphx.EXAMPLE_NAME` ## How was this patch tested? Manual. Author: WeichenXu Closes #14015 from WeichenXu123/graphx_example_plugin. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0bd7cd18 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0bd7cd18 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0bd7cd18 Branch: refs/heads/master Commit: 0bd7cd18bc4d535b0c4499913f6747b3f6315ac2 Parents: 192d1f9 Author: WeichenXu Authored: Sat Jul 2 16:29:00 2016 +0100 Committer: Sean Owen Committed: Sat Jul 2 16:29:00 2016 +0100 -- docs/graphx-programming-guide.md| 133 +-- .../graphx/AggregateMessagesExample.scala | 72 ++ .../examples/graphx/ComprehensiveExample.scala | 80 +++ .../graphx/ConnectedComponentsExample.scala | 68 ++ .../spark/examples/graphx/PageRankExample.scala | 61 + .../spark/examples/graphx/SSSPExample.scala | 69 ++ .../graphx/TriangleCountingExample.scala| 70 ++ 7 files changed, 426 insertions(+), 127 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0bd7cd18/docs/graphx-programming-guide.md -- diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md index e376b66..2e9966c 100644 --- a/docs/graphx-programming-guide.md +++ b/docs/graphx-programming-guide.md @@ -603,29 +603,7 @@ slightly unreliable and instead opted for more explicit user control. In the following example we use the [`aggregateMessages`][Graph.aggregateMessages] operator to compute the average age of the more senior followers of each user. -{% highlight scala %} -// Import random graph generation library -import org.apache.spark.graphx.util.GraphGenerators -// Create a graph with "age" as the vertex property. Here we use a random graph for simplicity. -val graph: Graph[Double, Int] = - GraphGenerators.logNormalGraph(sc, numVertices = 100).mapVertices( (id, _) => id.toDouble ) -// Compute the number of older followers and their total age -val olderFollowers: VertexRDD[(Int, Double)] = graph.aggregateMessages[(Int, Double)]( - triplet => { // Map Function -if (triplet.srcAttr > triplet.dstAttr) { - // Send message to destination vertex containing counter and age - triplet.sendToDst(1, triplet.srcAttr) -} - }, - // Add counter and age - (a, b) => (a._1 + b._1, a._2 + b._2) // Reduce Function -) -// Divide total age by number of older followers to get average age of older followers -val avgAgeOfOlderFollowers: VertexRDD[Double] = - olderFollowers.mapValues( (id, value) => value match { case (count, totalAge) => totalAge / count } ) -// Display the results -avgAgeOfOlderFollowers.collect.foreach(println(_)) -{% endhighlight %} +{% include_example scala/org/apache/spark/examples/graphx/AggregateMessagesExample.scala %} > The `aggregateMessages` operation performs optimally when the messages (and > the sums of > messages) are constant sized (e.g., floats and addition instead of lists and > concatenation). @@ -793,29 +771,7 @@ second argument list contains the user defined functions for receiving messages We can use the Pregel operator to express computation such as single source shortest path in the following example. -{% highlight scala %} -import org.apache.spark.graphx._ -// Import random graph generation library -import org.apache.spark.graphx.util.GraphGenerators -// A graph with edge attributes containing distances -val graph: Graph[Long, Double] = - GraphGenerators.logNormalGraph(sc, numVertices = 100).mapEdges(e => e.attr.toDouble) -val sourceId: VertexId = 42 // The ultimate source -// Initialize the graph such that all vertices except the root have distance infinity. -val initialGraph = graph.mapVertices((id, _) => if (id == sourceId) 0.0 else Double.PositiveInfinity) -val sssp = initialGraph.pregel(Double.PositiveInfinity)( - (id, dist, newDist) => math.min(dist, newDist), // Vertex Program - triplet => { // Send Message -if (triplet.srcAttr + triplet.attr < triplet.dstAttr) { - Iterator((triplet.
spark git commit: [SPARK-16345][DOCUMENTATION][EXAMPLES][GRAPHX] Extract graphx programming guide example snippets from source files instead of hard code them
Repository: spark Updated Branches: refs/heads/branch-2.0 f3a359939 -> 0d0b41609 [SPARK-16345][DOCUMENTATION][EXAMPLES][GRAPHX] Extract graphx programming guide example snippets from source files instead of hard code them ## What changes were proposed in this pull request? I extract 6 example programs from GraphX programming guide and replace them with `include_example` label. The 6 example programs are: - AggregateMessagesExample.scala - SSSPExample.scala - TriangleCountingExample.scala - ConnectedComponentsExample.scala - ComprehensiveExample.scala - PageRankExample.scala All the example code can run using `bin/run-example graphx.EXAMPLE_NAME` ## How was this patch tested? Manual. Author: WeichenXu Closes #14015 from WeichenXu123/graphx_example_plugin. (cherry picked from commit 0bd7cd18bc4d535b0c4499913f6747b3f6315ac2) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0d0b4160 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0d0b4160 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0d0b4160 Branch: refs/heads/branch-2.0 Commit: 0d0b416097a095fa771a7d5ae368546c26cb2d8b Parents: f3a3599 Author: WeichenXu Authored: Sat Jul 2 16:29:00 2016 +0100 Committer: Sean Owen Committed: Sat Jul 2 16:29:26 2016 +0100 -- docs/graphx-programming-guide.md| 133 +-- .../graphx/AggregateMessagesExample.scala | 72 ++ .../examples/graphx/ComprehensiveExample.scala | 80 +++ .../graphx/ConnectedComponentsExample.scala | 68 ++ .../spark/examples/graphx/PageRankExample.scala | 61 + .../spark/examples/graphx/SSSPExample.scala | 69 ++ .../graphx/TriangleCountingExample.scala| 70 ++ 7 files changed, 426 insertions(+), 127 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0d0b4160/docs/graphx-programming-guide.md -- diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md index e376b66..2e9966c 100644 --- a/docs/graphx-programming-guide.md +++ b/docs/graphx-programming-guide.md @@ -603,29 +603,7 @@ slightly unreliable and instead opted for more explicit user control. In the following example we use the [`aggregateMessages`][Graph.aggregateMessages] operator to compute the average age of the more senior followers of each user. -{% highlight scala %} -// Import random graph generation library -import org.apache.spark.graphx.util.GraphGenerators -// Create a graph with "age" as the vertex property. Here we use a random graph for simplicity. -val graph: Graph[Double, Int] = - GraphGenerators.logNormalGraph(sc, numVertices = 100).mapVertices( (id, _) => id.toDouble ) -// Compute the number of older followers and their total age -val olderFollowers: VertexRDD[(Int, Double)] = graph.aggregateMessages[(Int, Double)]( - triplet => { // Map Function -if (triplet.srcAttr > triplet.dstAttr) { - // Send message to destination vertex containing counter and age - triplet.sendToDst(1, triplet.srcAttr) -} - }, - // Add counter and age - (a, b) => (a._1 + b._1, a._2 + b._2) // Reduce Function -) -// Divide total age by number of older followers to get average age of older followers -val avgAgeOfOlderFollowers: VertexRDD[Double] = - olderFollowers.mapValues( (id, value) => value match { case (count, totalAge) => totalAge / count } ) -// Display the results -avgAgeOfOlderFollowers.collect.foreach(println(_)) -{% endhighlight %} +{% include_example scala/org/apache/spark/examples/graphx/AggregateMessagesExample.scala %} > The `aggregateMessages` operation performs optimally when the messages (and > the sums of > messages) are constant sized (e.g., floats and addition instead of lists and > concatenation). @@ -793,29 +771,7 @@ second argument list contains the user defined functions for receiving messages We can use the Pregel operator to express computation such as single source shortest path in the following example. -{% highlight scala %} -import org.apache.spark.graphx._ -// Import random graph generation library -import org.apache.spark.graphx.util.GraphGenerators -// A graph with edge attributes containing distances -val graph: Graph[Long, Double] = - GraphGenerators.logNormalGraph(sc, numVertices = 100).mapEdges(e => e.attr.toDouble) -val sourceId: VertexId = 42 // The ultimate source -// Initialize the graph such that all vertices except the root have distance infinity. -val initialGraph = graph.mapVertices((id, _) => if (id == sourceId) 0.0 else Double.PositiveInfinity) -val sssp = initialGraph.pregel(Double.PositiveInfinity)( - (id, dist, newDist) => math.min(dist, newDist), // Vertex Program - triplet => {
spark git commit: [MINOR][BUILD] Fix Java linter errors
Repository: spark Updated Branches: refs/heads/master 0bd7cd18b -> 3000b4b29 [MINOR][BUILD] Fix Java linter errors ## What changes were proposed in this pull request? This PR fixes the minor Java linter errors like the following. ``` -public int read(char cbuf[], int off, int len) throws IOException { +public int read(char[] cbuf, int off, int len) throws IOException { ``` ## How was this patch tested? Manual. ``` $ build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install $ dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. ``` Author: Dongjoon Hyun Closes #14017 from dongjoon-hyun/minor_build_java_linter_error. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3000b4b2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3000b4b2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3000b4b2 Branch: refs/heads/master Commit: 3000b4b29f9165f436f186a8c1ba818e24f90615 Parents: 0bd7cd1 Author: Dongjoon Hyun Authored: Sat Jul 2 16:31:06 2016 +0100 Committer: Sean Owen Committed: Sat Jul 2 16:31:06 2016 +0100 -- .../shuffle/sort/ShuffleExternalSorter.java | 3 ++- .../unsafe/sort/UnsafeExternalSorter.java| 12 ++-- .../catalyst/expressions/xml/UDFXPathUtil.java | 19 +++ .../sql/execution/UnsafeExternalRowSorter.java | 4 ++-- .../UnsafeFixedWidthAggregationMap.java | 4 ++-- .../sql/execution/UnsafeKVExternalSorter.java| 3 ++- 6 files changed, 25 insertions(+), 20 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3000b4b2/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java -- diff --git a/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java b/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java index 696ee73..cf38a04 100644 --- a/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java +++ b/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java @@ -376,7 +376,8 @@ final class ShuffleExternalSorter extends MemoryConsumer { // for tests assert(inMemSorter != null); if (inMemSorter.numRecords() >= numElementsForSpillThreshold) { - logger.info("Spilling data because number of spilledRecords crossed the threshold " + numElementsForSpillThreshold); + logger.info("Spilling data because number of spilledRecords crossed the threshold " + +numElementsForSpillThreshold); spill(); } http://git-wip-us.apache.org/repos/asf/spark/blob/3000b4b2/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java -- diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index d6a255e..8d596f8 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -27,7 +27,6 @@ import com.google.common.annotations.VisibleForTesting; import org.slf4j.Logger; import org.slf4j.LoggerFactory; -import org.apache.spark.SparkEnv; import org.apache.spark.TaskContext; import org.apache.spark.executor.ShuffleWriteMetrics; import org.apache.spark.memory.MemoryConsumer; @@ -99,8 +98,8 @@ public final class UnsafeExternalSorter extends MemoryConsumer { long numElementsForSpillThreshold, UnsafeInMemorySorter inMemorySorter) throws IOException { UnsafeExternalSorter sorter = new UnsafeExternalSorter(taskMemoryManager, blockManager, - serializerManager, taskContext, recordComparator, prefixComparator, initialSize, numElementsForSpillThreshold, -pageSizeBytes, inMemorySorter, false /* ignored */); + serializerManager, taskContext, recordComparator, prefixComparator, initialSize, +numElementsForSpillThreshold, pageSizeBytes, inMemorySorter, false /* ignored */); sorter.spill(Long.MAX_VALUE, sorter); // The external sorter will be used to insert records, in-memory sorter is not needed. sorter.inMemSorter = null; @@ -119,8 +118,8 @@ public final class UnsafeExternalSorter extends MemoryConsumer { long numElementsForSpillThreshold, boolean canUseRadixSort) { return new UnsafeExternalSorter(taskMemoryManager, blockManager, serializerManager, - taskContext, recordComparator, prefixComparator, initialSize, pageSizeBytes, numElementsForSpillThreshold, null, - canUseRadixSort); +
spark git commit: [MINOR][BUILD] Fix Java linter errors
Repository: spark Updated Branches: refs/heads/branch-2.0 0d0b41609 -> 0c6fd03fa [MINOR][BUILD] Fix Java linter errors This PR fixes the minor Java linter errors like the following. ``` -public int read(char cbuf[], int off, int len) throws IOException { +public int read(char[] cbuf, int off, int len) throws IOException { ``` Manual. ``` $ build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install $ dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. ``` Author: Dongjoon Hyun Closes #14017 from dongjoon-hyun/minor_build_java_linter_error. (cherry picked from commit 3000b4b29f9165f436f186a8c1ba818e24f90615) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0c6fd03f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0c6fd03f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0c6fd03f Branch: refs/heads/branch-2.0 Commit: 0c6fd03fa763df4afb77ac4738c76f0b73e46ad0 Parents: 0d0b416 Author: Dongjoon Hyun Authored: Sat Jul 2 16:31:06 2016 +0100 Committer: Sean Owen Committed: Sat Jul 2 16:33:22 2016 +0100 -- .../spark/shuffle/sort/ShuffleExternalSorter.java | 3 ++- .../collection/unsafe/sort/UnsafeExternalSorter.java| 12 ++-- .../spark/sql/execution/UnsafeExternalRowSorter.java| 4 ++-- .../sql/execution/UnsafeFixedWidthAggregationMap.java | 4 ++-- .../spark/sql/execution/UnsafeKVExternalSorter.java | 3 ++- 5 files changed, 14 insertions(+), 12 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0c6fd03f/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java -- diff --git a/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java b/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java index 696ee73..cf38a04 100644 --- a/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java +++ b/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java @@ -376,7 +376,8 @@ final class ShuffleExternalSorter extends MemoryConsumer { // for tests assert(inMemSorter != null); if (inMemSorter.numRecords() >= numElementsForSpillThreshold) { - logger.info("Spilling data because number of spilledRecords crossed the threshold " + numElementsForSpillThreshold); + logger.info("Spilling data because number of spilledRecords crossed the threshold " + +numElementsForSpillThreshold); spill(); } http://git-wip-us.apache.org/repos/asf/spark/blob/0c6fd03f/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java -- diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index 8a980d4..50f5b06 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -27,7 +27,6 @@ import com.google.common.annotations.VisibleForTesting; import org.slf4j.Logger; import org.slf4j.LoggerFactory; -import org.apache.spark.SparkEnv; import org.apache.spark.TaskContext; import org.apache.spark.executor.ShuffleWriteMetrics; import org.apache.spark.memory.MemoryConsumer; @@ -99,8 +98,8 @@ public final class UnsafeExternalSorter extends MemoryConsumer { long numElementsForSpillThreshold, UnsafeInMemorySorter inMemorySorter) throws IOException { UnsafeExternalSorter sorter = new UnsafeExternalSorter(taskMemoryManager, blockManager, - serializerManager, taskContext, recordComparator, prefixComparator, initialSize, numElementsForSpillThreshold, -pageSizeBytes, inMemorySorter, false /* ignored */); + serializerManager, taskContext, recordComparator, prefixComparator, initialSize, +numElementsForSpillThreshold, pageSizeBytes, inMemorySorter, false /* ignored */); sorter.spill(Long.MAX_VALUE, sorter); // The external sorter will be used to insert records, in-memory sorter is not needed. sorter.inMemSorter = null; @@ -119,8 +118,8 @@ public final class UnsafeExternalSorter extends MemoryConsumer { long numElementsForSpillThreshold, boolean canUseRadixSort) { return new UnsafeExternalSorter(taskMemoryManager, blockManager, serializerManager, - taskContext, recordComparator, prefixComparator, initialSize, pageSizeBytes, numElementsForSpillThreshold, null, - canUseRadixSort); + taskContext, record
spark git commit: [MINOR][DOCS] Remove unused images; crush PNGs that could use it for good measure
Repository: spark Updated Branches: refs/heads/branch-2.0 3ecee573c -> ecbb44709 [MINOR][DOCS] Remove unused images; crush PNGs that could use it for good measure ## What changes were proposed in this pull request? Coincidentally, I discovered that a couple images were unused in `docs/`, and then searched and found more, and then realized some PNGs were pretty big and could be crushed, and before I knew it, had done the same for the ASF site (not committed yet). No functional change at all, just less superfluous image data. ## How was this patch tested? `jekyll serve` Author: Sean Owen Closes #14029 from srowen/RemoveCompressImages. (cherry picked from commit 18fb57f58a04685823408f3a174a8722f155fd4d) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ecbb4470 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ecbb4470 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ecbb4470 Branch: refs/heads/branch-2.0 Commit: ecbb44709bfbaaf3412127dc4569732ade16a6ba Parents: 3ecee57 Author: Sean Owen Authored: Mon Jul 4 09:21:58 2016 +0100 Committer: Sean Owen Committed: Mon Jul 4 09:22:09 2016 +0100 -- .../spark/ui/static/spark-logo-77x50px-hd.png | Bin 4182 -> 3077 bytes docs/img/cluster-overview.png | Bin 33565 -> 22912 bytes docs/img/edge-cut.png | Bin 12563 -> 0 bytes docs/img/edge_cut_vs_vertex_cut.png | Bin 79745 -> 51015 bytes docs/img/graph_parallel.png | Bin 92288 -> 0 bytes docs/img/graphx_logo.png| Bin 40324 -> 22875 bytes docs/img/graphx_performance_comparison.png | Bin 166343 -> 0 bytes docs/img/ml-Pipeline.png| Bin 74030 -> 38536 bytes docs/img/ml-PipelineModel.png | Bin 76019 -> 39228 bytes docs/img/property_graph.png | Bin 225151 -> 135699 bytes docs/img/spark-logo-hd.png | Bin 16418 -> 11306 bytes docs/img/spark-webui-accumulators.png | Bin 231065 -> 160167 bytes docs/img/streaming-arch.png | Bin 78954 -> 51972 bytes docs/img/streaming-dstream-ops.png | Bin 48429 -> 33495 bytes docs/img/streaming-dstream-window.png | Bin 40938 -> 26622 bytes docs/img/streaming-dstream.png | Bin 26823 -> 17843 bytes docs/img/streaming-flow.png | Bin 31544 -> 20425 bytes docs/img/streaming-kinesis-arch.png | Bin 115277 -> 86336 bytes docs/img/structured-streaming-example-model.png | Bin 125504 -> 79409 bytes docs/img/structured-streaming-late-data.png | Bin 138226 -> 91513 bytes docs/img/structured-streaming-model.png | Bin 66098 -> 37321 bytes .../structured-streaming-stream-as-a-table.png | Bin 82251 -> 47791 bytes docs/img/structured-streaming-window.png| Bin 132875 -> 88102 bytes docs/img/triplet.png| Bin 31489 -> 19255 bytes docs/img/vertex-cut.png | Bin 12246 -> 0 bytes docs/img/vertex_routing_edge_tables.png | Bin 570007 -> 323162 bytes 26 files changed, 0 insertions(+), 0 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ecbb4470/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png -- diff --git a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png index ffe2550..cee2891 100644 Binary files a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png and b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png differ http://git-wip-us.apache.org/repos/asf/spark/blob/ecbb4470/docs/img/cluster-overview.png -- diff --git a/docs/img/cluster-overview.png b/docs/img/cluster-overview.png index 317554c..b1b7c1a 100644 Binary files a/docs/img/cluster-overview.png and b/docs/img/cluster-overview.png differ http://git-wip-us.apache.org/repos/asf/spark/blob/ecbb4470/docs/img/edge-cut.png -- diff --git a/docs/img/edge-cut.png b/docs/img/edge-cut.png deleted file mode 100644 index 698f4ff..000 Binary files a/docs/img/edge-cut.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/spark/blob/ecbb4470/docs/img/edge_cut_vs_vertex_cut.png -- diff --git a/docs/img/edge_cut_vs_vertex_cut.png b/docs/img
spark git commit: [MINOR][DOCS] Remove unused images; crush PNGs that could use it for good measure
Repository: spark Updated Branches: refs/heads/master a539b724c -> 18fb57f58 [MINOR][DOCS] Remove unused images; crush PNGs that could use it for good measure ## What changes were proposed in this pull request? Coincidentally, I discovered that a couple images were unused in `docs/`, and then searched and found more, and then realized some PNGs were pretty big and could be crushed, and before I knew it, had done the same for the ASF site (not committed yet). No functional change at all, just less superfluous image data. ## How was this patch tested? `jekyll serve` Author: Sean Owen Closes #14029 from srowen/RemoveCompressImages. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/18fb57f5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/18fb57f5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/18fb57f5 Branch: refs/heads/master Commit: 18fb57f58a04685823408f3a174a8722f155fd4d Parents: a539b72 Author: Sean Owen Authored: Mon Jul 4 09:21:58 2016 +0100 Committer: Sean Owen Committed: Mon Jul 4 09:21:58 2016 +0100 -- .../spark/ui/static/spark-logo-77x50px-hd.png | Bin 4182 -> 3077 bytes docs/img/cluster-overview.png | Bin 33565 -> 22912 bytes docs/img/edge-cut.png | Bin 12563 -> 0 bytes docs/img/edge_cut_vs_vertex_cut.png | Bin 79745 -> 51015 bytes docs/img/graph_parallel.png | Bin 92288 -> 0 bytes docs/img/graphx_logo.png| Bin 40324 -> 22875 bytes docs/img/graphx_performance_comparison.png | Bin 166343 -> 0 bytes docs/img/ml-Pipeline.png| Bin 74030 -> 38536 bytes docs/img/ml-PipelineModel.png | Bin 76019 -> 39228 bytes docs/img/property_graph.png | Bin 225151 -> 135699 bytes docs/img/spark-logo-hd.png | Bin 16418 -> 11306 bytes docs/img/spark-webui-accumulators.png | Bin 231065 -> 160167 bytes docs/img/streaming-arch.png | Bin 78954 -> 51972 bytes docs/img/streaming-dstream-ops.png | Bin 48429 -> 33495 bytes docs/img/streaming-dstream-window.png | Bin 40938 -> 26622 bytes docs/img/streaming-dstream.png | Bin 26823 -> 17843 bytes docs/img/streaming-flow.png | Bin 31544 -> 20425 bytes docs/img/streaming-kinesis-arch.png | Bin 115277 -> 86336 bytes docs/img/structured-streaming-example-model.png | Bin 125504 -> 79409 bytes docs/img/structured-streaming-late-data.png | Bin 138226 -> 91513 bytes docs/img/structured-streaming-model.png | Bin 66098 -> 37321 bytes .../structured-streaming-stream-as-a-table.png | Bin 82251 -> 47791 bytes docs/img/structured-streaming-window.png| Bin 132875 -> 88102 bytes docs/img/triplet.png| Bin 31489 -> 19255 bytes docs/img/vertex-cut.png | Bin 12246 -> 0 bytes docs/img/vertex_routing_edge_tables.png | Bin 570007 -> 323162 bytes 26 files changed, 0 insertions(+), 0 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/18fb57f5/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png -- diff --git a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png index ffe2550..cee2891 100644 Binary files a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png and b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png differ http://git-wip-us.apache.org/repos/asf/spark/blob/18fb57f5/docs/img/cluster-overview.png -- diff --git a/docs/img/cluster-overview.png b/docs/img/cluster-overview.png index 317554c..b1b7c1a 100644 Binary files a/docs/img/cluster-overview.png and b/docs/img/cluster-overview.png differ http://git-wip-us.apache.org/repos/asf/spark/blob/18fb57f5/docs/img/edge-cut.png -- diff --git a/docs/img/edge-cut.png b/docs/img/edge-cut.png deleted file mode 100644 index 698f4ff..000 Binary files a/docs/img/edge-cut.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/spark/blob/18fb57f5/docs/img/edge_cut_vs_vertex_cut.png -- diff --git a/docs/img/edge_cut_vs_vertex_cut.png b/docs/img/edge_cut_vs_vertex_cut.png index ae30396..5b1ed78 100644 Binary files a/docs/img/edge_cut_vs_vertex_cut.png and b/docs/img/edg
svn commit: r1751226 - in /spark: _includes/ images/ site/images/
Author: srowen Date: Mon Jul 4 08:31:32 2016 New Revision: 1751226 URL: http://svn.apache.org/viewvc?rev=1751226&view=rev Log: Remove unused images from Spark site; crush large PNGs; remove obsolete .html _includes Removed: spark/_includes/footer.html spark/_includes/navbar.html spark/images/Summit-Logo-FINALtr-150x150px.png spark/images/amplab-small.png spark/images/download.png spark/images/incubator-logo.png spark/images/logistic-regression2.png spark/images/scaling.png spark/images/spark-lr.png spark/images/spark-project-header1-cropped.png spark/images/spark-project-header1.png spark/images/spark-streaming-throughput.png spark/site/images/Summit-Logo-FINALtr-150x150px.png spark/site/images/amplab-small.png spark/site/images/download.png spark/site/images/incubator-logo.png spark/site/images/logistic-regression2.png spark/site/images/scaling.png spark/site/images/spark-lr.png spark/site/images/spark-project-header1-cropped.png spark/site/images/spark-project-header1.png spark/site/images/spark-streaming-throughput.png Modified: spark/images/0.8.0-ui-screenshot.png spark/images/graphx-perf-comparison.png spark/images/jdbc.png spark/images/logistic-regression.png spark/images/spark-logo-trademark.png spark/images/spark-logo.png spark/images/spark-runs-everywhere.png spark/images/spark-stack.png spark/images/spark-streaming-recovery.png spark/images/sql-hive-arch.png spark/site/images/0.8.0-ui-screenshot.png spark/site/images/graphx-perf-comparison.png spark/site/images/jdbc.png spark/site/images/logistic-regression.png spark/site/images/spark-logo-trademark.png spark/site/images/spark-logo.png spark/site/images/spark-runs-everywhere.png spark/site/images/spark-stack.png spark/site/images/spark-streaming-recovery.png spark/site/images/sql-hive-arch.png Modified: spark/images/0.8.0-ui-screenshot.png URL: http://svn.apache.org/viewvc/spark/images/0.8.0-ui-screenshot.png?rev=1751226&r1=1751225&r2=1751226&view=diff == Binary files - no diff available. Modified: spark/images/graphx-perf-comparison.png URL: http://svn.apache.org/viewvc/spark/images/graphx-perf-comparison.png?rev=1751226&r1=1751225&r2=1751226&view=diff == Binary files - no diff available. Modified: spark/images/jdbc.png URL: http://svn.apache.org/viewvc/spark/images/jdbc.png?rev=1751226&r1=1751225&r2=1751226&view=diff == Binary files - no diff available. Modified: spark/images/logistic-regression.png URL: http://svn.apache.org/viewvc/spark/images/logistic-regression.png?rev=1751226&r1=1751225&r2=1751226&view=diff == Binary files - no diff available. Modified: spark/images/spark-logo-trademark.png URL: http://svn.apache.org/viewvc/spark/images/spark-logo-trademark.png?rev=1751226&r1=1751225&r2=1751226&view=diff == Binary files - no diff available. Modified: spark/images/spark-logo.png URL: http://svn.apache.org/viewvc/spark/images/spark-logo.png?rev=1751226&r1=1751225&r2=1751226&view=diff == Binary files - no diff available. Modified: spark/images/spark-runs-everywhere.png URL: http://svn.apache.org/viewvc/spark/images/spark-runs-everywhere.png?rev=1751226&r1=1751225&r2=1751226&view=diff == Binary files - no diff available. Modified: spark/images/spark-stack.png URL: http://svn.apache.org/viewvc/spark/images/spark-stack.png?rev=1751226&r1=1751225&r2=1751226&view=diff == Binary files - no diff available. Modified: spark/images/spark-streaming-recovery.png URL: http://svn.apache.org/viewvc/spark/images/spark-streaming-recovery.png?rev=1751226&r1=1751225&r2=1751226&view=diff == Binary files - no diff available. Modified: spark/images/sql-hive-arch.png URL: http://svn.apache.org/viewvc/spark/images/sql-hive-arch.png?rev=1751226&r1=1751225&r2=1751226&view=diff == Binary files - no diff available. Modified: spark/site/images/0.8.0-ui-screenshot.png URL: http://svn.apache.org/viewvc/spark/site/images/0.8.0-ui-screenshot.png?rev=1751226&r1=1751225&r2=1751226&view=diff