spark git commit: [MINOR][BUILD] Add modernizr MIT license; specify "2014 and onwards" in license copyright

2016-06-04 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 729730159 -> ed1e20207


[MINOR][BUILD] Add modernizr MIT license; specify "2014 and onwards" in license 
copyright

## What changes were proposed in this pull request?

Per conversation on dev list, add missing modernizr license.
Specify "2014 and onwards" in copyright statement.

## How was this patch tested?

(none required)

Author: Sean Owen 

Closes #13510 from srowen/ModernizrLicense.

(cherry picked from commit 681387b2dc9a094cfba84188a1dd1ac9192bb99c)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ed1e2020
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ed1e2020
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ed1e2020

Branch: refs/heads/branch-2.0
Commit: ed1e20207c1c2e503a22d5ad2cdf505ef6ecbcad
Parents: 7297301
Author: Sean Owen 
Authored: Sat Jun 4 21:41:27 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 4 21:41:35 2016 +0100

--
 LICENSE|  1 +
 NOTICE |  2 +-
 licenses/LICENSE-modernizr.txt | 21 +
 3 files changed, 23 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ed1e2020/LICENSE
--
diff --git a/LICENSE b/LICENSE
index f403640..94fd46f 100644
--- a/LICENSE
+++ b/LICENSE
@@ -296,3 +296,4 @@ The text of each license is also included at 
licenses/LICENSE-[project].txt.
  (MIT License) blockUI (http://jquery.malsup.com/block/)
  (MIT License) RowsGroup (http://datatables.net/license/mit)
  (MIT License) jsonFormatter 
(http://www.jqueryscript.net/other/jQuery-Plugin-For-Pretty-JSON-Formatting-jsonFormatter.html)
+ (MIT License) modernizr 
(https://github.com/Modernizr/Modernizr/blob/master/LICENSE)

http://git-wip-us.apache.org/repos/asf/spark/blob/ed1e2020/NOTICE
--
diff --git a/NOTICE b/NOTICE
index f4b1260..69b513e 100644
--- a/NOTICE
+++ b/NOTICE
@@ -1,5 +1,5 @@
 Apache Spark
-Copyright 2014 The Apache Software Foundation.
+Copyright 2014 and onwards The Apache Software Foundation.
 
 This product includes software developed at
 The Apache Software Foundation (http://www.apache.org/).

http://git-wip-us.apache.org/repos/asf/spark/blob/ed1e2020/licenses/LICENSE-modernizr.txt
--
diff --git a/licenses/LICENSE-modernizr.txt b/licenses/LICENSE-modernizr.txt
new file mode 100644
index 000..2bf24b9
--- /dev/null
+++ b/licenses/LICENSE-modernizr.txt
@@ -0,0 +1,21 @@
+The MIT License (MIT)
+
+Copyright (c)  
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15707][SQL] Make Code Neat - Use map instead of if check.

2016-06-04 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 091f81e1f -> 0f307db5e


[SPARK-15707][SQL] Make Code Neat - Use map instead of if check.

## What changes were proposed in this pull request?
In forType function of object RandomDataGenerator, the code following:
if (maybeSqlTypeGenerator.isDefined){
  
  Some(generator)
} else{
 None
}
will be changed. Instead, maybeSqlTypeGenerator.map will be used.

## How was this patch tested?
All of the current unit tests passed.

Author: Weiqing Yang 

Closes #13448 from Sherry302/master.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0f307db5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0f307db5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0f307db5

Branch: refs/heads/master
Commit: 0f307db5e17e1e8a655cfa751218ac4ed88717a7
Parents: 091f81e
Author: Weiqing Yang 
Authored: Sat Jun 4 22:44:03 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 4 22:44:03 2016 +0100

--
 .../scala/org/apache/spark/sql/RandomDataGenerator.scala | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0f307db5/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
index 711e870..8508697 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
@@ -236,9 +236,8 @@ object RandomDataGenerator {
 // convert it to catalyst value to call udt's deserialize.
 val toCatalystType = 
CatalystTypeConverters.createToCatalystConverter(udt.sqlType)
 
-if (maybeSqlTypeGenerator.isDefined) {
-  val sqlTypeGenerator = maybeSqlTypeGenerator.get
-  val generator = () => {
+maybeSqlTypeGenerator.map { sqlTypeGenerator =>
+  () => {
 val generatedScalaValue = sqlTypeGenerator.apply()
 if (generatedScalaValue == null) {
   null
@@ -246,9 +245,6 @@ object RandomDataGenerator {
   udt.deserialize(toCatalystType(generatedScalaValue))
 }
   }
-  Some(generator)
-} else {
-  None
 }
   case unsupportedType => None
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15707][SQL] Make Code Neat - Use map instead of if check.

2016-06-04 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 7e4c9dd55 -> 32a64d8fc


[SPARK-15707][SQL] Make Code Neat - Use map instead of if check.

## What changes were proposed in this pull request?
In forType function of object RandomDataGenerator, the code following:
if (maybeSqlTypeGenerator.isDefined){
  
  Some(generator)
} else{
 None
}
will be changed. Instead, maybeSqlTypeGenerator.map will be used.

## How was this patch tested?
All of the current unit tests passed.

Author: Weiqing Yang 

Closes #13448 from Sherry302/master.

(cherry picked from commit 0f307db5e17e1e8a655cfa751218ac4ed88717a7)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/32a64d8f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/32a64d8f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/32a64d8f

Branch: refs/heads/branch-2.0
Commit: 32a64d8fc9e7ddaf993bdd7e679113dc605a69a7
Parents: 7e4c9dd
Author: Weiqing Yang 
Authored: Sat Jun 4 22:44:03 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 4 22:44:12 2016 +0100

--
 .../scala/org/apache/spark/sql/RandomDataGenerator.scala | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/32a64d8f/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
index 711e870..8508697 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
@@ -236,9 +236,8 @@ object RandomDataGenerator {
 // convert it to catalyst value to call udt's deserialize.
 val toCatalystType = 
CatalystTypeConverters.createToCatalystConverter(udt.sqlType)
 
-if (maybeSqlTypeGenerator.isDefined) {
-  val sqlTypeGenerator = maybeSqlTypeGenerator.get
-  val generator = () => {
+maybeSqlTypeGenerator.map { sqlTypeGenerator =>
+  () => {
 val generatedScalaValue = sqlTypeGenerator.apply()
 if (generatedScalaValue == null) {
   null
@@ -246,9 +245,6 @@ object RandomDataGenerator {
   udt.deserialize(toCatalystType(generatedScalaValue))
 }
   }
-  Some(generator)
-} else {
-  None
 }
   case unsupportedType => None
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15723] Fixed local-timezone-brittle test where short-timezone form "EST" is …

2016-06-05 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 0f307db5e -> 4e767d0f9


[SPARK-15723] Fixed local-timezone-brittle test where short-timezone form "EST" 
is …

## What changes were proposed in this pull request?

Stop using the abbreviated and ambiguous timezone "EST" in a test, since it is 
machine-local default timezone dependent, and fails in different timezones.  
Fixed [SPARK-15723](https://issues.apache.org/jira/browse/SPARK-15723).

## How was this patch tested?

Note that to reproduce this problem in any locale/timezone, you can modify the 
scalatest-maven-plugin argLine to add a timezone:

-ea -Xmx3g -XX:MaxPermSize=${MaxPermGen} 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Duser.timezone="Australia/Sydney"

and run

$ mvn test 
-DwildcardSuites=org.apache.spark.status.api.v1.SimpleDateParamSuite 
-Dtest=none. Equally this will fix it in an effected timezone:

-ea -Xmx3g -XX:MaxPermSize=${MaxPermGen} 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Duser.timezone="America/New_York"

To test the fix, apply the above change to `pom.xml` to set test TZ to 
`Australia/Sydney`, and confirm the test now passes.

Author: Brett Randall 

Closes #13462 from javabrett/SPARK-15723-SimpleDateParamSuite.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4e767d0f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4e767d0f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4e767d0f

Branch: refs/heads/master
Commit: 4e767d0f9042bfea6074c2637438859699ec4dc3
Parents: 0f307db
Author: Brett Randall 
Authored: Sun Jun 5 15:31:56 2016 +0100
Committer: Sean Owen 
Committed: Sun Jun 5 15:31:56 2016 +0100

--
 .../org/apache/spark/status/api/v1/SimpleDateParamSuite.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4e767d0f/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala 
b/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala
index 63b0e77..18baeb1 100644
--- 
a/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala
@@ -26,7 +26,8 @@ class SimpleDateParamSuite extends SparkFunSuite with 
Matchers {
 
   test("date parsing") {
 new SimpleDateParam("2015-02-20T23:21:17.190GMT").timestamp should be 
(1424474477190L)
-new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be 
(1424470877190L)
+// don't use EST, it is ambiguous, use -0500 instead, see SPARK-15723
+new SimpleDateParam("2015-02-20T17:21:17.190-0500").timestamp should be 
(1424470877190L)
 new SimpleDateParam("2015-02-20").timestamp should be (142439040L) // 
GMT
 intercept[WebApplicationException] {
   new SimpleDateParam("invalid date")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15723] Fixed local-timezone-brittle test where short-timezone form "EST" is …

2016-06-05 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 32a64d8fc -> 8c0ec85e6


[SPARK-15723] Fixed local-timezone-brittle test where short-timezone form "EST" 
is …

## What changes were proposed in this pull request?

Stop using the abbreviated and ambiguous timezone "EST" in a test, since it is 
machine-local default timezone dependent, and fails in different timezones.  
Fixed [SPARK-15723](https://issues.apache.org/jira/browse/SPARK-15723).

## How was this patch tested?

Note that to reproduce this problem in any locale/timezone, you can modify the 
scalatest-maven-plugin argLine to add a timezone:

-ea -Xmx3g -XX:MaxPermSize=${MaxPermGen} 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Duser.timezone="Australia/Sydney"

and run

$ mvn test 
-DwildcardSuites=org.apache.spark.status.api.v1.SimpleDateParamSuite 
-Dtest=none. Equally this will fix it in an effected timezone:

-ea -Xmx3g -XX:MaxPermSize=${MaxPermGen} 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Duser.timezone="America/New_York"

To test the fix, apply the above change to `pom.xml` to set test TZ to 
`Australia/Sydney`, and confirm the test now passes.

Author: Brett Randall 

Closes #13462 from javabrett/SPARK-15723-SimpleDateParamSuite.

(cherry picked from commit 4e767d0f9042bfea6074c2637438859699ec4dc3)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8c0ec85e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8c0ec85e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8c0ec85e

Branch: refs/heads/branch-2.0
Commit: 8c0ec85e62f762c11e0686d1c35d1dfec05df9de
Parents: 32a64d8
Author: Brett Randall 
Authored: Sun Jun 5 15:31:56 2016 +0100
Committer: Sean Owen 
Committed: Sun Jun 5 16:12:24 2016 +0100

--
 .../org/apache/spark/status/api/v1/SimpleDateParamSuite.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8c0ec85e/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala 
b/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala
index 63b0e77..18baeb1 100644
--- 
a/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala
@@ -26,7 +26,8 @@ class SimpleDateParamSuite extends SparkFunSuite with 
Matchers {
 
   test("date parsing") {
 new SimpleDateParam("2015-02-20T23:21:17.190GMT").timestamp should be 
(1424474477190L)
-new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be 
(1424470877190L)
+// don't use EST, it is ambiguous, use -0500 instead, see SPARK-15723
+new SimpleDateParam("2015-02-20T17:21:17.190-0500").timestamp should be 
(1424470877190L)
 new SimpleDateParam("2015-02-20").timestamp should be (142439040L) // 
GMT
 intercept[WebApplicationException] {
   new SimpleDateParam("invalid date")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15723] Fixed local-timezone-brittle test where short-timezone form "EST" is …

2016-06-05 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-1.6 a0cf7d0b2 -> 6a9f19dd5


[SPARK-15723] Fixed local-timezone-brittle test where short-timezone form "EST" 
is …

## What changes were proposed in this pull request?

Stop using the abbreviated and ambiguous timezone "EST" in a test, since it is 
machine-local default timezone dependent, and fails in different timezones.  
Fixed [SPARK-15723](https://issues.apache.org/jira/browse/SPARK-15723).

## How was this patch tested?

Note that to reproduce this problem in any locale/timezone, you can modify the 
scalatest-maven-plugin argLine to add a timezone:

-ea -Xmx3g -XX:MaxPermSize=${MaxPermGen} 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Duser.timezone="Australia/Sydney"

and run

$ mvn test 
-DwildcardSuites=org.apache.spark.status.api.v1.SimpleDateParamSuite 
-Dtest=none. Equally this will fix it in an effected timezone:

-ea -Xmx3g -XX:MaxPermSize=${MaxPermGen} 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Duser.timezone="America/New_York"

To test the fix, apply the above change to `pom.xml` to set test TZ to 
`Australia/Sydney`, and confirm the test now passes.

Author: Brett Randall 

Closes #13462 from javabrett/SPARK-15723-SimpleDateParamSuite.

(cherry picked from commit 4e767d0f9042bfea6074c2637438859699ec4dc3)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6a9f19dd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6a9f19dd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6a9f19dd

Branch: refs/heads/branch-1.6
Commit: 6a9f19dd57dadb80bccc328cf1d099bed04f7f18
Parents: a0cf7d0
Author: Brett Randall 
Authored: Sun Jun 5 15:31:56 2016 +0100
Committer: Sean Owen 
Committed: Sun Jun 5 16:12:49 2016 +0100

--
 .../org/apache/spark/status/api/v1/SimpleDateParamSuite.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6a9f19dd/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala 
b/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala
index 63b0e77..18baeb1 100644
--- 
a/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/status/api/v1/SimpleDateParamSuite.scala
@@ -26,7 +26,8 @@ class SimpleDateParamSuite extends SparkFunSuite with 
Matchers {
 
   test("date parsing") {
 new SimpleDateParam("2015-02-20T23:21:17.190GMT").timestamp should be 
(1424474477190L)
-new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be 
(1424470877190L)
+// don't use EST, it is ambiguous, use -0500 instead, see SPARK-15723
+new SimpleDateParam("2015-02-20T17:21:17.190-0500").timestamp should be 
(1424470877190L)
 new SimpleDateParam("2015-02-20").timestamp should be (142439040L) // 
GMT
 intercept[WebApplicationException] {
   new SimpleDateParam("invalid date")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [MINOR] Fix Typos 'an -> a'

2016-06-06 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 32f2f95db -> fd8af3971


[MINOR] Fix Typos 'an -> a'

## What changes were proposed in this pull request?

`an -> a`

Use cmds like `find . -name '*.R' | xargs -i sh -c "grep -in ' an [^aeiou]' {} 
&& echo {}"` to generate candidates, and review them one by one.

## How was this patch tested?
manual tests

Author: Zheng RuiFeng 

Closes #13515 from zhengruifeng/an_a.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fd8af397
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fd8af397
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fd8af397

Branch: refs/heads/master
Commit: fd8af397132fa1415a4c19d7f5cb5a41aa6ddb27
Parents: 32f2f95
Author: Zheng RuiFeng 
Authored: Mon Jun 6 09:35:47 2016 +0100
Committer: Sean Owen 
Committed: Mon Jun 6 09:35:47 2016 +0100

--
 R/pkg/R/utils.R   |  2 +-
 .../src/main/scala/org/apache/spark/Accumulable.scala |  2 +-
 .../org/apache/spark/api/java/JavaSparkContext.scala  |  2 +-
 .../scala/org/apache/spark/api/python/PythonRDD.scala |  2 +-
 .../scala/org/apache/spark/deploy/SparkSubmit.scala   |  6 +++---
 .../src/main/scala/org/apache/spark/rdd/JdbcRDD.scala |  6 +++---
 .../main/scala/org/apache/spark/scheduler/Pool.scala  |  2 +-
 .../org/apache/spark/broadcast/BroadcastSuite.scala   |  2 +-
 .../spark/deploy/rest/StandaloneRestSubmitSuite.scala |  2 +-
 .../test/scala/org/apache/spark/rpc/RpcEnvSuite.scala |  2 +-
 .../apache/spark/scheduler/DAGSchedulerSuite.scala|  4 ++--
 .../org/apache/spark/util/JsonProtocolSuite.scala |  2 +-
 .../spark/streaming/flume/FlumeBatchFetcher.scala |  2 +-
 .../spark/graphx/impl/VertexPartitionBaseOps.scala|  2 +-
 .../scala/org/apache/spark/ml/linalg/Vectors.scala|  2 +-
 .../src/main/scala/org/apache/spark/ml/Pipeline.scala |  2 +-
 .../spark/ml/classification/LogisticRegression.scala  |  4 ++--
 .../org/apache/spark/ml/tree/impl/RandomForest.scala  |  2 +-
 .../mllib/classification/LogisticRegression.scala |  2 +-
 .../org/apache/spark/mllib/classification/SVM.scala   |  2 +-
 .../spark/mllib/feature/VectorTransformer.scala   |  2 +-
 .../scala/org/apache/spark/mllib/linalg/Vectors.scala |  2 +-
 .../mllib/linalg/distributed/CoordinateMatrix.scala   |  2 +-
 .../apache/spark/mllib/rdd/MLPairRDDFunctions.scala   |  2 +-
 python/pyspark/ml/classification.py   |  4 ++--
 python/pyspark/ml/pipeline.py |  2 +-
 python/pyspark/mllib/classification.py|  2 +-
 python/pyspark/mllib/common.py|  2 +-
 python/pyspark/rdd.py |  4 ++--
 python/pyspark/sql/session.py |  2 +-
 python/pyspark/sql/streaming.py   |  2 +-
 python/pyspark/sql/types.py   |  2 +-
 python/pyspark/streaming/dstream.py   |  4 ++--
 .../src/main/scala/org/apache/spark/sql/Row.scala |  2 +-
 .../apache/spark/sql/catalyst/analysis/Analyzer.scala |  4 ++--
 .../sql/catalyst/analysis/FunctionRegistry.scala  |  2 +-
 .../sql/catalyst/analysis/MultiInstanceRelation.scala |  2 +-
 .../spark/sql/catalyst/catalog/SessionCatalog.scala   |  6 +++---
 .../sql/catalyst/catalog/functionResources.scala  |  2 +-
 .../sql/catalyst/expressions/ExpectsInputTypes.scala  |  2 +-
 .../spark/sql/catalyst/expressions/Projection.scala   |  4 ++--
 .../sql/catalyst/expressions/complexTypeCreator.scala |  2 +-
 .../org/apache/spark/sql/types/AbstractDataType.scala |  2 +-
 .../scala/org/apache/spark/sql/DataFrameReader.scala  |  2 +-
 .../main/scala/org/apache/spark/sql/SQLContext.scala  | 14 +++---
 .../scala/org/apache/spark/sql/SQLImplicits.scala |  2 +-
 .../scala/org/apache/spark/sql/SparkSession.scala | 14 +++---
 .../org/apache/spark/sql/catalyst/SQLBuilder.scala|  2 +-
 .../aggregate/SortBasedAggregationIterator.scala  |  2 +-
 .../apache/spark/sql/execution/aggregate/udaf.scala   |  2 +-
 .../execution/columnar/GenerateColumnAccessor.scala   |  2 +-
 .../execution/datasources/FileSourceStrategy.scala|  2 +-
 .../execution/datasources/json/JacksonParser.scala|  2 +-
 .../datasources/parquet/CatalystRowConverter.scala|  2 +-
 .../sql/execution/exchange/ExchangeCoordinator.scala  | 10 +-
 .../spark/sql/execution/joins/SortMergeJoinExec.scala |  2 +-
 .../spark/sql/execution/r/MapPartitionsRWrapper.scala |  2 +-
 .../scala/org/apache/spark/sql/expressions/udaf.scala |  2 +-
 .../org/apache/spark/sql/internal/SharedState.scala   |  2 +-
 .../apache/spark/sql/streaming/ContinuousQuery.scala  |  2 +-
 .../org/apache/spark/sql/hive/client/HiveClient.scala |  2 +-
 .../apache/spark/sql/hive/orc/OrcFileOperator.scala   |  2 +-
 .../spark/sql/hive/execution/HiveComparisonTest.scala |  

spark git commit: [SPARK-15771][ML][EXAMPLES] Use 'accuracy' rather than 'precision' in many ML examples

2016-06-06 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master fd8af3971 -> a95252823


[SPARK-15771][ML][EXAMPLES] Use 'accuracy' rather than 'precision' in many ML 
examples

## What changes were proposed in this pull request?
Since [SPARK-15617](https://issues.apache.org/jira/browse/SPARK-15617) 
deprecated ```precision``` in ```MulticlassClassificationEvaluator```, many ML 
examples broken.
```python
pyspark.sql.utils.IllegalArgumentException: 
u'MulticlassClassificationEvaluator_4c3bb1d73d8cc0cedae6 parameter metricName 
given invalid value precision.'
```
We should use ```accuracy``` to replace ```precision``` in these examples.

## How was this patch tested?
Offline tests.

Author: Yanbo Liang 

Closes #13519 from yanboliang/spark-15771.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a9525282
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a9525282
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a9525282

Branch: refs/heads/master
Commit: a95252823e09939b654dd425db38dadc4100bc87
Parents: fd8af39
Author: Yanbo Liang 
Authored: Mon Jun 6 09:36:34 2016 +0100
Committer: Sean Owen 
Committed: Mon Jun 6 09:36:34 2016 +0100

--
 .../examples/ml/JavaDecisionTreeClassificationExample.java | 2 +-
 .../examples/ml/JavaGradientBoostedTreeClassifierExample.java  | 2 +-
 .../examples/ml/JavaMultilayerPerceptronClassifierExample.java | 6 +++---
 .../org/apache/spark/examples/ml/JavaNaiveBayesExample.java| 6 +++---
 .../org/apache/spark/examples/ml/JavaOneVsRestExample.java | 6 +++---
 .../spark/examples/ml/JavaRandomForestClassifierExample.java   | 2 +-
 .../src/main/python/ml/decision_tree_classification_example.py | 2 +-
 .../main/python/ml/gradient_boosted_tree_classifier_example.py | 2 +-
 .../src/main/python/ml/multilayer_perceptron_classification.py | 6 +++---
 examples/src/main/python/ml/naive_bayes_example.py | 6 +++---
 examples/src/main/python/ml/one_vs_rest_example.py | 6 +++---
 .../src/main/python/ml/random_forest_classifier_example.py | 2 +-
 .../spark/examples/ml/DecisionTreeClassificationExample.scala  | 2 +-
 .../examples/ml/GradientBoostedTreeClassifierExample.scala | 2 +-
 .../examples/ml/MultilayerPerceptronClassifierExample.scala| 6 +++---
 .../scala/org/apache/spark/examples/ml/NaiveBayesExample.scala | 6 +++---
 .../scala/org/apache/spark/examples/ml/OneVsRestExample.scala  | 6 +++---
 .../spark/examples/ml/RandomForestClassifierExample.scala  | 2 +-
 python/pyspark/ml/evaluation.py| 2 +-
 19 files changed, 37 insertions(+), 37 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a9525282/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java
 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java
index bdb76f0..a9c6e7f 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java
@@ -90,7 +90,7 @@ public class JavaDecisionTreeClassificationExample {
 MulticlassClassificationEvaluator evaluator = new 
MulticlassClassificationEvaluator()
   .setLabelCol("indexedLabel")
   .setPredictionCol("prediction")
-  .setMetricName("precision");
+  .setMetricName("accuracy");
 double accuracy = evaluator.evaluate(predictions);
 System.out.println("Test Error = " + (1.0 - accuracy));
 

http://git-wip-us.apache.org/repos/asf/spark/blob/a9525282/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java
 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java
index 5c2e03e..3e9eb99 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java
@@ -92,7 +92,7 @@ public class JavaGradientBoostedTreeClassifierExample {
 MulticlassClassificationEvaluator evaluator = new 
MulticlassClassificationEvaluator()
   .setLabelCol("indexedLabel")
   .setPredictionCol("prediction")
-  .setMetricName("precision");
+  .setMetricName("accuracy");
 double accuracy = evaluator.evaluate(predictions);
 System.out.println("Test Error = " + (1.

spark git commit: [MINOR] Fix Typos 'an -> a'

2016-06-06 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 7d10e4bdd -> 90e94b826


[MINOR] Fix Typos 'an -> a'

## What changes were proposed in this pull request?

`an -> a`

Use cmds like `find . -name '*.R' | xargs -i sh -c "grep -in ' an [^aeiou]' {} 
&& echo {}"` to generate candidates, and review them one by one.

## How was this patch tested?
manual tests

Author: Zheng RuiFeng 

Closes #13515 from zhengruifeng/an_a.

(cherry picked from commit fd8af397132fa1415a4c19d7f5cb5a41aa6ddb27)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/90e94b82
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/90e94b82
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/90e94b82

Branch: refs/heads/branch-2.0
Commit: 90e94b82649d9816cd4065549678b82751238552
Parents: 7d10e4b
Author: Zheng RuiFeng 
Authored: Mon Jun 6 09:35:47 2016 +0100
Committer: Sean Owen 
Committed: Mon Jun 6 09:35:57 2016 +0100

--
 R/pkg/R/utils.R   |  2 +-
 .../src/main/scala/org/apache/spark/Accumulable.scala |  2 +-
 .../org/apache/spark/api/java/JavaSparkContext.scala  |  2 +-
 .../scala/org/apache/spark/api/python/PythonRDD.scala |  2 +-
 .../scala/org/apache/spark/deploy/SparkSubmit.scala   |  6 +++---
 .../src/main/scala/org/apache/spark/rdd/JdbcRDD.scala |  6 +++---
 .../main/scala/org/apache/spark/scheduler/Pool.scala  |  2 +-
 .../org/apache/spark/broadcast/BroadcastSuite.scala   |  2 +-
 .../spark/deploy/rest/StandaloneRestSubmitSuite.scala |  2 +-
 .../test/scala/org/apache/spark/rpc/RpcEnvSuite.scala |  2 +-
 .../apache/spark/scheduler/DAGSchedulerSuite.scala|  4 ++--
 .../org/apache/spark/util/JsonProtocolSuite.scala |  2 +-
 .../spark/streaming/flume/FlumeBatchFetcher.scala |  2 +-
 .../spark/graphx/impl/VertexPartitionBaseOps.scala|  2 +-
 .../scala/org/apache/spark/ml/linalg/Vectors.scala|  2 +-
 .../src/main/scala/org/apache/spark/ml/Pipeline.scala |  2 +-
 .../spark/ml/classification/LogisticRegression.scala  |  4 ++--
 .../org/apache/spark/ml/tree/impl/RandomForest.scala  |  2 +-
 .../mllib/classification/LogisticRegression.scala |  2 +-
 .../org/apache/spark/mllib/classification/SVM.scala   |  2 +-
 .../spark/mllib/feature/VectorTransformer.scala   |  2 +-
 .../scala/org/apache/spark/mllib/linalg/Vectors.scala |  2 +-
 .../mllib/linalg/distributed/CoordinateMatrix.scala   |  2 +-
 .../apache/spark/mllib/rdd/MLPairRDDFunctions.scala   |  2 +-
 python/pyspark/ml/classification.py   |  4 ++--
 python/pyspark/ml/pipeline.py |  2 +-
 python/pyspark/mllib/classification.py|  2 +-
 python/pyspark/mllib/common.py|  2 +-
 python/pyspark/rdd.py |  4 ++--
 python/pyspark/sql/session.py |  2 +-
 python/pyspark/sql/streaming.py   |  2 +-
 python/pyspark/sql/types.py   |  2 +-
 python/pyspark/streaming/dstream.py   |  4 ++--
 .../src/main/scala/org/apache/spark/sql/Row.scala |  2 +-
 .../apache/spark/sql/catalyst/analysis/Analyzer.scala |  4 ++--
 .../sql/catalyst/analysis/FunctionRegistry.scala  |  2 +-
 .../sql/catalyst/analysis/MultiInstanceRelation.scala |  2 +-
 .../spark/sql/catalyst/catalog/SessionCatalog.scala   |  6 +++---
 .../sql/catalyst/catalog/functionResources.scala  |  2 +-
 .../sql/catalyst/expressions/ExpectsInputTypes.scala  |  2 +-
 .../spark/sql/catalyst/expressions/Projection.scala   |  4 ++--
 .../sql/catalyst/expressions/complexTypeCreator.scala |  2 +-
 .../org/apache/spark/sql/types/AbstractDataType.scala |  2 +-
 .../scala/org/apache/spark/sql/DataFrameReader.scala  |  2 +-
 .../main/scala/org/apache/spark/sql/SQLContext.scala  | 14 +++---
 .../scala/org/apache/spark/sql/SQLImplicits.scala |  2 +-
 .../scala/org/apache/spark/sql/SparkSession.scala | 14 +++---
 .../org/apache/spark/sql/catalyst/SQLBuilder.scala|  2 +-
 .../aggregate/SortBasedAggregationIterator.scala  |  2 +-
 .../apache/spark/sql/execution/aggregate/udaf.scala   |  2 +-
 .../execution/columnar/GenerateColumnAccessor.scala   |  2 +-
 .../execution/datasources/FileSourceStrategy.scala|  2 +-
 .../execution/datasources/json/JacksonParser.scala|  2 +-
 .../datasources/parquet/CatalystRowConverter.scala|  2 +-
 .../sql/execution/exchange/ExchangeCoordinator.scala  | 10 +-
 .../spark/sql/execution/joins/SortMergeJoinExec.scala |  2 +-
 .../spark/sql/execution/r/MapPartitionsRWrapper.scala |  2 +-
 .../scala/org/apache/spark/sql/expressions/udaf.scala |  2 +-
 .../org/apache/spark/sql/internal/SharedState.scala   |  2 +-
 .../apache/spark/sql/streaming/ContinuousQuery.scala  |  2 +-
 .../org/apache/spark/sql/hive/client/HiveClient.scala |  2 +-
 .../apache/spark

spark git commit: [SPARK-15771][ML][EXAMPLES] Use 'accuracy' rather than 'precision' in many ML examples

2016-06-06 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 90e94b826 -> 86a35a229


[SPARK-15771][ML][EXAMPLES] Use 'accuracy' rather than 'precision' in many ML 
examples

## What changes were proposed in this pull request?
Since [SPARK-15617](https://issues.apache.org/jira/browse/SPARK-15617) 
deprecated ```precision``` in ```MulticlassClassificationEvaluator```, many ML 
examples broken.
```python
pyspark.sql.utils.IllegalArgumentException: 
u'MulticlassClassificationEvaluator_4c3bb1d73d8cc0cedae6 parameter metricName 
given invalid value precision.'
```
We should use ```accuracy``` to replace ```precision``` in these examples.

## How was this patch tested?
Offline tests.

Author: Yanbo Liang 

Closes #13519 from yanboliang/spark-15771.

(cherry picked from commit a95252823e09939b654dd425db38dadc4100bc87)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/86a35a22
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/86a35a22
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/86a35a22

Branch: refs/heads/branch-2.0
Commit: 86a35a22985b9e592744e6ef31453995f2322a31
Parents: 90e94b8
Author: Yanbo Liang 
Authored: Mon Jun 6 09:36:34 2016 +0100
Committer: Sean Owen 
Committed: Mon Jun 6 09:36:43 2016 +0100

--
 .../examples/ml/JavaDecisionTreeClassificationExample.java | 2 +-
 .../examples/ml/JavaGradientBoostedTreeClassifierExample.java  | 2 +-
 .../examples/ml/JavaMultilayerPerceptronClassifierExample.java | 6 +++---
 .../org/apache/spark/examples/ml/JavaNaiveBayesExample.java| 6 +++---
 .../org/apache/spark/examples/ml/JavaOneVsRestExample.java | 6 +++---
 .../spark/examples/ml/JavaRandomForestClassifierExample.java   | 2 +-
 .../src/main/python/ml/decision_tree_classification_example.py | 2 +-
 .../main/python/ml/gradient_boosted_tree_classifier_example.py | 2 +-
 .../src/main/python/ml/multilayer_perceptron_classification.py | 6 +++---
 examples/src/main/python/ml/naive_bayes_example.py | 6 +++---
 examples/src/main/python/ml/one_vs_rest_example.py | 6 +++---
 .../src/main/python/ml/random_forest_classifier_example.py | 2 +-
 .../spark/examples/ml/DecisionTreeClassificationExample.scala  | 2 +-
 .../examples/ml/GradientBoostedTreeClassifierExample.scala | 2 +-
 .../examples/ml/MultilayerPerceptronClassifierExample.scala| 6 +++---
 .../scala/org/apache/spark/examples/ml/NaiveBayesExample.scala | 6 +++---
 .../scala/org/apache/spark/examples/ml/OneVsRestExample.scala  | 6 +++---
 .../spark/examples/ml/RandomForestClassifierExample.scala  | 2 +-
 python/pyspark/ml/evaluation.py| 2 +-
 19 files changed, 37 insertions(+), 37 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/86a35a22/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java
 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java
index bdb76f0..a9c6e7f 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java
@@ -90,7 +90,7 @@ public class JavaDecisionTreeClassificationExample {
 MulticlassClassificationEvaluator evaluator = new 
MulticlassClassificationEvaluator()
   .setLabelCol("indexedLabel")
   .setPredictionCol("prediction")
-  .setMetricName("precision");
+  .setMetricName("accuracy");
 double accuracy = evaluator.evaluate(predictions);
 System.out.println("Test Error = " + (1.0 - accuracy));
 

http://git-wip-us.apache.org/repos/asf/spark/blob/86a35a22/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java
 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java
index 5c2e03e..3e9eb99 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java
@@ -92,7 +92,7 @@ public class JavaGradientBoostedTreeClassifierExample {
 MulticlassClassificationEvaluator evaluator = new 
MulticlassClassificationEvaluator()
   .setLabelCol("indexedLabel")
   .setPredictionCol("prediction")
-  .setMetricName("precision");
+  .setMetricName("accuracy"

spark git commit: [SPARK-14900][ML][PYSPARK] Add accuracy and deprecate precison, recall, f1

2016-06-06 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master a95252823 -> 00ad4f054


[SPARK-14900][ML][PYSPARK] Add accuracy and deprecate precison,recall,f1

## What changes were proposed in this pull request?
1, add accuracy for MulticlassMetrics
2, deprecate overall precision,recall,f1 and recommend accuracy usage

## How was this patch tested?
manual tests in pyspark shell

Author: Zheng RuiFeng 

Closes #13511 from zhengruifeng/deprecate_py_precisonrecall.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/00ad4f05
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/00ad4f05
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/00ad4f05

Branch: refs/heads/master
Commit: 00ad4f054cd044e17d29b7c2c62efd8616462619
Parents: a952528
Author: Zheng RuiFeng 
Authored: Mon Jun 6 15:19:22 2016 +0100
Committer: Sean Owen 
Committed: Mon Jun 6 15:19:22 2016 +0100

--
 python/pyspark/mllib/evaluation.py | 18 ++
 1 file changed, 18 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/00ad4f05/python/pyspark/mllib/evaluation.py
--
diff --git a/python/pyspark/mllib/evaluation.py 
b/python/pyspark/mllib/evaluation.py
index 5f32f09..2eaac87 100644
--- a/python/pyspark/mllib/evaluation.py
+++ b/python/pyspark/mllib/evaluation.py
@@ -15,6 +15,8 @@
 # limitations under the License.
 #
 
+import warnings
+
 from pyspark import since
 from pyspark.mllib.common import JavaModelWrapper, callMLlibFunc
 from pyspark.sql import SQLContext
@@ -181,6 +183,8 @@ class MulticlassMetrics(JavaModelWrapper):
 0.66...
 >>> metrics.recall()
 0.66...
+>>> metrics.accuracy()
+0.66...
 >>> metrics.weightedFalsePositiveRate
 0.19...
 >>> metrics.weightedPrecision
@@ -233,6 +237,8 @@ class MulticlassMetrics(JavaModelWrapper):
 Returns precision or precision for a given label (category) if 
specified.
 """
 if label is None:
+# note:: Deprecated in 2.0.0. Use accuracy.
+warnings.warn("Deprecated in 2.0.0. Use accuracy.")
 return self.call("precision")
 else:
 return self.call("precision", float(label))
@@ -243,6 +249,8 @@ class MulticlassMetrics(JavaModelWrapper):
 Returns recall or recall for a given label (category) if specified.
 """
 if label is None:
+# note:: Deprecated in 2.0.0. Use accuracy.
+warnings.warn("Deprecated in 2.0.0. Use accuracy.")
 return self.call("recall")
 else:
 return self.call("recall", float(label))
@@ -254,6 +262,8 @@ class MulticlassMetrics(JavaModelWrapper):
 """
 if beta is None:
 if label is None:
+# note:: Deprecated in 2.0.0. Use accuracy.
+warnings.warn("Deprecated in 2.0.0. Use accuracy.")
 return self.call("fMeasure")
 else:
 return self.call("fMeasure", label)
@@ -263,6 +273,14 @@ class MulticlassMetrics(JavaModelWrapper):
 else:
 return self.call("fMeasure", label, beta)
 
+@since('2.0.0')
+def accuracy(self):
+"""
+Returns accuracy (equals to the total number of correctly classified 
instances
+out of the total number of instances).
+"""
+return self.call("accuracy")
+
 @property
 @since('1.4.0')
 def weightedTruePositiveRate(self):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-14900][ML][PYSPARK] Add accuracy and deprecate precison, recall, f1

2016-06-06 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 86a35a229 -> e38ff70e6


[SPARK-14900][ML][PYSPARK] Add accuracy and deprecate precison,recall,f1

## What changes were proposed in this pull request?
1, add accuracy for MulticlassMetrics
2, deprecate overall precision,recall,f1 and recommend accuracy usage

## How was this patch tested?
manual tests in pyspark shell

Author: Zheng RuiFeng 

Closes #13511 from zhengruifeng/deprecate_py_precisonrecall.

(cherry picked from commit 00ad4f054cd044e17d29b7c2c62efd8616462619)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e38ff70e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e38ff70e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e38ff70e

Branch: refs/heads/branch-2.0
Commit: e38ff70e6bacf1c85edc390d28f8a8d5ecc6cbc3
Parents: 86a35a2
Author: Zheng RuiFeng 
Authored: Mon Jun 6 15:19:22 2016 +0100
Committer: Sean Owen 
Committed: Mon Jun 6 15:19:38 2016 +0100

--
 python/pyspark/mllib/evaluation.py | 18 ++
 1 file changed, 18 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e38ff70e/python/pyspark/mllib/evaluation.py
--
diff --git a/python/pyspark/mllib/evaluation.py 
b/python/pyspark/mllib/evaluation.py
index 5f32f09..2eaac87 100644
--- a/python/pyspark/mllib/evaluation.py
+++ b/python/pyspark/mllib/evaluation.py
@@ -15,6 +15,8 @@
 # limitations under the License.
 #
 
+import warnings
+
 from pyspark import since
 from pyspark.mllib.common import JavaModelWrapper, callMLlibFunc
 from pyspark.sql import SQLContext
@@ -181,6 +183,8 @@ class MulticlassMetrics(JavaModelWrapper):
 0.66...
 >>> metrics.recall()
 0.66...
+>>> metrics.accuracy()
+0.66...
 >>> metrics.weightedFalsePositiveRate
 0.19...
 >>> metrics.weightedPrecision
@@ -233,6 +237,8 @@ class MulticlassMetrics(JavaModelWrapper):
 Returns precision or precision for a given label (category) if 
specified.
 """
 if label is None:
+# note:: Deprecated in 2.0.0. Use accuracy.
+warnings.warn("Deprecated in 2.0.0. Use accuracy.")
 return self.call("precision")
 else:
 return self.call("precision", float(label))
@@ -243,6 +249,8 @@ class MulticlassMetrics(JavaModelWrapper):
 Returns recall or recall for a given label (category) if specified.
 """
 if label is None:
+# note:: Deprecated in 2.0.0. Use accuracy.
+warnings.warn("Deprecated in 2.0.0. Use accuracy.")
 return self.call("recall")
 else:
 return self.call("recall", float(label))
@@ -254,6 +262,8 @@ class MulticlassMetrics(JavaModelWrapper):
 """
 if beta is None:
 if label is None:
+# note:: Deprecated in 2.0.0. Use accuracy.
+warnings.warn("Deprecated in 2.0.0. Use accuracy.")
 return self.call("fMeasure")
 else:
 return self.call("fMeasure", label)
@@ -263,6 +273,14 @@ class MulticlassMetrics(JavaModelWrapper):
 else:
 return self.call("fMeasure", label, beta)
 
+@since('2.0.0')
+def accuracy(self):
+"""
+Returns accuracy (equals to the total number of correctly classified 
instances
+out of the total number of instances).
+"""
+return self.call("accuracy")
+
 @property
 @since('1.4.0')
 def weightedTruePositiveRate(self):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r1747061 - in /spark: downloads.md js/downloads.js site/downloads.html site/js/downloads.js

2016-06-06 Thread srowen
Author: srowen
Date: Mon Jun  6 19:56:07 2016
New Revision: 1747061

URL: http://svn.apache.org/viewvc?rev=1747061&view=rev
Log:
SPARK-15778 add spark-2.0.0-preview release to options and other minor related 
updates

Modified:
spark/downloads.md
spark/js/downloads.js
spark/site/downloads.html
spark/site/js/downloads.js

Modified: spark/downloads.md
URL: 
http://svn.apache.org/viewvc/spark/downloads.md?rev=1747061&r1=1747060&r2=1747061&view=diff
==
--- spark/downloads.md (original)
+++ spark/downloads.md Mon Jun  6 19:56:07 2016
@@ -16,7 +16,7 @@ $(document).ready(function() {
 
 ## Download Apache Spark™
 
-Our latest version is Apache Spark 1.6.1, released on March 9, 2016
+Our latest stable version is Apache Spark 1.6.1, released on March 9, 2016
 (release notes)
 https://github.com/apache/spark/releases/tag/v1.6.1";>(git 
tag)
 
@@ -36,6 +36,17 @@ Our latest version is Apache Spark 1.6.1
 _Note: Scala 2.11 users should download the Spark source package and build
 [with Scala 2.11 
support](http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211)._
 
+### Latest Preview Release
+
+Preview releases, as the name suggests, are releases for previewing upcoming 
features.
+Unlike nightly packages, preview releases have been audited by the project's 
management committee
+to satisfy the legal requirements of Apache Software Foundation's release 
policy.
+Preview releases are not meant to be functional, i.e. they can and highly 
likely will contain
+critical bugs or documentation errors.
+
+The latest preview release is Spark 2.0.0-preview, published on May 24, 2016.
+You can select and download it above.
+
 ### Link with Spark
 Spark artifacts are [hosted in Maven 
Central](http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.spark%22).
 You can add a Maven dependency with the following coordinates:
 
@@ -54,14 +65,9 @@ If you are interested in working with th
 
 Once you've downloaded Spark, you can find instructions for installing and 
building it on the documentation 
page.
 
-Stable Releases
-
-
-### Latest Preview Release (Spark 2.0.0-preview)
-Preview releases, as the name suggests, are releases for previewing upcoming 
features. Unlike nightly packages, preview releases have been audited by the 
project's management committee to satisfy the legal requirements of Apache 
Software Foundation's release policy.Preview releases are not meant to be 
functional, i.e. they can and highly likely will contain critical bugs or 
documentation errors.
-
-The latest preview release is Spark 2.0.0-preview, published on May 24, 2016. 
You can https://dist.apache.org/repos/dist/release/spark/spark-2.0.0-preview/";>download
 it here.
+### Release Notes for Stable Releases
 
+
 
 ### Nightly Packages and Artifacts
 For developers, Spark maintains nightly builds and SNAPSHOT artifacts. More 
information is available on the [Spark developer 
Wiki](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-NightlyBuilds).

Modified: spark/js/downloads.js
URL: 
http://svn.apache.org/viewvc/spark/js/downloads.js?rev=1747061&r1=1747060&r2=1747061&view=diff
==
--- spark/js/downloads.js (original)
+++ spark/js/downloads.js Mon Jun  6 19:56:07 2016
@@ -3,8 +3,8 @@
 
 releases = {};
 
-function addRelease(version, releaseDate, packages, downloadable) {
-  releases[version] = {released: releaseDate, packages: packages, 
downloadable: downloadable};
+function addRelease(version, releaseDate, packages, downloadable, stable) {
+  releases[version] = {released: releaseDate, packages: packages, 
downloadable: downloadable, stable: stable};
 }
 
 var sources = {pretty: "Source Code [can build several Hadoop versions]", tag: 
"sources"};
@@ -13,8 +13,9 @@ var hadoop1 = {pretty: "Pre-built for Ha
 var cdh4 = {pretty: "Pre-built for CDH 4", tag: "cdh4"};
 var hadoop2 = {pretty: "Pre-built for Hadoop 2.2", tag: "hadoop2"};
 var hadoop2p3 = {pretty: "Pre-built for Hadoop 2.3", tag: "hadoop2.3"};
-var hadoop2p4 = {pretty: "Pre-built for Hadoop 2.4 and later", tag: 
"hadoop2.4"};
-var hadoop2p6 = {pretty: "Pre-built for Hadoop 2.6 and later", tag: 
"hadoop2.6"};
+var hadoop2p4 = {pretty: "Pre-built for Hadoop 2.4", tag: "hadoop2.4"};
+var hadoop2p6 = {pretty: "Pre-built for Hadoop 2.6", tag: "hadoop2.6"};
+var hadoop2p7 = {pretty: "Pre-built for Hadoop 2.7 and later", tag: 
"hadoop2.7"};
 var mapr3 = {pretty: "Pre-built for MapR 3.X", tag: "mapr3"};
 var mapr4 = {pretty: "Pre-built for MapR 4.X", tag: "mapr4"};
 
@@ -31,32 +3

svn commit: r1747076 - in /spark: js/downloads.js site/js/downloads.js

2016-06-06 Thread srowen
Author: srowen
Date: Mon Jun  6 20:59:54 2016
New Revision: 1747076

URL: http://svn.apache.org/viewvc?rev=1747076&view=rev
Log:
SPARK-15778 part 2: group preview/stable releases in download version dropdown

Modified:
spark/js/downloads.js
spark/site/js/downloads.js

Modified: spark/js/downloads.js
URL: 
http://svn.apache.org/viewvc/spark/js/downloads.js?rev=1747076&r1=1747075&r2=1747076&view=diff
==
--- spark/js/downloads.js (original)
+++ spark/js/downloads.js Mon Jun  6 20:59:54 2016
@@ -53,18 +53,18 @@ addRelease("1.1.0", new Date("9/11/2014"
 addRelease("1.0.2", new Date("8/5/2014"), sources.concat(packagesV3), true, 
true);
 addRelease("1.0.1", new Date("7/11/2014"), sources.concat(packagesV3), false, 
true);
 addRelease("1.0.0", new Date("5/30/2014"), sources.concat(packagesV2), false, 
true);
-addRelease("0.9.2", new Date("7/23/2014"), sources.concat(packagesV2), true, 
false);
-addRelease("0.9.1", new Date("4/9/2014"), sources.concat(packagesV2), false, 
false);
-addRelease("0.9.0-incubating", new Date("2/2/2014"), 
sources.concat(packagesV2), false, false);
-addRelease("0.8.1-incubating", new Date("12/19/2013"), 
sources.concat(packagesV2), true, false);
-addRelease("0.8.0-incubating", new Date("9/25/2013"), 
sources.concat(packagesV1), true, false);
-addRelease("0.7.3", new Date("7/16/2013"), sources.concat(packagesV1), true, 
false);
-addRelease("0.7.2", new Date("2/6/2013"), sources.concat(packagesV1), false, 
false);
-addRelease("0.7.0", new Date("2/27/2013"), sources, false, false);
+addRelease("0.9.2", new Date("7/23/2014"), sources.concat(packagesV2), true, 
true);
+addRelease("0.9.1", new Date("4/9/2014"), sources.concat(packagesV2), false, 
true);
+addRelease("0.9.0-incubating", new Date("2/2/2014"), 
sources.concat(packagesV2), false, true);
+addRelease("0.8.1-incubating", new Date("12/19/2013"), 
sources.concat(packagesV2), true, true);
+addRelease("0.8.0-incubating", new Date("9/25/2013"), 
sources.concat(packagesV1), true, true);
+addRelease("0.7.3", new Date("7/16/2013"), sources.concat(packagesV1), true, 
true);
+addRelease("0.7.2", new Date("2/6/2013"), sources.concat(packagesV1), false, 
true);
+addRelease("0.7.0", new Date("2/27/2013"), sources, false, true);
 
 function append(el, contents) {
-  el.innerHTML = el.innerHTML + contents;
-};
+  el.innerHTML += contents;
+}
 
 function empty(el) {
   el.innerHTML = "";
@@ -79,27 +79,25 @@ function versionShort(version) { return
 function initDownloads() {
   var versionSelect = document.getElementById("sparkVersionSelect");
 
-  // Populate versions
-  var markedDefault = false;
+  // Populate stable versions
+  append(versionSelect, "");
   for (var version in releases) {
+if (!releases[version].downloadable || !releases[version].stable) { 
continue; }
 var releaseDate = releases[version].released;
-var downloadable = releases[version].downloadable;
-var stable = releases[version].stable;
-
-if (!downloadable) { continue; }
-
-var selected = false;
-if (!markedDefault && stable) {
-  selected = true;
-  markedDefault = true;
-}
+var title = versionShort(version) + " (" + 
releaseDate.toDateString().slice(4) + ")";
+append(versionSelect, "" + title + 
"");
+  }
+  append(versionSelect, "");
 
-// Don't display incubation status here
+  // Populate other versions
+  append(versionSelect, "");
+  for (var version in releases) {
+if (!releases[version].downloadable || releases[version].stable) { 
continue; }
+var releaseDate = releases[version].released;
 var title = versionShort(version) + " (" + 
releaseDate.toDateString().slice(4) + ")";
-append(versionSelect, 
-  "" +
-  title + "");
+append(versionSelect, "" + title + 
"");
   }
+  append(versionSelect, "");
 
   // Populate packages and (transitively) releases
   onVersionSelect();

Modified: spark/site/js/downloads.js
URL: 
http://svn.apache.org/viewvc/spark/site/js/downloads.js?rev=1747076&r1=1747075&r2=1747076&view=diff
==
--- spark/site/js/downloads.js (original)
+++ spark/site/js/downloads.js Mon Jun  6 20:59:54 2016
@@ -53,18 +53,18 @@ addRelease("1.1.0", new Date("9/11/2014"
 addRelease("1.0.2", new Date("8/5

spark git commit: [SPARK-12655][GRAPHX] GraphX does not unpersist RDDs

2016-06-07 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-1.6 6a9f19dd5 -> 5830828ef


[SPARK-12655][GRAPHX] GraphX does not unpersist RDDs

Some VertexRDD and EdgeRDD are created during the intermediate step of 
g.connectedComponents() but unnecessarily left cached after the method is done. 
The fix is to unpersist these RDDs once they are no longer in use.

A test case is added to confirm the fix for the reported bug.

Author: Jason Lee 

Closes #10713 from jasoncl/SPARK-12655.

(cherry picked from commit d0a5c32bd05841f411a342a80c5da9f73f30d69a)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5830828e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5830828e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5830828e

Branch: refs/heads/branch-1.6
Commit: 5830828efbf863df510a2b5b17d76214863ff48f
Parents: 6a9f19d
Author: Jason Lee 
Authored: Fri Jan 15 12:04:05 2016 +
Committer: Sean Owen 
Committed: Tue Jun 7 09:25:04 2016 +0100

--
 .../scala/org/apache/spark/graphx/Pregel.scala |  2 +-
 .../spark/graphx/lib/ConnectedComponents.scala |  4 +++-
 .../scala/org/apache/spark/graphx/GraphSuite.scala | 17 +
 3 files changed, 21 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5830828e/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
--
diff --git a/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala 
b/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
index 2ca60d5..8a89295 100644
--- a/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
+++ b/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
@@ -151,7 +151,7 @@ object Pregel extends Logging {
   // count the iteration
   i += 1
 }
-
+messages.unpersist(blocking = false)
 g
   } // end of apply
 

http://git-wip-us.apache.org/repos/asf/spark/blob/5830828e/graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala
--
diff --git 
a/graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala 
b/graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala
index 859f896..f72cbb1 100644
--- 
a/graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala
+++ 
b/graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala
@@ -47,9 +47,11 @@ object ConnectedComponents {
   }
 }
 val initialMessage = Long.MaxValue
-Pregel(ccGraph, initialMessage, activeDirection = EdgeDirection.Either)(
+val pregelGraph = Pregel(ccGraph, initialMessage, activeDirection = 
EdgeDirection.Either)(
   vprog = (id, attr, msg) => math.min(attr, msg),
   sendMsg = sendMessage,
   mergeMsg = (a, b) => math.min(a, b))
+ccGraph.unpersist()
+pregelGraph
   } // end of connectedComponents
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/5830828e/graphx/src/test/scala/org/apache/spark/graphx/GraphSuite.scala
--
diff --git a/graphx/src/test/scala/org/apache/spark/graphx/GraphSuite.scala 
b/graphx/src/test/scala/org/apache/spark/graphx/GraphSuite.scala
index 9acbd79..a46c5da 100644
--- a/graphx/src/test/scala/org/apache/spark/graphx/GraphSuite.scala
+++ b/graphx/src/test/scala/org/apache/spark/graphx/GraphSuite.scala
@@ -428,6 +428,23 @@ class GraphSuite extends SparkFunSuite with 
LocalSparkContext {
 }
   }
 
+  test("unpersist graph RDD") {
+withSpark { sc =>
+  val vert = sc.parallelize(List((1L, "a"), (2L, "b"), (3L, "c")), 1)
+  val edges = sc.parallelize(List(Edge[Long](1L, 2L), Edge[Long](1L, 3L)), 
1)
+  val g0 = Graph(vert, edges)
+  val g = g0.partitionBy(PartitionStrategy.EdgePartition2D, 2)
+  val cc = g.connectedComponents()
+  assert(sc.getPersistentRDDs.nonEmpty)
+  cc.unpersist()
+  g.unpersist()
+  g0.unpersist()
+  vert.unpersist()
+  edges.unpersist()
+  assert(sc.getPersistentRDDs.isEmpty)
+}
+  }
+
   test("SPARK-14219: pickRandomVertex") {
 withSpark { sc =>
   val vert = sc.parallelize(List((1L, "a")), 1)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [MINOR] fix typo in documents

2016-06-07 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 57dd4efcd -> a7e9e60df


[MINOR] fix typo in documents

## What changes were proposed in this pull request?

I use spell check tools checks typo in spark documents and fix them.

## How was this patch tested?

N/A

Author: WeichenXu 

Closes #13538 from WeichenXu123/fix_doc_typo.

(cherry picked from commit 1e2c9311871968426e019164b129652fd6d0037f)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a7e9e60d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a7e9e60d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a7e9e60d

Branch: refs/heads/branch-2.0
Commit: a7e9e60df5c10a90c06883ea3203ec895b9b1f82
Parents: 57dd4ef
Author: WeichenXu 
Authored: Tue Jun 7 13:29:27 2016 +0100
Committer: Sean Owen 
Committed: Tue Jun 7 13:29:36 2016 +0100

--
 docs/graphx-programming-guide.md| 2 +-
 docs/hardware-provisioning.md   | 2 +-
 docs/streaming-programming-guide.md | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a7e9e60d/docs/graphx-programming-guide.md
--
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index 9dea9b5..81cf174 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -132,7 +132,7 @@ var graph: Graph[VertexProperty, String] = null
 
 Like RDDs, property graphs are immutable, distributed, and fault-tolerant.  
Changes to the values or
 structure of the graph are accomplished by producing a new graph with the 
desired changes.  Note
-that substantial parts of the original graph (i.e., unaffected structure, 
attributes, and indicies)
+that substantial parts of the original graph (i.e., unaffected structure, 
attributes, and indices)
 are reused in the new graph reducing the cost of this inherently functional 
data structure.  The
 graph is partitioned across the executors using a range of vertex partitioning 
heuristics.  As with
 RDDs, each partition of the graph can be recreated on a different machine in 
the event of a failure.

http://git-wip-us.apache.org/repos/asf/spark/blob/a7e9e60d/docs/hardware-provisioning.md
--
diff --git a/docs/hardware-provisioning.md b/docs/hardware-provisioning.md
index 60ecb4f..bb6f616 100644
--- a/docs/hardware-provisioning.md
+++ b/docs/hardware-provisioning.md
@@ -22,7 +22,7 @@ Hadoop and Spark on a common cluster manager like 
[Mesos](running-on-mesos.html)
 
 * If this is not possible, run Spark on different nodes in the same local-area 
network as HDFS.
 
-* For low-latency data stores like HBase, it may be preferrable to run 
computing jobs on different
+* For low-latency data stores like HBase, it may be preferable to run 
computing jobs on different
 nodes than the storage system to avoid interference.
 
 # Local Disks

http://git-wip-us.apache.org/repos/asf/spark/blob/a7e9e60d/docs/streaming-programming-guide.md
--
diff --git a/docs/streaming-programming-guide.md 
b/docs/streaming-programming-guide.md
index 78ae6a7..0a6a039 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -1259,7 +1259,7 @@ dstream.foreachRDD(sendRecord)
 
 
 This is incorrect as this requires the connection object to be serialized and 
sent from the
-driver to the worker. Such connection objects are rarely transferrable across 
machines. This
+driver to the worker. Such connection objects are rarely transferable across 
machines. This
 error may manifest as serialization errors (connection object not 
serializable), initialization
 errors (connection object needs to be initialized at the workers), etc. The 
correct solution is
 to create the connection object at the worker.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [MINOR] fix typo in documents

2016-06-07 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 5f731d685 -> 1e2c93118


[MINOR] fix typo in documents

## What changes were proposed in this pull request?

I use spell check tools checks typo in spark documents and fix them.

## How was this patch tested?

N/A

Author: WeichenXu 

Closes #13538 from WeichenXu123/fix_doc_typo.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1e2c9311
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1e2c9311
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1e2c9311

Branch: refs/heads/master
Commit: 1e2c9311871968426e019164b129652fd6d0037f
Parents: 5f731d6
Author: WeichenXu 
Authored: Tue Jun 7 13:29:27 2016 +0100
Committer: Sean Owen 
Committed: Tue Jun 7 13:29:27 2016 +0100

--
 docs/graphx-programming-guide.md| 2 +-
 docs/hardware-provisioning.md   | 2 +-
 docs/streaming-programming-guide.md | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1e2c9311/docs/graphx-programming-guide.md
--
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index 9dea9b5..81cf174 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -132,7 +132,7 @@ var graph: Graph[VertexProperty, String] = null
 
 Like RDDs, property graphs are immutable, distributed, and fault-tolerant.  
Changes to the values or
 structure of the graph are accomplished by producing a new graph with the 
desired changes.  Note
-that substantial parts of the original graph (i.e., unaffected structure, 
attributes, and indicies)
+that substantial parts of the original graph (i.e., unaffected structure, 
attributes, and indices)
 are reused in the new graph reducing the cost of this inherently functional 
data structure.  The
 graph is partitioned across the executors using a range of vertex partitioning 
heuristics.  As with
 RDDs, each partition of the graph can be recreated on a different machine in 
the event of a failure.

http://git-wip-us.apache.org/repos/asf/spark/blob/1e2c9311/docs/hardware-provisioning.md
--
diff --git a/docs/hardware-provisioning.md b/docs/hardware-provisioning.md
index 60ecb4f..bb6f616 100644
--- a/docs/hardware-provisioning.md
+++ b/docs/hardware-provisioning.md
@@ -22,7 +22,7 @@ Hadoop and Spark on a common cluster manager like 
[Mesos](running-on-mesos.html)
 
 * If this is not possible, run Spark on different nodes in the same local-area 
network as HDFS.
 
-* For low-latency data stores like HBase, it may be preferrable to run 
computing jobs on different
+* For low-latency data stores like HBase, it may be preferable to run 
computing jobs on different
 nodes than the storage system to avoid interference.
 
 # Local Disks

http://git-wip-us.apache.org/repos/asf/spark/blob/1e2c9311/docs/streaming-programming-guide.md
--
diff --git a/docs/streaming-programming-guide.md 
b/docs/streaming-programming-guide.md
index 78ae6a7..0a6a039 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -1259,7 +1259,7 @@ dstream.foreachRDD(sendRecord)
 
 
 This is incorrect as this requires the connection object to be serialized and 
sent from the
-driver to the worker. Such connection objects are rarely transferrable across 
machines. This
+driver to the worker. Such connection objects are rarely transferable across 
machines. This
 error may manifest as serialization errors (connection object not 
serializable), initialization
 errors (connection object needs to be initialized at the workers), etc. The 
correct solution is
 to create the connection object at the worker.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15793][ML] Add maxSentenceLength for ml.Word2Vec

2016-06-08 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 91fbc880b -> 87706eb66


[SPARK-15793][ML] Add maxSentenceLength for ml.Word2Vec

## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-15793

Word2vec in ML package should have maxSentenceLength method for feature parity.

## How was this patch tested?

Tested with Spark unit test.

Author: yinxusen 

Closes #13536 from yinxusen/SPARK-15793.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/87706eb6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/87706eb6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/87706eb6

Branch: refs/heads/master
Commit: 87706eb66cd1370862a1f8ea447484c80969e45f
Parents: 91fbc88
Author: yinxusen 
Authored: Wed Jun 8 09:18:04 2016 +0100
Committer: Sean Owen 
Committed: Wed Jun 8 09:18:04 2016 +0100

--
 .../org/apache/spark/ml/feature/Word2Vec.scala   | 19 +++
 .../apache/spark/ml/feature/Word2VecSuite.scala  |  1 +
 2 files changed, 20 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/87706eb6/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
index 2d89eb0..33515b2 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
@@ -87,6 +87,21 @@ private[feature] trait Word2VecBase extends Params
   /** @group getParam */
   def getMinCount: Int = $(minCount)
 
+  /**
+   * Sets the maximum length (in words) of each sentence in the input data.
+   * Any sentence longer than this threshold will be divided into chunks of
+   * up to `maxSentenceLength` size.
+   * Default: 1000
+   * @group param
+   */
+  final val maxSentenceLength = new IntParam(this, "maxSentenceLength", 
"Maximum length " +
+"(in words) of each sentence in the input data. Any sentence longer than 
this threshold will " +
+"be divided into chunks up to the size.")
+  setDefault(maxSentenceLength -> 1000)
+
+  /** @group getParam */
+  def getMaxSentenceLength: Int = $(maxSentenceLength)
+
   setDefault(stepSize -> 0.025)
   setDefault(maxIter -> 1)
 
@@ -137,6 +152,9 @@ final class Word2Vec(override val uid: String) extends 
Estimator[Word2VecModel]
   /** @group setParam */
   def setMinCount(value: Int): this.type = set(minCount, value)
 
+  /** @group setParam */
+  def setMaxSentenceLength(value: Int): this.type = set(maxSentenceLength, 
value)
+
   @Since("2.0.0")
   override def fit(dataset: Dataset[_]): Word2VecModel = {
 transformSchema(dataset.schema, logging = true)
@@ -149,6 +167,7 @@ final class Word2Vec(override val uid: String) extends 
Estimator[Word2VecModel]
   .setSeed($(seed))
   .setVectorSize($(vectorSize))
   .setWindowSize($(windowSize))
+  .setMaxSentenceLength($(maxSentenceLength))
   .fit(input)
 copyValues(new Word2VecModel(uid, wordVectors).setParent(this))
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/87706eb6/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala
--
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala 
b/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala
index 280a36f..16c74f6 100644
--- a/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala
+++ b/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala
@@ -191,6 +191,7 @@ class Word2VecSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
   .setSeed(42L)
   .setStepSize(0.01)
   .setVectorSize(100)
+  .setMaxSentenceLength(500)
 testDefaultReadWrite(t)
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15793][ML] Add maxSentenceLength for ml.Word2Vec

2016-06-08 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 141e910af -> a790ac579


[SPARK-15793][ML] Add maxSentenceLength for ml.Word2Vec

## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-15793

Word2vec in ML package should have maxSentenceLength method for feature parity.

## How was this patch tested?

Tested with Spark unit test.

Author: yinxusen 

Closes #13536 from yinxusen/SPARK-15793.

(cherry picked from commit 87706eb66cd1370862a1f8ea447484c80969e45f)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a790ac57
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a790ac57
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a790ac57

Branch: refs/heads/branch-2.0
Commit: a790ac5793e1988895341fa878f947b09b275926
Parents: 141e910
Author: yinxusen 
Authored: Wed Jun 8 09:18:04 2016 +0100
Committer: Sean Owen 
Committed: Wed Jun 8 09:18:17 2016 +0100

--
 .../org/apache/spark/ml/feature/Word2Vec.scala   | 19 +++
 .../apache/spark/ml/feature/Word2VecSuite.scala  |  1 +
 2 files changed, 20 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a790ac57/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
index 2d89eb0..33515b2 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
@@ -87,6 +87,21 @@ private[feature] trait Word2VecBase extends Params
   /** @group getParam */
   def getMinCount: Int = $(minCount)
 
+  /**
+   * Sets the maximum length (in words) of each sentence in the input data.
+   * Any sentence longer than this threshold will be divided into chunks of
+   * up to `maxSentenceLength` size.
+   * Default: 1000
+   * @group param
+   */
+  final val maxSentenceLength = new IntParam(this, "maxSentenceLength", 
"Maximum length " +
+"(in words) of each sentence in the input data. Any sentence longer than 
this threshold will " +
+"be divided into chunks up to the size.")
+  setDefault(maxSentenceLength -> 1000)
+
+  /** @group getParam */
+  def getMaxSentenceLength: Int = $(maxSentenceLength)
+
   setDefault(stepSize -> 0.025)
   setDefault(maxIter -> 1)
 
@@ -137,6 +152,9 @@ final class Word2Vec(override val uid: String) extends 
Estimator[Word2VecModel]
   /** @group setParam */
   def setMinCount(value: Int): this.type = set(minCount, value)
 
+  /** @group setParam */
+  def setMaxSentenceLength(value: Int): this.type = set(maxSentenceLength, 
value)
+
   @Since("2.0.0")
   override def fit(dataset: Dataset[_]): Word2VecModel = {
 transformSchema(dataset.schema, logging = true)
@@ -149,6 +167,7 @@ final class Word2Vec(override val uid: String) extends 
Estimator[Word2VecModel]
   .setSeed($(seed))
   .setVectorSize($(vectorSize))
   .setWindowSize($(windowSize))
+  .setMaxSentenceLength($(maxSentenceLength))
   .fit(input)
 copyValues(new Word2VecModel(uid, wordVectors).setParent(this))
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/a790ac57/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala
--
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala 
b/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala
index 280a36f..16c74f6 100644
--- a/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala
+++ b/mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala
@@ -191,6 +191,7 @@ class Word2VecSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
   .setSeed(42L)
   .setStepSize(0.01)
   .setVectorSize(100)
+  .setMaxSentenceLength(500)
 testDefaultReadWrite(t)
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r1747385 - in /spark: ./ site/ site/docs/ site/docs/2.0.0-preview/ site/docs/2.0.0-preview/api/ site/docs/2.0.0-preview/api/R/ site/docs/2.0.0-preview/api/java/ site/docs/2.0.0-preview/api

2016-06-08 Thread srowen
Author: srowen
Date: Wed Jun  8 12:04:28 2016
New Revision: 1747385

URL: http://svn.apache.org/viewvc?rev=1747385&view=rev
Log:
Uploaded Spark 2.0.0 preview docs and added preview docs section on site


[This commit notification would consist of 1214 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [MINOR] Fix Java Lint errors introduced by #13286 and #13280

2016-06-08 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 a790ac579 -> 5e9a8e715


[MINOR] Fix Java Lint errors introduced by #13286 and #13280

## What changes were proposed in this pull request?

revived #13464

Fix Java Lint errors introduced by #13286 and #13280
Before:
```
Using `mvn` from path: 
/Users/pichu/Project/spark/build/apache-maven-3.3.9/bin/mvn
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; 
support was removed in 8.0
Checkstyle checks failed at following occurrences:
[ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[340,5] 
(whitespace) FileTabCharacter: Line contains a tab character.
[ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[341,5] 
(whitespace) FileTabCharacter: Line contains a tab character.
[ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[342,5] 
(whitespace) FileTabCharacter: Line contains a tab character.
[ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[343,5] 
(whitespace) FileTabCharacter: Line contains a tab character.
[ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[41,28] 
(naming) MethodName: Method name 'Append' must match pattern 
'^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
[ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[52,28] 
(naming) MethodName: Method name 'Complete' must match pattern 
'^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
[ERROR] 
src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[61,8]
 (imports) UnusedImports: Unused import - 
org.apache.parquet.schema.PrimitiveType.
[ERROR] 
src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[62,8]
 (imports) UnusedImports: Unused import - org.apache.parquet.schema.Type.
```

## How was this patch tested?
ran `dev/lint-java` locally

Author: Sandeep Singh 

Closes #13559 from techaddict/minor-3.

(cherry picked from commit f958c1c3e292aba98d283637606890f353a9836c)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5e9a8e71
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5e9a8e71
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5e9a8e71

Branch: refs/heads/branch-2.0
Commit: 5e9a8e715953feadaa16ecd0f8e1818272b9c952
Parents: a790ac5
Author: Sandeep Singh 
Authored: Wed Jun 8 14:51:00 2016 +0100
Committer: Sean Owen 
Committed: Wed Jun 8 14:51:10 2016 +0100

--
 dev/checkstyle-suppressions.xml  | 2 ++
 .../main/java/org/apache/spark/launcher/LauncherServer.java  | 8 
 2 files changed, 6 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5e9a8e71/dev/checkstyle-suppressions.xml
--
diff --git a/dev/checkstyle-suppressions.xml b/dev/checkstyle-suppressions.xml
index bfc2e73..31656ca 100644
--- a/dev/checkstyle-suppressions.xml
+++ b/dev/checkstyle-suppressions.xml
@@ -42,4 +42,6 @@
   
files="src/main/java/org/apache/hive/service/auth/PasswdAuthenticationProvider.java"/>
 
+
 

http://git-wip-us.apache.org/repos/asf/spark/blob/5e9a8e71/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java
--
diff --git 
a/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java 
b/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java
index 28e9420..ae43f56 100644
--- a/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java
+++ b/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java
@@ -337,10 +337,10 @@ class LauncherServer implements Closeable {
   }
   super.close();
   if (handle != null) {
-   if (!handle.getState().isFinal()) {
- LOG.log(Level.WARNING, "Lost connection to spark application.");
- handle.setState(SparkAppHandle.State.LOST);
-   }
+if (!handle.getState().isFinal()) {
+  LOG.log(Level.WARNING, "Lost connection to spark application.");
+  handle.setState(SparkAppHandle.State.LOST);
+}
 handle.disconnect();
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [MINOR] Fix Java Lint errors introduced by #13286 and #13280

2016-06-08 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 87706eb66 -> f958c1c3e


[MINOR] Fix Java Lint errors introduced by #13286 and #13280

## What changes were proposed in this pull request?

revived #13464

Fix Java Lint errors introduced by #13286 and #13280
Before:
```
Using `mvn` from path: 
/Users/pichu/Project/spark/build/apache-maven-3.3.9/bin/mvn
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; 
support was removed in 8.0
Checkstyle checks failed at following occurrences:
[ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[340,5] 
(whitespace) FileTabCharacter: Line contains a tab character.
[ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[341,5] 
(whitespace) FileTabCharacter: Line contains a tab character.
[ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[342,5] 
(whitespace) FileTabCharacter: Line contains a tab character.
[ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[343,5] 
(whitespace) FileTabCharacter: Line contains a tab character.
[ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[41,28] 
(naming) MethodName: Method name 'Append' must match pattern 
'^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
[ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[52,28] 
(naming) MethodName: Method name 'Complete' must match pattern 
'^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
[ERROR] 
src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[61,8]
 (imports) UnusedImports: Unused import - 
org.apache.parquet.schema.PrimitiveType.
[ERROR] 
src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[62,8]
 (imports) UnusedImports: Unused import - org.apache.parquet.schema.Type.
```

## How was this patch tested?
ran `dev/lint-java` locally

Author: Sandeep Singh 

Closes #13559 from techaddict/minor-3.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f958c1c3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f958c1c3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f958c1c3

Branch: refs/heads/master
Commit: f958c1c3e292aba98d283637606890f353a9836c
Parents: 87706eb
Author: Sandeep Singh 
Authored: Wed Jun 8 14:51:00 2016 +0100
Committer: Sean Owen 
Committed: Wed Jun 8 14:51:00 2016 +0100

--
 dev/checkstyle-suppressions.xml  | 2 ++
 .../main/java/org/apache/spark/launcher/LauncherServer.java  | 8 
 .../datasources/parquet/SpecificParquetRecordReaderBase.java | 2 --
 3 files changed, 6 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f958c1c3/dev/checkstyle-suppressions.xml
--
diff --git a/dev/checkstyle-suppressions.xml b/dev/checkstyle-suppressions.xml
index bfc2e73..31656ca 100644
--- a/dev/checkstyle-suppressions.xml
+++ b/dev/checkstyle-suppressions.xml
@@ -42,4 +42,6 @@
   
files="src/main/java/org/apache/hive/service/auth/PasswdAuthenticationProvider.java"/>
 
+
 

http://git-wip-us.apache.org/repos/asf/spark/blob/f958c1c3/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java
--
diff --git 
a/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java 
b/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java
index 28e9420..ae43f56 100644
--- a/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java
+++ b/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java
@@ -337,10 +337,10 @@ class LauncherServer implements Closeable {
   }
   super.close();
   if (handle != null) {
-   if (!handle.getState().isFinal()) {
- LOG.log(Level.WARNING, "Lost connection to spark application.");
- handle.setState(SparkAppHandle.State.LOST);
-   }
+if (!handle.getState().isFinal()) {
+  LOG.log(Level.WARNING, "Lost connection to spark application.");
+  handle.setState(SparkAppHandle.State.LOST);
+}
 handle.disconnect();
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/f958c1c3/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
index 3f7a872..14626e5 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecor

spark git commit: [DOCUMENTATION] Fixed target JAR path

2016-06-08 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master f958c1c3e -> ca70ab27c


[DOCUMENTATION] Fixed target JAR path

## What changes were proposed in this pull request?

Mentioned Scala version in the sbt configuration file is 2.11, so the path of 
the target JAR should be `/target/scala-2.11/simple-project_2.11-1.0.jar`

## How was this patch tested?

n/a

Author: prabs 
Author: Prabeesh K 

Closes #13554 from prabeesh/master.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ca70ab27
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ca70ab27
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ca70ab27

Branch: refs/heads/master
Commit: ca70ab27cc73f6ea7fce5d179ca8f13459c8ba95
Parents: f958c1c
Author: prabs 
Authored: Wed Jun 8 17:22:55 2016 +0100
Committer: Sean Owen 
Committed: Wed Jun 8 17:22:55 2016 +0100

--
 docs/quick-start.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ca70ab27/docs/quick-start.md
--
diff --git a/docs/quick-start.md b/docs/quick-start.md
index 72372a6..1b961fd 100644
--- a/docs/quick-start.md
+++ b/docs/quick-start.md
@@ -289,13 +289,13 @@ $ find .
 # Package a jar containing your application
 $ sbt package
 ...
-[info] Packaging {..}/{..}/target/scala-2.10/simple-project_2.10-1.0.jar
+[info] Packaging 
{..}/{..}/target/scala-{{site.SCALA_BINARY_VERSION}}/simple-project_{{site.SCALA_BINARY_VERSION}}-1.0.jar
 
 # Use spark-submit to run your application
 $ YOUR_SPARK_HOME/bin/spark-submit \
   --class "SimpleApp" \
   --master local[4] \
-  target/scala-2.10/simple-project_2.10-1.0.jar
+  
target/scala-{{site.SCALA_BINARY_VERSION}}/simple-project_{{site.SCALA_BINARY_VERSION}}-1.0.jar
 ...
 Lines with a: 46, Lines with b: 23
 {% endhighlight %}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [DOCUMENTATION] Fixed target JAR path

2016-06-08 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 5e9a8e715 -> b2778c8bb


[DOCUMENTATION] Fixed target JAR path

## What changes were proposed in this pull request?

Mentioned Scala version in the sbt configuration file is 2.11, so the path of 
the target JAR should be `/target/scala-2.11/simple-project_2.11-1.0.jar`

## How was this patch tested?

n/a

Author: prabs 
Author: Prabeesh K 

Closes #13554 from prabeesh/master.

(cherry picked from commit ca70ab27cc73f6ea7fce5d179ca8f13459c8ba95)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b2778c8b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b2778c8b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b2778c8b

Branch: refs/heads/branch-2.0
Commit: b2778c8bbdf3b3a2e650b17346f87f2568f88295
Parents: 5e9a8e7
Author: prabs 
Authored: Wed Jun 8 17:22:55 2016 +0100
Committer: Sean Owen 
Committed: Wed Jun 8 17:23:03 2016 +0100

--
 docs/quick-start.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b2778c8b/docs/quick-start.md
--
diff --git a/docs/quick-start.md b/docs/quick-start.md
index 72372a6..1b961fd 100644
--- a/docs/quick-start.md
+++ b/docs/quick-start.md
@@ -289,13 +289,13 @@ $ find .
 # Package a jar containing your application
 $ sbt package
 ...
-[info] Packaging {..}/{..}/target/scala-2.10/simple-project_2.10-1.0.jar
+[info] Packaging 
{..}/{..}/target/scala-{{site.SCALA_BINARY_VERSION}}/simple-project_{{site.SCALA_BINARY_VERSION}}-1.0.jar
 
 # Use spark-submit to run your application
 $ YOUR_SPARK_HOME/bin/spark-submit \
   --class "SimpleApp" \
   --master local[4] \
-  target/scala-2.10/simple-project_2.10-1.0.jar
+  
target/scala-{{site.SCALA_BINARY_VERSION}}/simple-project_{{site.SCALA_BINARY_VERSION}}-1.0.jar
 ...
 Lines with a: 46, Lines with b: 23
 {% endhighlight %}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15818][BUILD] Upgrade to Hadoop 2.7.2

2016-06-09 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 921fa40b1 -> 147c02082


[SPARK-15818][BUILD] Upgrade to Hadoop 2.7.2

## What changes were proposed in this pull request?

Updating the Hadoop version from 2.7.0 to 2.7.2 if we use the Hadoop-2.7 build 
profile

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, 
manual tests)
Existing tests

(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

I'd like us to use Hadoop 2.7.2 owing to the Hadoop release notes stating 
Hadoop 2.7.0 is not ready for production use

https://hadoop.apache.org/docs/r2.7.0/ states

"Apache Hadoop 2.7.0 is a minor release in the 2.x.y release line, building 
upon the previous stable release 2.6.0.
This release is not yet ready for production use. Production users should use 
2.7.1 release and beyond."

Hadoop 2.7.1 release notes:
"Apache Hadoop 2.7.1 is a minor release in the 2.x.y release line, building 
upon the previous release 2.7.0. This is the next stable release after Apache 
Hadoop 2.6.x."

And then Hadoop 2.7.2 release notes:
"Apache Hadoop 2.7.2 is a minor release in the 2.x.y release line, building 
upon the previous stable release 2.7.1."

I've tested this is OK with Intel hardware and IBM Java 8 so let's test it with 
OpenJDK, ideally this will be pushed to branch-2.0 and master.

Author: Adam Roberts 

Closes #13556 from a-roberts/patch-2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/147c0208
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/147c0208
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/147c0208

Branch: refs/heads/master
Commit: 147c020823080c60b495f7950629d8134bf895db
Parents: 921fa40
Author: Adam Roberts 
Authored: Thu Jun 9 10:34:01 2016 +0100
Committer: Sean Owen 
Committed: Thu Jun 9 10:34:01 2016 +0100

--
 dev/deps/spark-deps-hadoop-2.4 | 30 +++---
 dev/deps/spark-deps-hadoop-2.6 | 30 +++---
 dev/deps/spark-deps-hadoop-2.7 | 30 +++---
 pom.xml|  6 +++---
 4 files changed, 48 insertions(+), 48 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/147c0208/dev/deps/spark-deps-hadoop-2.4
--
diff --git a/dev/deps/spark-deps-hadoop-2.4 b/dev/deps/spark-deps-hadoop-2.4
index f0491ec..501bf58 100644
--- a/dev/deps/spark-deps-hadoop-2.4
+++ b/dev/deps/spark-deps-hadoop-2.4
@@ -53,21 +53,21 @@ eigenbase-properties-1.1.5.jar
 guava-14.0.1.jar
 guice-3.0.jar
 guice-servlet-3.0.jar
-hadoop-annotations-2.4.0.jar
-hadoop-auth-2.4.0.jar
-hadoop-client-2.4.0.jar
-hadoop-common-2.4.0.jar
-hadoop-hdfs-2.4.0.jar
-hadoop-mapreduce-client-app-2.4.0.jar
-hadoop-mapreduce-client-common-2.4.0.jar
-hadoop-mapreduce-client-core-2.4.0.jar
-hadoop-mapreduce-client-jobclient-2.4.0.jar
-hadoop-mapreduce-client-shuffle-2.4.0.jar
-hadoop-yarn-api-2.4.0.jar
-hadoop-yarn-client-2.4.0.jar
-hadoop-yarn-common-2.4.0.jar
-hadoop-yarn-server-common-2.4.0.jar
-hadoop-yarn-server-web-proxy-2.4.0.jar
+hadoop-annotations-2.4.1.jar
+hadoop-auth-2.4.1.jar
+hadoop-client-2.4.1.jar
+hadoop-common-2.4.1.jar
+hadoop-hdfs-2.4.1.jar
+hadoop-mapreduce-client-app-2.4.1.jar
+hadoop-mapreduce-client-common-2.4.1.jar
+hadoop-mapreduce-client-core-2.4.1.jar
+hadoop-mapreduce-client-jobclient-2.4.1.jar
+hadoop-mapreduce-client-shuffle-2.4.1.jar
+hadoop-yarn-api-2.4.1.jar
+hadoop-yarn-client-2.4.1.jar
+hadoop-yarn-common-2.4.1.jar
+hadoop-yarn-server-common-2.4.1.jar
+hadoop-yarn-server-web-proxy-2.4.1.jar
 hk2-api-2.4.0-b34.jar
 hk2-locator-2.4.0-b34.jar
 hk2-utils-2.4.0-b34.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/147c0208/dev/deps/spark-deps-hadoop-2.6
--
diff --git a/dev/deps/spark-deps-hadoop-2.6 b/dev/deps/spark-deps-hadoop-2.6
index b3dced6..b915727 100644
--- a/dev/deps/spark-deps-hadoop-2.6
+++ b/dev/deps/spark-deps-hadoop-2.6
@@ -58,21 +58,21 @@ gson-2.2.4.jar
 guava-14.0.1.jar
 guice-3.0.jar
 guice-servlet-3.0.jar
-hadoop-annotations-2.6.0.jar
-hadoop-auth-2.6.0.jar
-hadoop-client-2.6.0.jar
-hadoop-common-2.6.0.jar
-hadoop-hdfs-2.6.0.jar
-hadoop-mapreduce-client-app-2.6.0.jar
-hadoop-mapreduce-client-common-2.6.0.jar
-hadoop-mapreduce-client-core-2.6.0.jar
-hadoop-mapreduce-client-jobclient-2.6.0.jar
-hadoop-mapreduce-client-shuffle-2.6.0.jar
-hadoop-yarn-api-2.6.0.jar
-hadoop-yarn-client-2.6.0.jar
-hadoop-yarn-common-2.6.0.jar
-hadoop-yarn-server-common-2.6.0.jar
-hadoop-yarn-server-web-proxy-2.6.0.jar
+hadoop-annotations-2.6.4.jar
+hadoop-auth-2.6.4.jar
+hadoop-client-2.6.4.jar
+hadoop-common-2.6.4.jar
+hadoop-hdfs-2.6.4.jar
+hadoop-mapreduce-client-app-2.6.4.jar
+h

spark git commit: [SPARK-15818][BUILD] Upgrade to Hadoop 2.7.2

2016-06-09 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 8ee93eed9 -> 77c08d224


[SPARK-15818][BUILD] Upgrade to Hadoop 2.7.2

## What changes were proposed in this pull request?

Updating the Hadoop version from 2.7.0 to 2.7.2 if we use the Hadoop-2.7 build 
profile

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, 
manual tests)
Existing tests

(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

I'd like us to use Hadoop 2.7.2 owing to the Hadoop release notes stating 
Hadoop 2.7.0 is not ready for production use

https://hadoop.apache.org/docs/r2.7.0/ states

"Apache Hadoop 2.7.0 is a minor release in the 2.x.y release line, building 
upon the previous stable release 2.6.0.
This release is not yet ready for production use. Production users should use 
2.7.1 release and beyond."

Hadoop 2.7.1 release notes:
"Apache Hadoop 2.7.1 is a minor release in the 2.x.y release line, building 
upon the previous release 2.7.0. This is the next stable release after Apache 
Hadoop 2.6.x."

And then Hadoop 2.7.2 release notes:
"Apache Hadoop 2.7.2 is a minor release in the 2.x.y release line, building 
upon the previous stable release 2.7.1."

I've tested this is OK with Intel hardware and IBM Java 8 so let's test it with 
OpenJDK, ideally this will be pushed to branch-2.0 and master.

Author: Adam Roberts 

Closes #13556 from a-roberts/patch-2.

(cherry picked from commit 147c020823080c60b495f7950629d8134bf895db)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/77c08d22
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/77c08d22
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/77c08d22

Branch: refs/heads/branch-2.0
Commit: 77c08d2240bef7d814fc6e4dd0a53fbdf1e2f795
Parents: 8ee93ee
Author: Adam Roberts 
Authored: Thu Jun 9 10:34:01 2016 +0100
Committer: Sean Owen 
Committed: Thu Jun 9 10:34:15 2016 +0100

--
 dev/deps/spark-deps-hadoop-2.4 | 30 +++---
 dev/deps/spark-deps-hadoop-2.6 | 30 +++---
 dev/deps/spark-deps-hadoop-2.7 | 30 +++---
 pom.xml|  6 +++---
 4 files changed, 48 insertions(+), 48 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/77c08d22/dev/deps/spark-deps-hadoop-2.4
--
diff --git a/dev/deps/spark-deps-hadoop-2.4 b/dev/deps/spark-deps-hadoop-2.4
index 77d5266..3df292e 100644
--- a/dev/deps/spark-deps-hadoop-2.4
+++ b/dev/deps/spark-deps-hadoop-2.4
@@ -53,21 +53,21 @@ eigenbase-properties-1.1.5.jar
 guava-14.0.1.jar
 guice-3.0.jar
 guice-servlet-3.0.jar
-hadoop-annotations-2.4.0.jar
-hadoop-auth-2.4.0.jar
-hadoop-client-2.4.0.jar
-hadoop-common-2.4.0.jar
-hadoop-hdfs-2.4.0.jar
-hadoop-mapreduce-client-app-2.4.0.jar
-hadoop-mapreduce-client-common-2.4.0.jar
-hadoop-mapreduce-client-core-2.4.0.jar
-hadoop-mapreduce-client-jobclient-2.4.0.jar
-hadoop-mapreduce-client-shuffle-2.4.0.jar
-hadoop-yarn-api-2.4.0.jar
-hadoop-yarn-client-2.4.0.jar
-hadoop-yarn-common-2.4.0.jar
-hadoop-yarn-server-common-2.4.0.jar
-hadoop-yarn-server-web-proxy-2.4.0.jar
+hadoop-annotations-2.4.1.jar
+hadoop-auth-2.4.1.jar
+hadoop-client-2.4.1.jar
+hadoop-common-2.4.1.jar
+hadoop-hdfs-2.4.1.jar
+hadoop-mapreduce-client-app-2.4.1.jar
+hadoop-mapreduce-client-common-2.4.1.jar
+hadoop-mapreduce-client-core-2.4.1.jar
+hadoop-mapreduce-client-jobclient-2.4.1.jar
+hadoop-mapreduce-client-shuffle-2.4.1.jar
+hadoop-yarn-api-2.4.1.jar
+hadoop-yarn-client-2.4.1.jar
+hadoop-yarn-common-2.4.1.jar
+hadoop-yarn-server-common-2.4.1.jar
+hadoop-yarn-server-web-proxy-2.4.1.jar
 hk2-api-2.4.0-b34.jar
 hk2-locator-2.4.0-b34.jar
 hk2-utils-2.4.0-b34.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/77c08d22/dev/deps/spark-deps-hadoop-2.6
--
diff --git a/dev/deps/spark-deps-hadoop-2.6 b/dev/deps/spark-deps-hadoop-2.6
index 9afe50f..9540f58 100644
--- a/dev/deps/spark-deps-hadoop-2.6
+++ b/dev/deps/spark-deps-hadoop-2.6
@@ -58,21 +58,21 @@ gson-2.2.4.jar
 guava-14.0.1.jar
 guice-3.0.jar
 guice-servlet-3.0.jar
-hadoop-annotations-2.6.0.jar
-hadoop-auth-2.6.0.jar
-hadoop-client-2.6.0.jar
-hadoop-common-2.6.0.jar
-hadoop-hdfs-2.6.0.jar
-hadoop-mapreduce-client-app-2.6.0.jar
-hadoop-mapreduce-client-common-2.6.0.jar
-hadoop-mapreduce-client-core-2.6.0.jar
-hadoop-mapreduce-client-jobclient-2.6.0.jar
-hadoop-mapreduce-client-shuffle-2.6.0.jar
-hadoop-yarn-api-2.6.0.jar
-hadoop-yarn-client-2.6.0.jar
-hadoop-yarn-common-2.6.0.jar
-hadoop-yarn-server-common-2.6.0.jar
-hadoop-yarn-server-web-proxy-2.6.0.jar
+hadoop-annotations-2.6.4.jar
+hadoop-auth-2.6.4.jar
+hadoop-cl

spark git commit: [SPARK-15823][PYSPARK][ML] Add @property for 'accuracy' in MulticlassMetrics

2016-06-10 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 675a73715 -> 16ca32eac


[SPARK-15823][PYSPARK][ML] Add @property for 'accuracy' in MulticlassMetrics

## What changes were proposed in this pull request?
`accuracy` should be decorated with `property` to keep step with other methods 
in `pyspark.MulticlassMetrics`, like `weightedPrecision`, `weightedRecall`, etc

## How was this patch tested?
manual tests

Author: Zheng RuiFeng 

Closes #13560 from zhengruifeng/add_accuracy_property.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/16ca32ea
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/16ca32ea
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/16ca32ea

Branch: refs/heads/master
Commit: 16ca32eace39c423224b0ec25922038fd45c501a
Parents: 675a737
Author: Zheng RuiFeng 
Authored: Fri Jun 10 10:09:19 2016 +0100
Committer: Sean Owen 
Committed: Fri Jun 10 10:09:19 2016 +0100

--
 python/pyspark/mllib/evaluation.py | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/16ca32ea/python/pyspark/mllib/evaluation.py
--
diff --git a/python/pyspark/mllib/evaluation.py 
b/python/pyspark/mllib/evaluation.py
index 2eaac87..fc2a0b3 100644
--- a/python/pyspark/mllib/evaluation.py
+++ b/python/pyspark/mllib/evaluation.py
@@ -179,11 +179,7 @@ class MulticlassMetrics(JavaModelWrapper):
 1.0...
 >>> metrics.fMeasure(0.0, 2.0)
 0.52...
->>> metrics.precision()
-0.66...
->>> metrics.recall()
-0.66...
->>> metrics.accuracy()
+>>> metrics.accuracy
 0.66...
 >>> metrics.weightedFalsePositiveRate
 0.19...
@@ -273,6 +269,7 @@ class MulticlassMetrics(JavaModelWrapper):
 else:
 return self.call("fMeasure", label, beta)
 
+@property
 @since('2.0.0')
 def accuracy(self):
 """


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15823][PYSPARK][ML] Add @property for 'accuracy' in MulticlassMetrics

2016-06-10 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 84a8421e5 -> 6709ce1ae


[SPARK-15823][PYSPARK][ML] Add @property for 'accuracy' in MulticlassMetrics

## What changes were proposed in this pull request?
`accuracy` should be decorated with `property` to keep step with other methods 
in `pyspark.MulticlassMetrics`, like `weightedPrecision`, `weightedRecall`, etc

## How was this patch tested?
manual tests

Author: Zheng RuiFeng 

Closes #13560 from zhengruifeng/add_accuracy_property.

(cherry picked from commit 16ca32eace39c423224b0ec25922038fd45c501a)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6709ce1a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6709ce1a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6709ce1a

Branch: refs/heads/branch-2.0
Commit: 6709ce1aea4a8d7438722f48fd7f2ed0fc7fa5be
Parents: 84a8421
Author: Zheng RuiFeng 
Authored: Fri Jun 10 10:09:19 2016 +0100
Committer: Sean Owen 
Committed: Fri Jun 10 10:09:29 2016 +0100

--
 python/pyspark/mllib/evaluation.py | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6709ce1a/python/pyspark/mllib/evaluation.py
--
diff --git a/python/pyspark/mllib/evaluation.py 
b/python/pyspark/mllib/evaluation.py
index 2eaac87..fc2a0b3 100644
--- a/python/pyspark/mllib/evaluation.py
+++ b/python/pyspark/mllib/evaluation.py
@@ -179,11 +179,7 @@ class MulticlassMetrics(JavaModelWrapper):
 1.0...
 >>> metrics.fMeasure(0.0, 2.0)
 0.52...
->>> metrics.precision()
-0.66...
->>> metrics.recall()
-0.66...
->>> metrics.accuracy()
+>>> metrics.accuracy
 0.66...
 >>> metrics.weightedFalsePositiveRate
 0.19...
@@ -273,6 +269,7 @@ class MulticlassMetrics(JavaModelWrapper):
 else:
 return self.call("fMeasure", label, beta)
 
+@property
 @since('2.0.0')
 def accuracy(self):
 """


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15837][ML][PYSPARK] Word2vec python add maxsentence parameter

2016-06-10 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 16ca32eac -> cdd7f5a57


[SPARK-15837][ML][PYSPARK] Word2vec python add maxsentence parameter

## What changes were proposed in this pull request?

Word2vec python add maxsentence parameter.

## How was this patch tested?

Existing test.

Author: WeichenXu 

Closes #13578 from WeichenXu123/word2vec_python_add_maxsentence.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cdd7f5a5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cdd7f5a5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cdd7f5a5

Branch: refs/heads/master
Commit: cdd7f5a57a21d4a8f93456d149f65859c96190cf
Parents: 16ca32e
Author: WeichenXu 
Authored: Fri Jun 10 12:26:53 2016 +0100
Committer: Sean Owen 
Committed: Fri Jun 10 12:26:53 2016 +0100

--
 python/pyspark/ml/feature.py | 29 -
 1 file changed, 24 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cdd7f5a5/python/pyspark/ml/feature.py
--
diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py
index ebe1300..bfb2fb7 100755
--- a/python/pyspark/ml/feature.py
+++ b/python/pyspark/ml/feature.py
@@ -2244,28 +2244,33 @@ class Word2Vec(JavaEstimator, HasStepSize, HasMaxIter, 
HasSeed, HasInputCol, Has
 windowSize = Param(Params._dummy(), "windowSize",
"the window size (context words from [-window, 
window]). Default value is 5",
typeConverter=TypeConverters.toInt)
+maxSentenceLength = Param(Params._dummy(), "maxSentenceLength",
+  "Maximum length (in words) of each sentence in 
the input data. " +
+  "Any sentence longer than this threshold will " +
+  "be divided into chunks up to the size.",
+  typeConverter=TypeConverters.toInt)
 
 @keyword_only
 def __init__(self, vectorSize=100, minCount=5, numPartitions=1, 
stepSize=0.025, maxIter=1,
- seed=None, inputCol=None, outputCol=None, windowSize=5):
+ seed=None, inputCol=None, outputCol=None, windowSize=5, 
maxSentenceLength=1000):
 """
 __init__(self, vectorSize=100, minCount=5, numPartitions=1, 
stepSize=0.025, maxIter=1, \
- seed=None, inputCol=None, outputCol=None, windowSize=5)
+ seed=None, inputCol=None, outputCol=None, windowSize=5, 
maxSentenceLength=1000)
 """
 super(Word2Vec, self).__init__()
 self._java_obj = 
self._new_java_obj("org.apache.spark.ml.feature.Word2Vec", self.uid)
 self._setDefault(vectorSize=100, minCount=5, numPartitions=1, 
stepSize=0.025, maxIter=1,
- seed=None, windowSize=5)
+ seed=None, windowSize=5, maxSentenceLength=1000)
 kwargs = self.__init__._input_kwargs
 self.setParams(**kwargs)
 
 @keyword_only
 @since("1.4.0")
 def setParams(self, vectorSize=100, minCount=5, numPartitions=1, 
stepSize=0.025, maxIter=1,
-  seed=None, inputCol=None, outputCol=None, windowSize=5):
+  seed=None, inputCol=None, outputCol=None, windowSize=5, 
maxSentenceLength=1000):
 """
 setParams(self, minCount=5, numPartitions=1, stepSize=0.025, 
maxIter=1, seed=None, \
- inputCol=None, outputCol=None, windowSize=5)
+ inputCol=None, outputCol=None, windowSize=5, 
maxSentenceLength=1000)
 Sets params for this Word2Vec.
 """
 kwargs = self.setParams._input_kwargs
@@ -2327,6 +2332,20 @@ class Word2Vec(JavaEstimator, HasStepSize, HasMaxIter, 
HasSeed, HasInputCol, Has
 """
 return self.getOrDefault(self.windowSize)
 
+@since("2.0.0")
+def setMaxSentenceLength(self, value):
+"""
+Sets the value of :py:attr:`maxSentenceLength`.
+"""
+return self._set(maxSentenceLength=value)
+
+@since("2.0.0")
+def getMaxSentenceLength(self):
+"""
+Gets the value of maxSentenceLength or its default value.
+"""
+return self.getOrDefault(self.maxSentenceLength)
+
 def _create_model(self, java_model):
 return Word2VecModel(java_model)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15837][ML][PYSPARK] Word2vec python add maxsentence parameter

2016-06-10 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 6709ce1ae -> 54b4763d2


[SPARK-15837][ML][PYSPARK] Word2vec python add maxsentence parameter

## What changes were proposed in this pull request?

Word2vec python add maxsentence parameter.

## How was this patch tested?

Existing test.

Author: WeichenXu 

Closes #13578 from WeichenXu123/word2vec_python_add_maxsentence.

(cherry picked from commit cdd7f5a57a21d4a8f93456d149f65859c96190cf)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/54b4763d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/54b4763d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/54b4763d

Branch: refs/heads/branch-2.0
Commit: 54b4763d295d6aeab6105d0430470343dd4ca3a3
Parents: 6709ce1
Author: WeichenXu 
Authored: Fri Jun 10 12:26:53 2016 +0100
Committer: Sean Owen 
Committed: Fri Jun 10 12:27:04 2016 +0100

--
 python/pyspark/ml/feature.py | 29 -
 1 file changed, 24 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/54b4763d/python/pyspark/ml/feature.py
--
diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py
index ebe1300..bfb2fb7 100755
--- a/python/pyspark/ml/feature.py
+++ b/python/pyspark/ml/feature.py
@@ -2244,28 +2244,33 @@ class Word2Vec(JavaEstimator, HasStepSize, HasMaxIter, 
HasSeed, HasInputCol, Has
 windowSize = Param(Params._dummy(), "windowSize",
"the window size (context words from [-window, 
window]). Default value is 5",
typeConverter=TypeConverters.toInt)
+maxSentenceLength = Param(Params._dummy(), "maxSentenceLength",
+  "Maximum length (in words) of each sentence in 
the input data. " +
+  "Any sentence longer than this threshold will " +
+  "be divided into chunks up to the size.",
+  typeConverter=TypeConverters.toInt)
 
 @keyword_only
 def __init__(self, vectorSize=100, minCount=5, numPartitions=1, 
stepSize=0.025, maxIter=1,
- seed=None, inputCol=None, outputCol=None, windowSize=5):
+ seed=None, inputCol=None, outputCol=None, windowSize=5, 
maxSentenceLength=1000):
 """
 __init__(self, vectorSize=100, minCount=5, numPartitions=1, 
stepSize=0.025, maxIter=1, \
- seed=None, inputCol=None, outputCol=None, windowSize=5)
+ seed=None, inputCol=None, outputCol=None, windowSize=5, 
maxSentenceLength=1000)
 """
 super(Word2Vec, self).__init__()
 self._java_obj = 
self._new_java_obj("org.apache.spark.ml.feature.Word2Vec", self.uid)
 self._setDefault(vectorSize=100, minCount=5, numPartitions=1, 
stepSize=0.025, maxIter=1,
- seed=None, windowSize=5)
+ seed=None, windowSize=5, maxSentenceLength=1000)
 kwargs = self.__init__._input_kwargs
 self.setParams(**kwargs)
 
 @keyword_only
 @since("1.4.0")
 def setParams(self, vectorSize=100, minCount=5, numPartitions=1, 
stepSize=0.025, maxIter=1,
-  seed=None, inputCol=None, outputCol=None, windowSize=5):
+  seed=None, inputCol=None, outputCol=None, windowSize=5, 
maxSentenceLength=1000):
 """
 setParams(self, minCount=5, numPartitions=1, stepSize=0.025, 
maxIter=1, seed=None, \
- inputCol=None, outputCol=None, windowSize=5)
+ inputCol=None, outputCol=None, windowSize=5, 
maxSentenceLength=1000)
 Sets params for this Word2Vec.
 """
 kwargs = self.setParams._input_kwargs
@@ -2327,6 +2332,20 @@ class Word2Vec(JavaEstimator, HasStepSize, HasMaxIter, 
HasSeed, HasInputCol, Has
 """
 return self.getOrDefault(self.windowSize)
 
+@since("2.0.0")
+def setMaxSentenceLength(self, value):
+"""
+Sets the value of :py:attr:`maxSentenceLength`.
+"""
+return self._set(maxSentenceLength=value)
+
+@since("2.0.0")
+def getMaxSentenceLength(self):
+"""
+Gets the value of maxSentenceLength or its default value.
+"""
+return self.getOrDefault(self.maxSentenceLength)
+
 def _create_model(self, java_model):
 return Word2VecModel(java_model)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache"

2016-06-11 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 7504bc73f -> 3761330dd


[SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache"

## What changes were proposed in this pull request?

Use new Spark logo including "Apache" (now, with crushed PNGs). Remove old 
unreferenced logo files.

## How was this patch tested?

Manual check of generated HTML site and Spark UI. I searched for references to 
the deleted files to make sure they were not used.

Author: Sean Owen 

Closes #13609 from srowen/SPARK-15879.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3761330d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3761330d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3761330d

Branch: refs/heads/master
Commit: 3761330dd0151d7369d7fba4d4c344e9863990ef
Parents: 7504bc7
Author: Sean Owen 
Authored: Sat Jun 11 12:46:07 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 11 12:46:07 2016 +0100

--
 .../spark/ui/static/spark-logo-77x50px-hd.png   | Bin 3536 -> 4182 bytes
 .../org/apache/spark/ui/static/spark_logo.png   | Bin 14233 -> 0 bytes
 docs/img/incubator-logo.png | Bin 11651 -> 0 bytes
 docs/img/spark-logo-100x40px.png| Bin 3635 -> 0 bytes
 docs/img/spark-logo-77x40px-hd.png  | Bin 1904 -> 0 bytes
 docs/img/spark-logo-77x50px-hd.png  | Bin 3536 -> 0 bytes
 docs/img/spark-logo-hd.png  | Bin 13512 -> 16418 bytes
 7 files changed, 0 insertions(+), 0 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png
--
diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png
index 6c5f099..ffe2550 100644
Binary files 
a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
and 
b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
differ

http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png
--
diff --git a/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png 
b/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png
deleted file mode 100644
index 4b18734..000
Binary files 
a/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png and 
/dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/docs/img/incubator-logo.png
--
diff --git a/docs/img/incubator-logo.png b/docs/img/incubator-logo.png
deleted file mode 100644
index 33ca7f6..000
Binary files a/docs/img/incubator-logo.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/docs/img/spark-logo-100x40px.png
--
diff --git a/docs/img/spark-logo-100x40px.png b/docs/img/spark-logo-100x40px.png
deleted file mode 100644
index 54c3187..000
Binary files a/docs/img/spark-logo-100x40px.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/docs/img/spark-logo-77x40px-hd.png
--
diff --git a/docs/img/spark-logo-77x40px-hd.png 
b/docs/img/spark-logo-77x40px-hd.png
deleted file mode 100644
index 270402f..000
Binary files a/docs/img/spark-logo-77x40px-hd.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/docs/img/spark-logo-77x50px-hd.png
--
diff --git a/docs/img/spark-logo-77x50px-hd.png 
b/docs/img/spark-logo-77x50px-hd.png
deleted file mode 100644
index 6c5f099..000
Binary files a/docs/img/spark-logo-77x50px-hd.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/3761330d/docs/img/spark-logo-hd.png
--
diff --git a/docs/img/spark-logo-hd.png b/docs/img/spark-logo-hd.png
index 1381e30..e4508e7 100644
Binary files a/docs/img/spark-logo-hd.png and b/docs/img/spark-logo-hd.png 
differ


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache"

2016-06-11 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 f0fa0a894 -> 4c29c55f2


[SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache"

## What changes were proposed in this pull request?

Use new Spark logo including "Apache" (now, with crushed PNGs). Remove old 
unreferenced logo files.

## How was this patch tested?

Manual check of generated HTML site and Spark UI. I searched for references to 
the deleted files to make sure they were not used.

Author: Sean Owen 

Closes #13609 from srowen/SPARK-15879.

(cherry picked from commit 3761330dd0151d7369d7fba4d4c344e9863990ef)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4c29c55f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4c29c55f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4c29c55f

Branch: refs/heads/branch-2.0
Commit: 4c29c55f22d57c5fbadd0b759155fbab4b07a70a
Parents: f0fa0a8
Author: Sean Owen 
Authored: Sat Jun 11 12:46:07 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 11 12:46:21 2016 +0100

--
 .../spark/ui/static/spark-logo-77x50px-hd.png   | Bin 3536 -> 4182 bytes
 .../org/apache/spark/ui/static/spark_logo.png   | Bin 14233 -> 0 bytes
 docs/img/incubator-logo.png | Bin 11651 -> 0 bytes
 docs/img/spark-logo-100x40px.png| Bin 3635 -> 0 bytes
 docs/img/spark-logo-77x40px-hd.png  | Bin 1904 -> 0 bytes
 docs/img/spark-logo-77x50px-hd.png  | Bin 3536 -> 0 bytes
 docs/img/spark-logo-hd.png  | Bin 13512 -> 16418 bytes
 7 files changed, 0 insertions(+), 0 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png
--
diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png
index 6c5f099..ffe2550 100644
Binary files 
a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
and 
b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
differ

http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png
--
diff --git a/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png 
b/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png
deleted file mode 100644
index 4b18734..000
Binary files 
a/core/src/main/resources/org/apache/spark/ui/static/spark_logo.png and 
/dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/docs/img/incubator-logo.png
--
diff --git a/docs/img/incubator-logo.png b/docs/img/incubator-logo.png
deleted file mode 100644
index 33ca7f6..000
Binary files a/docs/img/incubator-logo.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/docs/img/spark-logo-100x40px.png
--
diff --git a/docs/img/spark-logo-100x40px.png b/docs/img/spark-logo-100x40px.png
deleted file mode 100644
index 54c3187..000
Binary files a/docs/img/spark-logo-100x40px.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/docs/img/spark-logo-77x40px-hd.png
--
diff --git a/docs/img/spark-logo-77x40px-hd.png 
b/docs/img/spark-logo-77x40px-hd.png
deleted file mode 100644
index 270402f..000
Binary files a/docs/img/spark-logo-77x40px-hd.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/docs/img/spark-logo-77x50px-hd.png
--
diff --git a/docs/img/spark-logo-77x50px-hd.png 
b/docs/img/spark-logo-77x50px-hd.png
deleted file mode 100644
index 6c5f099..000
Binary files a/docs/img/spark-logo-77x50px-hd.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/4c29c55f/docs/img/spark-logo-hd.png
--
diff --git a/docs/img/spark-logo-hd.png b/docs/img/spark-logo-hd.png
index 1381e30..e4508e7 100644
Binary files a/docs/img/spark-logo-hd.png and b/docs/img/spark-logo-hd.png 
differ


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents

2016-06-11 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 3761330dd -> ad102af16


[SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents

## What changes were proposed in this pull request?

This issue fixes all broken links on Spark 2.0 preview MLLib documents. Also, 
this contains some editorial change.

**Fix broken links**
  * mllib-data-types.md
  * mllib-decision-tree.md
  * mllib-ensembles.md
  * mllib-feature-extraction.md
  * mllib-pmml-model-export.md
  * mllib-statistics.md

**Fix malformed section header and scala coding style**
  * mllib-linear-methods.md

**Replace indirect forward links with direct one**
  * ml-classification-regression.md

## How was this patch tested?

Manual tests (with `cd docs; jekyll build`.)

Author: Dongjoon Hyun 

Closes #13608 from dongjoon-hyun/SPARK-15883.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ad102af1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ad102af1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ad102af1

Branch: refs/heads/master
Commit: ad102af169c7344b30d3b84aa16452fcdc22542c
Parents: 3761330
Author: Dongjoon Hyun 
Authored: Sat Jun 11 12:55:38 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 11 12:55:38 2016 +0100

--
 docs/ml-classification-regression.md |  4 ++--
 docs/mllib-data-types.md | 16 ++--
 docs/mllib-decision-tree.md  |  6 +++---
 docs/mllib-ensembles.md  |  6 +++---
 docs/mllib-feature-extraction.md |  2 +-
 docs/mllib-linear-methods.md | 10 +-
 docs/mllib-pmml-model-export.md  |  2 +-
 docs/mllib-statistics.md |  8 
 8 files changed, 25 insertions(+), 29 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ad102af1/docs/ml-classification-regression.md
--
diff --git a/docs/ml-classification-regression.md 
b/docs/ml-classification-regression.md
index 88457d4..d7e5521 100644
--- a/docs/ml-classification-regression.md
+++ b/docs/ml-classification-regression.md
@@ -815,7 +815,7 @@ The main differences between this API and the [original 
MLlib ensembles API](mll
 ## Random Forests
 
 [Random forests](http://en.wikipedia.org/wiki/Random_forest)
-are ensembles of [decision trees](ml-decision-tree.html).
+are ensembles of [decision 
trees](ml-classification-regression.html#decision-trees).
 Random forests combine many decision trees in order to reduce the risk of 
overfitting.
 The `spark.ml` implementation supports random forests for binary and 
multiclass classification and for regression,
 using both continuous and categorical features.
@@ -896,7 +896,7 @@ All output columns are optional; to exclude an output 
column, set its correspond
 ## Gradient-Boosted Trees (GBTs)
 
 [Gradient-Boosted Trees (GBTs)](http://en.wikipedia.org/wiki/Gradient_boosting)
-are ensembles of [decision trees](ml-decision-tree.html).
+are ensembles of [decision 
trees](ml-classification-regression.html#decision-trees).
 GBTs iteratively train decision trees in order to minimize a loss function.
 The `spark.ml` implementation supports GBTs for binary classification and for 
regression,
 using both continuous and categorical features.

http://git-wip-us.apache.org/repos/asf/spark/blob/ad102af1/docs/mllib-data-types.md
--
diff --git a/docs/mllib-data-types.md b/docs/mllib-data-types.md
index 2ffe0f1..ef56aeb 100644
--- a/docs/mllib-data-types.md
+++ b/docs/mllib-data-types.md
@@ -33,7 +33,7 @@ implementations: 
[`DenseVector`](api/scala/index.html#org.apache.spark.mllib.lin
 using the factory methods implemented in
 [`Vectors`](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) to 
create local vectors.
 
-Refer to the [`Vector` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) and [`Vectors` 
Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors) for 
details on the API.
+Refer to the [`Vector` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) and [`Vectors` 
Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) for 
details on the API.
 
 {% highlight scala %}
 import org.apache.spark.mllib.linalg.{Vector, Vectors}
@@ -199,7 +199,7 @@ After loading, the feature indices are converted to 
zero-based.
 
[`MLUtils.loadLibSVMFile`](api/scala/index.html#org.apache.spark.mllib.util.MLUtils$)
 reads training
 examples stored in LIBSVM format.
 
-Refer to the [`MLUtils` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.util.MLUtils) for details on 
the API.
+Refer to the [`MLUtils` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.util.MLUtils$) for details on 
the API.
 
 {% highlight scala %}
 imp

spark git commit: [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents

2016-06-11 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 4c29c55f2 -> 8cf33fb8a


[SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents

## What changes were proposed in this pull request?

This issue fixes all broken links on Spark 2.0 preview MLLib documents. Also, 
this contains some editorial change.

**Fix broken links**
  * mllib-data-types.md
  * mllib-decision-tree.md
  * mllib-ensembles.md
  * mllib-feature-extraction.md
  * mllib-pmml-model-export.md
  * mllib-statistics.md

**Fix malformed section header and scala coding style**
  * mllib-linear-methods.md

**Replace indirect forward links with direct one**
  * ml-classification-regression.md

## How was this patch tested?

Manual tests (with `cd docs; jekyll build`.)

Author: Dongjoon Hyun 

Closes #13608 from dongjoon-hyun/SPARK-15883.

(cherry picked from commit ad102af169c7344b30d3b84aa16452fcdc22542c)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8cf33fb8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8cf33fb8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8cf33fb8

Branch: refs/heads/branch-2.0
Commit: 8cf33fb8a945e8f76833f68fc99b1ad5dee13641
Parents: 4c29c55
Author: Dongjoon Hyun 
Authored: Sat Jun 11 12:55:38 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 11 12:55:48 2016 +0100

--
 docs/ml-classification-regression.md |  4 ++--
 docs/mllib-data-types.md | 16 ++--
 docs/mllib-decision-tree.md  |  6 +++---
 docs/mllib-ensembles.md  |  6 +++---
 docs/mllib-feature-extraction.md |  2 +-
 docs/mllib-linear-methods.md | 10 +-
 docs/mllib-pmml-model-export.md  |  2 +-
 docs/mllib-statistics.md |  8 
 8 files changed, 25 insertions(+), 29 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8cf33fb8/docs/ml-classification-regression.md
--
diff --git a/docs/ml-classification-regression.md 
b/docs/ml-classification-regression.md
index 88457d4..d7e5521 100644
--- a/docs/ml-classification-regression.md
+++ b/docs/ml-classification-regression.md
@@ -815,7 +815,7 @@ The main differences between this API and the [original 
MLlib ensembles API](mll
 ## Random Forests
 
 [Random forests](http://en.wikipedia.org/wiki/Random_forest)
-are ensembles of [decision trees](ml-decision-tree.html).
+are ensembles of [decision 
trees](ml-classification-regression.html#decision-trees).
 Random forests combine many decision trees in order to reduce the risk of 
overfitting.
 The `spark.ml` implementation supports random forests for binary and 
multiclass classification and for regression,
 using both continuous and categorical features.
@@ -896,7 +896,7 @@ All output columns are optional; to exclude an output 
column, set its correspond
 ## Gradient-Boosted Trees (GBTs)
 
 [Gradient-Boosted Trees (GBTs)](http://en.wikipedia.org/wiki/Gradient_boosting)
-are ensembles of [decision trees](ml-decision-tree.html).
+are ensembles of [decision 
trees](ml-classification-regression.html#decision-trees).
 GBTs iteratively train decision trees in order to minimize a loss function.
 The `spark.ml` implementation supports GBTs for binary classification and for 
regression,
 using both continuous and categorical features.

http://git-wip-us.apache.org/repos/asf/spark/blob/8cf33fb8/docs/mllib-data-types.md
--
diff --git a/docs/mllib-data-types.md b/docs/mllib-data-types.md
index 2ffe0f1..ef56aeb 100644
--- a/docs/mllib-data-types.md
+++ b/docs/mllib-data-types.md
@@ -33,7 +33,7 @@ implementations: 
[`DenseVector`](api/scala/index.html#org.apache.spark.mllib.lin
 using the factory methods implemented in
 [`Vectors`](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) to 
create local vectors.
 
-Refer to the [`Vector` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) and [`Vectors` 
Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors) for 
details on the API.
+Refer to the [`Vector` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) and [`Vectors` 
Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) for 
details on the API.
 
 {% highlight scala %}
 import org.apache.spark.mllib.linalg.{Vector, Vectors}
@@ -199,7 +199,7 @@ After loading, the feature indices are converted to 
zero-based.
 
[`MLUtils.loadLibSVMFile`](api/scala/index.html#org.apache.spark.mllib.util.MLUtils$)
 reads training
 examples stored in LIBSVM format.
 
-Refer to the [`MLUtils` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.util.MLUtils) for details on 
the API.
+Refer to the [`MLUtils` Scala 
docs](api/scala

spark git commit: [SPARK-15878][CORE][TEST] fix cleanup in EventLoggingListenerSuite and ReplayListenerSuite

2016-06-12 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 9e204c62c -> 8cc22b008


[SPARK-15878][CORE][TEST] fix cleanup in EventLoggingListenerSuite and 
ReplayListenerSuite

## What changes were proposed in this pull request?

These tests weren't properly using `LocalSparkContext` so weren't cleaning up 
correctly when tests failed.

## How was this patch tested?

Jenkins.

Author: Imran Rashid 

Closes #13602 from squito/SPARK-15878_cleanup_replaylistener.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8cc22b00
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8cc22b00
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8cc22b00

Branch: refs/heads/master
Commit: 8cc22b0085475a188f229536b4f83988ae889a8e
Parents: 9e204c6
Author: Imran Rashid 
Authored: Sun Jun 12 12:54:57 2016 +0100
Committer: Sean Owen 
Committed: Sun Jun 12 12:54:57 2016 +0100

--
 .../org/apache/spark/scheduler/EventLoggingListenerSuite.scala | 2 +-
 .../scala/org/apache/spark/scheduler/ReplayListenerSuite.scala | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8cc22b00/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
 
b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
index 176d893..c4c80b5 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
@@ -181,7 +181,7 @@ class EventLoggingListenerSuite extends SparkFunSuite with 
LocalSparkContext wit
 // into SPARK-6688.
 val conf = getLoggingConf(testDirPath, compressionCodec)
   .set("spark.hadoop.fs.defaultFS", "unsupported://example.com")
-val sc = new SparkContext("local-cluster[2,2,1024]", "test", conf)
+sc = new SparkContext("local-cluster[2,2,1024]", "test", conf)
 assert(sc.eventLogger.isDefined)
 val eventLogger = sc.eventLogger.get
 val eventLogPath = eventLogger.logPath

http://git-wip-us.apache.org/repos/asf/spark/blob/8cc22b00/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala
index 35215c1..1732aca 100644
--- a/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala
@@ -23,7 +23,7 @@ import java.net.URI
 import org.json4s.jackson.JsonMethods._
 import org.scalatest.BeforeAndAfter
 
-import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite}
+import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, 
SparkFunSuite}
 import org.apache.spark.deploy.SparkHadoopUtil
 import org.apache.spark.io.CompressionCodec
 import org.apache.spark.util.{JsonProtocol, JsonProtocolSuite, Utils}
@@ -31,7 +31,7 @@ import org.apache.spark.util.{JsonProtocol, 
JsonProtocolSuite, Utils}
 /**
  * Test whether ReplayListenerBus replays events from logs correctly.
  */
-class ReplayListenerSuite extends SparkFunSuite with BeforeAndAfter {
+class ReplayListenerSuite extends SparkFunSuite with BeforeAndAfter with 
LocalSparkContext {
   private val fileSystem = Utils.getHadoopFileSystem("/",
 SparkHadoopUtil.get.newConfiguration(new SparkConf()))
   private var testDir: File = _
@@ -101,7 +101,7 @@ class ReplayListenerSuite extends SparkFunSuite with 
BeforeAndAfter {
 fileSystem.mkdirs(logDirPath)
 
 val conf = EventLoggingListenerSuite.getLoggingConf(logDirPath, codecName)
-val sc = new SparkContext("local-cluster[2,1,1024]", "Test replay", conf)
+sc = new SparkContext("local-cluster[2,1,1024]", "Test replay", conf)
 
 // Run a few jobs
 sc.parallelize(1 to 100, 1).count()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15878][CORE][TEST] fix cleanup in EventLoggingListenerSuite and ReplayListenerSuite

2016-06-12 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 d494a483a -> 879e8fd09


[SPARK-15878][CORE][TEST] fix cleanup in EventLoggingListenerSuite and 
ReplayListenerSuite

## What changes were proposed in this pull request?

These tests weren't properly using `LocalSparkContext` so weren't cleaning up 
correctly when tests failed.

## How was this patch tested?

Jenkins.

Author: Imran Rashid 

Closes #13602 from squito/SPARK-15878_cleanup_replaylistener.

(cherry picked from commit 8cc22b0085475a188f229536b4f83988ae889a8e)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/879e8fd0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/879e8fd0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/879e8fd0

Branch: refs/heads/branch-2.0
Commit: 879e8fd09477fc78d66c9da9e0e117a513b0b046
Parents: d494a48
Author: Imran Rashid 
Authored: Sun Jun 12 12:54:57 2016 +0100
Committer: Sean Owen 
Committed: Sun Jun 12 12:55:17 2016 +0100

--
 .../org/apache/spark/scheduler/EventLoggingListenerSuite.scala | 2 +-
 .../scala/org/apache/spark/scheduler/ReplayListenerSuite.scala | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/879e8fd0/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
 
b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
index 176d893..c4c80b5 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
@@ -181,7 +181,7 @@ class EventLoggingListenerSuite extends SparkFunSuite with 
LocalSparkContext wit
 // into SPARK-6688.
 val conf = getLoggingConf(testDirPath, compressionCodec)
   .set("spark.hadoop.fs.defaultFS", "unsupported://example.com")
-val sc = new SparkContext("local-cluster[2,2,1024]", "test", conf)
+sc = new SparkContext("local-cluster[2,2,1024]", "test", conf)
 assert(sc.eventLogger.isDefined)
 val eventLogger = sc.eventLogger.get
 val eventLogPath = eventLogger.logPath

http://git-wip-us.apache.org/repos/asf/spark/blob/879e8fd0/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala
index 35215c1..1732aca 100644
--- a/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/scheduler/ReplayListenerSuite.scala
@@ -23,7 +23,7 @@ import java.net.URI
 import org.json4s.jackson.JsonMethods._
 import org.scalatest.BeforeAndAfter
 
-import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite}
+import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, 
SparkFunSuite}
 import org.apache.spark.deploy.SparkHadoopUtil
 import org.apache.spark.io.CompressionCodec
 import org.apache.spark.util.{JsonProtocol, JsonProtocolSuite, Utils}
@@ -31,7 +31,7 @@ import org.apache.spark.util.{JsonProtocol, 
JsonProtocolSuite, Utils}
 /**
  * Test whether ReplayListenerBus replays events from logs correctly.
  */
-class ReplayListenerSuite extends SparkFunSuite with BeforeAndAfter {
+class ReplayListenerSuite extends SparkFunSuite with BeforeAndAfter with 
LocalSparkContext {
   private val fileSystem = Utils.getHadoopFileSystem("/",
 SparkHadoopUtil.get.newConfiguration(new SparkConf()))
   private var testDir: File = _
@@ -101,7 +101,7 @@ class ReplayListenerSuite extends SparkFunSuite with 
BeforeAndAfter {
 fileSystem.mkdirs(logDirPath)
 
 val conf = EventLoggingListenerSuite.getLoggingConf(logDirPath, codecName)
-val sc = new SparkContext("local-cluster[2,1,1024]", "Test replay", conf)
+sc = new SparkContext("local-cluster[2,1,1024]", "Test replay", conf)
 
 // Run a few jobs
 sc.parallelize(1 to 100, 1).count()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15781][DOCUMENTATION] remove deprecated environment variable doc

2016-06-12 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 879e8fd09 -> 8c294f4ad


[SPARK-15781][DOCUMENTATION] remove deprecated environment variable doc

## What changes were proposed in this pull request?

Like `SPARK_JAVA_OPTS` and `SPARK_CLASSPATH`, we will remove the document for 
`SPARK_WORKER_INSTANCES` to discourage user not to use them. If they are 
actually used, SparkConf will show a warning message as before.

## How was this patch tested?

Manually tested.

Author: bomeng 

Closes #13533 from bomeng/SPARK-15781.

(cherry picked from commit 3fd3ee038b89821f51f30a4ecd4452b5b3bc6568)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8c294f4a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8c294f4a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8c294f4a

Branch: refs/heads/branch-2.0
Commit: 8c294f4ad95e95f6c8873d7b346394d34cc40975
Parents: 879e8fd
Author: bomeng 
Authored: Sun Jun 12 12:58:34 2016 +0100
Committer: Sean Owen 
Committed: Sun Jun 12 12:58:41 2016 +0100

--
 docs/spark-standalone.md | 9 -
 1 file changed, 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8c294f4a/docs/spark-standalone.md
--
diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md
index fd94c34..40c7293 100644
--- a/docs/spark-standalone.md
+++ b/docs/spark-standalone.md
@@ -134,15 +134,6 @@ You can optionally configure the cluster further by 
setting environment variable
 Port for the worker web UI (default: 8081).
   
   
-SPARK_WORKER_INSTANCES
-
-  Number of worker instances to run on each machine (default: 1). You can 
make this more than 1 if
-  you have have very large machines and would like multiple Spark worker 
processes. If you do set
-  this, make sure to also set SPARK_WORKER_CORES explicitly 
to limit the cores per worker,
-  or else each worker will try to use all the cores.
-
-  
-  
 SPARK_WORKER_DIR
 Directory to run applications in, which will include both logs and 
scratch space (default: SPARK_HOME/work).
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15781][DOCUMENTATION] remove deprecated environment variable doc

2016-06-12 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 8cc22b008 -> 3fd3ee038


[SPARK-15781][DOCUMENTATION] remove deprecated environment variable doc

## What changes were proposed in this pull request?

Like `SPARK_JAVA_OPTS` and `SPARK_CLASSPATH`, we will remove the document for 
`SPARK_WORKER_INSTANCES` to discourage user not to use them. If they are 
actually used, SparkConf will show a warning message as before.

## How was this patch tested?

Manually tested.

Author: bomeng 

Closes #13533 from bomeng/SPARK-15781.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3fd3ee03
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3fd3ee03
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3fd3ee03

Branch: refs/heads/master
Commit: 3fd3ee038b89821f51f30a4ecd4452b5b3bc6568
Parents: 8cc22b0
Author: bomeng 
Authored: Sun Jun 12 12:58:34 2016 +0100
Committer: Sean Owen 
Committed: Sun Jun 12 12:58:34 2016 +0100

--
 docs/spark-standalone.md | 9 -
 1 file changed, 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3fd3ee03/docs/spark-standalone.md
--
diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md
index fd94c34..40c7293 100644
--- a/docs/spark-standalone.md
+++ b/docs/spark-standalone.md
@@ -134,15 +134,6 @@ You can optionally configure the cluster further by 
setting environment variable
 Port for the worker web UI (default: 8081).
   
   
-SPARK_WORKER_INSTANCES
-
-  Number of worker instances to run on each machine (default: 1). You can 
make this more than 1 if
-  you have have very large machines and would like multiple Spark worker 
processes. If you do set
-  this, make sure to also set SPARK_WORKER_CORES explicitly 
to limit the cores per worker,
-  or else each worker will try to use all the cores.
-
-  
-  
 SPARK_WORKER_DIR
 Directory to run applications in, which will include both logs and 
scratch space (default: SPARK_HOME/work).
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15806][DOCUMENTATION] update doc for SPARK_MASTER_IP

2016-06-12 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 3fd3ee038 -> 50248dcff


[SPARK-15806][DOCUMENTATION] update doc for SPARK_MASTER_IP

## What changes were proposed in this pull request?

SPARK_MASTER_IP is a deprecated environment variable. It is replaced by 
SPARK_MASTER_HOST according to MasterArguments.scala.

## How was this patch tested?

Manually verified.

Author: bomeng 

Closes #13543 from bomeng/SPARK-15806.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/50248dcf
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/50248dcf
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/50248dcf

Branch: refs/heads/master
Commit: 50248dcfff3ba79b73323f3a804c1e19a8be6097
Parents: 3fd3ee0
Author: bomeng 
Authored: Sun Jun 12 14:25:48 2016 +0100
Committer: Sean Owen 
Committed: Sun Jun 12 14:25:48 2016 +0100

--
 conf/spark-env.sh.template   | 2 +-
 .../org/apache/spark/deploy/master/MasterArguments.scala | 8 +++-
 docs/spark-standalone.md | 4 ++--
 sbin/start-master.sh | 6 +++---
 sbin/start-slaves.sh | 6 +++---
 5 files changed, 16 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/50248dcf/conf/spark-env.sh.template
--
diff --git a/conf/spark-env.sh.template b/conf/spark-env.sh.template
index 9cffdc3..c750c72 100755
--- a/conf/spark-env.sh.template
+++ b/conf/spark-env.sh.template
@@ -42,7 +42,7 @@
 # - SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G)
 
 # Options for the daemons used in the standalone deploy mode
-# - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
+# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
 # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for 
the master
 # - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. 
"-Dx=y")
 # - SPARK_WORKER_CORES, to set the number of cores to use on this machine

http://git-wip-us.apache.org/repos/asf/spark/blob/50248dcf/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala
index 585e083..c63793c 100644
--- a/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala
@@ -20,18 +20,24 @@ package org.apache.spark.deploy.master
 import scala.annotation.tailrec
 
 import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
 import org.apache.spark.util.{IntParam, Utils}
 
 /**
  * Command-line parser for the master.
  */
-private[master] class MasterArguments(args: Array[String], conf: SparkConf) {
+private[master] class MasterArguments(args: Array[String], conf: SparkConf) 
extends Logging {
   var host = Utils.localHostName()
   var port = 7077
   var webUiPort = 8080
   var propertiesFile: String = null
 
   // Check for settings in environment variables
+  if (System.getenv("SPARK_MASTER_IP") != null) {
+logWarning("SPARK_MASTER_IP is deprecated, please use SPARK_MASTER_HOST")
+host = System.getenv("SPARK_MASTER_IP")
+  }
+
   if (System.getenv("SPARK_MASTER_HOST") != null) {
 host = System.getenv("SPARK_MASTER_HOST")
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/50248dcf/docs/spark-standalone.md
--
diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md
index 40c7293..c864c90 100644
--- a/docs/spark-standalone.md
+++ b/docs/spark-standalone.md
@@ -94,8 +94,8 @@ You can optionally configure the cluster further by setting 
environment variable
 
   Environment VariableMeaning
   
-SPARK_MASTER_IP
-Bind the master to a specific IP address, for example a public 
one.
+SPARK_MASTER_HOST
+Bind the master to a specific hostname or IP address, for example a 
public one.
   
   
 SPARK_MASTER_PORT

http://git-wip-us.apache.org/repos/asf/spark/blob/50248dcf/sbin/start-master.sh
--
diff --git a/sbin/start-master.sh b/sbin/start-master.sh
index ce7f177..981cb15 100755
--- a/sbin/start-master.sh
+++ b/sbin/start-master.sh
@@ -47,8 +47,8 @@ if [ "$SPARK_MASTER_PORT" = "" ]; then
   SPARK_MASTER_PORT=7077
 fi
 
-if [ "$SPARK_MASTER_IP" = "" ]; then
-  SPARK_MASTER_IP=`hostname`
+if [ "$SPARK_MASTER_HOST" = "" ]; then
+  SPARK_MASTER_

spark git commit: [SPARK-15806][DOCUMENTATION] update doc for SPARK_MASTER_IP

2016-06-12 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 8c294f4ad -> b75d1c201


[SPARK-15806][DOCUMENTATION] update doc for SPARK_MASTER_IP

## What changes were proposed in this pull request?

SPARK_MASTER_IP is a deprecated environment variable. It is replaced by 
SPARK_MASTER_HOST according to MasterArguments.scala.

## How was this patch tested?

Manually verified.

Author: bomeng 

Closes #13543 from bomeng/SPARK-15806.

(cherry picked from commit 50248dcfff3ba79b73323f3a804c1e19a8be6097)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b75d1c20
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b75d1c20
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b75d1c20

Branch: refs/heads/branch-2.0
Commit: b75d1c20131b438999645d0be6ea5765a2f7da80
Parents: 8c294f4
Author: bomeng 
Authored: Sun Jun 12 14:25:48 2016 +0100
Committer: Sean Owen 
Committed: Sun Jun 12 14:25:56 2016 +0100

--
 conf/spark-env.sh.template   | 2 +-
 .../org/apache/spark/deploy/master/MasterArguments.scala | 8 +++-
 docs/spark-standalone.md | 4 ++--
 sbin/start-master.sh | 6 +++---
 sbin/start-slaves.sh | 6 +++---
 5 files changed, 16 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b75d1c20/conf/spark-env.sh.template
--
diff --git a/conf/spark-env.sh.template b/conf/spark-env.sh.template
index 9cffdc3..c750c72 100755
--- a/conf/spark-env.sh.template
+++ b/conf/spark-env.sh.template
@@ -42,7 +42,7 @@
 # - SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G)
 
 # Options for the daemons used in the standalone deploy mode
-# - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
+# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
 # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for 
the master
 # - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. 
"-Dx=y")
 # - SPARK_WORKER_CORES, to set the number of cores to use on this machine

http://git-wip-us.apache.org/repos/asf/spark/blob/b75d1c20/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala
index 585e083..c63793c 100644
--- a/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala
@@ -20,18 +20,24 @@ package org.apache.spark.deploy.master
 import scala.annotation.tailrec
 
 import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
 import org.apache.spark.util.{IntParam, Utils}
 
 /**
  * Command-line parser for the master.
  */
-private[master] class MasterArguments(args: Array[String], conf: SparkConf) {
+private[master] class MasterArguments(args: Array[String], conf: SparkConf) 
extends Logging {
   var host = Utils.localHostName()
   var port = 7077
   var webUiPort = 8080
   var propertiesFile: String = null
 
   // Check for settings in environment variables
+  if (System.getenv("SPARK_MASTER_IP") != null) {
+logWarning("SPARK_MASTER_IP is deprecated, please use SPARK_MASTER_HOST")
+host = System.getenv("SPARK_MASTER_IP")
+  }
+
   if (System.getenv("SPARK_MASTER_HOST") != null) {
 host = System.getenv("SPARK_MASTER_HOST")
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/b75d1c20/docs/spark-standalone.md
--
diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md
index 40c7293..c864c90 100644
--- a/docs/spark-standalone.md
+++ b/docs/spark-standalone.md
@@ -94,8 +94,8 @@ You can optionally configure the cluster further by setting 
environment variable
 
   Environment VariableMeaning
   
-SPARK_MASTER_IP
-Bind the master to a specific IP address, for example a public 
one.
+SPARK_MASTER_HOST
+Bind the master to a specific hostname or IP address, for example a 
public one.
   
   
 SPARK_MASTER_PORT

http://git-wip-us.apache.org/repos/asf/spark/blob/b75d1c20/sbin/start-master.sh
--
diff --git a/sbin/start-master.sh b/sbin/start-master.sh
index ce7f177..981cb15 100755
--- a/sbin/start-master.sh
+++ b/sbin/start-master.sh
@@ -47,8 +47,8 @@ if [ "$SPARK_MASTER_PORT" = "" ]; then
   SPARK_MASTER_PORT=7077
 fi
 
-if [ "$SPARK_MASTER

spark git commit: [SPARK-15813] Improve Canceling log message to make it less ambiguous

2016-06-13 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 b96e7f6aa -> 41f309bfb


[SPARK-15813] Improve Canceling log message to make it less ambiguous

## What changes were proposed in this pull request?
Add new desired executor number to make the log message less ambiguous.

## How was this patch tested?
This is a trivial change

Author: Peter Ableda 

Closes #13552 from peterableda/patch-1.

(cherry picked from commit d681742b2d37bd68cf5d8d3161e0f48846f6f9d4)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/41f309bf
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/41f309bf
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/41f309bf

Branch: refs/heads/branch-2.0
Commit: 41f309bfbcefcc9612efb7c0571a4009147e5896
Parents: b96e7f6
Author: Peter Ableda 
Authored: Mon Jun 13 09:40:17 2016 +0100
Committer: Sean Owen 
Committed: Mon Jun 13 09:40:25 2016 +0100

--
 .../main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala   | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/41f309bf/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
--
diff --git 
a/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala 
b/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
index b110d82..1b80071 100644
--- a/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
+++ b/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
@@ -354,7 +354,8 @@ private[yarn] class YarnAllocator(
 
 } else if (missing < 0) {
   val numToCancel = math.min(numPendingAllocate, -missing)
-  logInfo(s"Canceling requests for $numToCancel executor containers")
+  logInfo(s"Canceling requests for $numToCancel executor container(s) to 
have a new desired " +
+s"total $targetNumExecutors executors.")
 
   val matchingRequests = amClient.getMatchingRequests(RM_REQUEST_PRIORITY, 
ANY_HOST, resource)
   if (!matchingRequests.isEmpty) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15813] Improve Canceling log message to make it less ambiguous

2016-06-13 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master e2ab79d5e -> d681742b2


[SPARK-15813] Improve Canceling log message to make it less ambiguous

## What changes were proposed in this pull request?
Add new desired executor number to make the log message less ambiguous.

## How was this patch tested?
This is a trivial change

Author: Peter Ableda 

Closes #13552 from peterableda/patch-1.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d681742b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d681742b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d681742b

Branch: refs/heads/master
Commit: d681742b2d37bd68cf5d8d3161e0f48846f6f9d4
Parents: e2ab79d
Author: Peter Ableda 
Authored: Mon Jun 13 09:40:17 2016 +0100
Committer: Sean Owen 
Committed: Mon Jun 13 09:40:17 2016 +0100

--
 .../main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala   | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d681742b/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
--
diff --git 
a/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala 
b/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
index b110d82..1b80071 100644
--- a/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
+++ b/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
@@ -354,7 +354,8 @@ private[yarn] class YarnAllocator(
 
 } else if (missing < 0) {
   val numToCancel = math.min(numPendingAllocate, -missing)
-  logInfo(s"Canceling requests for $numToCancel executor containers")
+  logInfo(s"Canceling requests for $numToCancel executor container(s) to 
have a new desired " +
+s"total $targetNumExecutors executors.")
 
   val matchingRequests = amClient.getMatchingRequests(RM_REQUEST_PRIORITY, 
ANY_HOST, resource)
   if (!matchingRequests.isEmpty) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [DOCUMENTATION] fixed typos in python programming guide

2016-06-14 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 688b6ef9d -> a87a56f5c


[DOCUMENTATION] fixed typos in python programming guide

## What changes were proposed in this pull request?

minor typo

## How was this patch tested?

minor typo in the doc, should be self explanatory

Author: Mortada Mehyar 

Closes #13639 from mortada/typo.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a87a56f5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a87a56f5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a87a56f5

Branch: refs/heads/master
Commit: a87a56f5c70792eccbb57046f6b26d40494c380a
Parents: 688b6ef
Author: Mortada Mehyar 
Authored: Tue Jun 14 09:45:46 2016 +0100
Committer: Sean Owen 
Committed: Tue Jun 14 09:45:46 2016 +0100

--
 docs/programming-guide.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a87a56f5/docs/programming-guide.md
--
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 3f081a0..97bcb51 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -491,7 +491,7 @@ for examples of using Cassandra / HBase ```InputFormat``` 
and ```OutputFormat```
 
 RDDs support two types of operations: *transformations*, which create a new 
dataset from an existing one, and *actions*, which return a value to the driver 
program after running a computation on the dataset. For example, `map` is a 
transformation that passes each dataset element through a function and returns 
a new RDD representing the results. On the other hand, `reduce` is an action 
that aggregates all the elements of the RDD using some function and returns the 
final result to the driver program (although there is also a parallel 
`reduceByKey` that returns a distributed dataset).
 
-All transformations in Spark are lazy, in that they do not compute 
their results right away. Instead, they just remember the transformations 
applied to some base dataset (e.g. a file). The transformations are only 
computed when an action requires a result to be returned to the driver program. 
This design enables Spark to run more efficiently -- for example, we can 
realize that a dataset created through `map` will be used in a `reduce` and 
return only the result of the `reduce` to the driver, rather than the larger 
mapped dataset.
+All transformations in Spark are lazy, in that they do not compute 
their results right away. Instead, they just remember the transformations 
applied to some base dataset (e.g. a file). The transformations are only 
computed when an action requires a result to be returned to the driver program. 
This design enables Spark to run more efficiently. For example, we can realize 
that a dataset created through `map` will be used in a `reduce` and return only 
the result of the `reduce` to the driver, rather than the larger mapped dataset.
 
 By default, each transformed RDD may be recomputed each time you run an action 
on it. However, you may also *persist* an RDD in memory using the `persist` (or 
`cache`) method, in which case Spark will keep the elements around on the 
cluster for much faster access the next time you query it. There is also 
support for persisting RDDs on disk, or replicated across multiple nodes.
 
@@ -618,7 +618,7 @@ class MyClass {
 }
 {% endhighlight %}
 
-Here, if we create a `new MyClass` and call `doStuff` on it, the `map` inside 
there references the
+Here, if we create a new `MyClass` instance and call `doStuff` on it, the 
`map` inside there references the
 `func1` method *of that `MyClass` instance*, so the whole object needs to be 
sent to the cluster. It is
 similar to writing `rdd.map(x => this.func1(x))`.
 
@@ -1156,7 +1156,7 @@ to disk, incurring the additional overhead of disk I/O 
and increased garbage col
 Shuffle also generates a large number of intermediate files on disk. As of 
Spark 1.3, these files
 are preserved until the corresponding RDDs are no longer used and are garbage 
collected.
 This is done so the shuffle files don't need to be re-created if the lineage 
is re-computed.
-Garbage collection may happen only after a long period time, if the 
application retains references
+Garbage collection may happen only after a long period of time, if the 
application retains references
 to these RDDs or if GC does not kick in frequently. This means that 
long-running Spark jobs may
 consume a large amount of disk space. The temporary storage directory is 
specified by the
 `spark.local.dir` configuration parameter when configuring the Spark context.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional comma

spark git commit: [DOCUMENTATION] fixed typos in python programming guide

2016-06-14 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 974be6241 -> cf52375b9


[DOCUMENTATION] fixed typos in python programming guide

## What changes were proposed in this pull request?

minor typo

## How was this patch tested?

minor typo in the doc, should be self explanatory

Author: Mortada Mehyar 

Closes #13639 from mortada/typo.

(cherry picked from commit a87a56f5c70792eccbb57046f6b26d40494c380a)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cf52375b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cf52375b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cf52375b

Branch: refs/heads/branch-2.0
Commit: cf52375b9f3da84d6aad31134d4f2859de7d447c
Parents: 974be62
Author: Mortada Mehyar 
Authored: Tue Jun 14 09:45:46 2016 +0100
Committer: Sean Owen 
Committed: Tue Jun 14 09:45:56 2016 +0100

--
 docs/programming-guide.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cf52375b/docs/programming-guide.md
--
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 3f081a0..97bcb51 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -491,7 +491,7 @@ for examples of using Cassandra / HBase ```InputFormat``` 
and ```OutputFormat```
 
 RDDs support two types of operations: *transformations*, which create a new 
dataset from an existing one, and *actions*, which return a value to the driver 
program after running a computation on the dataset. For example, `map` is a 
transformation that passes each dataset element through a function and returns 
a new RDD representing the results. On the other hand, `reduce` is an action 
that aggregates all the elements of the RDD using some function and returns the 
final result to the driver program (although there is also a parallel 
`reduceByKey` that returns a distributed dataset).
 
-All transformations in Spark are lazy, in that they do not compute 
their results right away. Instead, they just remember the transformations 
applied to some base dataset (e.g. a file). The transformations are only 
computed when an action requires a result to be returned to the driver program. 
This design enables Spark to run more efficiently -- for example, we can 
realize that a dataset created through `map` will be used in a `reduce` and 
return only the result of the `reduce` to the driver, rather than the larger 
mapped dataset.
+All transformations in Spark are lazy, in that they do not compute 
their results right away. Instead, they just remember the transformations 
applied to some base dataset (e.g. a file). The transformations are only 
computed when an action requires a result to be returned to the driver program. 
This design enables Spark to run more efficiently. For example, we can realize 
that a dataset created through `map` will be used in a `reduce` and return only 
the result of the `reduce` to the driver, rather than the larger mapped dataset.
 
 By default, each transformed RDD may be recomputed each time you run an action 
on it. However, you may also *persist* an RDD in memory using the `persist` (or 
`cache`) method, in which case Spark will keep the elements around on the 
cluster for much faster access the next time you query it. There is also 
support for persisting RDDs on disk, or replicated across multiple nodes.
 
@@ -618,7 +618,7 @@ class MyClass {
 }
 {% endhighlight %}
 
-Here, if we create a `new MyClass` and call `doStuff` on it, the `map` inside 
there references the
+Here, if we create a new `MyClass` instance and call `doStuff` on it, the 
`map` inside there references the
 `func1` method *of that `MyClass` instance*, so the whole object needs to be 
sent to the cluster. It is
 similar to writing `rdd.map(x => this.func1(x))`.
 
@@ -1156,7 +1156,7 @@ to disk, incurring the additional overhead of disk I/O 
and increased garbage col
 Shuffle also generates a large number of intermediate files on disk. As of 
Spark 1.3, these files
 are preserved until the corresponding RDDs are no longer used and are garbage 
collected.
 This is done so the shuffle files don't need to be re-created if the lineage 
is re-computed.
-Garbage collection may happen only after a long period time, if the 
application retains references
+Garbage collection may happen only after a long period of time, if the 
application retains references
 to these RDDs or if GC does not kick in frequently. This means that 
long-running Spark jobs may
 consume a large amount of disk space. The temporary storage directory is 
specified by the
 `spark.local.dir` configuration parameter when configuring the Spark context.


---

spark git commit: [SPARK-15821][DOCS] Include parallel build info

2016-06-14 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 96c3500c6 -> a431e3f1f


[SPARK-15821][DOCS] Include parallel build info

## What changes were proposed in this pull request?

We should mention that users can build Spark using multiple threads to decrease 
build times; either here or in "Building Spark"

## How was this patch tested?

Built on machines with between one core to 192 cores using mvn -T 1C and 
observed faster build times with no loss in stability

In response to the question here 
https://issues.apache.org/jira/browse/SPARK-15821 I think we should suggest 
this option as we know it works for Spark and can result in faster builds

Author: Adam Roberts 

Closes #13562 from a-roberts/patch-3.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a431e3f1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a431e3f1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a431e3f1

Branch: refs/heads/master
Commit: a431e3f1f8575e2498650ac767e69fbc903e9929
Parents: 96c3500
Author: Adam Roberts 
Authored: Tue Jun 14 13:59:01 2016 +0100
Committer: Sean Owen 
Committed: Tue Jun 14 13:59:01 2016 +0100

--
 README.md| 2 ++
 dev/make-distribution.sh | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a431e3f1/README.md
--
diff --git a/README.md b/README.md
index d5804d1..c77c429 100644
--- a/README.md
+++ b/README.md
@@ -25,6 +25,8 @@ To build Spark and its example programs, run:
 build/mvn -DskipTests clean package
 
 (You do not need to do this if you downloaded a pre-built package.)
+
+You can build Spark using more than one thread by using the -T option with 
Maven, see ["Parallel builds in Maven 
3"](https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3).
 More detailed documentation is available from the project site, at
 ["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html).
 For developing Spark using an IDE, see 
[Eclipse](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-Eclipse)

http://git-wip-us.apache.org/repos/asf/spark/blob/a431e3f1/dev/make-distribution.sh
--
diff --git a/dev/make-distribution.sh b/dev/make-distribution.sh
index 4f7544f..9be4fdf 100755
--- a/dev/make-distribution.sh
+++ b/dev/make-distribution.sh
@@ -53,7 +53,7 @@ while (( "$#" )); do
 --hadoop)
   echo "Error: '--hadoop' is no longer supported:"
   echo "Error: use Maven profiles and options -Dhadoop.version and 
-Dyarn.version instead."
-  echo "Error: Related profiles include hadoop-2.2, hadoop-2.3 and 
hadoop-2.4."
+  echo "Error: Related profiles include hadoop-2.2, hadoop-2.3, 
hadoop-2.4, hadoop-2.6 and hadoop-2.7."
   exit_with_usage
   ;;
 --with-yarn)
@@ -150,7 +150,7 @@ export MAVEN_OPTS="${MAVEN_OPTS:--Xmx2g 
-XX:MaxPermSize=512M -XX:ReservedCodeCac
 # Store the command as an array because $MVN variable might have spaces in it.
 # Normal quoting tricks don't work.
 # See: http://mywiki.wooledge.org/BashFAQ/050
-BUILD_COMMAND=("$MVN" clean package -DskipTests $@)
+BUILD_COMMAND=("$MVN" -T 1C clean package -DskipTests $@)
 
 # Actually build the jar
 echo -e "\nBuilding with..."


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15821][DOCS] Include parallel build info

2016-06-14 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 d59859d38 -> 0d80bc291


[SPARK-15821][DOCS] Include parallel build info

## What changes were proposed in this pull request?

We should mention that users can build Spark using multiple threads to decrease 
build times; either here or in "Building Spark"

## How was this patch tested?

Built on machines with between one core to 192 cores using mvn -T 1C and 
observed faster build times with no loss in stability

In response to the question here 
https://issues.apache.org/jira/browse/SPARK-15821 I think we should suggest 
this option as we know it works for Spark and can result in faster builds

Author: Adam Roberts 

Closes #13562 from a-roberts/patch-3.

(cherry picked from commit a431e3f1f8575e2498650ac767e69fbc903e9929)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0d80bc29
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0d80bc29
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0d80bc29

Branch: refs/heads/branch-2.0
Commit: 0d80bc291f8c96359b22bda2df8cb7b835e31339
Parents: d59859d
Author: Adam Roberts 
Authored: Tue Jun 14 13:59:01 2016 +0100
Committer: Sean Owen 
Committed: Tue Jun 14 13:59:16 2016 +0100

--
 README.md| 2 ++
 dev/make-distribution.sh | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0d80bc29/README.md
--
diff --git a/README.md b/README.md
index d5804d1..c77c429 100644
--- a/README.md
+++ b/README.md
@@ -25,6 +25,8 @@ To build Spark and its example programs, run:
 build/mvn -DskipTests clean package
 
 (You do not need to do this if you downloaded a pre-built package.)
+
+You can build Spark using more than one thread by using the -T option with 
Maven, see ["Parallel builds in Maven 
3"](https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3).
 More detailed documentation is available from the project site, at
 ["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html).
 For developing Spark using an IDE, see 
[Eclipse](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-Eclipse)

http://git-wip-us.apache.org/repos/asf/spark/blob/0d80bc29/dev/make-distribution.sh
--
diff --git a/dev/make-distribution.sh b/dev/make-distribution.sh
index 4f7544f..9be4fdf 100755
--- a/dev/make-distribution.sh
+++ b/dev/make-distribution.sh
@@ -53,7 +53,7 @@ while (( "$#" )); do
 --hadoop)
   echo "Error: '--hadoop' is no longer supported:"
   echo "Error: use Maven profiles and options -Dhadoop.version and 
-Dyarn.version instead."
-  echo "Error: Related profiles include hadoop-2.2, hadoop-2.3 and 
hadoop-2.4."
+  echo "Error: Related profiles include hadoop-2.2, hadoop-2.3, 
hadoop-2.4, hadoop-2.6 and hadoop-2.7."
   exit_with_usage
   ;;
 --with-yarn)
@@ -150,7 +150,7 @@ export MAVEN_OPTS="${MAVEN_OPTS:--Xmx2g 
-XX:MaxPermSize=512M -XX:ReservedCodeCac
 # Store the command as an array because $MVN variable might have spaces in it.
 # Normal quoting tricks don't work.
 # See: http://mywiki.wooledge.org/BashFAQ/050
-BUILD_COMMAND=("$MVN" clean package -DskipTests $@)
+BUILD_COMMAND=("$MVN" -T 1C clean package -DskipTests $@)
 
 # Actually build the jar
 echo -e "\nBuilding with..."


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: doc fix of HiveThriftServer

2016-06-14 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master a431e3f1f -> 53bb03084


doc fix of HiveThriftServer

## What changes were proposed in this pull request?

Just minor doc fix.

\cc yhuai

Author: Jeff Zhang 

Closes #13659 from zjffdu/doc_fix.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/53bb0308
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/53bb0308
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/53bb0308

Branch: refs/heads/master
Commit: 53bb03084796231f724ff8369490df520e1ee33c
Parents: a431e3f
Author: Jeff Zhang 
Authored: Tue Jun 14 14:28:40 2016 +0100
Committer: Sean Owen 
Committed: Tue Jun 14 14:28:40 2016 +0100

--
 .../apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala | 2 +-
 .../spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala | 4 ++--
 .../apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala  | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/53bb0308/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala
--
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala
index c82fa4e..2e0fa1e 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala
@@ -30,7 +30,7 @@ import org.apache.spark.ui._
 import org.apache.spark.ui.UIUtils._
 
 
-/** Page for Spark Web UI that shows statistics of a thrift server */
+/** Page for Spark Web UI that shows statistics of the thrift server */
 private[ui] class ThriftServerPage(parent: ThriftServerTab) extends 
WebUIPage("") with Logging {
 
   private val listener = parent.listener

http://git-wip-us.apache.org/repos/asf/spark/blob/53bb0308/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala
--
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala
index 008108a..f39e9dc 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala
@@ -29,7 +29,7 @@ import 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.{ExecutionInfo,
 import org.apache.spark.ui._
 import org.apache.spark.ui.UIUtils._
 
-/** Page for Spark Web UI that shows statistics of a streaming job */
+/** Page for Spark Web UI that shows statistics of jobs running in the thrift 
server */
 private[ui] class ThriftServerSessionPage(parent: ThriftServerTab)
   extends WebUIPage("session") with Logging {
 
@@ -60,7 +60,7 @@ private[ui] class ThriftServerSessionPage(parent: 
ThriftServerTab)
 UIUtils.headerSparkPage("JDBC/ODBC Session", content, parent, Some(5000))
   }
 
-  /** Generate basic stats of the streaming program */
+  /** Generate basic stats of the thrift server program */
   private def generateBasicStats(): Seq[Node] = {
 val timeSinceStart = System.currentTimeMillis() - startTime.getTime
 

http://git-wip-us.apache.org/repos/asf/spark/blob/53bb0308/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala
--
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala
index 923ba8a..db20660 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala
@@ -24,7 +24,7 @@ import 
org.apache.spark.sql.hive.thriftserver.ui.ThriftServerTab._
 import org.apache.spark.ui.{SparkUI, SparkUITab}
 
 /**
- * Spark Web UI tab that shows statistics of a streaming job.
+ * Spark Web UI tab that shows statistics of jobs running in the thrift server.
  * This assumes the given SparkContext has enabled its SparkUI.
  */
 private[thriftserver] class ThriftServerTab(sparkContext: SparkContext)


---

spark git commit: doc fix of HiveThriftServer

2016-06-14 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 0d80bc291 -> e90ba2287


doc fix of HiveThriftServer

## What changes were proposed in this pull request?

Just minor doc fix.

\cc yhuai

Author: Jeff Zhang 

Closes #13659 from zjffdu/doc_fix.

(cherry picked from commit 53bb03084796231f724ff8369490df520e1ee33c)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e90ba228
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e90ba228
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e90ba228

Branch: refs/heads/branch-2.0
Commit: e90ba228787c0a8b50855bafb0bc16eddee8329b
Parents: 0d80bc2
Author: Jeff Zhang 
Authored: Tue Jun 14 14:28:40 2016 +0100
Committer: Sean Owen 
Committed: Tue Jun 14 14:28:54 2016 +0100

--
 .../apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala | 2 +-
 .../spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala | 4 ++--
 .../apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala  | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e90ba228/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala
--
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala
index c82fa4e..2e0fa1e 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala
@@ -30,7 +30,7 @@ import org.apache.spark.ui._
 import org.apache.spark.ui.UIUtils._
 
 
-/** Page for Spark Web UI that shows statistics of a thrift server */
+/** Page for Spark Web UI that shows statistics of the thrift server */
 private[ui] class ThriftServerPage(parent: ThriftServerTab) extends 
WebUIPage("") with Logging {
 
   private val listener = parent.listener

http://git-wip-us.apache.org/repos/asf/spark/blob/e90ba228/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala
--
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala
index 008108a..f39e9dc 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala
@@ -29,7 +29,7 @@ import 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.{ExecutionInfo,
 import org.apache.spark.ui._
 import org.apache.spark.ui.UIUtils._
 
-/** Page for Spark Web UI that shows statistics of a streaming job */
+/** Page for Spark Web UI that shows statistics of jobs running in the thrift 
server */
 private[ui] class ThriftServerSessionPage(parent: ThriftServerTab)
   extends WebUIPage("session") with Logging {
 
@@ -60,7 +60,7 @@ private[ui] class ThriftServerSessionPage(parent: 
ThriftServerTab)
 UIUtils.headerSparkPage("JDBC/ODBC Session", content, parent, Some(5000))
   }
 
-  /** Generate basic stats of the streaming program */
+  /** Generate basic stats of the thrift server program */
   private def generateBasicStats(): Seq[Node] = {
 val timeSinceStart = System.currentTimeMillis() - startTime.getTime
 

http://git-wip-us.apache.org/repos/asf/spark/blob/e90ba228/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala
--
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala
index 923ba8a..db20660 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala
@@ -24,7 +24,7 @@ import 
org.apache.spark.sql.hive.thriftserver.ui.ThriftServerTab._
 import org.apache.spark.ui.{SparkUI, SparkUITab}
 
 /**
- * Spark Web UI tab that shows statistics of a streaming job.
+ * Spark Web UI tab that shows statistics of jobs running in the thrift server.
  * This assumes the given SparkContext has enabled its SparkUI.
  */
 private[thrif

spark git commit: [MINOR] Clean up several build warnings, mostly due to internal use of old accumulators

2016-06-14 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 e03c25193 -> 24539223b


[MINOR] Clean up several build warnings, mostly due to internal use of old 
accumulators

Another PR to clean up recent build warnings. This particularly cleans up 
several instances of the old accumulator API usage in tests that are 
straightforward to update. I think this qualifies as "minor".

Jenkins

Author: Sean Owen 

Closes #13642 from srowen/BuildWarnings.

(cherry picked from commit 6151d2641f91c8e3ec0c324e78afb46cdb2ef111)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/24539223
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/24539223
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/24539223

Branch: refs/heads/branch-2.0
Commit: 24539223b043b621a377251bdab206833af78d0c
Parents: e03c251
Author: Sean Owen 
Authored: Tue Jun 14 09:40:07 2016 -0700
Committer: Sean Owen 
Committed: Tue Jun 14 20:36:30 2016 +0100

--
 core/pom.xml|   6 +-
 .../spark/scheduler/DAGSchedulerSuite.scala |  12 +--
 .../spark/scheduler/TaskContextSuite.scala  |   9 +-
 .../spark/sql/execution/debug/package.scala |  34 +++---
 .../sql/execution/metric/SQLMetricsSuite.scala  | 105 +--
 .../spark/deploy/yarn/YarnAllocatorSuite.scala  |   1 +
 6 files changed, 31 insertions(+), 136 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/24539223/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index f5fdb40..90c8f97 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -356,12 +356,12 @@
 generate-resources
 
   
-  
+  
 
   
-  
+  
 
-  
+  
 
 
   run

http://git-wip-us.apache.org/repos/asf/spark/blob/24539223/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
index 5bcc8ff..ce4e7a2 100644
--- a/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
@@ -1593,13 +1593,11 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with Timeou
   }
 
   test("misbehaved accumulator should not crash DAGScheduler and 
SparkContext") {
-val acc = new Accumulator[Int](0, new AccumulatorParam[Int] {
-  override def addAccumulator(t1: Int, t2: Int): Int = t1 + t2
-  override def zero(initialValue: Int): Int = 0
-  override def addInPlace(r1: Int, r2: Int): Int = {
-throw new DAGSchedulerSuiteDummyException
-  }
-})
+val acc = new LongAccumulator {
+  override def add(v: java.lang.Long): Unit = throw new 
DAGSchedulerSuiteDummyException
+  override def add(v: Long): Unit = throw new 
DAGSchedulerSuiteDummyException
+}
+sc.register(acc)
 
 // Run this on executors
 sc.parallelize(1 to 10, 2).foreach { item => acc.add(1) }

http://git-wip-us.apache.org/repos/asf/spark/blob/24539223/core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala
index 368668b..9eda79a 100644
--- a/core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala
+++ b/core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala
@@ -146,14 +146,13 @@ class TaskContextSuite extends SparkFunSuite with 
BeforeAndAfter with LocalSpark
   test("accumulators are updated on exception failures") {
 // This means use 1 core and 4 max task failures
 sc = new SparkContext("local[1,4]", "test")
-val param = AccumulatorParam.LongAccumulatorParam
 // Create 2 accumulators, one that counts failed values and another that 
doesn't
-val acc1 = new Accumulator(0L, param, Some("x"), countFailedValues = true)
-val acc2 = new Accumulator(0L, param, Some("y"), countFailedValues = false)
+val acc1 = AccumulatorSuite.createLongAccum("x", true)
+val acc2 = AccumulatorSuite.createLongAccum("y", false)
 // Fail first 3 attempts of every task. This means each task should be run 
4 times.
 sc.parallelize(1 to

spark git commit: [SPARK-15922][MLLIB] `toIndexedRowMatrix` should consider the case `cols < offset+colsPerBlock`

2016-06-16 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master f9bf15d9b -> 36110a830


[SPARK-15922][MLLIB] `toIndexedRowMatrix` should consider the case `cols < 
offset+colsPerBlock`

## What changes were proposed in this pull request?

SPARK-15922 reports the following scenario throwing an exception due to the 
mismatched vector sizes. This PR handles the exceptional case, `cols < (offset 
+ colsPerBlock)`.

**Before**
```scala
scala> import org.apache.spark.mllib.linalg.distributed._
scala> import org.apache.spark.mllib.linalg._
scala> val rows = IndexedRow(0L, new DenseVector(Array(1,2,3))) :: 
IndexedRow(1L, new DenseVector(Array(1,2,3))):: IndexedRow(2L, new 
DenseVector(Array(1,2,3))):: Nil
scala> val rdd = sc.parallelize(rows)
scala> val matrix = new IndexedRowMatrix(rdd, 3, 3)
scala> val bmat = matrix.toBlockMatrix
scala> val imat = bmat.toIndexedRowMatrix
scala> imat.rows.collect
... // java.lang.IllegalArgumentException: requirement failed: Vectors must be 
the same length!
```

**After**
```scala
...
scala> imat.rows.collect
res0: Array[org.apache.spark.mllib.linalg.distributed.IndexedRow] = 
Array(IndexedRow(0,[1.0,2.0,3.0]), IndexedRow(1,[1.0,2.0,3.0]), 
IndexedRow(2,[1.0,2.0,3.0]))
```

## How was this patch tested?

Pass the Jenkins tests (including the above case)

Author: Dongjoon Hyun 

Closes #13643 from dongjoon-hyun/SPARK-15922.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/36110a83
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/36110a83
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/36110a83

Branch: refs/heads/master
Commit: 36110a8306608186696c536028d2776e022d305a
Parents: f9bf15d
Author: Dongjoon Hyun 
Authored: Thu Jun 16 23:02:46 2016 +0200
Committer: Sean Owen 
Committed: Thu Jun 16 23:02:46 2016 +0200

--
 .../org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala | 2 +-
 .../spark/mllib/linalg/distributed/BlockMatrixSuite.scala   | 5 +
 2 files changed, 6 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/36110a83/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
index 7a24617..639295c 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
@@ -288,7 +288,7 @@ class BlockMatrix @Since("1.3.0") (
 
   vectors.foreach { case (blockColIdx: Int, vec: BV[Double]) =>
 val offset = colsPerBlock * blockColIdx
-wholeVector(offset until offset + colsPerBlock) := vec
+wholeVector(offset until Math.min(cols, offset + colsPerBlock)) := vec
   }
   new IndexedRow(rowIdx, Vectors.fromBreeze(wholeVector))
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/36110a83/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala
--
diff --git 
a/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala
index e5a2cbb..61266f3 100644
--- 
a/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala
@@ -135,6 +135,11 @@ class BlockMatrixSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 assert(rowMat.numCols() === n)
 assert(rowMat.toBreeze() === gridBasedMat.toBreeze())
 
+// SPARK-15922: BlockMatrix to IndexedRowMatrix throws an error"
+val bmat = rowMat.toBlockMatrix
+val imat = bmat.toIndexedRowMatrix
+imat.rows.collect
+
 val rows = 1
 val cols = 10
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15922][MLLIB] `toIndexedRowMatrix` should consider the case `cols < offset+colsPerBlock`

2016-06-16 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 5b003c9bc -> 579268426


[SPARK-15922][MLLIB] `toIndexedRowMatrix` should consider the case `cols < 
offset+colsPerBlock`

## What changes were proposed in this pull request?

SPARK-15922 reports the following scenario throwing an exception due to the 
mismatched vector sizes. This PR handles the exceptional case, `cols < (offset 
+ colsPerBlock)`.

**Before**
```scala
scala> import org.apache.spark.mllib.linalg.distributed._
scala> import org.apache.spark.mllib.linalg._
scala> val rows = IndexedRow(0L, new DenseVector(Array(1,2,3))) :: 
IndexedRow(1L, new DenseVector(Array(1,2,3))):: IndexedRow(2L, new 
DenseVector(Array(1,2,3))):: Nil
scala> val rdd = sc.parallelize(rows)
scala> val matrix = new IndexedRowMatrix(rdd, 3, 3)
scala> val bmat = matrix.toBlockMatrix
scala> val imat = bmat.toIndexedRowMatrix
scala> imat.rows.collect
... // java.lang.IllegalArgumentException: requirement failed: Vectors must be 
the same length!
```

**After**
```scala
...
scala> imat.rows.collect
res0: Array[org.apache.spark.mllib.linalg.distributed.IndexedRow] = 
Array(IndexedRow(0,[1.0,2.0,3.0]), IndexedRow(1,[1.0,2.0,3.0]), 
IndexedRow(2,[1.0,2.0,3.0]))
```

## How was this patch tested?

Pass the Jenkins tests (including the above case)

Author: Dongjoon Hyun 

Closes #13643 from dongjoon-hyun/SPARK-15922.

(cherry picked from commit 36110a8306608186696c536028d2776e022d305a)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/57926842
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/57926842
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/57926842

Branch: refs/heads/branch-2.0
Commit: 5792684268b273562e694855eb671c21c4044280
Parents: 5b003c9
Author: Dongjoon Hyun 
Authored: Thu Jun 16 23:02:46 2016 +0200
Committer: Sean Owen 
Committed: Thu Jun 16 23:03:00 2016 +0200

--
 .../org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala | 2 +-
 .../spark/mllib/linalg/distributed/BlockMatrixSuite.scala   | 5 +
 2 files changed, 6 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/57926842/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
index 7a24617..639295c 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
@@ -288,7 +288,7 @@ class BlockMatrix @Since("1.3.0") (
 
   vectors.foreach { case (blockColIdx: Int, vec: BV[Double]) =>
 val offset = colsPerBlock * blockColIdx
-wholeVector(offset until offset + colsPerBlock) := vec
+wholeVector(offset until Math.min(cols, offset + colsPerBlock)) := vec
   }
   new IndexedRow(rowIdx, Vectors.fromBreeze(wholeVector))
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/57926842/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala
--
diff --git 
a/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala
index e5a2cbb..61266f3 100644
--- 
a/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala
@@ -135,6 +135,11 @@ class BlockMatrixSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 assert(rowMat.numCols() === n)
 assert(rowMat.toBreeze() === gridBasedMat.toBreeze())
 
+// SPARK-15922: BlockMatrix to IndexedRowMatrix throws an error"
+val bmat = rowMat.toBlockMatrix
+val imat = bmat.toIndexedRowMatrix
+imat.rows.collect
+
 val rows = 1
 val cols = 10
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15796][CORE] Reduce spark.memory.fraction default to avoid overrunning old gen in JVM default config

2016-06-16 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 36110a830 -> 457126e42


[SPARK-15796][CORE] Reduce spark.memory.fraction default to avoid overrunning 
old gen in JVM default config

## What changes were proposed in this pull request?

Reduce `spark.memory.fraction` default to 0.6 in order to make it fit within 
default JVM old generation size (2/3 heap). See JIRA discussion. This means a 
full cache doesn't spill into the new gen. CC andrewor14

## How was this patch tested?

Jenkins tests.

Author: Sean Owen 

Closes #13618 from srowen/SPARK-15796.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/457126e4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/457126e4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/457126e4

Branch: refs/heads/master
Commit: 457126e420e66228cc68def4bc3d87e7a282069a
Parents: 36110a8
Author: Sean Owen 
Authored: Thu Jun 16 23:04:10 2016 +0200
Committer: Sean Owen 
Committed: Thu Jun 16 23:04:10 2016 +0200

--
 .../spark/memory/UnifiedMemoryManager.scala   |  8 
 .../scala/org/apache/spark/DistributedSuite.scala |  2 +-
 docs/configuration.md |  7 ---
 docs/tuning.md| 18 +-
 4 files changed, 26 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/457126e4/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala 
b/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala
index ae747c1..c7b36be 100644
--- a/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala
+++ b/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala
@@ -25,9 +25,9 @@ import org.apache.spark.storage.BlockId
  * either side can borrow memory from the other.
  *
  * The region shared between execution and storage is a fraction of (the total 
heap space - 300MB)
- * configurable through `spark.memory.fraction` (default 0.75). The position 
of the boundary
+ * configurable through `spark.memory.fraction` (default 0.6). The position of 
the boundary
  * within this space is further determined by `spark.memory.storageFraction` 
(default 0.5).
- * This means the size of the storage region is 0.75 * 0.5 = 0.375 of the heap 
space by default.
+ * This means the size of the storage region is 0.6 * 0.5 = 0.3 of the heap 
space by default.
  *
  * Storage can borrow as much execution memory as is free until execution 
reclaims its space.
  * When this happens, cached blocks will be evicted from memory until 
sufficient borrowed
@@ -187,7 +187,7 @@ object UnifiedMemoryManager {
   // Set aside a fixed amount of memory for non-storage, non-execution 
purposes.
   // This serves a function similar to `spark.memory.fraction`, but guarantees 
that we reserve
   // sufficient memory for the system even for small heaps. E.g. if we have a 
1GB JVM, then
-  // the memory used for execution and storage will be (1024 - 300) * 0.75 = 
543MB by default.
+  // the memory used for execution and storage will be (1024 - 300) * 0.6 = 
434MB by default.
   private val RESERVED_SYSTEM_MEMORY_BYTES = 300 * 1024 * 1024
 
   def apply(conf: SparkConf, numCores: Int): UnifiedMemoryManager = {
@@ -223,7 +223,7 @@ object UnifiedMemoryManager {
   }
 }
 val usableMemory = systemMemory - reservedMemory
-val memoryFraction = conf.getDouble("spark.memory.fraction", 0.75)
+val memoryFraction = conf.getDouble("spark.memory.fraction", 0.6)
 (usableMemory * memoryFraction).toLong
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/457126e4/core/src/test/scala/org/apache/spark/DistributedSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/DistributedSuite.scala 
b/core/src/test/scala/org/apache/spark/DistributedSuite.scala
index 6e69fc4..0515e6e 100644
--- a/core/src/test/scala/org/apache/spark/DistributedSuite.scala
+++ b/core/src/test/scala/org/apache/spark/DistributedSuite.scala
@@ -223,7 +223,7 @@ class DistributedSuite extends SparkFunSuite with Matchers 
with LocalSparkContex
 
   test("compute when only some partitions fit in memory") {
 val size = 1
-val numPartitions = 10
+val numPartitions = 20
 val conf = new SparkConf()
   .set("spark.storage.unrollMemoryThreshold", "1024")
   .set("spark.testing.memory", size.toString)

http://git-wip-us.apache.org/repos/asf/spark/blob/457126e4/docs/configuration.md
--
diff

spark git commit: [SPARK-15796][CORE] Reduce spark.memory.fraction default to avoid overrunning old gen in JVM default config

2016-06-16 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 579268426 -> 095ddb4c9


[SPARK-15796][CORE] Reduce spark.memory.fraction default to avoid overrunning 
old gen in JVM default config

## What changes were proposed in this pull request?

Reduce `spark.memory.fraction` default to 0.6 in order to make it fit within 
default JVM old generation size (2/3 heap). See JIRA discussion. This means a 
full cache doesn't spill into the new gen. CC andrewor14

## How was this patch tested?

Jenkins tests.

Author: Sean Owen 

Closes #13618 from srowen/SPARK-15796.

(cherry picked from commit 457126e420e66228cc68def4bc3d87e7a282069a)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/095ddb4c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/095ddb4c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/095ddb4c

Branch: refs/heads/branch-2.0
Commit: 095ddb4c9e7ab9193c15c69eb057a9bb2dbdaed1
Parents: 5792684
Author: Sean Owen 
Authored: Thu Jun 16 23:04:10 2016 +0200
Committer: Sean Owen 
Committed: Thu Jun 16 23:04:19 2016 +0200

--
 .../spark/memory/UnifiedMemoryManager.scala   |  8 
 .../scala/org/apache/spark/DistributedSuite.scala |  2 +-
 docs/configuration.md |  7 ---
 docs/tuning.md| 18 +-
 4 files changed, 26 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/095ddb4c/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala 
b/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala
index ae747c1..c7b36be 100644
--- a/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala
+++ b/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala
@@ -25,9 +25,9 @@ import org.apache.spark.storage.BlockId
  * either side can borrow memory from the other.
  *
  * The region shared between execution and storage is a fraction of (the total 
heap space - 300MB)
- * configurable through `spark.memory.fraction` (default 0.75). The position 
of the boundary
+ * configurable through `spark.memory.fraction` (default 0.6). The position of 
the boundary
  * within this space is further determined by `spark.memory.storageFraction` 
(default 0.5).
- * This means the size of the storage region is 0.75 * 0.5 = 0.375 of the heap 
space by default.
+ * This means the size of the storage region is 0.6 * 0.5 = 0.3 of the heap 
space by default.
  *
  * Storage can borrow as much execution memory as is free until execution 
reclaims its space.
  * When this happens, cached blocks will be evicted from memory until 
sufficient borrowed
@@ -187,7 +187,7 @@ object UnifiedMemoryManager {
   // Set aside a fixed amount of memory for non-storage, non-execution 
purposes.
   // This serves a function similar to `spark.memory.fraction`, but guarantees 
that we reserve
   // sufficient memory for the system even for small heaps. E.g. if we have a 
1GB JVM, then
-  // the memory used for execution and storage will be (1024 - 300) * 0.75 = 
543MB by default.
+  // the memory used for execution and storage will be (1024 - 300) * 0.6 = 
434MB by default.
   private val RESERVED_SYSTEM_MEMORY_BYTES = 300 * 1024 * 1024
 
   def apply(conf: SparkConf, numCores: Int): UnifiedMemoryManager = {
@@ -223,7 +223,7 @@ object UnifiedMemoryManager {
   }
 }
 val usableMemory = systemMemory - reservedMemory
-val memoryFraction = conf.getDouble("spark.memory.fraction", 0.75)
+val memoryFraction = conf.getDouble("spark.memory.fraction", 0.6)
 (usableMemory * memoryFraction).toLong
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/095ddb4c/core/src/test/scala/org/apache/spark/DistributedSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/DistributedSuite.scala 
b/core/src/test/scala/org/apache/spark/DistributedSuite.scala
index 6e69fc4..0515e6e 100644
--- a/core/src/test/scala/org/apache/spark/DistributedSuite.scala
+++ b/core/src/test/scala/org/apache/spark/DistributedSuite.scala
@@ -223,7 +223,7 @@ class DistributedSuite extends SparkFunSuite with Matchers 
with LocalSparkContex
 
   test("compute when only some partitions fit in memory") {
 val size = 1
-val numPartitions = 10
+val numPartitions = 20
 val conf = new SparkConf()
   .set("spark.storage.unrollMemoryThreshold", "1024")
   .set("spark.testing.memory", size.toString)

http://git-w

spark git commit: [SPARK-15942][REPL] Unblock `:reset` command in REPL.

2016-06-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 001a58960 -> 1b3a9b966


[SPARK-15942][REPL] Unblock `:reset` command in REPL.

## What changes were proposed in this pull
(Paste from JIRA issue.)
As a follow up for SPARK-15697, I have following semantics for `:reset` command.
On `:reset` we forget all that user has done but not the initialization of 
spark. To avoid confusion or make it more clear, we show the message `spark` 
and `sc` are not erased, infact they are in same state as they were left by 
previous operations done by the user.
While doing above, somewhere I felt that this is not usually what reset means. 
But an accidental shutdown of a cluster can be very costly, so may be in that 
sense this is less surprising and still useful.

## How was this patch tested?

Manually, by calling `:reset` command, by both altering the state of 
SparkContext and creating some local variables.

Author: Prashant Sharma 
Author: Prashant Sharma 

Closes #13661 from ScrapCodes/repl-reset-command.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1b3a9b96
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1b3a9b96
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1b3a9b96

Branch: refs/heads/master
Commit: 1b3a9b966a7813e2406dfb020e83605af22f9ef3
Parents: 001a589
Author: Prashant Sharma 
Authored: Sun Jun 19 20:12:00 2016 +0100
Committer: Sean Owen 
Committed: Sun Jun 19 20:12:00 2016 +0100

--
 .../scala/org/apache/spark/repl/SparkILoop.scala| 16 ++--
 .../scala/org/apache/spark/repl/ReplSuite.scala |  3 ++-
 2 files changed, 16 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1b3a9b96/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala
--
diff --git 
a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala 
b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala
index dcf3209..2707b08 100644
--- a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala
+++ b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala
@@ -36,7 +36,11 @@ class SparkILoop(in0: Option[BufferedReader], out: 
JPrintWriter)
   def initializeSpark() {
 intp.beQuietDuring {
   processLine("""
-@transient val spark = org.apache.spark.repl.Main.createSparkSession()
+@transient val spark = if (org.apache.spark.repl.Main.sparkSession != 
null) {
+org.apache.spark.repl.Main.sparkSession
+  } else {
+org.apache.spark.repl.Main.createSparkSession()
+  }
 @transient val sc = {
   val _sc = spark.sparkContext
   _sc.uiWebUrl.foreach(webUrl => println(s"Spark context Web UI 
available at ${webUrl}"))
@@ -50,6 +54,7 @@ class SparkILoop(in0: Option[BufferedReader], out: 
JPrintWriter)
   processLine("import spark.implicits._")
   processLine("import spark.sql")
   processLine("import org.apache.spark.sql.functions._")
+  replayCommandStack = Nil // remove above commands from session history.
 }
   }
 
@@ -70,7 +75,8 @@ class SparkILoop(in0: Option[BufferedReader], out: 
JPrintWriter)
 echo("Type :help for more information.")
   }
 
-  private val blockedCommands = Set[String]("reset")
+  /** Add repl commands that needs to be blocked. e.g. reset */
+  private val blockedCommands = Set[String]()
 
   /** Standard commands */
   lazy val sparkStandardCommands: List[SparkILoop.this.LoopCommand] =
@@ -88,6 +94,12 @@ class SparkILoop(in0: Option[BufferedReader], out: 
JPrintWriter)
 initializeSpark()
 super.loadFiles(settings)
   }
+
+  override def resetCommand(line: String): Unit = {
+super.resetCommand(line)
+initializeSpark()
+echo("Note that after :reset, state of SparkSession and SparkContext is 
unchanged.")
+  }
 }
 
 object SparkILoop {

http://git-wip-us.apache.org/repos/asf/spark/blob/1b3a9b96/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala
--
diff --git 
a/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala 
b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala
index 2444e93..c10db94 100644
--- a/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala
+++ b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala
@@ -49,7 +49,8 @@ class ReplSuite extends SparkFunSuite {
 
 val oldExecutorClasspath = System.getProperty(CONF_EXECUTOR_CLASSPATH)
 System.setProperty(CONF_EXECUTOR_CLASSPATH, classpath)
-
+Main.sparkContext = null
+Main.sparkSession = null // causes recreation of SparkContext for each 
test.
 Main.conf.set("spark.master", 

spark git commit: [SPARK-15942][REPL] Unblock `:reset` command in REPL.

2016-06-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 dc85bd0a0 -> 2c1c337ba


[SPARK-15942][REPL] Unblock `:reset` command in REPL.

## What changes were proposed in this pull
(Paste from JIRA issue.)
As a follow up for SPARK-15697, I have following semantics for `:reset` command.
On `:reset` we forget all that user has done but not the initialization of 
spark. To avoid confusion or make it more clear, we show the message `spark` 
and `sc` are not erased, infact they are in same state as they were left by 
previous operations done by the user.
While doing above, somewhere I felt that this is not usually what reset means. 
But an accidental shutdown of a cluster can be very costly, so may be in that 
sense this is less surprising and still useful.

## How was this patch tested?

Manually, by calling `:reset` command, by both altering the state of 
SparkContext and creating some local variables.

Author: Prashant Sharma 
Author: Prashant Sharma 

Closes #13661 from ScrapCodes/repl-reset-command.

(cherry picked from commit 1b3a9b966a7813e2406dfb020e83605af22f9ef3)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2c1c337b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2c1c337b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2c1c337b

Branch: refs/heads/branch-2.0
Commit: 2c1c337ba5984b9e495b4d02bf865e56fd83ab03
Parents: dc85bd0
Author: Prashant Sharma 
Authored: Sun Jun 19 20:12:00 2016 +0100
Committer: Sean Owen 
Committed: Sun Jun 19 20:12:08 2016 +0100

--
 .../scala/org/apache/spark/repl/SparkILoop.scala| 16 ++--
 .../scala/org/apache/spark/repl/ReplSuite.scala |  3 ++-
 2 files changed, 16 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2c1c337b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala
--
diff --git 
a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala 
b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala
index dcf3209..2707b08 100644
--- a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala
+++ b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala
@@ -36,7 +36,11 @@ class SparkILoop(in0: Option[BufferedReader], out: 
JPrintWriter)
   def initializeSpark() {
 intp.beQuietDuring {
   processLine("""
-@transient val spark = org.apache.spark.repl.Main.createSparkSession()
+@transient val spark = if (org.apache.spark.repl.Main.sparkSession != 
null) {
+org.apache.spark.repl.Main.sparkSession
+  } else {
+org.apache.spark.repl.Main.createSparkSession()
+  }
 @transient val sc = {
   val _sc = spark.sparkContext
   _sc.uiWebUrl.foreach(webUrl => println(s"Spark context Web UI 
available at ${webUrl}"))
@@ -50,6 +54,7 @@ class SparkILoop(in0: Option[BufferedReader], out: 
JPrintWriter)
   processLine("import spark.implicits._")
   processLine("import spark.sql")
   processLine("import org.apache.spark.sql.functions._")
+  replayCommandStack = Nil // remove above commands from session history.
 }
   }
 
@@ -70,7 +75,8 @@ class SparkILoop(in0: Option[BufferedReader], out: 
JPrintWriter)
 echo("Type :help for more information.")
   }
 
-  private val blockedCommands = Set[String]("reset")
+  /** Add repl commands that needs to be blocked. e.g. reset */
+  private val blockedCommands = Set[String]()
 
   /** Standard commands */
   lazy val sparkStandardCommands: List[SparkILoop.this.LoopCommand] =
@@ -88,6 +94,12 @@ class SparkILoop(in0: Option[BufferedReader], out: 
JPrintWriter)
 initializeSpark()
 super.loadFiles(settings)
   }
+
+  override def resetCommand(line: String): Unit = {
+super.resetCommand(line)
+initializeSpark()
+echo("Note that after :reset, state of SparkSession and SparkContext is 
unchanged.")
+  }
 }
 
 object SparkILoop {

http://git-wip-us.apache.org/repos/asf/spark/blob/2c1c337b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala
--
diff --git 
a/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala 
b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala
index 2444e93..c10db94 100644
--- a/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala
+++ b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala
@@ -49,7 +49,8 @@ class ReplSuite extends SparkFunSuite {
 
 val oldExecutorClasspath = System.getProperty(CONF_EXECUTOR_CLASSPATH)
 System.setProperty(CONF_EXECUTOR_CLASSPATH, classpath)
-
+Main.sparkContext = null
+Main.spar

spark git commit: [SPARK-16040][MLLIB][DOC] spark.mllib PIC document extra line of refernece

2016-06-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 2c1c337ba -> 80c6d4e3a


[SPARK-16040][MLLIB][DOC] spark.mllib PIC document extra line of refernece

## What changes were proposed in this pull request?

In the 2.0 document, Line "A full example that produces the experiment 
described in the PIC paper can be found under examples/." is redundant.

There is already "Find full example code at 
"examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala"
 in the Spark repo.".

We should remove the first line, which is consistent with other documents.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, 
manual tests)

Manual test

Author: wm...@hotmail.com 

Closes #13755 from wangmiao1981/doc.

(cherry picked from commit 5930d7a2e95b2fe4d470cf39546e5a12306553fe)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/80c6d4e3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/80c6d4e3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/80c6d4e3

Branch: refs/heads/branch-2.0
Commit: 80c6d4e3a49fad4dac46738fe5458641f21b96a1
Parents: 2c1c337
Author: wm...@hotmail.com 
Authored: Sun Jun 19 20:19:40 2016 +0100
Committer: Sean Owen 
Committed: Sun Jun 19 20:19:48 2016 +0100

--
 docs/mllib-clustering.md | 4 
 1 file changed, 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/80c6d4e3/docs/mllib-clustering.md
--
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index 6897ba4..073927c 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -170,10 +170,6 @@ which contains the computed clustering assignments.
 Refer to the [`PowerIterationClustering` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClustering)
 and [`PowerIterationClusteringModel` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClusteringModel)
 for details on the API.
 
 {% include_example 
scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala %}
-
-A full example that produces the experiment described in the PIC paper can be 
found under
-[`examples/`](https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala).
-
 
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16040][MLLIB][DOC] spark.mllib PIC document extra line of refernece

2016-06-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 1b3a9b966 -> 5930d7a2e


[SPARK-16040][MLLIB][DOC] spark.mllib PIC document extra line of refernece

## What changes were proposed in this pull request?

In the 2.0 document, Line "A full example that produces the experiment 
described in the PIC paper can be found under examples/." is redundant.

There is already "Find full example code at 
"examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala"
 in the Spark repo.".

We should remove the first line, which is consistent with other documents.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, 
manual tests)

Manual test

Author: wm...@hotmail.com 

Closes #13755 from wangmiao1981/doc.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5930d7a2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5930d7a2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5930d7a2

Branch: refs/heads/master
Commit: 5930d7a2e95b2fe4d470cf39546e5a12306553fe
Parents: 1b3a9b9
Author: wm...@hotmail.com 
Authored: Sun Jun 19 20:19:40 2016 +0100
Committer: Sean Owen 
Committed: Sun Jun 19 20:19:40 2016 +0100

--
 docs/mllib-clustering.md | 4 
 1 file changed, 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5930d7a2/docs/mllib-clustering.md
--
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index 6897ba4..073927c 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -170,10 +170,6 @@ which contains the computed clustering assignments.
 Refer to the [`PowerIterationClustering` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClustering)
 and [`PowerIterationClusteringModel` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClusteringModel)
 for details on the API.
 
 {% include_example 
scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala %}
-
-A full example that produces the experiment described in the PIC paper can be 
found under
-[`examples/`](https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala).
-
 
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [MINOR] Closing stale pull requests.

2016-06-20 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 359c2e827 -> 92514232e


[MINOR] Closing stale pull requests.

Closes #13114
Closes #10187
Closes #13432
Closes #13550

Author: Sean Owen 

Closes #13781 from srowen/CloseStalePR.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/92514232
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/92514232
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/92514232

Branch: refs/heads/master
Commit: 92514232e52af0f5f0413ed97b9571b1b9daaa90
Parents: 359c2e8
Author: Sean Owen 
Authored: Mon Jun 20 22:12:55 2016 +0100
Committer: Sean Owen 
Committed: Mon Jun 20 22:12:55 2016 +0100

--

--



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16084][SQL] Minor comments update for "DESCRIBE" table

2016-06-21 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master a58f40239 -> f3a768b7b


[SPARK-16084][SQL] Minor comments update for "DESCRIBE" table

## What changes were proposed in this pull request?

1. FORMATTED is actually supported, but partition is not supported;
2. Remove parenthesis as it is not necessary just like anywhere else.

## How was this patch tested?

Minor issue. I do not think it needs a test case!

Author: bomeng 

Closes #13791 from bomeng/SPARK-16084.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f3a768b7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f3a768b7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f3a768b7

Branch: refs/heads/master
Commit: f3a768b7b96f00f33d2fe4e6c0bf4acf373ad4f4
Parents: a58f402
Author: bomeng 
Authored: Tue Jun 21 08:51:43 2016 +0100
Committer: Sean Owen 
Committed: Tue Jun 21 08:51:43 2016 +0100

--
 .../scala/org/apache/spark/sql/execution/SparkSqlParser.scala  | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f3a768b7/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
index 154c25a..2ae8380 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
@@ -279,15 +279,15 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan = 
withOrigin(ctx) {
-// FORMATTED and columns are not supported. Return null and let the parser 
decide what to do
-// with this (create an exception or pass it on to a different system).
+// Describe partition and column are not supported yet. Return null and 
let the parser decide
+// what to do with this (create an exception or pass it on to a different 
system).
 if (ctx.describeColName != null || ctx.partitionSpec != null) {
   null
 } else {
   DescribeTableCommand(
 visitTableIdentifier(ctx.tableIdentifier),
 ctx.EXTENDED != null,
-ctx.FORMATTED() != null)
+ctx.FORMATTED != null)
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16084][SQL] Minor comments update for "DESCRIBE" table

2016-06-21 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 0499ed961 -> 34a8e23c7


[SPARK-16084][SQL] Minor comments update for "DESCRIBE" table

## What changes were proposed in this pull request?

1. FORMATTED is actually supported, but partition is not supported;
2. Remove parenthesis as it is not necessary just like anywhere else.

## How was this patch tested?

Minor issue. I do not think it needs a test case!

Author: bomeng 

Closes #13791 from bomeng/SPARK-16084.

(cherry picked from commit f3a768b7b96f00f33d2fe4e6c0bf4acf373ad4f4)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/34a8e23c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/34a8e23c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/34a8e23c

Branch: refs/heads/branch-2.0
Commit: 34a8e23c739532cd2cb059d9d4e785368d6d0a98
Parents: 0499ed9
Author: bomeng 
Authored: Tue Jun 21 08:51:43 2016 +0100
Committer: Sean Owen 
Committed: Tue Jun 21 08:51:57 2016 +0100

--
 .../scala/org/apache/spark/sql/execution/SparkSqlParser.scala  | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/34a8e23c/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
index 154c25a..2ae8380 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
@@ -279,15 +279,15 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan = 
withOrigin(ctx) {
-// FORMATTED and columns are not supported. Return null and let the parser 
decide what to do
-// with this (create an exception or pass it on to a different system).
+// Describe partition and column are not supported yet. Return null and 
let the parser decide
+// what to do with this (create an exception or pass it on to a different 
system).
 if (ctx.describeColName != null || ctx.partitionSpec != null) {
   null
 } else {
   DescribeTableCommand(
 visitTableIdentifier(ctx.tableIdentifier),
 ctx.EXTENDED != null,
-ctx.FORMATTED() != null)
+ctx.FORMATTED != null)
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-6005][TESTS] Fix flaky test: o.a.s.streaming.kafka.DirectKafkaStreamSuite.offset recovery

2016-06-22 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-1.6 d98fb19c1 -> 4fdac3c27


[SPARK-6005][TESTS] Fix flaky test: 
o.a.s.streaming.kafka.DirectKafkaStreamSuite.offset recovery

## What changes were proposed in this pull request?

Because this test extracts data from `DStream.generatedRDDs` before stopping, 
it may get data before checkpointing. Then after recovering from the 
checkpoint, `recoveredOffsetRanges` may contain something not in 
`offsetRangesBeforeStop`, which will fail the test. Adding `Thread.sleep(1000)` 
before `ssc.stop()` will reproduce this failure.

This PR just moves the logic of `offsetRangesBeforeStop` (also renamed to 
`offsetRangesAfterStop`) after `ssc.stop()` to fix the flaky test.

## How was this patch tested?

Jenkins unit tests.

Author: Shixiong Zhu 

Closes #12903 from zsxwing/SPARK-6005.

(cherry picked from commit 9533f5390a3ad7ab96a7bea01cdb6aed89503a51)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4fdac3c2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4fdac3c2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4fdac3c2

Branch: refs/heads/branch-1.6
Commit: 4fdac3c271eccc5db69c45788af15e955752a163
Parents: d98fb19
Author: Shixiong Zhu 
Authored: Tue May 10 13:26:53 2016 -0700
Committer: Sean Owen 
Committed: Wed Jun 22 14:10:50 2016 +0100

--
 .../kafka/DirectKafkaStreamSuite.scala  | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4fdac3c2/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
--
diff --git 
a/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 
b/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
index 02225d5..feea0ae 100644
--- 
a/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
+++ 
b/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
@@ -280,14 +280,20 @@ class DirectKafkaStreamSuite
   sendDataAndWaitForReceive(i)
 }
 
+ssc.stop()
+
 // Verify that offset ranges were generated
-val offsetRangesBeforeStop = getOffsetRanges(kafkaStream)
-assert(offsetRangesBeforeStop.size >= 1, "No offset ranges generated")
+// Since "offsetRangesAfterStop" will be used to compare with 
"recoveredOffsetRanges", we should
+// collect offset ranges after stopping. Otherwise, because new RDDs keep 
being generated before
+// stopping, we may not be able to get the latest RDDs, then 
"recoveredOffsetRanges" will
+// contain something not in "offsetRangesAfterStop".
+val offsetRangesAfterStop = getOffsetRanges(kafkaStream)
+assert(offsetRangesAfterStop.size >= 1, "No offset ranges generated")
 assert(
-  offsetRangesBeforeStop.head._2.forall { _.fromOffset === 0 },
+  offsetRangesAfterStop.head._2.forall { _.fromOffset === 0 },
   "starting offset not zero"
 )
-ssc.stop()
+
 logInfo("== RESTARTING ")
 
 // Recover context from checkpoints
@@ -297,12 +303,14 @@ class DirectKafkaStreamSuite
 // Verify offset ranges have been recovered
 val recoveredOffsetRanges = getOffsetRanges(recoveredStream)
 assert(recoveredOffsetRanges.size > 0, "No offset ranges recovered")
-val earlierOffsetRangesAsSets = offsetRangesBeforeStop.map { x => (x._1, 
x._2.toSet) }
+val earlierOffsetRangesAsSets = offsetRangesAfterStop.map { x => (x._1, 
x._2.toSet) }
 assert(
   recoveredOffsetRanges.forall { or =>
 earlierOffsetRangesAsSets.contains((or._1, or._2.toSet))
   },
-  "Recovered ranges are not the same as the ones generated"
+  "Recovered ranges are not the same as the ones generated\n" +
+s"recoveredOffsetRanges: $recoveredOffsetRanges\n" +
+s"earlierOffsetRangesAsSets: $earlierOffsetRangesAsSets"
 )
 // Restart context, give more data and verify the total at the end
 // If the total is write that means each records has been received only 
once


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15660][CORE] Update RDD `variance/stdev` description and add popVariance/popStdev

2016-06-23 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 4374a46bf -> 5eef1e6c6


[SPARK-15660][CORE] Update RDD `variance/stdev` description and add 
popVariance/popStdev

## What changes were proposed in this pull request?

In Spark-11490, `variance/stdev` are redefined as the **sample** 
`variance/stdev` instead of population ones. This PR updates the other old 
documentations to prevent users from misunderstanding. This will update the 
following Scala/Java API docs.

- 
http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.api.java.JavaDoubleRDD
- 
http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.rdd.DoubleRDDFunctions
- 
http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.util.StatCounter
- 
http://spark.apache.org/docs/2.0.0-preview/api/java/org/apache/spark/api/java/JavaDoubleRDD.html
- 
http://spark.apache.org/docs/2.0.0-preview/api/java/org/apache/spark/rdd/DoubleRDDFunctions.html
- 
http://spark.apache.org/docs/2.0.0-preview/api/java/org/apache/spark/util/StatCounter.html

Also, this PR adds them `popVariance` and `popStdev` functions clearly.

## How was this patch tested?

Pass the updated Jenkins tests.

Author: Dongjoon Hyun 

Closes #13403 from dongjoon-hyun/SPARK-15660.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5eef1e6c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5eef1e6c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5eef1e6c

Branch: refs/heads/master
Commit: 5eef1e6c6a8b6202fc6db4a90c4caab5169e86c6
Parents: 4374a46
Author: Dongjoon Hyun 
Authored: Thu Jun 23 11:07:34 2016 +0100
Committer: Sean Owen 
Committed: Thu Jun 23 11:07:34 2016 +0100

--
 .../apache/spark/api/java/JavaDoubleRDD.scala   | 17 +--
 .../apache/spark/rdd/DoubleRDDFunctions.scala   | 21 +--
 .../org/apache/spark/util/StatCounter.scala | 22 
 .../java/org/apache/spark/JavaAPISuite.java |  2 ++
 .../org/apache/spark/PartitioningSuite.scala|  4 
 5 files changed, 58 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5eef1e6c/core/src/main/scala/org/apache/spark/api/java/JavaDoubleRDD.scala
--
diff --git a/core/src/main/scala/org/apache/spark/api/java/JavaDoubleRDD.scala 
b/core/src/main/scala/org/apache/spark/api/java/JavaDoubleRDD.scala
index 0d3a523..0026fc9 100644
--- a/core/src/main/scala/org/apache/spark/api/java/JavaDoubleRDD.scala
+++ b/core/src/main/scala/org/apache/spark/api/java/JavaDoubleRDD.scala
@@ -22,6 +22,7 @@ import java.lang.{Double => JDouble}
 import scala.language.implicitConversions
 import scala.reflect.ClassTag
 
+import org.apache.spark.annotation.Since
 import org.apache.spark.Partitioner
 import org.apache.spark.api.java.function.{Function => JFunction}
 import org.apache.spark.partial.{BoundedDouble, PartialResult}
@@ -184,10 +185,10 @@ class JavaDoubleRDD(val srdd: RDD[scala.Double])
   /** Compute the mean of this RDD's elements. */
   def mean(): JDouble = srdd.mean()
 
-  /** Compute the variance of this RDD's elements. */
+  /** Compute the population variance of this RDD's elements. */
   def variance(): JDouble = srdd.variance()
 
-  /** Compute the standard deviation of this RDD's elements. */
+  /** Compute the population standard deviation of this RDD's elements. */
   def stdev(): JDouble = srdd.stdev()
 
   /**
@@ -202,6 +203,18 @@ class JavaDoubleRDD(val srdd: RDD[scala.Double])
*/
   def sampleVariance(): JDouble = srdd.sampleVariance()
 
+  /**
+   * Compute the population standard deviation of this RDD's elements.
+   */
+  @Since("2.1.0")
+  def popStdev(): JDouble = srdd.popStdev()
+
+  /**
+   * Compute the population variance of this RDD's elements.
+   */
+  @Since("2.1.0")
+  def popVariance(): JDouble = srdd.popVariance()
+
   /** Return the approximate mean of the elements in this RDD. */
   def meanApprox(timeout: Long, confidence: JDouble): 
PartialResult[BoundedDouble] =
 srdd.meanApprox(timeout, confidence)

http://git-wip-us.apache.org/repos/asf/spark/blob/5eef1e6c/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala
--
diff --git a/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
b/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala
index 368916a..a05a770 100644
--- a/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala
@@ -17,6 +17,7 @@
 
 package org.apache.spark.rdd
 
+import org.apache.spark.annotation.Since
 import org.apache.spark.TaskContext
 import org.apache.spark.internal.L

spark git commit: [SPARK-16125][YARN] Fix not test yarn cluster mode correctly in YarnClusterSuite

2016-06-24 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 2d2f607bf -> f4fd7432f


[SPARK-16125][YARN] Fix not test yarn cluster mode correctly in YarnClusterSuite

## What changes were proposed in this pull request?

Since SPARK-13220(Deprecate "yarn-client" and "yarn-cluster"), YarnClusterSuite 
doesn't test "yarn cluster" mode correctly.
This pull request fixes it.

## How was this patch tested?
Unit test

(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Author: peng.zhang 

Closes #13836 from renozhang/SPARK-16125-test-yarn-cluster-mode.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f4fd7432
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f4fd7432
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f4fd7432

Branch: refs/heads/master
Commit: f4fd7432fb9cf7b197ccada1378c4f2a6d427522
Parents: 2d2f607
Author: peng.zhang 
Authored: Fri Jun 24 08:28:32 2016 +0100
Committer: Sean Owen 
Committed: Fri Jun 24 08:28:32 2016 +0100

--
 core/src/test/scala/org/apache/spark/util/UtilsSuite.scala   | 3 ++-
 python/pyspark/context.py| 4 
 .../src/main/scala/org/apache/spark/repl/SparkILoop.scala| 2 --
 .../scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala| 2 +-
 4 files changed, 3 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f4fd7432/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala 
b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
index e3a8e83..df279b5 100644
--- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
@@ -754,7 +754,8 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties with Logging {
 
   test("isDynamicAllocationEnabled") {
 val conf = new SparkConf()
-conf.set("spark.master", "yarn-client")
+conf.set("spark.master", "yarn")
+conf.set("spark.submit.deployMode", "client")
 assert(Utils.isDynamicAllocationEnabled(conf) === false)
 assert(Utils.isDynamicAllocationEnabled(
   conf.set("spark.dynamicAllocation.enabled", "false")) === false)

http://git-wip-us.apache.org/repos/asf/spark/blob/f4fd7432/python/pyspark/context.py
--
diff --git a/python/pyspark/context.py b/python/pyspark/context.py
index aec0215..7217a99 100644
--- a/python/pyspark/context.py
+++ b/python/pyspark/context.py
@@ -155,10 +155,6 @@ class SparkContext(object):
 self.appName = self._conf.get("spark.app.name")
 self.sparkHome = self._conf.get("spark.home", None)
 
-# Let YARN know it's a pyspark app, so it distributes needed libraries.
-if self.master == "yarn-client":
-self._conf.set("spark.yarn.isPython", "true")
-
 for (k, v) in self._conf.getAll():
 if k.startswith("spark.executorEnv."):
 varName = k[len("spark.executorEnv."):]

http://git-wip-us.apache.org/repos/asf/spark/blob/f4fd7432/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
--
diff --git 
a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala 
b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
index 8fcab38..e871004 100644
--- a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
+++ b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
@@ -943,8 +943,6 @@ class SparkILoop(
   })
 
   private def process(settings: Settings): Boolean = savingContextLoader {
-if (getMaster() == "yarn-client") System.setProperty("SPARK_YARN_MODE", 
"true")
-
 this.settings = settings
 createInterpreter()
 

http://git-wip-us.apache.org/repos/asf/spark/blob/f4fd7432/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
--
diff --git 
a/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala 
b/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
index 4ce33e0..6b20dea 100644
--- a/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
+++ b/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
@@ -312,7 +312,7 @@ private object YarnClusterDriver extends Logging with 
Matchers {
 
 // If we are running in yarn-cluster mode, verify that driver logs links 
and present and are
 // in the expected format.
-if (conf.get("spark.master") == "yarn-cluster") {
+if (con

spark git commit: [SPARK-16125][YARN] Fix not test yarn cluster mode correctly in YarnClusterSuite

2016-06-24 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 3ccdd6b9c -> b6420db9e


[SPARK-16125][YARN] Fix not test yarn cluster mode correctly in YarnClusterSuite

## What changes were proposed in this pull request?

Since SPARK-13220(Deprecate "yarn-client" and "yarn-cluster"), YarnClusterSuite 
doesn't test "yarn cluster" mode correctly.
This pull request fixes it.

## How was this patch tested?
Unit test

(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Author: peng.zhang 

Closes #13836 from renozhang/SPARK-16125-test-yarn-cluster-mode.

(cherry picked from commit f4fd7432fb9cf7b197ccada1378c4f2a6d427522)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b6420db9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b6420db9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b6420db9

Branch: refs/heads/branch-2.0
Commit: b6420db9ebc59c453a6a523aba68addf5762bb2c
Parents: 3ccdd6b
Author: peng.zhang 
Authored: Fri Jun 24 08:28:32 2016 +0100
Committer: Sean Owen 
Committed: Fri Jun 24 08:28:45 2016 +0100

--
 core/src/test/scala/org/apache/spark/util/UtilsSuite.scala   | 3 ++-
 python/pyspark/context.py| 4 
 .../src/main/scala/org/apache/spark/repl/SparkILoop.scala| 2 --
 .../scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala| 2 +-
 4 files changed, 3 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b6420db9/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala 
b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
index e3a8e83..df279b5 100644
--- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
@@ -754,7 +754,8 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties with Logging {
 
   test("isDynamicAllocationEnabled") {
 val conf = new SparkConf()
-conf.set("spark.master", "yarn-client")
+conf.set("spark.master", "yarn")
+conf.set("spark.submit.deployMode", "client")
 assert(Utils.isDynamicAllocationEnabled(conf) === false)
 assert(Utils.isDynamicAllocationEnabled(
   conf.set("spark.dynamicAllocation.enabled", "false")) === false)

http://git-wip-us.apache.org/repos/asf/spark/blob/b6420db9/python/pyspark/context.py
--
diff --git a/python/pyspark/context.py b/python/pyspark/context.py
index aec0215..7217a99 100644
--- a/python/pyspark/context.py
+++ b/python/pyspark/context.py
@@ -155,10 +155,6 @@ class SparkContext(object):
 self.appName = self._conf.get("spark.app.name")
 self.sparkHome = self._conf.get("spark.home", None)
 
-# Let YARN know it's a pyspark app, so it distributes needed libraries.
-if self.master == "yarn-client":
-self._conf.set("spark.yarn.isPython", "true")
-
 for (k, v) in self._conf.getAll():
 if k.startswith("spark.executorEnv."):
 varName = k[len("spark.executorEnv."):]

http://git-wip-us.apache.org/repos/asf/spark/blob/b6420db9/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
--
diff --git 
a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala 
b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
index 8fcab38..e871004 100644
--- a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
+++ b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala
@@ -943,8 +943,6 @@ class SparkILoop(
   })
 
   private def process(settings: Settings): Boolean = savingContextLoader {
-if (getMaster() == "yarn-client") System.setProperty("SPARK_YARN_MODE", 
"true")
-
 this.settings = settings
 createInterpreter()
 

http://git-wip-us.apache.org/repos/asf/spark/blob/b6420db9/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
--
diff --git 
a/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala 
b/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
index 4ce33e0..6b20dea 100644
--- a/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
+++ b/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
@@ -312,7 +312,7 @@ private object YarnClusterDriver extends Logging with 
Matchers {
 
 // If we are running in yarn-cluster mode, verify that driver logs links 
and present and

spark git commit: [SPARK-16129][CORE][SQL] Eliminate direct use of commons-lang classes in favor of commons-lang3

2016-06-24 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master f4fd7432f -> 158af162e


[SPARK-16129][CORE][SQL] Eliminate direct use of commons-lang classes in favor 
of commons-lang3

## What changes were proposed in this pull request?

Replace use of `commons-lang` in favor of `commons-lang3` and forbid the former 
via scalastyle; remove `NotImplementedException` from `comons-lang` in favor of 
JDK `UnsupportedOperationException`

## How was this patch tested?

Jenkins tests

Author: Sean Owen 

Closes #13843 from srowen/SPARK-16129.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/158af162
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/158af162
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/158af162

Branch: refs/heads/master
Commit: 158af162eac7348464c6751c8acd48fc6c117688
Parents: f4fd743
Author: Sean Owen 
Authored: Fri Jun 24 10:35:54 2016 +0100
Committer: Sean Owen 
Committed: Fri Jun 24 10:35:54 2016 +0100

--
 .../scala/org/apache/spark/SparkContext.scala   |  5 ++--
 scalastyle-config.xml   |  6 +
 .../sql/catalyst/expressions/TimeWindow.scala   |  2 +-
 .../spark/sql/catalyst/trees/TreeNode.scala |  2 +-
 .../parquet/VectorizedColumnReader.java | 25 ++--
 .../sql/execution/vectorized/ColumnVector.java  | 17 +++--
 .../execution/vectorized/ColumnVectorUtils.java |  6 ++---
 .../sql/execution/vectorized/ColumnarBatch.java | 12 --
 .../spark/sql/execution/ExistingRDD.scala   |  2 +-
 .../execution/columnar/InMemoryRelation.scala   |  2 +-
 .../service/cli/session/HiveSessionImpl.java|  2 +-
 .../spark/streaming/StreamingContext.scala  |  5 ++--
 .../streaming/scheduler/JobScheduler.scala  |  6 ++---
 13 files changed, 44 insertions(+), 48 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/158af162/core/src/main/scala/org/apache/spark/SparkContext.scala
--
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index d870181..fe15052 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -24,7 +24,6 @@ import java.util.{Arrays, Locale, Properties, ServiceLoader, 
UUID}
 import java.util.concurrent.ConcurrentMap
 import java.util.concurrent.atomic.{AtomicBoolean, AtomicInteger, 
AtomicReference}
 
-import scala.annotation.tailrec
 import scala.collection.JavaConverters._
 import scala.collection.Map
 import scala.collection.generic.Growable
@@ -34,7 +33,7 @@ import scala.reflect.{classTag, ClassTag}
 import scala.util.control.NonFatal
 
 import com.google.common.collect.MapMaker
-import org.apache.commons.lang.SerializationUtils
+import org.apache.commons.lang3.SerializationUtils
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.Path
 import org.apache.hadoop.io.{ArrayWritable, BooleanWritable, BytesWritable, 
DoubleWritable,
@@ -334,7 +333,7 @@ class SparkContext(config: SparkConf) extends Logging with 
ExecutorAllocationCli
 override protected def childValue(parent: Properties): Properties = {
   // Note: make a clone such that changes in the parent properties aren't 
reflected in
   // the those of the children threads, which has confusing semantics 
(SPARK-10563).
-  SerializationUtils.clone(parent).asInstanceOf[Properties]
+  SerializationUtils.clone(parent)
 }
 override protected def initialValue(): Properties = new Properties()
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/158af162/scalastyle-config.xml
--
diff --git a/scalastyle-config.xml b/scalastyle-config.xml
index 270104f..9a35183 100644
--- a/scalastyle-config.xml
+++ b/scalastyle-config.xml
@@ -210,6 +210,12 @@ This file is divided into 3 sections:
 scala.collection.JavaConverters._ and use .asScala / .asJava 
methods
   
 
+  
+org\.apache\.commons\.lang\.
+Use Commons Lang 3 classes (package 
org.apache.commons.lang3.*) instead
+of Commons Lang 2 (package org.apache.commons.lang.*)
+  
+
   
 
   java,scala,3rdParty,spark

http://git-wip-us.apache.org/repos/asf/spark/blob/158af162/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala
index 83fa447..66c4bf2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWind

spark git commit: [SPARK-16129][CORE][SQL] Eliminate direct use of commons-lang classes in favor of commons-lang3

2016-06-24 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 b6420db9e -> 201d5e8db


[SPARK-16129][CORE][SQL] Eliminate direct use of commons-lang classes in favor 
of commons-lang3

## What changes were proposed in this pull request?

Replace use of `commons-lang` in favor of `commons-lang3` and forbid the former 
via scalastyle; remove `NotImplementedException` from `comons-lang` in favor of 
JDK `UnsupportedOperationException`

## How was this patch tested?

Jenkins tests

Author: Sean Owen 

Closes #13843 from srowen/SPARK-16129.

(cherry picked from commit 158af162eac7348464c6751c8acd48fc6c117688)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/201d5e8d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/201d5e8d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/201d5e8d

Branch: refs/heads/branch-2.0
Commit: 201d5e8db3fd29898a6cd69e015ca491e5721b08
Parents: b6420db
Author: Sean Owen 
Authored: Fri Jun 24 10:35:54 2016 +0100
Committer: Sean Owen 
Committed: Fri Jun 24 10:36:04 2016 +0100

--
 .../scala/org/apache/spark/SparkContext.scala   |  5 ++--
 scalastyle-config.xml   |  6 +
 .../sql/catalyst/expressions/TimeWindow.scala   |  2 +-
 .../spark/sql/catalyst/trees/TreeNode.scala |  2 +-
 .../parquet/VectorizedColumnReader.java | 25 ++--
 .../sql/execution/vectorized/ColumnVector.java  | 17 +++--
 .../execution/vectorized/ColumnVectorUtils.java |  6 ++---
 .../sql/execution/vectorized/ColumnarBatch.java | 12 --
 .../spark/sql/execution/ExistingRDD.scala   |  2 +-
 .../execution/columnar/InMemoryRelation.scala   |  2 +-
 .../service/cli/session/HiveSessionImpl.java|  2 +-
 .../spark/streaming/StreamingContext.scala  |  5 ++--
 .../streaming/scheduler/JobScheduler.scala  |  6 ++---
 13 files changed, 44 insertions(+), 48 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/201d5e8d/core/src/main/scala/org/apache/spark/SparkContext.scala
--
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index d870181..fe15052 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -24,7 +24,6 @@ import java.util.{Arrays, Locale, Properties, ServiceLoader, 
UUID}
 import java.util.concurrent.ConcurrentMap
 import java.util.concurrent.atomic.{AtomicBoolean, AtomicInteger, 
AtomicReference}
 
-import scala.annotation.tailrec
 import scala.collection.JavaConverters._
 import scala.collection.Map
 import scala.collection.generic.Growable
@@ -34,7 +33,7 @@ import scala.reflect.{classTag, ClassTag}
 import scala.util.control.NonFatal
 
 import com.google.common.collect.MapMaker
-import org.apache.commons.lang.SerializationUtils
+import org.apache.commons.lang3.SerializationUtils
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.Path
 import org.apache.hadoop.io.{ArrayWritable, BooleanWritable, BytesWritable, 
DoubleWritable,
@@ -334,7 +333,7 @@ class SparkContext(config: SparkConf) extends Logging with 
ExecutorAllocationCli
 override protected def childValue(parent: Properties): Properties = {
   // Note: make a clone such that changes in the parent properties aren't 
reflected in
   // the those of the children threads, which has confusing semantics 
(SPARK-10563).
-  SerializationUtils.clone(parent).asInstanceOf[Properties]
+  SerializationUtils.clone(parent)
 }
 override protected def initialValue(): Properties = new Properties()
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/201d5e8d/scalastyle-config.xml
--
diff --git a/scalastyle-config.xml b/scalastyle-config.xml
index 270104f..9a35183 100644
--- a/scalastyle-config.xml
+++ b/scalastyle-config.xml
@@ -210,6 +210,12 @@ This file is divided into 3 sections:
 scala.collection.JavaConverters._ and use .asScala / .asJava 
methods
   
 
+  
+org\.apache\.commons\.lang\.
+Use Commons Lang 3 classes (package 
org.apache.commons.lang3.*) instead
+of Commons Lang 2 (package org.apache.commons.lang.*)
+  
+
   
 
   java,scala,3rdParty,spark

http://git-wip-us.apache.org/repos/asf/spark/blob/201d5e8d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala
index

spark git commit: [MLLIB] org.apache.spark.mllib.util.SVMDataGenerator generates ArrayIndexOutOfBoundsException. I have found the bug and tested the solution.

2016-06-25 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master a7d29499d -> a3c7b4187


[MLLIB] org.apache.spark.mllib.util.SVMDataGenerator generates 
ArrayIndexOutOfBoundsException. I have found the bug and tested the solution.

## What changes were proposed in this pull request?

Just adjust the size of an array in line 58 so it does not cause an 
ArrayOutOfBoundsException in line 66.

## How was this patch tested?

Manual tests. I have recompiled the entire project with the fix, it has been 
built successfully and I have run the code, also with good results.

line 66: val yD = blas.ddot(trueWeights.length, x, 1, trueWeights, 1) + 
rnd.nextGaussian() * 0.1
crashes because trueWeights has length "nfeatures + 1" while "x" has length 
"features", and they should have the same length.

To fix this just make trueWeights be the same length as x.

I have recompiled the project with the change and it is working now:
[spark-1.6.1]$ spark-submit --master local[*] --class 
org.apache.spark.mllib.util.SVMDataGenerator 
mllib/target/spark-mllib_2.11-1.6.1.jar local /home/user/test

And it generates the data successfully now in the specified folder.

Author: José Antonio 

Closes #13895 from j4munoz/patch-2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a3c7b418
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a3c7b418
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a3c7b418

Branch: refs/heads/master
Commit: a3c7b4187bad00dad87df7e3b5929a44d29568ed
Parents: a7d2949
Author: José Antonio 
Authored: Sat Jun 25 09:11:25 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 25 09:11:25 2016 +0100

--
 .../main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a3c7b418/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala
index cde5979..c946860 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala
@@ -55,7 +55,7 @@ object SVMDataGenerator {
 val sc = new SparkContext(sparkMaster, "SVMGenerator")
 
 val globalRnd = new Random(94720)
-val trueWeights = Array.fill[Double](nfeatures + 
1)(globalRnd.nextGaussian())
+val trueWeights = Array.fill[Double](nfeatures)(globalRnd.nextGaussian())
 
 val data: RDD[LabeledPoint] = sc.parallelize(0 until nexamples, parts).map 
{ idx =>
   val rnd = new Random(42 + idx)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [MLLIB] org.apache.spark.mllib.util.SVMDataGenerator generates ArrayIndexOutOfBoundsException. I have found the bug and tested the solution.

2016-06-25 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 d079b5de7 -> cbfcdcfb6


[MLLIB] org.apache.spark.mllib.util.SVMDataGenerator generates 
ArrayIndexOutOfBoundsException. I have found the bug and tested the solution.

## What changes were proposed in this pull request?

Just adjust the size of an array in line 58 so it does not cause an 
ArrayOutOfBoundsException in line 66.

## How was this patch tested?

Manual tests. I have recompiled the entire project with the fix, it has been 
built successfully and I have run the code, also with good results.

line 66: val yD = blas.ddot(trueWeights.length, x, 1, trueWeights, 1) + 
rnd.nextGaussian() * 0.1
crashes because trueWeights has length "nfeatures + 1" while "x" has length 
"features", and they should have the same length.

To fix this just make trueWeights be the same length as x.

I have recompiled the project with the change and it is working now:
[spark-1.6.1]$ spark-submit --master local[*] --class 
org.apache.spark.mllib.util.SVMDataGenerator 
mllib/target/spark-mllib_2.11-1.6.1.jar local /home/user/test

And it generates the data successfully now in the specified folder.

Author: José Antonio 

Closes #13895 from j4munoz/patch-2.

(cherry picked from commit a3c7b4187bad00dad87df7e3b5929a44d29568ed)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cbfcdcfb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cbfcdcfb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cbfcdcfb

Branch: refs/heads/branch-2.0
Commit: cbfcdcfb60d41126e17cddda52922d6058f1a401
Parents: d079b5d
Author: José Antonio 
Authored: Sat Jun 25 09:11:25 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 25 09:11:35 2016 +0100

--
 .../main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cbfcdcfb/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala
index cde5979..c946860 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala
@@ -55,7 +55,7 @@ object SVMDataGenerator {
 val sc = new SparkContext(sparkMaster, "SVMGenerator")
 
 val globalRnd = new Random(94720)
-val trueWeights = Array.fill[Double](nfeatures + 
1)(globalRnd.nextGaussian())
+val trueWeights = Array.fill[Double](nfeatures)(globalRnd.nextGaussian())
 
 val data: RDD[LabeledPoint] = sc.parallelize(0 until nexamples, parts).map 
{ idx =>
   val rnd = new Random(42 + idx)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [MLLIB] org.apache.spark.mllib.util.SVMDataGenerator generates ArrayIndexOutOfBoundsException. I have found the bug and tested the solution.

2016-06-25 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-1.6 b7acc1b71 -> 24d59fb64


[MLLIB] org.apache.spark.mllib.util.SVMDataGenerator generates 
ArrayIndexOutOfBoundsException. I have found the bug and tested the solution.

## What changes were proposed in this pull request?

Just adjust the size of an array in line 58 so it does not cause an 
ArrayOutOfBoundsException in line 66.

## How was this patch tested?

Manual tests. I have recompiled the entire project with the fix, it has been 
built successfully and I have run the code, also with good results.

line 66: val yD = blas.ddot(trueWeights.length, x, 1, trueWeights, 1) + 
rnd.nextGaussian() * 0.1
crashes because trueWeights has length "nfeatures + 1" while "x" has length 
"features", and they should have the same length.

To fix this just make trueWeights be the same length as x.

I have recompiled the project with the change and it is working now:
[spark-1.6.1]$ spark-submit --master local[*] --class 
org.apache.spark.mllib.util.SVMDataGenerator 
mllib/target/spark-mllib_2.11-1.6.1.jar local /home/user/test

And it generates the data successfully now in the specified folder.

Author: José Antonio 

Closes #13895 from j4munoz/patch-2.

(cherry picked from commit a3c7b4187bad00dad87df7e3b5929a44d29568ed)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/24d59fb6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/24d59fb6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/24d59fb6

Branch: refs/heads/branch-1.6
Commit: 24d59fb64770fb8951794df9ee6398329838359a
Parents: b7acc1b
Author: José Antonio 
Authored: Sat Jun 25 09:11:25 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 25 09:11:47 2016 +0100

--
 .../main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/24d59fb6/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala
index cde5979..c946860 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala
@@ -55,7 +55,7 @@ object SVMDataGenerator {
 val sc = new SparkContext(sparkMaster, "SVMGenerator")
 
 val globalRnd = new Random(94720)
-val trueWeights = Array.fill[Double](nfeatures + 
1)(globalRnd.nextGaussian())
+val trueWeights = Array.fill[Double](nfeatures)(globalRnd.nextGaussian())
 
 val data: RDD[LabeledPoint] = sc.parallelize(0 until nexamples, parts).map 
{ idx =>
   val rnd = new Random(42 + idx)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15958] Make initial buffer size for the Sorter configurable

2016-06-25 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master a3c7b4187 -> bf665a958


[SPARK-15958] Make initial buffer size for the Sorter configurable

## What changes were proposed in this pull request?

Currently the initial buffer size in the sorter is hard coded inside the code 
and is too small for large workload. As a result, the sorter spends significant 
time expanding the buffer size and copying the data. It would be useful to have 
it configurable.

## How was this patch tested?

Tested by running a job on the cluster.

Author: Sital Kedia 

Closes #13699 from sitalkedia/config_sort_buffer_upstream.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bf665a95
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bf665a95
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bf665a95

Branch: refs/heads/master
Commit: bf665a958631125a1670504ef5966ef1a0e14798
Parents: a3c7b41
Author: Sital Kedia 
Authored: Sat Jun 25 09:13:39 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 25 09:13:39 2016 +0100

--
 .../org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java| 7 +--
 .../apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java   | 4 ++--
 .../apache/spark/sql/execution/UnsafeExternalRowSorter.java   | 4 +++-
 .../apache/spark/sql/execution/UnsafeKVExternalSorter.java| 7 +--
 4 files changed, 15 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bf665a95/core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java
--
diff --git 
a/core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java 
b/core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java
index daa63d4..05fa04c 100644
--- a/core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java
+++ b/core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java
@@ -61,7 +61,7 @@ public class UnsafeShuffleWriter extends 
ShuffleWriter {
   private static final ClassTag OBJECT_CLASS_TAG = 
ClassTag$.MODULE$.Object();
 
   @VisibleForTesting
-  static final int INITIAL_SORT_BUFFER_SIZE = 4096;
+  static final int DEFAULT_INITIAL_SORT_BUFFER_SIZE = 4096;
 
   private final BlockManager blockManager;
   private final IndexShuffleBlockResolver shuffleBlockResolver;
@@ -74,6 +74,7 @@ public class UnsafeShuffleWriter extends 
ShuffleWriter {
   private final TaskContext taskContext;
   private final SparkConf sparkConf;
   private final boolean transferToEnabled;
+  private final int initialSortBufferSize;
 
   @Nullable private MapStatus mapStatus;
   @Nullable private ShuffleExternalSorter sorter;
@@ -122,6 +123,8 @@ public class UnsafeShuffleWriter extends 
ShuffleWriter {
 this.taskContext = taskContext;
 this.sparkConf = sparkConf;
 this.transferToEnabled = sparkConf.getBoolean("spark.file.transferTo", 
true);
+this.initialSortBufferSize = 
sparkConf.getInt("spark.shuffle.sort.initialBufferSize",
+  
DEFAULT_INITIAL_SORT_BUFFER_SIZE);
 open();
   }
 
@@ -187,7 +190,7 @@ public class UnsafeShuffleWriter extends 
ShuffleWriter {
   memoryManager,
   blockManager,
   taskContext,
-  INITIAL_SORT_BUFFER_SIZE,
+  initialSortBufferSize,
   partitioner.numPartitions(),
   sparkConf,
   writeMetrics);

http://git-wip-us.apache.org/repos/asf/spark/blob/bf665a95/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java
--
diff --git 
a/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java
 
b/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java
index 7dd61f8..daeb467 100644
--- 
a/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java
+++ 
b/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java
@@ -413,10 +413,10 @@ public class UnsafeShuffleWriterSuite {
   }
 
   private void writeEnoughRecordsToTriggerSortBufferExpansionAndSpill() throws 
Exception {
-memoryManager.limit(UnsafeShuffleWriter.INITIAL_SORT_BUFFER_SIZE * 16);
+memoryManager.limit(UnsafeShuffleWriter.DEFAULT_INITIAL_SORT_BUFFER_SIZE * 
16);
 final UnsafeShuffleWriter writer = createWriter(false);
 final ArrayList> dataToWrite = new ArrayList<>();
-for (int i = 0; i < UnsafeShuffleWriter.INITIAL_SORT_BUFFER_SIZE + 1; i++) 
{
+for (int i = 0; i < UnsafeShuffleWriter.DEFAULT_INITIAL_SORT_BUFFER_SIZE + 
1; i++) {
   dataToWrite.add(new Tuple2(i, i));
 }
 writer.write(dataToWrite.iterator());

http://git-wip-us.apache.org/repos/asf/spark/blob/bf665a95/sql/catalyst/src/main/java/org/apache/spark/sq

spark git commit: [SPARK-1301][WEB UI] Added anchor links to Accumulators and Tasks on StagePage

2016-06-25 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master bf665a958 -> 3ee9695d1


[SPARK-1301][WEB UI] Added anchor links to Accumulators and Tasks on StagePage

## What changes were proposed in this pull request?

Sometimes the "Aggregated Metrics by Executor" table on the Stage page can get 
very long so actor links to the Accumulators and Tasks tables below it have 
been added to the summary at the top of the page. This has been done in the 
same way as the Jobs and Stages pages. Note: the Accumulators link only 
displays when the table exists.

## How was this patch tested?

Manually Tested and dev/run-tests

![justtasks](https://cloud.githubusercontent.com/assets/13952758/15165269/6e8efe8c-16c9-11e6-9784-cffe966fdcf0.png)
![withaccumulators](https://cloud.githubusercontent.com/assets/13952758/15165270/7019ec9e-16c9-11e6-8649-db69ed7a317d.png)

Author: Alex Bozarth 

Closes #13037 from ajbozarth/spark1301.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3ee9695d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3ee9695d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3ee9695d

Branch: refs/heads/master
Commit: 3ee9695d1fcf3750cbf7896a56f8a1ba93f4e82f
Parents: bf665a9
Author: Alex Bozarth 
Authored: Sat Jun 25 09:27:22 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 25 09:27:22 2016 +0100

--
 .../org/apache/spark/ui/static/webui.css|  4 +-
 .../org/apache/spark/ui/static/webui.js | 47 
 .../scala/org/apache/spark/ui/UIUtils.scala |  1 +
 .../org/apache/spark/ui/jobs/StagePage.scala| 16 ++-
 4 files changed, 64 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3ee9695d/core/src/main/resources/org/apache/spark/ui/static/webui.css
--
diff --git a/core/src/main/resources/org/apache/spark/ui/static/webui.css 
b/core/src/main/resources/org/apache/spark/ui/static/webui.css
index 595e80a..b157f3e 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/webui.css
+++ b/core/src/main/resources/org/apache/spark/ui/static/webui.css
@@ -155,7 +155,7 @@ pre {
   display: none;
 }
 
-span.expand-additional-metrics, span.expand-dag-viz {
+span.expand-additional-metrics, span.expand-dag-viz, span.collapse-table {
   cursor: pointer;
 }
 
@@ -163,7 +163,7 @@ span.additional-metric-title {
   cursor: pointer;
 }
 
-.additional-metrics.collapsed {
+.additional-metrics.collapsed, .collapsible-table.collapsed {
   display: none;
 }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/3ee9695d/core/src/main/resources/org/apache/spark/ui/static/webui.js
--
diff --git a/core/src/main/resources/org/apache/spark/ui/static/webui.js 
b/core/src/main/resources/org/apache/spark/ui/static/webui.js
new file mode 100644
index 000..e37307a
--- /dev/null
+++ b/core/src/main/resources/org/apache/spark/ui/static/webui.js
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+function collapseTablePageLoad(name, table){
+  if (window.localStorage.getItem(name) == "true") {
+// Set it to false so that the click function can revert it
+window.localStorage.setItem(name, "false");
+collapseTable(name, table);
+  }
+}
+
+function collapseTable(thisName, table){
+var status = window.localStorage.getItem(thisName) == "true";
+status = !status;
+
+thisClass = '.' + thisName
+
+// Expand the list of additional metrics.
+var tableDiv = $(thisClass).parent().find('.' + table);
+$(tableDiv).toggleClass('collapsed');
+
+// Switch the class of the arrow from open to closed.
+$(thisClass).find('.collapse-table-arrow').toggleClass('arrow-open');
+$(thisClass).find('.collapse-table-arrow').toggleClass('arrow-closed');
+
+window.localStorage.setItem(thisName, "" + status);
+}
+
+// Add a call to collapseTablePageLoad() on each collapsible table
+// to remember if it's collapsed on each page re

spark git commit: [SPARK-16193][TESTS] Address flaky ExternalAppendOnlyMapSuite spilling tests

2016-06-25 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 cbfcdcfb6 -> b03b0976f


[SPARK-16193][TESTS] Address flaky ExternalAppendOnlyMapSuite spilling tests

## What changes were proposed in this pull request?

Make spill tests wait until job has completed before returning the number of 
stages that spilled

## How was this patch tested?

Existing Jenkins tests.

Author: Sean Owen 

Closes #13896 from srowen/SPARK-16193.

(cherry picked from commit e87741589a24821b5fe73e5d9ee2164247998580)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b03b0976
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b03b0976
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b03b0976

Branch: refs/heads/branch-2.0
Commit: b03b0976fac878bf7e5d1721441179a4d4d9c317
Parents: cbfcdcf
Author: Sean Owen 
Authored: Sat Jun 25 12:14:14 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 25 12:14:24 2016 +0100

--
 core/src/main/scala/org/apache/spark/TestUtils.scala | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b03b0976/core/src/main/scala/org/apache/spark/TestUtils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/TestUtils.scala 
b/core/src/main/scala/org/apache/spark/TestUtils.scala
index 43c89b2..871b9d1 100644
--- a/core/src/main/scala/org/apache/spark/TestUtils.scala
+++ b/core/src/main/scala/org/apache/spark/TestUtils.scala
@@ -22,6 +22,7 @@ import java.net.{URI, URL}
 import java.nio.charset.StandardCharsets
 import java.nio.file.Paths
 import java.util.Arrays
+import java.util.concurrent.{CountDownLatch, TimeUnit}
 import java.util.jar.{JarEntry, JarOutputStream}
 
 import scala.collection.JavaConverters._
@@ -190,8 +191,14 @@ private[spark] object TestUtils {
 private class SpillListener extends SparkListener {
   private val stageIdToTaskMetrics = new mutable.HashMap[Int, 
ArrayBuffer[TaskMetrics]]
   private val spilledStageIds = new mutable.HashSet[Int]
+  private val stagesDone = new CountDownLatch(1)
 
-  def numSpilledStages: Int = spilledStageIds.size
+  def numSpilledStages: Int = {
+// Long timeout, just in case somehow the job end isn't notified.
+// Fails if a timeout occurs
+assert(stagesDone.await(10, TimeUnit.SECONDS))
+spilledStageIds.size
+  }
 
   override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = {
 stageIdToTaskMetrics.getOrElseUpdate(
@@ -206,4 +213,8 @@ private class SpillListener extends SparkListener {
   spilledStageIds += stageId
 }
   }
+
+  override def onJobEnd(jobEnd: SparkListenerJobEnd): Unit = {
+stagesDone.countDown()
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16193][TESTS] Address flaky ExternalAppendOnlyMapSuite spilling tests

2016-06-25 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-1.6 24d59fb64 -> 60e095b9b


[SPARK-16193][TESTS] Address flaky ExternalAppendOnlyMapSuite spilling tests

## What changes were proposed in this pull request?

Make spill tests wait until job has completed before returning the number of 
stages that spilled

## How was this patch tested?

Existing Jenkins tests.

Author: Sean Owen 

Closes #13896 from srowen/SPARK-16193.

(cherry picked from commit e87741589a24821b5fe73e5d9ee2164247998580)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/60e095b9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/60e095b9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/60e095b9

Branch: refs/heads/branch-1.6
Commit: 60e095b9bea3caa3e9d1e768d116f911a048d8ec
Parents: 24d59fb
Author: Sean Owen 
Authored: Sat Jun 25 12:14:14 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 25 12:14:40 2016 +0100

--
 core/src/main/scala/org/apache/spark/TestUtils.scala | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/60e095b9/core/src/main/scala/org/apache/spark/TestUtils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/TestUtils.scala 
b/core/src/main/scala/org/apache/spark/TestUtils.scala
index 43c89b2..871b9d1 100644
--- a/core/src/main/scala/org/apache/spark/TestUtils.scala
+++ b/core/src/main/scala/org/apache/spark/TestUtils.scala
@@ -22,6 +22,7 @@ import java.net.{URI, URL}
 import java.nio.charset.StandardCharsets
 import java.nio.file.Paths
 import java.util.Arrays
+import java.util.concurrent.{CountDownLatch, TimeUnit}
 import java.util.jar.{JarEntry, JarOutputStream}
 
 import scala.collection.JavaConverters._
@@ -190,8 +191,14 @@ private[spark] object TestUtils {
 private class SpillListener extends SparkListener {
   private val stageIdToTaskMetrics = new mutable.HashMap[Int, 
ArrayBuffer[TaskMetrics]]
   private val spilledStageIds = new mutable.HashSet[Int]
+  private val stagesDone = new CountDownLatch(1)
 
-  def numSpilledStages: Int = spilledStageIds.size
+  def numSpilledStages: Int = {
+// Long timeout, just in case somehow the job end isn't notified.
+// Fails if a timeout occurs
+assert(stagesDone.await(10, TimeUnit.SECONDS))
+spilledStageIds.size
+  }
 
   override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = {
 stageIdToTaskMetrics.getOrElseUpdate(
@@ -206,4 +213,8 @@ private class SpillListener extends SparkListener {
   spilledStageIds += stageId
 }
   }
+
+  override def onJobEnd(jobEnd: SparkListenerJobEnd): Unit = {
+stagesDone.countDown()
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16193][TESTS] Address flaky ExternalAppendOnlyMapSuite spilling tests

2016-06-25 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 3ee9695d1 -> e87741589


[SPARK-16193][TESTS] Address flaky ExternalAppendOnlyMapSuite spilling tests

## What changes were proposed in this pull request?

Make spill tests wait until job has completed before returning the number of 
stages that spilled

## How was this patch tested?

Existing Jenkins tests.

Author: Sean Owen 

Closes #13896 from srowen/SPARK-16193.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e8774158
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e8774158
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e8774158

Branch: refs/heads/master
Commit: e87741589a24821b5fe73e5d9ee2164247998580
Parents: 3ee9695
Author: Sean Owen 
Authored: Sat Jun 25 12:14:14 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 25 12:14:14 2016 +0100

--
 core/src/main/scala/org/apache/spark/TestUtils.scala | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e8774158/core/src/main/scala/org/apache/spark/TestUtils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/TestUtils.scala 
b/core/src/main/scala/org/apache/spark/TestUtils.scala
index 43c89b2..871b9d1 100644
--- a/core/src/main/scala/org/apache/spark/TestUtils.scala
+++ b/core/src/main/scala/org/apache/spark/TestUtils.scala
@@ -22,6 +22,7 @@ import java.net.{URI, URL}
 import java.nio.charset.StandardCharsets
 import java.nio.file.Paths
 import java.util.Arrays
+import java.util.concurrent.{CountDownLatch, TimeUnit}
 import java.util.jar.{JarEntry, JarOutputStream}
 
 import scala.collection.JavaConverters._
@@ -190,8 +191,14 @@ private[spark] object TestUtils {
 private class SpillListener extends SparkListener {
   private val stageIdToTaskMetrics = new mutable.HashMap[Int, 
ArrayBuffer[TaskMetrics]]
   private val spilledStageIds = new mutable.HashSet[Int]
+  private val stagesDone = new CountDownLatch(1)
 
-  def numSpilledStages: Int = spilledStageIds.size
+  def numSpilledStages: Int = {
+// Long timeout, just in case somehow the job end isn't notified.
+// Fails if a timeout occurs
+assert(stagesDone.await(10, TimeUnit.SECONDS))
+spilledStageIds.size
+  }
 
   override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = {
 stageIdToTaskMetrics.getOrElseUpdate(
@@ -206,4 +213,8 @@ private class SpillListener extends SparkListener {
   spilledStageIds += stageId
 }
   }
+
+  override def onJobEnd(jobEnd: SparkListenerJobEnd): Unit = {
+stagesDone.countDown()
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16214][EXAMPLES] fix the denominator of SparkPi

2016-06-27 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 e01776395 -> efce6e17c


[SPARK-16214][EXAMPLES] fix the denominator of SparkPi

## What changes were proposed in this pull request?

reduce the denominator of SparkPi by 1

## How was this patch tested?

  integration tests

Author: 杨浩 

Closes #13910 from yanghaogn/patch-1.

(cherry picked from commit b452026324da20f76f7d8b78e5ba1c007712e585)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/efce6e17
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/efce6e17
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/efce6e17

Branch: refs/heads/branch-2.0
Commit: efce6e17c3a7c2c63b9d40bd02fe4f4fec4085bd
Parents: e017763
Author: 杨浩 
Authored: Mon Jun 27 08:31:52 2016 +0100
Committer: Sean Owen 
Committed: Mon Jun 27 08:32:01 2016 +0100

--
 examples/src/main/scala/org/apache/spark/examples/SparkPi.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/efce6e17/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala
--
diff --git a/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala 
b/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala
index 42f6cef..272c1a4 100644
--- a/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala
+++ b/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala
@@ -36,7 +36,7 @@ object SparkPi {
   val y = random * 2 - 1
   if (x*x + y*y < 1) 1 else 0
 }.reduce(_ + _)
-println("Pi is roughly " + 4.0 * count / n)
+println("Pi is roughly " + 4.0 * count / (n - 1))
 spark.stop()
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16214][EXAMPLES] fix the denominator of SparkPi

2016-06-27 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 30b182bcc -> b45202632


[SPARK-16214][EXAMPLES] fix the denominator of SparkPi

## What changes were proposed in this pull request?

reduce the denominator of SparkPi by 1

## How was this patch tested?

  integration tests

Author: 杨浩 

Closes #13910 from yanghaogn/patch-1.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b4520263
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b4520263
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b4520263

Branch: refs/heads/master
Commit: b452026324da20f76f7d8b78e5ba1c007712e585
Parents: 30b182b
Author: 杨浩 
Authored: Mon Jun 27 08:31:52 2016 +0100
Committer: Sean Owen 
Committed: Mon Jun 27 08:31:52 2016 +0100

--
 examples/src/main/scala/org/apache/spark/examples/SparkPi.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b4520263/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala
--
diff --git a/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala 
b/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala
index 42f6cef..272c1a4 100644
--- a/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala
+++ b/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala
@@ -36,7 +36,7 @@ object SparkPi {
   val y = random * 2 - 1
   if (x*x + y*y < 1) 1 else 0
 }.reduce(_ + _)
-println("Pi is roughly " + 4.0 * count / n)
+println("Pi is roughly " + 4.0 * count / (n - 1))
 spark.stop()
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16214][EXAMPLES] fix the denominator of SparkPi

2016-06-27 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-1.6 60e095b9b -> 22a496d2a


[SPARK-16214][EXAMPLES] fix the denominator of SparkPi

## What changes were proposed in this pull request?

reduce the denominator of SparkPi by 1

## How was this patch tested?

  integration tests

Author: 杨浩 

Closes #13910 from yanghaogn/patch-1.

(cherry picked from commit b452026324da20f76f7d8b78e5ba1c007712e585)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/22a496d2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/22a496d2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/22a496d2

Branch: refs/heads/branch-1.6
Commit: 22a496d2a12e24f97977d324c38f5aa6ff260588
Parents: 60e095b
Author: 杨浩 
Authored: Mon Jun 27 08:31:52 2016 +0100
Committer: Sean Owen 
Committed: Mon Jun 27 08:32:12 2016 +0100

--
 examples/src/main/scala/org/apache/spark/examples/SparkPi.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/22a496d2/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala
--
diff --git a/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala 
b/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala
index 818d4f2..ead8f46 100644
--- a/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala
+++ b/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala
@@ -34,7 +34,7 @@ object SparkPi {
   val y = random * 2 - 1
   if (x*x + y*y < 1) 1 else 0
 }.reduce(_ + _)
-println("Pi is roughly " + 4.0 * count / n)
+println("Pi is roughly " + 4.0 * count / (n - 1))
 spark.stop()
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [MINOR][CORE] Fix display wrong free memory size in the log

2016-06-27 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master b45202632 -> 52d4fe057


[MINOR][CORE] Fix display wrong free memory size in the log

## What changes were proposed in this pull request?

Free memory size displayed in the log is wrong (used memory), fix to make it 
correct.

## How was this patch tested?

N/A

Author: jerryshao 

Closes #13804 from jerryshao/memory-log-fix.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/52d4fe05
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/52d4fe05
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/52d4fe05

Branch: refs/heads/master
Commit: 52d4fe057909e8d431ae36f538dc4cafb351cdb5
Parents: b452026
Author: jerryshao 
Authored: Mon Jun 27 09:23:58 2016 +0100
Committer: Sean Owen 
Committed: Mon Jun 27 09:23:58 2016 +0100

--
 .../main/scala/org/apache/spark/storage/memory/MemoryStore.scala  | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/52d4fe05/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala 
b/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala
index 99be4de..0349da0 100644
--- a/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala
+++ b/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala
@@ -377,7 +377,8 @@ private[spark] class MemoryStore(
 entries.put(blockId, entry)
   }
   logInfo("Block %s stored as bytes in memory (estimated size %s, free 
%s)".format(
-blockId, Utils.bytesToString(entry.size), 
Utils.bytesToString(blocksMemoryUsed)))
+blockId, Utils.bytesToString(entry.size),
+Utils.bytesToString(maxMemory - blocksMemoryUsed)))
   Right(entry.size)
 } else {
   // We ran out of space while unrolling the values for this block


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [MINOR][CORE] Fix display wrong free memory size in the log

2016-06-27 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 efce6e17c -> ea8d419c1


[MINOR][CORE] Fix display wrong free memory size in the log

## What changes were proposed in this pull request?

Free memory size displayed in the log is wrong (used memory), fix to make it 
correct.

## How was this patch tested?

N/A

Author: jerryshao 

Closes #13804 from jerryshao/memory-log-fix.

(cherry picked from commit 52d4fe057909e8d431ae36f538dc4cafb351cdb5)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ea8d419c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ea8d419c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ea8d419c

Branch: refs/heads/branch-2.0
Commit: ea8d419c106ad90f8f5b48e6bf897b0ff3f49f1f
Parents: efce6e1
Author: jerryshao 
Authored: Mon Jun 27 09:23:58 2016 +0100
Committer: Sean Owen 
Committed: Mon Jun 27 09:24:06 2016 +0100

--
 .../main/scala/org/apache/spark/storage/memory/MemoryStore.scala  | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ea8d419c/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala 
b/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala
index 99be4de..0349da0 100644
--- a/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala
+++ b/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala
@@ -377,7 +377,8 @@ private[spark] class MemoryStore(
 entries.put(blockId, entry)
   }
   logInfo("Block %s stored as bytes in memory (estimated size %s, free 
%s)".format(
-blockId, Utils.bytesToString(entry.size), 
Utils.bytesToString(blocksMemoryUsed)))
+blockId, Utils.bytesToString(entry.size),
+Utils.bytesToString(maxMemory - blocksMemoryUsed)))
   Right(entry.size)
 } else {
   // We ran out of space while unrolling the values for this block


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r1750410 [2/2] - in /spark: ./ _plugins/ mllib/ releases/_posts/ site/ site/mllib/ site/news/ site/releases/ site/sql/ site/streaming/ sql/ streaming/

2016-06-27 Thread srowen
Modified: spark/site/releases/spark-release-1-1-0.html
URL: 
http://svn.apache.org/viewvc/spark/site/releases/spark-release-1-1-0.html?rev=1750410&r1=1750409&r2=1750410&view=diff
==
--- spark/site/releases/spark-release-1-1-0.html (original)
+++ spark/site/releases/spark-release-1-1-0.html Mon Jun 27 20:31:41 2016
@@ -197,7 +197,7 @@
 Spark SQL adds a number of new features and performance improvements in 
this release. A http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#running-the-thrift-jdbc-server";>JDBC/ODBC
 server allows users to connect to SparkSQL from many different 
applications and provides shared access to cached tables. A new module provides 
http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#json-datasets";>support
 for loading JSON data directly into Spark’s SchemaRDD format, including 
automatic schema inference. Spark SQL introduces http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#other-configuration-options";>dynamic
 bytecode generation in this release, a technique which significantly 
speeds up execution for queries that perform complex expression evaluation.  
This release also adds support for registering Python, Scala, and Java lambda 
functions as UDFs, which can then be called directly in SQL. Spark 1.1 adds a 
http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#programmatically-specifying-the-schema";>public
 types API to allow users to create SchemaRDD’s from custom data sources. 
Finally, many optimizations have been added to the native Parquet support as 
well as throughout the engine.
 
 MLlib
-MLlib adds several new algorithms and optimizations in this release. 1.1 
introduces a https://issues.apache.org/jira/browse/SPARK-2359";>new 
library of statistical packages which provides exploratory analytic 
functions. These include stratified sampling, correlations, chi-squared tests 
and support for creating random datasets. This release adds utilities for 
feature extraction (https://issues.apache.org/jira/browse/SPARK-2510";>Word2Vec and https://issues.apache.org/jira/browse/SPARK-2511";>TF-IDF) and feature 
transformation (https://issues.apache.org/jira/browse/SPARK-2272";>normalization and 
standard scaling). Also new are support for https://issues.apache.org/jira/browse/SPARK-1553";>nonnegative matrix 
factorization and https://issues.apache.org/jira/browse/SPARK-1782";>SVD via Lanczos. 
The decision tree algorithm has been https://issues.apache.org/jira/browse/SPARK-2478";>added in Python and 
Java<
 /a>. A tree aggregation primitive has been added to help optimize many 
existing algorithms. Performance improves across the board in MLlib 1.1, with 
improvements of around 2-3X for many algorithms and up to 5X for large scale 
decision tree problems. 
+MLlib adds several new algorithms and optimizations in this release. 1.1 
introduces a https://issues.apache.org/jira/browse/SPARK-2359";>new 
library of statistical packages which provides exploratory analytic 
functions. These include stratified sampling, correlations, chi-squared tests 
and support for creating random datasets. This release adds utilities for 
feature extraction (https://issues.apache.org/jira/browse/SPARK-2510";>Word2Vec and https://issues.apache.org/jira/browse/SPARK-2511";>TF-IDF) and feature 
transformation (https://issues.apache.org/jira/browse/SPARK-2272";>normalization and 
standard scaling). Also new are support for https://issues.apache.org/jira/browse/SPARK-1553";>nonnegative matrix 
factorization and https://issues.apache.org/jira/browse/SPARK-1782";>SVD via Lanczos. 
The decision tree algorithm has been https://issues.apache.org/jira/browse/SPARK-2478";>added in Python and 
Java<
 /a>. A tree aggregation primitive has been added to help optimize many 
existing algorithms. Performance improves across the board in MLlib 1.1, with 
improvements of around 2-3X for many algorithms and up to 5X for large scale 
decision tree problems.
 
 GraphX and Spark Streaming
 Spark streaming adds a new data source https://issues.apache.org/jira/browse/SPARK-1981";>Amazon Kinesis. For 
the Apache Flume, a new mode is supported which https://issues.apache.org/jira/browse/SPARK-1729";>pulls data from 
Flume, simplifying deployment and providing high availability. The first of 
a set of https://issues.apache.org/jira/browse/SPARK-2438";>streaming 
machine learning algorithms is introduced with streaming linear regression. 
Finally, https://issues.apache.org/jira/browse/SPARK-1341";>rate 
limiting has been added for streaming inputs. GraphX adds https://issues.apache.org/jira/browse/SPARK-1991";>custom storage levels 
for vertices and edges along with https://issues.apache.org/jira/browse/SPARK-2748";>improved numerical 
precision across the board. Finally, GraphX adds a new label propagation 
algorithm.
@@ -215,7 +215,7 @@
 
 
   The default value of spark.io.compression.codec is now 
snappy f

svn commit: r1750410 [1/2] - in /spark: ./ _plugins/ mllib/ releases/_posts/ site/ site/mllib/ site/news/ site/releases/ site/sql/ site/streaming/ sql/ streaming/

2016-06-27 Thread srowen
Author: srowen
Date: Mon Jun 27 20:31:41 2016
New Revision: 1750410

URL: http://svn.apache.org/viewvc?rev=1750410&view=rev
Log:
Remove Spark site plugins (not used/working); fix jekyll build warning and one 
bad heading tag; remove inactive {% extra %} tag; commit current output of 
jekyll for consistency (mostly minor whitespace changes)

Removed:
spark/_plugins/
Modified:
spark/_config.yml
spark/index.md
spark/mllib/index.md
spark/releases/_posts/2016-01-04-spark-release-1-6-0.md
spark/site/documentation.html
spark/site/examples.html
spark/site/index.html
spark/site/mllib/index.html
spark/site/news/index.html
spark/site/news/spark-0-9-1-released.html
spark/site/news/spark-0-9-2-released.html
spark/site/news/spark-1-1-0-released.html
spark/site/news/spark-1-2-2-released.html
spark/site/news/spark-and-shark-in-the-news.html
spark/site/news/spark-summit-east-2015-videos-posted.html
spark/site/releases/spark-release-0-8-0.html
spark/site/releases/spark-release-0-9-1.html
spark/site/releases/spark-release-1-0-1.html
spark/site/releases/spark-release-1-0-2.html
spark/site/releases/spark-release-1-1-0.html
spark/site/releases/spark-release-1-2-0.html
spark/site/releases/spark-release-1-3-0.html
spark/site/releases/spark-release-1-3-1.html
spark/site/releases/spark-release-1-4-0.html
spark/site/releases/spark-release-1-5-0.html
spark/site/releases/spark-release-1-6-0.html
spark/site/sql/index.html
spark/site/streaming/index.html
spark/sql/index.md
spark/streaming/index.md

Modified: spark/_config.yml
URL: 
http://svn.apache.org/viewvc/spark/_config.yml?rev=1750410&r1=1750409&r2=1750410&view=diff
==
--- spark/_config.yml (original)
+++ spark/_config.yml Mon Jun 27 20:31:41 2016
@@ -1,12 +1,10 @@
-# pygments option has been renamed to highlighter.
-# pygments: true
 highlighter: pygments
 markdown: kramdown
 kramdown:
   entity_output: symbol
 permalink: none
 destination: site
-exclude: README.md
+exclude: ['README.md']
 keep_files: ['docs', '.svn']
 
 # The recommended way of viewing the website on your local machine is via 
jekyll using
@@ -16,5 +14,3 @@ keep_files: ['docs', '.svn']
 # E.g. on OS X this might be:
 #url: file:///Users/andyk/Development/spark/website/site/
 url: /
-
-shark_url: http://shark.cs.berkeley.edu

Modified: spark/index.md
URL: 
http://svn.apache.org/viewvc/spark/index.md?rev=1750410&r1=1750409&r2=1750410&view=diff
==
--- spark/index.md (original)
+++ spark/index.md Mon Jun 27 20:31:41 2016
@@ -123,9 +123,6 @@ navigation:
   
 
 
-{% extra %}
-
-
 
   
 Community
@@ -190,5 +187,3 @@ navigation:
 Download Apache Spark
   
 
-
-{% endextra %}

Modified: spark/mllib/index.md
URL: 
http://svn.apache.org/viewvc/spark/mllib/index.md?rev=1750410&r1=1750409&r2=1750410&view=diff
==
--- spark/mllib/index.md (original)
+++ spark/mllib/index.md Mon Jun 27 20:31:41 2016
@@ -76,9 +76,6 @@ subproject: MLlib
   
 
 
-{% extra %}
-
-
 
   
 Algorithms
@@ -148,5 +145,3 @@ subproject: MLlib
 
   
 
-
-{% endextra %}

Modified: spark/releases/_posts/2016-01-04-spark-release-1-6-0.md
URL: 
http://svn.apache.org/viewvc/spark/releases/_posts/2016-01-04-spark-release-1-6-0.md?rev=1750410&r1=1750409&r2=1750410&view=diff
==
--- spark/releases/_posts/2016-01-04-spark-release-1-6-0.md (original)
+++ spark/releases/_posts/2016-01-04-spark-release-1-6-0.md Mon Jun 27 20:31:41 
2016
@@ -82,7 +82,7 @@ You can consult JIRA for the [detailed c
- [SPARK-11337](https://issues.apache.org/jira/browse/SPARK-11337)  
**Testable example code** - Automated testing for code in user guide examples
 
 
-##Deprecations
+## Deprecations
 * In spark.mllib.clustering.KMeans, the "runs" parameter has been deprecated.
 * In spark.ml.classification.LogisticRegressionModel and 
spark.ml.regression.LinearRegressionModel, the "weights" field has been 
deprecated, in favor of the new name "coefficients."  This helps disambiguate 
from instance (row) weights given to algorithms.
 

Modified: spark/site/documentation.html
URL: 
http://svn.apache.org/viewvc/spark/site/documentation.html?rev=1750410&r1=1750409&r2=1750410&view=diff
==
--- spark/site/documentation.html (original)
+++ spark/site/documentation.html Mon Jun 27 20:31:41 2016
@@ -249,12 +249,13 @@
 
 
 Meetup Talk Videos
-In addition to the videos listed below, you can also view http://www.meetup.com/spark-users/files/";>

spark git commit: [SPARK-15858][ML] Fix calculating error by tree stack over flow prob…

2016-06-29 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 21385d02a -> 393db655c


[SPARK-15858][ML] Fix calculating error by tree stack over flow prob…

## What changes were proposed in this pull request?

What changes were proposed in this pull request?

Improving evaluateEachIteration function in mllib as it fails when trying to 
calculate error by tree for a model that has more than 500 trees

## How was this patch tested?

the batch tested on productions data set (2K rows x 2K features) training a 
gradient boosted model without validation with 1000 maxIteration settings, then 
trying to produce the error by tree, the new patch was able to perform the 
calculation within 30 seconds, while previously it was take hours then fail.

**PS**: It would be better if this PR can be cherry picked into release 
branches 1.6.1 and 2.0

Author: Mahmoud Rawas 
Author: Mahmoud Rawas 

Closes #13624 from mhmoudr/SPARK-15858.master.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/393db655
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/393db655
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/393db655

Branch: refs/heads/master
Commit: 393db655c3c43155305fbba1b2f8c48a95f18d93
Parents: 21385d0
Author: Mahmoud Rawas 
Authored: Wed Jun 29 13:12:17 2016 +0100
Committer: Sean Owen 
Committed: Wed Jun 29 13:12:17 2016 +0100

--
 .../ml/tree/impl/GradientBoostedTrees.scala | 40 ++--
 .../mllib/tree/model/treeEnsembleModels.scala   | 37 --
 2 files changed, 34 insertions(+), 43 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/393db655/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala 
b/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
index a0faff2..7bef899 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
@@ -205,31 +205,29 @@ private[spark] object GradientBoostedTrees extends 
Logging {
   case _ => data
 }
 
-val numIterations = trees.length
-val evaluationArray = Array.fill(numIterations)(0.0)
-val localTreeWeights = treeWeights
-
-var predictionAndError = computeInitialPredictionAndError(
-  remappedData, localTreeWeights(0), trees(0), loss)
-
-evaluationArray(0) = predictionAndError.values.mean()
-
 val broadcastTrees = sc.broadcast(trees)
-(1 until numIterations).foreach { nTree =>
-  predictionAndError = remappedData.zip(predictionAndError).mapPartitions 
{ iter =>
-val currentTree = broadcastTrees.value(nTree)
-val currentTreeWeight = localTreeWeights(nTree)
-iter.map { case (point, (pred, error)) =>
-  val newPred = updatePrediction(point.features, pred, currentTree, 
currentTreeWeight)
-  val newError = loss.computeError(newPred, point.label)
-  (newPred, newError)
-}
+val localTreeWeights = treeWeights
+val treesIndices = trees.indices
+
+val dataCount = remappedData.count()
+val evaluation = remappedData.map { point =>
+  treesIndices.map { idx =>
+val prediction = broadcastTrees.value(idx)
+  .rootNode
+  .predictImpl(point.features)
+  .prediction
+prediction * localTreeWeights(idx)
   }
-  evaluationArray(nTree) = predictionAndError.values.mean()
+  .scanLeft(0.0)(_ + _).drop(1)
+  .map(prediction => loss.computeError(prediction, point.label))
 }
+.aggregate(treesIndices.map(_ => 0.0))(
+  (aggregated, row) => treesIndices.map(idx => aggregated(idx) + row(idx)),
+  (a, b) => treesIndices.map(idx => a(idx) + b(idx)))
+.map(_ / dataCount)
 
-broadcastTrees.unpersist()
-evaluationArray
+broadcastTrees.destroy()
+evaluation.toArray
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/393db655/mllib/src/main/scala/org/apache/spark/mllib/tree/model/treeEnsembleModels.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/tree/model/treeEnsembleModels.scala
 
b/mllib/src/main/scala/org/apache/spark/mllib/tree/model/treeEnsembleModels.scala
index f7d9b22..657ed0a 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/tree/model/treeEnsembleModels.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/tree/model/treeEnsembleModels.scala
@@ -151,31 +151,24 @@ class GradientBoostedTreesModel @Since("1.2.0") (
   case _ => data
 }
 
-val numIterations = trees.length
-val evaluationArray = 

spark git commit: [SPARK-16257][BUILD] Update spark_ec2.py to support Spark 1.6.2 and 1.6.3.

2016-06-29 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-1.6 1ac830aca -> ccc7fa357


[SPARK-16257][BUILD] Update spark_ec2.py to support Spark 1.6.2 and 1.6.3.

## What changes were proposed in this pull request?

- Adds 1.6.2 and 1.6.3 as supported Spark versions within the bundled spark-ec2 
script.
- Makes the default Spark version 1.6.3 to keep in sync with the upcoming 
release.
- Does not touch the newer spark-ec2 scripts in the separate amplabs repository.

## How was this patch tested?

- Manual script execution:

export AWS_SECRET_ACCESS_KEY=_snip_
export AWS_ACCESS_KEY_ID=_snip_
$SPARK_HOME/ec2/spark-ec2 \
--key-pair=_snip_ \
--identity-file=_snip_ \
--region=us-east-1 \
--vpc-id=_snip_ \
--slaves=1 \
--instance-type=t1.micro \
--spark-version=1.6.2 \
--hadoop-major-version=yarn \
launch test-cluster

- Result: Successful creation of a 1.6.2-based Spark cluster.

This contribution is my original work and I license the work to the project 
under the project's open source license.

Author: Brian Uri 

Closes #13947 from briuri/branch-1.6-bug-spark-16257.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ccc7fa35
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ccc7fa35
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ccc7fa35

Branch: refs/heads/branch-1.6
Commit: ccc7fa357099e0f621cfc02448ba20d3f6fabc14
Parents: 1ac830a
Author: Brian Uri 
Authored: Thu Jun 30 07:52:28 2016 +0100
Committer: Sean Owen 
Committed: Thu Jun 30 07:52:28 2016 +0100

--
 ec2/spark_ec2.py | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ccc7fa35/ec2/spark_ec2.py
--
diff --git a/ec2/spark_ec2.py b/ec2/spark_ec2.py
index 76c09f0..b28b4c5 100755
--- a/ec2/spark_ec2.py
+++ b/ec2/spark_ec2.py
@@ -51,7 +51,7 @@ else:
 raw_input = input
 xrange = range
 
-SPARK_EC2_VERSION = "1.6.1"
+SPARK_EC2_VERSION = "1.6.3"
 SPARK_EC2_DIR = os.path.dirname(os.path.realpath(__file__))
 
 VALID_SPARK_VERSIONS = set([
@@ -77,6 +77,8 @@ VALID_SPARK_VERSIONS = set([
 "1.5.2",
 "1.6.0",
 "1.6.1",
+"1.6.2",
+"1.6.3",
 ])
 
 SPARK_TACHYON_MAP = {
@@ -96,6 +98,8 @@ SPARK_TACHYON_MAP = {
 "1.5.2": "0.7.1",
 "1.6.0": "0.8.2",
 "1.6.1": "0.8.2",
+"1.6.2": "0.8.2",
+"1.6.3": "0.8.2",
 }
 
 DEFAULT_SPARK_VERSION = SPARK_EC2_VERSION
@@ -103,7 +107,7 @@ DEFAULT_SPARK_GITHUB_REPO = 
"https://github.com/apache/spark";
 
 # Default location to get the spark-ec2 scripts (and ami-list) from
 DEFAULT_SPARK_EC2_GITHUB_REPO = "https://github.com/amplab/spark-ec2";
-DEFAULT_SPARK_EC2_BRANCH = "branch-1.5"
+DEFAULT_SPARK_EC2_BRANCH = "branch-1.6"
 
 
 def setup_external_libs(libs):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16182][CORE] Utils.scala -- terminateProcess() should call Process.destroyForcibly() if and only if Process.destroy() fails

2016-07-01 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master fbfd0ab9d -> 2075bf8ef


[SPARK-16182][CORE] Utils.scala -- terminateProcess() should call 
Process.destroyForcibly() if and only if Process.destroy() fails

## What changes were proposed in this pull request?

Utils.terminateProcess should `destroy()` first and only fall back to 
`destroyForcibly()` if it fails. It's kind of bad that we're force-killing 
executors -- and only in Java 8. See JIRA for an example of the impact: no 
shutdown

While here: `Utils.waitForProcess` should use the Java 8 method if available 
instead of a custom implementation.

## How was this patch tested?

Existing tests, which cover the force-kill case, and Amplab tests, which will 
cover both Java 7 and Java 8 eventually. However I tested locally on Java 8 and 
the PR builder will try Java 7 here.

Author: Sean Owen 

Closes #13973 from srowen/SPARK-16182.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2075bf8e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2075bf8e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2075bf8e

Branch: refs/heads/master
Commit: 2075bf8ef6035fd7606bcf20dc2cd7d7b9cda446
Parents: fbfd0ab
Author: Sean Owen 
Authored: Fri Jul 1 09:22:27 2016 +0100
Committer: Sean Owen 
Committed: Fri Jul 1 09:22:27 2016 +0100

--
 .../scala/org/apache/spark/util/Utils.scala | 76 
 .../org/apache/spark/util/UtilsSuite.scala  |  2 +-
 2 files changed, 47 insertions(+), 31 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2075bf8e/core/src/main/scala/org/apache/spark/util/Utils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index f77cc2f..0c23f3c 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -1772,50 +1772,66 @@ private[spark] object Utils extends Logging {
   }
 
   /**
-   * Terminates a process waiting for at most the specified duration. Returns 
whether
-   * the process terminated.
+   * Terminates a process waiting for at most the specified duration.
+   *
+   * @return the process exit value if it was successfully terminated, else 
None
*/
   def terminateProcess(process: Process, timeoutMs: Long): Option[Int] = {
-try {
-  // Java8 added a new API which will more forcibly kill the process. Use 
that if available.
-  val destroyMethod = process.getClass().getMethod("destroyForcibly");
-  destroyMethod.setAccessible(true)
-  destroyMethod.invoke(process)
-} catch {
-  case NonFatal(e) =>
-if (!e.isInstanceOf[NoSuchMethodException]) {
-  logWarning("Exception when attempting to kill process", e)
-}
-process.destroy()
-}
+// Politely destroy first
+process.destroy()
+
 if (waitForProcess(process, timeoutMs)) {
+  // Successful exit
   Option(process.exitValue())
 } else {
-  None
+  // Java 8 added a new API which will more forcibly kill the process. Use 
that if available.
+  try {
+classOf[Process].getMethod("destroyForcibly").invoke(process)
+  } catch {
+case _: NoSuchMethodException => return None // Not available; give up
+case NonFatal(e) => logWarning("Exception when attempting to kill 
process", e)
+  }
+  // Wait, again, although this really should return almost immediately
+  if (waitForProcess(process, timeoutMs)) {
+Option(process.exitValue())
+  } else {
+logWarning("Timed out waiting to forcibly kill process")
+None
+  }
 }
   }
 
   /**
* Wait for a process to terminate for at most the specified duration.
-   * Return whether the process actually terminated after the given timeout.
+   *
+   * @return whether the process actually terminated before the given timeout.
*/
   def waitForProcess(process: Process, timeoutMs: Long): Boolean = {
-var terminated = false
-val startTime = System.currentTimeMillis
-while (!terminated) {
-  try {
-process.exitValue()
-terminated = true
-  } catch {
-case e: IllegalThreadStateException =>
-  // Process not terminated yet
-  if (System.currentTimeMillis - startTime > timeoutMs) {
-return false
+try {
+  // Use Java 8 method if available
+  classOf[Process].getMethod("waitFor", java.lang.Long.TYPE, 
classOf[TimeUnit])
+.invoke(process, timeoutMs.asInstanceOf[java.lang.Long], 
TimeUnit.MILLISECONDS)
+.asInstanceOf[Boolean]
+} catch {
+   

spark git commit: [SPARK-16182][CORE] Utils.scala -- terminateProcess() should call Process.destroyForcibly() if and only if Process.destroy() fails

2016-07-01 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 1932bb683 -> 972106dd3


[SPARK-16182][CORE] Utils.scala -- terminateProcess() should call 
Process.destroyForcibly() if and only if Process.destroy() fails

## What changes were proposed in this pull request?

Utils.terminateProcess should `destroy()` first and only fall back to 
`destroyForcibly()` if it fails. It's kind of bad that we're force-killing 
executors -- and only in Java 8. See JIRA for an example of the impact: no 
shutdown

While here: `Utils.waitForProcess` should use the Java 8 method if available 
instead of a custom implementation.

## How was this patch tested?

Existing tests, which cover the force-kill case, and Amplab tests, which will 
cover both Java 7 and Java 8 eventually. However I tested locally on Java 8 and 
the PR builder will try Java 7 here.

Author: Sean Owen 

Closes #13973 from srowen/SPARK-16182.

(cherry picked from commit 2075bf8ef6035fd7606bcf20dc2cd7d7b9cda446)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/972106dd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/972106dd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/972106dd

Branch: refs/heads/branch-2.0
Commit: 972106dd3bdc40b0980949a09783d6d460e8d268
Parents: 1932bb6
Author: Sean Owen 
Authored: Fri Jul 1 09:22:27 2016 +0100
Committer: Sean Owen 
Committed: Fri Jul 1 09:22:36 2016 +0100

--
 .../scala/org/apache/spark/util/Utils.scala | 76 
 .../org/apache/spark/util/UtilsSuite.scala  |  2 +-
 2 files changed, 47 insertions(+), 31 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/972106dd/core/src/main/scala/org/apache/spark/util/Utils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index f77cc2f..0c23f3c 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -1772,50 +1772,66 @@ private[spark] object Utils extends Logging {
   }
 
   /**
-   * Terminates a process waiting for at most the specified duration. Returns 
whether
-   * the process terminated.
+   * Terminates a process waiting for at most the specified duration.
+   *
+   * @return the process exit value if it was successfully terminated, else 
None
*/
   def terminateProcess(process: Process, timeoutMs: Long): Option[Int] = {
-try {
-  // Java8 added a new API which will more forcibly kill the process. Use 
that if available.
-  val destroyMethod = process.getClass().getMethod("destroyForcibly");
-  destroyMethod.setAccessible(true)
-  destroyMethod.invoke(process)
-} catch {
-  case NonFatal(e) =>
-if (!e.isInstanceOf[NoSuchMethodException]) {
-  logWarning("Exception when attempting to kill process", e)
-}
-process.destroy()
-}
+// Politely destroy first
+process.destroy()
+
 if (waitForProcess(process, timeoutMs)) {
+  // Successful exit
   Option(process.exitValue())
 } else {
-  None
+  // Java 8 added a new API which will more forcibly kill the process. Use 
that if available.
+  try {
+classOf[Process].getMethod("destroyForcibly").invoke(process)
+  } catch {
+case _: NoSuchMethodException => return None // Not available; give up
+case NonFatal(e) => logWarning("Exception when attempting to kill 
process", e)
+  }
+  // Wait, again, although this really should return almost immediately
+  if (waitForProcess(process, timeoutMs)) {
+Option(process.exitValue())
+  } else {
+logWarning("Timed out waiting to forcibly kill process")
+None
+  }
 }
   }
 
   /**
* Wait for a process to terminate for at most the specified duration.
-   * Return whether the process actually terminated after the given timeout.
+   *
+   * @return whether the process actually terminated before the given timeout.
*/
   def waitForProcess(process: Process, timeoutMs: Long): Boolean = {
-var terminated = false
-val startTime = System.currentTimeMillis
-while (!terminated) {
-  try {
-process.exitValue()
-terminated = true
-  } catch {
-case e: IllegalThreadStateException =>
-  // Process not terminated yet
-  if (System.currentTimeMillis - startTime > timeoutMs) {
-return false
+try {
+  // Use Java 8 method if available
+  classOf[Process].getMethod("waitFor", java.lang.Long.TYPE, 
classOf[TimeUnit])
+.invoke(process, timeou

spark git commit: [SPARK-16182][CORE] Utils.scala -- terminateProcess() should call Process.destroyForcibly() if and only if Process.destroy() fails

2016-07-01 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-1.6 ccc7fa357 -> 83f860448


[SPARK-16182][CORE] Utils.scala -- terminateProcess() should call 
Process.destroyForcibly() if and only if Process.destroy() fails

## What changes were proposed in this pull request?

Utils.terminateProcess should `destroy()` first and only fall back to 
`destroyForcibly()` if it fails. It's kind of bad that we're force-killing 
executors -- and only in Java 8. See JIRA for an example of the impact: no 
shutdown

While here: `Utils.waitForProcess` should use the Java 8 method if available 
instead of a custom implementation.

## How was this patch tested?

Existing tests, which cover the force-kill case, and Amplab tests, which will 
cover both Java 7 and Java 8 eventually. However I tested locally on Java 8 and 
the PR builder will try Java 7 here.

Author: Sean Owen 

Closes #13973 from srowen/SPARK-16182.

(cherry picked from commit 2075bf8ef6035fd7606bcf20dc2cd7d7b9cda446)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/83f86044
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/83f86044
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/83f86044

Branch: refs/heads/branch-1.6
Commit: 83f86044879b3c6bbfb0f3075cba552070b064cf
Parents: ccc7fa3
Author: Sean Owen 
Authored: Fri Jul 1 09:22:27 2016 +0100
Committer: Sean Owen 
Committed: Fri Jul 1 09:25:02 2016 +0100

--
 .../scala/org/apache/spark/util/Utils.scala | 76 
 .../org/apache/spark/util/UtilsSuite.scala  |  2 +-
 2 files changed, 47 insertions(+), 31 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/83f86044/core/src/main/scala/org/apache/spark/util/Utils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index 36ab3ac..427b382 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -1732,50 +1732,66 @@ private[spark] object Utils extends Logging {
   }
 
   /**
-   * Terminates a process waiting for at most the specified duration. Returns 
whether
-   * the process terminated.
+   * Terminates a process waiting for at most the specified duration.
+   *
+   * @return the process exit value if it was successfully terminated, else 
None
*/
   def terminateProcess(process: Process, timeoutMs: Long): Option[Int] = {
-try {
-  // Java8 added a new API which will more forcibly kill the process. Use 
that if available.
-  val destroyMethod = process.getClass().getMethod("destroyForcibly");
-  destroyMethod.setAccessible(true)
-  destroyMethod.invoke(process)
-} catch {
-  case NonFatal(e) =>
-if (!e.isInstanceOf[NoSuchMethodException]) {
-  logWarning("Exception when attempting to kill process", e)
-}
-process.destroy()
-}
+// Politely destroy first
+process.destroy()
+
 if (waitForProcess(process, timeoutMs)) {
+  // Successful exit
   Option(process.exitValue())
 } else {
-  None
+  // Java 8 added a new API which will more forcibly kill the process. Use 
that if available.
+  try {
+classOf[Process].getMethod("destroyForcibly").invoke(process)
+  } catch {
+case _: NoSuchMethodException => return None // Not available; give up
+case NonFatal(e) => logWarning("Exception when attempting to kill 
process", e)
+  }
+  // Wait, again, although this really should return almost immediately
+  if (waitForProcess(process, timeoutMs)) {
+Option(process.exitValue())
+  } else {
+logWarning("Timed out waiting to forcibly kill process")
+None
+  }
 }
   }
 
   /**
* Wait for a process to terminate for at most the specified duration.
-   * Return whether the process actually terminated after the given timeout.
+   *
+   * @return whether the process actually terminated before the given timeout.
*/
   def waitForProcess(process: Process, timeoutMs: Long): Boolean = {
-var terminated = false
-val startTime = System.currentTimeMillis
-while (!terminated) {
-  try {
-process.exitValue()
-terminated = true
-  } catch {
-case e: IllegalThreadStateException =>
-  // Process not terminated yet
-  if (System.currentTimeMillis - startTime > timeoutMs) {
-return false
+try {
+  // Use Java 8 method if available
+  classOf[Process].getMethod("waitFor", java.lang.Long.TYPE, 
classOf[TimeUnit])
+.invoke(process, timeou

spark git commit: [SPARK-15761][MLLIB][PYSPARK] Load ipython when default python is Python3

2016-07-01 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-1.6 83f860448 -> 1026aba16


[SPARK-15761][MLLIB][PYSPARK] Load ipython when default python is Python3

## What changes were proposed in this pull request?

I would like to use IPython with Python 3.5. It is annoying when it fails with 
IPython requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON 
when I have a version greater than 2.7

## How was this patch tested
It now works with IPython and Python3

Author: MechCoder 

Closes #13503 from MechCoder/spark-15761.

(cherry picked from commit 66283ee0b25de2a5daaa21d50a05a7fadec1de77)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1026aba1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1026aba1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1026aba1

Branch: refs/heads/branch-1.6
Commit: 1026aba16554f6c5b5a6a3fdc2b9bdb7911a9fcc
Parents: 83f8604
Author: MechCoder 
Authored: Fri Jul 1 09:27:34 2016 +0100
Committer: Sean Owen 
Committed: Fri Jul 1 09:27:54 2016 +0100

--
 bin/pyspark | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1026aba1/bin/pyspark
--
diff --git a/bin/pyspark b/bin/pyspark
index 5eaa17d..42af597 100755
--- a/bin/pyspark
+++ b/bin/pyspark
@@ -54,9 +54,11 @@ elif [[ -z "$PYSPARK_DRIVER_PYTHON" ]]; then
   PYSPARK_DRIVER_PYTHON="${PYSPARK_PYTHON:-"$DEFAULT_PYTHON"}"
 fi
 
+WORKS_WITH_IPYTHON=$($DEFAULT_PYTHON -c 'import sys; print(sys.version_info >= 
(2, 7, 0))')
+
 # Determine the Python executable to use for the executors:
 if [[ -z "$PYSPARK_PYTHON" ]]; then
-  if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && $DEFAULT_PYTHON != "python2.7" 
]]; then
+  if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && ! WORKS_WITH_IPYTHON ]]; then
 echo "IPython requires Python 2.7+; please install python2.7 or set 
PYSPARK_PYTHON" 1>&2
 exit 1
   else


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15761][MLLIB][PYSPARK] Load ipython when default python is Python3

2016-07-01 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 2075bf8ef -> 66283ee0b


[SPARK-15761][MLLIB][PYSPARK] Load ipython when default python is Python3

## What changes were proposed in this pull request?

I would like to use IPython with Python 3.5. It is annoying when it fails with 
IPython requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON 
when I have a version greater than 2.7

## How was this patch tested
It now works with IPython and Python3

Author: MechCoder 

Closes #13503 from MechCoder/spark-15761.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/66283ee0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/66283ee0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/66283ee0

Branch: refs/heads/master
Commit: 66283ee0b25de2a5daaa21d50a05a7fadec1de77
Parents: 2075bf8
Author: MechCoder 
Authored: Fri Jul 1 09:27:34 2016 +0100
Committer: Sean Owen 
Committed: Fri Jul 1 09:27:34 2016 +0100

--
 bin/pyspark | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/66283ee0/bin/pyspark
--
diff --git a/bin/pyspark b/bin/pyspark
index 396a07c..ac8aa04 100755
--- a/bin/pyspark
+++ b/bin/pyspark
@@ -50,9 +50,11 @@ if [[ -z "$PYSPARK_DRIVER_PYTHON" ]]; then
   PYSPARK_DRIVER_PYTHON="${PYSPARK_PYTHON:-"$DEFAULT_PYTHON"}"
 fi
 
+WORKS_WITH_IPYTHON=$($DEFAULT_PYTHON -c 'import sys; print(sys.version_info >= 
(2, 7, 0))')
+
 # Determine the Python executable to use for the executors:
 if [[ -z "$PYSPARK_PYTHON" ]]; then
-  if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && $DEFAULT_PYTHON != "python2.7" 
]]; then
+  if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && ! WORKS_WITH_IPYTHON ]]; then
 echo "IPython requires Python 2.7+; please install python2.7 or set 
PYSPARK_PYTHON" 1>&2
 exit 1
   else


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15761][MLLIB][PYSPARK] Load ipython when default python is Python3

2016-07-01 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 972106dd3 -> 0b64543c5


[SPARK-15761][MLLIB][PYSPARK] Load ipython when default python is Python3

## What changes were proposed in this pull request?

I would like to use IPython with Python 3.5. It is annoying when it fails with 
IPython requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON 
when I have a version greater than 2.7

## How was this patch tested
It now works with IPython and Python3

Author: MechCoder 

Closes #13503 from MechCoder/spark-15761.

(cherry picked from commit 66283ee0b25de2a5daaa21d50a05a7fadec1de77)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0b64543c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0b64543c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0b64543c

Branch: refs/heads/branch-2.0
Commit: 0b64543c5ba6a943294f189b7ca02e0debbfad9c
Parents: 972106d
Author: MechCoder 
Authored: Fri Jul 1 09:27:34 2016 +0100
Committer: Sean Owen 
Committed: Fri Jul 1 09:27:42 2016 +0100

--
 bin/pyspark | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0b64543c/bin/pyspark
--
diff --git a/bin/pyspark b/bin/pyspark
index 396a07c..ac8aa04 100755
--- a/bin/pyspark
+++ b/bin/pyspark
@@ -50,9 +50,11 @@ if [[ -z "$PYSPARK_DRIVER_PYTHON" ]]; then
   PYSPARK_DRIVER_PYTHON="${PYSPARK_PYTHON:-"$DEFAULT_PYTHON"}"
 fi
 
+WORKS_WITH_IPYTHON=$($DEFAULT_PYTHON -c 'import sys; print(sys.version_info >= 
(2, 7, 0))')
+
 # Determine the Python executable to use for the executors:
 if [[ -z "$PYSPARK_PYTHON" ]]; then
-  if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && $DEFAULT_PYTHON != "python2.7" 
]]; then
+  if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && ! WORKS_WITH_IPYTHON ]]; then
 echo "IPython requires Python 2.7+; please install python2.7 or set 
PYSPARK_PYTHON" 1>&2
 exit 1
   else


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16222][SQL] JDBC Sources - Handling illegal input values for `fetchsize` and `batchsize`

2016-07-01 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 0b64543c5 -> 3665927c6


[SPARK-16222][SQL] JDBC Sources - Handling illegal input values for `fetchsize` 
and `batchsize`

 What changes were proposed in this pull request?
For JDBC data sources, users can specify `batchsize` for multi-row inserts and 
`fetchsize` for multi-row fetch. A few issues exist:

- The property keys are case sensitive. Thus, the existing test cases for 
`fetchsize` use incorrect names, `fetchSize`. Basically, the test cases are 
broken.
- No test case exists for `batchsize`.
- We do not detect the illegal input values for `fetchsize` and `batchsize`.

For example, when `batchsize` is zero, we got the following exception:
```
Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most 
recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): 
java.lang.ArithmeticException: / by zero
```
when `fetchsize` is less than zero, we got the exception from the underlying 
JDBC driver:
```
Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most 
recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): 
org.h2.jdbc.JdbcSQLException: Invalid value "-1" for parameter "rows" 
[90008-183]
```

This PR fixes all the above issues, and issue the appropriate exceptions when 
detecting the illegal inputs for `fetchsize` and `batchsize`. Also update the 
function descriptions.

 How was this patch tested?
Test cases are fixed and added.

Author: gatorsmile 

Closes #13919 from gatorsmile/jdbcProperties.

(cherry picked from commit 0ad6ce7e54b1d8f5946dde652fa5341d15059158)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3665927c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3665927c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3665927c

Branch: refs/heads/branch-2.0
Commit: 3665927c6f5fa4794a59718fd2d339310c70a985
Parents: 0b64543
Author: gatorsmile 
Authored: Fri Jul 1 09:54:02 2016 +0100
Committer: Sean Owen 
Committed: Fri Jul 1 09:54:10 2016 +0100

--
 .../org/apache/spark/sql/DataFrameReader.scala  |  6 +-
 .../org/apache/spark/sql/DataFrameWriter.scala  |  3 +-
 .../execution/datasources/jdbc/JDBCRDD.scala|  6 +-
 .../execution/datasources/jdbc/JdbcUtils.scala  | 10 +++-
 .../apache/spark/sql/jdbc/PostgresDialect.scala |  2 +-
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala   | 62 
 .../apache/spark/sql/jdbc/JDBCWriteSuite.scala  | 54 -
 7 files changed, 98 insertions(+), 45 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3665927c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
index 35ba522..e8c2885 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
@@ -177,7 +177,8 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
*  clause expressions used to split the column 
`columnName` evenly.
* @param connectionProperties JDBC database connection arguments, a list of 
arbitrary string
* tag/value. Normally at least a "user" and 
"password" property
-   * should be included.
+   * should be included. "fetchsize" can be used 
to control the
+   * number of rows per fetch.
* @since 1.4.0
*/
   def jdbc(
@@ -207,7 +208,8 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* @param predicates Condition in the where clause for each partition.
* @param connectionProperties JDBC database connection arguments, a list of 
arbitrary string
* tag/value. Normally at least a "user" and 
"password" property
-   * should be included.
+   * should be included. "fetchsize" can be used 
to control the
+   * number of rows per fetch.
* @since 1.4.0
*/
   def jdbc(

http://git-wip-us.apache.org/repos/asf/spark/blob/3665927c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
index ca3972d..f77af76 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
+++ b/sql/core/src

spark git commit: [SPARK-16222][SQL] JDBC Sources - Handling illegal input values for `fetchsize` and `batchsize`

2016-07-01 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 66283ee0b -> 0ad6ce7e5


[SPARK-16222][SQL] JDBC Sources - Handling illegal input values for `fetchsize` 
and `batchsize`

 What changes were proposed in this pull request?
For JDBC data sources, users can specify `batchsize` for multi-row inserts and 
`fetchsize` for multi-row fetch. A few issues exist:

- The property keys are case sensitive. Thus, the existing test cases for 
`fetchsize` use incorrect names, `fetchSize`. Basically, the test cases are 
broken.
- No test case exists for `batchsize`.
- We do not detect the illegal input values for `fetchsize` and `batchsize`.

For example, when `batchsize` is zero, we got the following exception:
```
Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most 
recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): 
java.lang.ArithmeticException: / by zero
```
when `fetchsize` is less than zero, we got the exception from the underlying 
JDBC driver:
```
Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most 
recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): 
org.h2.jdbc.JdbcSQLException: Invalid value "-1" for parameter "rows" 
[90008-183]
```

This PR fixes all the above issues, and issue the appropriate exceptions when 
detecting the illegal inputs for `fetchsize` and `batchsize`. Also update the 
function descriptions.

 How was this patch tested?
Test cases are fixed and added.

Author: gatorsmile 

Closes #13919 from gatorsmile/jdbcProperties.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0ad6ce7e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0ad6ce7e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0ad6ce7e

Branch: refs/heads/master
Commit: 0ad6ce7e54b1d8f5946dde652fa5341d15059158
Parents: 66283ee
Author: gatorsmile 
Authored: Fri Jul 1 09:54:02 2016 +0100
Committer: Sean Owen 
Committed: Fri Jul 1 09:54:02 2016 +0100

--
 .../org/apache/spark/sql/DataFrameReader.scala  |  6 +-
 .../org/apache/spark/sql/DataFrameWriter.scala  |  3 +-
 .../execution/datasources/jdbc/JDBCRDD.scala|  6 +-
 .../execution/datasources/jdbc/JdbcUtils.scala  | 10 +++-
 .../apache/spark/sql/jdbc/PostgresDialect.scala |  2 +-
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala   | 62 
 .../apache/spark/sql/jdbc/JDBCWriteSuite.scala  | 54 -
 7 files changed, 98 insertions(+), 45 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0ad6ce7e/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
index 35ba522..e8c2885 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
@@ -177,7 +177,8 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
*  clause expressions used to split the column 
`columnName` evenly.
* @param connectionProperties JDBC database connection arguments, a list of 
arbitrary string
* tag/value. Normally at least a "user" and 
"password" property
-   * should be included.
+   * should be included. "fetchsize" can be used 
to control the
+   * number of rows per fetch.
* @since 1.4.0
*/
   def jdbc(
@@ -207,7 +208,8 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* @param predicates Condition in the where clause for each partition.
* @param connectionProperties JDBC database connection arguments, a list of 
arbitrary string
* tag/value. Normally at least a "user" and 
"password" property
-   * should be included.
+   * should be included. "fetchsize" can be used 
to control the
+   * number of rows per fetch.
* @since 1.4.0
*/
   def jdbc(

http://git-wip-us.apache.org/repos/asf/spark/blob/0ad6ce7e/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
index ca3972d..f77af76 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
@@ -391,7 +391,8 @@ final class DataFrameWriter[T

spark git commit: [GRAPHX][EXAMPLES] move graphx test data directory and update graphx document

2016-07-02 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master bad0f7dbb -> 192d1f9cf


[GRAPHX][EXAMPLES] move graphx test data directory and update graphx document

## What changes were proposed in this pull request?

There are two test data files used for graphx examples existing in directory 
"graphx/data"
I move it into "data/" directory because the "graphx" directory is used for 
code files and other test data files (such as mllib, streaming test data) are 
all in there.

I also update the graphx document where reference the data files which I move 
place.

## How was this patch tested?

N/A

Author: WeichenXu 

Closes #14010 from WeichenXu123/move_graphx_data_dir.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/192d1f9c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/192d1f9c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/192d1f9c

Branch: refs/heads/master
Commit: 192d1f9cf3463d050b87422939448f2acf86acc9
Parents: bad0f7d
Author: WeichenXu 
Authored: Sat Jul 2 08:40:23 2016 +0100
Committer: Sean Owen 
Committed: Sat Jul 2 08:40:23 2016 +0100

--
 data/graphx/followers.txt|  8 
 data/graphx/users.txt|  7 +++
 docs/graphx-programming-guide.md | 18 +-
 graphx/data/followers.txt|  8 
 graphx/data/users.txt|  7 ---
 5 files changed, 24 insertions(+), 24 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/192d1f9c/data/graphx/followers.txt
--
diff --git a/data/graphx/followers.txt b/data/graphx/followers.txt
new file mode 100644
index 000..7bb8e90
--- /dev/null
+++ b/data/graphx/followers.txt
@@ -0,0 +1,8 @@
+2 1
+4 1
+1 2
+6 3
+7 3
+7 6
+6 7
+3 7

http://git-wip-us.apache.org/repos/asf/spark/blob/192d1f9c/data/graphx/users.txt
--
diff --git a/data/graphx/users.txt b/data/graphx/users.txt
new file mode 100644
index 000..982d19d
--- /dev/null
+++ b/data/graphx/users.txt
@@ -0,0 +1,7 @@
+1,BarackObama,Barack Obama
+2,ladygaga,Goddess of Love
+3,jeresig,John Resig
+4,justinbieber,Justin Bieber
+6,matei_zaharia,Matei Zaharia
+7,odersky,Martin Odersky
+8,anonsys

http://git-wip-us.apache.org/repos/asf/spark/blob/192d1f9c/docs/graphx-programming-guide.md
--
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index 81cf174..e376b66 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -1007,15 +1007,15 @@ PageRank measures the importance of each vertex in a 
graph, assuming an edge fro
 
 GraphX comes with static and dynamic implementations of PageRank as methods on 
the [`PageRank` object][PageRank]. Static PageRank runs for a fixed number of 
iterations, while dynamic PageRank runs until the ranks converge (i.e., stop 
changing by more than a specified tolerance). [`GraphOps`][GraphOps] allows 
calling these algorithms directly as methods on `Graph`.
 
-GraphX also includes an example social network dataset that we can run 
PageRank on. A set of users is given in `graphx/data/users.txt`, and a set of 
relationships between users is given in `graphx/data/followers.txt`. We compute 
the PageRank of each user as follows:
+GraphX also includes an example social network dataset that we can run 
PageRank on. A set of users is given in `data/graphx/users.txt`, and a set of 
relationships between users is given in `data/graphx/followers.txt`. We compute 
the PageRank of each user as follows:
 
 {% highlight scala %}
 // Load the edges as a graph
-val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt")
+val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
 // Run PageRank
 val ranks = graph.pageRank(0.0001).vertices
 // Join the ranks with the usernames
-val users = sc.textFile("graphx/data/users.txt").map { line =>
+val users = sc.textFile("data/graphx/users.txt").map { line =>
   val fields = line.split(",")
   (fields(0).toLong, fields(1))
 }
@@ -1032,11 +1032,11 @@ The connected components algorithm labels each 
connected component of the graph
 
 {% highlight scala %}
 // Load the graph as in the PageRank example
-val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt")
+val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
 // Find the connected components
 val cc = graph.connectedComponents().vertices
 // Join the connected components with the usernames
-val users = sc.textFile("graphx/data/users.txt").map { line =>
+val users = sc.textFile("data/graphx/users.txt").map { line =>
   val fields = line.split(",")
   (fields(0).toLong, fields(1))
 }
@@ -1053,11 +1053,11 @@ A ver

spark git commit: [GRAPHX][EXAMPLES] move graphx test data directory and update graphx document

2016-07-02 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 ab4303800 -> f3a359939


[GRAPHX][EXAMPLES] move graphx test data directory and update graphx document

## What changes were proposed in this pull request?

There are two test data files used for graphx examples existing in directory 
"graphx/data"
I move it into "data/" directory because the "graphx" directory is used for 
code files and other test data files (such as mllib, streaming test data) are 
all in there.

I also update the graphx document where reference the data files which I move 
place.

## How was this patch tested?

N/A

Author: WeichenXu 

Closes #14010 from WeichenXu123/move_graphx_data_dir.

(cherry picked from commit 192d1f9cf3463d050b87422939448f2acf86acc9)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f3a35993
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f3a35993
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f3a35993

Branch: refs/heads/branch-2.0
Commit: f3a359939afb25c8b91fabe5955e1cdf609be521
Parents: ab43038
Author: WeichenXu 
Authored: Sat Jul 2 08:40:23 2016 +0100
Committer: Sean Owen 
Committed: Sat Jul 2 08:40:31 2016 +0100

--
 data/graphx/followers.txt|  8 
 data/graphx/users.txt|  7 +++
 docs/graphx-programming-guide.md | 18 +-
 graphx/data/followers.txt|  8 
 graphx/data/users.txt|  7 ---
 5 files changed, 24 insertions(+), 24 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f3a35993/data/graphx/followers.txt
--
diff --git a/data/graphx/followers.txt b/data/graphx/followers.txt
new file mode 100644
index 000..7bb8e90
--- /dev/null
+++ b/data/graphx/followers.txt
@@ -0,0 +1,8 @@
+2 1
+4 1
+1 2
+6 3
+7 3
+7 6
+6 7
+3 7

http://git-wip-us.apache.org/repos/asf/spark/blob/f3a35993/data/graphx/users.txt
--
diff --git a/data/graphx/users.txt b/data/graphx/users.txt
new file mode 100644
index 000..982d19d
--- /dev/null
+++ b/data/graphx/users.txt
@@ -0,0 +1,7 @@
+1,BarackObama,Barack Obama
+2,ladygaga,Goddess of Love
+3,jeresig,John Resig
+4,justinbieber,Justin Bieber
+6,matei_zaharia,Matei Zaharia
+7,odersky,Martin Odersky
+8,anonsys

http://git-wip-us.apache.org/repos/asf/spark/blob/f3a35993/docs/graphx-programming-guide.md
--
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index 81cf174..e376b66 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -1007,15 +1007,15 @@ PageRank measures the importance of each vertex in a 
graph, assuming an edge fro
 
 GraphX comes with static and dynamic implementations of PageRank as methods on 
the [`PageRank` object][PageRank]. Static PageRank runs for a fixed number of 
iterations, while dynamic PageRank runs until the ranks converge (i.e., stop 
changing by more than a specified tolerance). [`GraphOps`][GraphOps] allows 
calling these algorithms directly as methods on `Graph`.
 
-GraphX also includes an example social network dataset that we can run 
PageRank on. A set of users is given in `graphx/data/users.txt`, and a set of 
relationships between users is given in `graphx/data/followers.txt`. We compute 
the PageRank of each user as follows:
+GraphX also includes an example social network dataset that we can run 
PageRank on. A set of users is given in `data/graphx/users.txt`, and a set of 
relationships between users is given in `data/graphx/followers.txt`. We compute 
the PageRank of each user as follows:
 
 {% highlight scala %}
 // Load the edges as a graph
-val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt")
+val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
 // Run PageRank
 val ranks = graph.pageRank(0.0001).vertices
 // Join the ranks with the usernames
-val users = sc.textFile("graphx/data/users.txt").map { line =>
+val users = sc.textFile("data/graphx/users.txt").map { line =>
   val fields = line.split(",")
   (fields(0).toLong, fields(1))
 }
@@ -1032,11 +1032,11 @@ The connected components algorithm labels each 
connected component of the graph
 
 {% highlight scala %}
 // Load the graph as in the PageRank example
-val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt")
+val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
 // Find the connected components
 val cc = graph.connectedComponents().vertices
 // Join the connected components with the usernames
-val users = sc.textFile("graphx/data/users.txt").map { line =>
+val users = sc.textFile("data/graphx/users.txt").map { l

spark git commit: [SPARK-16345][DOCUMENTATION][EXAMPLES][GRAPHX] Extract graphx programming guide example snippets from source files instead of hard code them

2016-07-02 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 192d1f9cf -> 0bd7cd18b


[SPARK-16345][DOCUMENTATION][EXAMPLES][GRAPHX] Extract graphx programming guide 
example snippets from source files instead of hard code them

## What changes were proposed in this pull request?

I extract 6 example programs from GraphX programming guide and replace them with
`include_example` label.

The 6 example programs are:
- AggregateMessagesExample.scala
- SSSPExample.scala
- TriangleCountingExample.scala
- ConnectedComponentsExample.scala
- ComprehensiveExample.scala
- PageRankExample.scala

All the example code can run using
`bin/run-example graphx.EXAMPLE_NAME`

## How was this patch tested?

Manual.

Author: WeichenXu 

Closes #14015 from WeichenXu123/graphx_example_plugin.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0bd7cd18
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0bd7cd18
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0bd7cd18

Branch: refs/heads/master
Commit: 0bd7cd18bc4d535b0c4499913f6747b3f6315ac2
Parents: 192d1f9
Author: WeichenXu 
Authored: Sat Jul 2 16:29:00 2016 +0100
Committer: Sean Owen 
Committed: Sat Jul 2 16:29:00 2016 +0100

--
 docs/graphx-programming-guide.md| 133 +--
 .../graphx/AggregateMessagesExample.scala   |  72 ++
 .../examples/graphx/ComprehensiveExample.scala  |  80 +++
 .../graphx/ConnectedComponentsExample.scala |  68 ++
 .../spark/examples/graphx/PageRankExample.scala |  61 +
 .../spark/examples/graphx/SSSPExample.scala |  69 ++
 .../graphx/TriangleCountingExample.scala|  70 ++
 7 files changed, 426 insertions(+), 127 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0bd7cd18/docs/graphx-programming-guide.md
--
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index e376b66..2e9966c 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -603,29 +603,7 @@ slightly unreliable and instead opted for more explicit 
user control.
 In the following example we use the 
[`aggregateMessages`][Graph.aggregateMessages] operator to
 compute the average age of the more senior followers of each user.
 
-{% highlight scala %}
-// Import random graph generation library
-import org.apache.spark.graphx.util.GraphGenerators
-// Create a graph with "age" as the vertex property.  Here we use a random 
graph for simplicity.
-val graph: Graph[Double, Int] =
-  GraphGenerators.logNormalGraph(sc, numVertices = 100).mapVertices( (id, _) 
=> id.toDouble )
-// Compute the number of older followers and their total age
-val olderFollowers: VertexRDD[(Int, Double)] = graph.aggregateMessages[(Int, 
Double)](
-  triplet => { // Map Function
-if (triplet.srcAttr > triplet.dstAttr) {
-  // Send message to destination vertex containing counter and age
-  triplet.sendToDst(1, triplet.srcAttr)
-}
-  },
-  // Add counter and age
-  (a, b) => (a._1 + b._1, a._2 + b._2) // Reduce Function
-)
-// Divide total age by number of older followers to get average age of older 
followers
-val avgAgeOfOlderFollowers: VertexRDD[Double] =
-  olderFollowers.mapValues( (id, value) => value match { case (count, 
totalAge) => totalAge / count } )
-// Display the results
-avgAgeOfOlderFollowers.collect.foreach(println(_))
-{% endhighlight %}
+{% include_example 
scala/org/apache/spark/examples/graphx/AggregateMessagesExample.scala %}
 
 > The `aggregateMessages` operation performs optimally when the messages (and 
 > the sums of
 > messages) are constant sized (e.g., floats and addition instead of lists and 
 > concatenation).
@@ -793,29 +771,7 @@ second argument list contains the user defined functions 
for receiving messages
 We can use the Pregel operator to express computation such as single source
 shortest path in the following example.
 
-{% highlight scala %}
-import org.apache.spark.graphx._
-// Import random graph generation library
-import org.apache.spark.graphx.util.GraphGenerators
-// A graph with edge attributes containing distances
-val graph: Graph[Long, Double] =
-  GraphGenerators.logNormalGraph(sc, numVertices = 100).mapEdges(e => 
e.attr.toDouble)
-val sourceId: VertexId = 42 // The ultimate source
-// Initialize the graph such that all vertices except the root have distance 
infinity.
-val initialGraph = graph.mapVertices((id, _) => if (id == sourceId) 0.0 else 
Double.PositiveInfinity)
-val sssp = initialGraph.pregel(Double.PositiveInfinity)(
-  (id, dist, newDist) => math.min(dist, newDist), // Vertex Program
-  triplet => {  // Send Message
-if (triplet.srcAttr + triplet.attr < triplet.dstAttr) {
-  Iterator((triplet.

spark git commit: [SPARK-16345][DOCUMENTATION][EXAMPLES][GRAPHX] Extract graphx programming guide example snippets from source files instead of hard code them

2016-07-02 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 f3a359939 -> 0d0b41609


[SPARK-16345][DOCUMENTATION][EXAMPLES][GRAPHX] Extract graphx programming guide 
example snippets from source files instead of hard code them

## What changes were proposed in this pull request?

I extract 6 example programs from GraphX programming guide and replace them with
`include_example` label.

The 6 example programs are:
- AggregateMessagesExample.scala
- SSSPExample.scala
- TriangleCountingExample.scala
- ConnectedComponentsExample.scala
- ComprehensiveExample.scala
- PageRankExample.scala

All the example code can run using
`bin/run-example graphx.EXAMPLE_NAME`

## How was this patch tested?

Manual.

Author: WeichenXu 

Closes #14015 from WeichenXu123/graphx_example_plugin.

(cherry picked from commit 0bd7cd18bc4d535b0c4499913f6747b3f6315ac2)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0d0b4160
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0d0b4160
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0d0b4160

Branch: refs/heads/branch-2.0
Commit: 0d0b416097a095fa771a7d5ae368546c26cb2d8b
Parents: f3a3599
Author: WeichenXu 
Authored: Sat Jul 2 16:29:00 2016 +0100
Committer: Sean Owen 
Committed: Sat Jul 2 16:29:26 2016 +0100

--
 docs/graphx-programming-guide.md| 133 +--
 .../graphx/AggregateMessagesExample.scala   |  72 ++
 .../examples/graphx/ComprehensiveExample.scala  |  80 +++
 .../graphx/ConnectedComponentsExample.scala |  68 ++
 .../spark/examples/graphx/PageRankExample.scala |  61 +
 .../spark/examples/graphx/SSSPExample.scala |  69 ++
 .../graphx/TriangleCountingExample.scala|  70 ++
 7 files changed, 426 insertions(+), 127 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0d0b4160/docs/graphx-programming-guide.md
--
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index e376b66..2e9966c 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -603,29 +603,7 @@ slightly unreliable and instead opted for more explicit 
user control.
 In the following example we use the 
[`aggregateMessages`][Graph.aggregateMessages] operator to
 compute the average age of the more senior followers of each user.
 
-{% highlight scala %}
-// Import random graph generation library
-import org.apache.spark.graphx.util.GraphGenerators
-// Create a graph with "age" as the vertex property.  Here we use a random 
graph for simplicity.
-val graph: Graph[Double, Int] =
-  GraphGenerators.logNormalGraph(sc, numVertices = 100).mapVertices( (id, _) 
=> id.toDouble )
-// Compute the number of older followers and their total age
-val olderFollowers: VertexRDD[(Int, Double)] = graph.aggregateMessages[(Int, 
Double)](
-  triplet => { // Map Function
-if (triplet.srcAttr > triplet.dstAttr) {
-  // Send message to destination vertex containing counter and age
-  triplet.sendToDst(1, triplet.srcAttr)
-}
-  },
-  // Add counter and age
-  (a, b) => (a._1 + b._1, a._2 + b._2) // Reduce Function
-)
-// Divide total age by number of older followers to get average age of older 
followers
-val avgAgeOfOlderFollowers: VertexRDD[Double] =
-  olderFollowers.mapValues( (id, value) => value match { case (count, 
totalAge) => totalAge / count } )
-// Display the results
-avgAgeOfOlderFollowers.collect.foreach(println(_))
-{% endhighlight %}
+{% include_example 
scala/org/apache/spark/examples/graphx/AggregateMessagesExample.scala %}
 
 > The `aggregateMessages` operation performs optimally when the messages (and 
 > the sums of
 > messages) are constant sized (e.g., floats and addition instead of lists and 
 > concatenation).
@@ -793,29 +771,7 @@ second argument list contains the user defined functions 
for receiving messages
 We can use the Pregel operator to express computation such as single source
 shortest path in the following example.
 
-{% highlight scala %}
-import org.apache.spark.graphx._
-// Import random graph generation library
-import org.apache.spark.graphx.util.GraphGenerators
-// A graph with edge attributes containing distances
-val graph: Graph[Long, Double] =
-  GraphGenerators.logNormalGraph(sc, numVertices = 100).mapEdges(e => 
e.attr.toDouble)
-val sourceId: VertexId = 42 // The ultimate source
-// Initialize the graph such that all vertices except the root have distance 
infinity.
-val initialGraph = graph.mapVertices((id, _) => if (id == sourceId) 0.0 else 
Double.PositiveInfinity)
-val sssp = initialGraph.pregel(Double.PositiveInfinity)(
-  (id, dist, newDist) => math.min(dist, newDist), // Vertex Program
-  triplet => {

spark git commit: [MINOR][BUILD] Fix Java linter errors

2016-07-02 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 0bd7cd18b -> 3000b4b29


[MINOR][BUILD] Fix Java linter errors

## What changes were proposed in this pull request?

This PR fixes the minor Java linter errors like the following.
```
-public int read(char cbuf[], int off, int len) throws IOException {
+public int read(char[] cbuf, int off, int len) throws IOException {
```

## How was this patch tested?

Manual.
```
$ build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive 
-Phive-thriftserver install
$ dev/lint-java
Using `mvn` from path: /usr/local/bin/mvn
Checkstyle checks passed.
```

Author: Dongjoon Hyun 

Closes #14017 from dongjoon-hyun/minor_build_java_linter_error.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3000b4b2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3000b4b2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3000b4b2

Branch: refs/heads/master
Commit: 3000b4b29f9165f436f186a8c1ba818e24f90615
Parents: 0bd7cd1
Author: Dongjoon Hyun 
Authored: Sat Jul 2 16:31:06 2016 +0100
Committer: Sean Owen 
Committed: Sat Jul 2 16:31:06 2016 +0100

--
 .../shuffle/sort/ShuffleExternalSorter.java  |  3 ++-
 .../unsafe/sort/UnsafeExternalSorter.java| 12 ++--
 .../catalyst/expressions/xml/UDFXPathUtil.java   | 19 +++
 .../sql/execution/UnsafeExternalRowSorter.java   |  4 ++--
 .../UnsafeFixedWidthAggregationMap.java  |  4 ++--
 .../sql/execution/UnsafeKVExternalSorter.java|  3 ++-
 6 files changed, 25 insertions(+), 20 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3000b4b2/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java
--
diff --git 
a/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java 
b/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java
index 696ee73..cf38a04 100644
--- 
a/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java
@@ -376,7 +376,8 @@ final class ShuffleExternalSorter extends MemoryConsumer {
 // for tests
 assert(inMemSorter != null);
 if (inMemSorter.numRecords() >= numElementsForSpillThreshold) {
-  logger.info("Spilling data because number of spilledRecords crossed the 
threshold " + numElementsForSpillThreshold);
+  logger.info("Spilling data because number of spilledRecords crossed the 
threshold " +
+numElementsForSpillThreshold);
   spill();
 }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/3000b4b2/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
--
diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index d6a255e..8d596f8 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -27,7 +27,6 @@ import com.google.common.annotations.VisibleForTesting;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-import org.apache.spark.SparkEnv;
 import org.apache.spark.TaskContext;
 import org.apache.spark.executor.ShuffleWriteMetrics;
 import org.apache.spark.memory.MemoryConsumer;
@@ -99,8 +98,8 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
   long numElementsForSpillThreshold,
   UnsafeInMemorySorter inMemorySorter) throws IOException {
 UnsafeExternalSorter sorter = new UnsafeExternalSorter(taskMemoryManager, 
blockManager,
-  serializerManager, taskContext, recordComparator, prefixComparator, 
initialSize, numElementsForSpillThreshold,
-pageSizeBytes, inMemorySorter, false /* ignored */);
+  serializerManager, taskContext, recordComparator, prefixComparator, 
initialSize,
+numElementsForSpillThreshold, pageSizeBytes, inMemorySorter, false /* 
ignored */);
 sorter.spill(Long.MAX_VALUE, sorter);
 // The external sorter will be used to insert records, in-memory sorter is 
not needed.
 sorter.inMemSorter = null;
@@ -119,8 +118,8 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
   long numElementsForSpillThreshold,
   boolean canUseRadixSort) {
 return new UnsafeExternalSorter(taskMemoryManager, blockManager, 
serializerManager,
-  taskContext, recordComparator, prefixComparator, initialSize, 
pageSizeBytes, numElementsForSpillThreshold, null,
-  canUseRadixSort);
+ 

spark git commit: [MINOR][BUILD] Fix Java linter errors

2016-07-02 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 0d0b41609 -> 0c6fd03fa


[MINOR][BUILD] Fix Java linter errors

This PR fixes the minor Java linter errors like the following.
```
-public int read(char cbuf[], int off, int len) throws IOException {
+public int read(char[] cbuf, int off, int len) throws IOException {
```

Manual.
```
$ build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive 
-Phive-thriftserver install
$ dev/lint-java
Using `mvn` from path: /usr/local/bin/mvn
Checkstyle checks passed.
```

Author: Dongjoon Hyun 

Closes #14017 from dongjoon-hyun/minor_build_java_linter_error.

(cherry picked from commit 3000b4b29f9165f436f186a8c1ba818e24f90615)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0c6fd03f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0c6fd03f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0c6fd03f

Branch: refs/heads/branch-2.0
Commit: 0c6fd03fa763df4afb77ac4738c76f0b73e46ad0
Parents: 0d0b416
Author: Dongjoon Hyun 
Authored: Sat Jul 2 16:31:06 2016 +0100
Committer: Sean Owen 
Committed: Sat Jul 2 16:33:22 2016 +0100

--
 .../spark/shuffle/sort/ShuffleExternalSorter.java   |  3 ++-
 .../collection/unsafe/sort/UnsafeExternalSorter.java| 12 ++--
 .../spark/sql/execution/UnsafeExternalRowSorter.java|  4 ++--
 .../sql/execution/UnsafeFixedWidthAggregationMap.java   |  4 ++--
 .../spark/sql/execution/UnsafeKVExternalSorter.java |  3 ++-
 5 files changed, 14 insertions(+), 12 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0c6fd03f/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java
--
diff --git 
a/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java 
b/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java
index 696ee73..cf38a04 100644
--- 
a/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java
@@ -376,7 +376,8 @@ final class ShuffleExternalSorter extends MemoryConsumer {
 // for tests
 assert(inMemSorter != null);
 if (inMemSorter.numRecords() >= numElementsForSpillThreshold) {
-  logger.info("Spilling data because number of spilledRecords crossed the 
threshold " + numElementsForSpillThreshold);
+  logger.info("Spilling data because number of spilledRecords crossed the 
threshold " +
+numElementsForSpillThreshold);
   spill();
 }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/0c6fd03f/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
--
diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index 8a980d4..50f5b06 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -27,7 +27,6 @@ import com.google.common.annotations.VisibleForTesting;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-import org.apache.spark.SparkEnv;
 import org.apache.spark.TaskContext;
 import org.apache.spark.executor.ShuffleWriteMetrics;
 import org.apache.spark.memory.MemoryConsumer;
@@ -99,8 +98,8 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
   long numElementsForSpillThreshold,
   UnsafeInMemorySorter inMemorySorter) throws IOException {
 UnsafeExternalSorter sorter = new UnsafeExternalSorter(taskMemoryManager, 
blockManager,
-  serializerManager, taskContext, recordComparator, prefixComparator, 
initialSize, numElementsForSpillThreshold,
-pageSizeBytes, inMemorySorter, false /* ignored */);
+  serializerManager, taskContext, recordComparator, prefixComparator, 
initialSize,
+numElementsForSpillThreshold, pageSizeBytes, inMemorySorter, false /* 
ignored */);
 sorter.spill(Long.MAX_VALUE, sorter);
 // The external sorter will be used to insert records, in-memory sorter is 
not needed.
 sorter.inMemSorter = null;
@@ -119,8 +118,8 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
   long numElementsForSpillThreshold,
   boolean canUseRadixSort) {
 return new UnsafeExternalSorter(taskMemoryManager, blockManager, 
serializerManager,
-  taskContext, recordComparator, prefixComparator, initialSize, 
pageSizeBytes, numElementsForSpillThreshold, null,
-  canUseRadixSort);
+  taskContext, record

spark git commit: [MINOR][DOCS] Remove unused images; crush PNGs that could use it for good measure

2016-07-04 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 3ecee573c -> ecbb44709


[MINOR][DOCS] Remove unused images; crush PNGs that could use it for good 
measure

## What changes were proposed in this pull request?

Coincidentally, I discovered that a couple images were unused in `docs/`, and 
then searched and found more, and then realized some PNGs were pretty big and 
could be crushed, and before I knew it, had done the same for the ASF site (not 
committed yet).

No functional change at all, just less superfluous image data.

## How was this patch tested?

`jekyll serve`

Author: Sean Owen 

Closes #14029 from srowen/RemoveCompressImages.

(cherry picked from commit 18fb57f58a04685823408f3a174a8722f155fd4d)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ecbb4470
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ecbb4470
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ecbb4470

Branch: refs/heads/branch-2.0
Commit: ecbb44709bfbaaf3412127dc4569732ade16a6ba
Parents: 3ecee57
Author: Sean Owen 
Authored: Mon Jul 4 09:21:58 2016 +0100
Committer: Sean Owen 
Committed: Mon Jul 4 09:22:09 2016 +0100

--
 .../spark/ui/static/spark-logo-77x50px-hd.png   | Bin 4182 -> 3077 bytes
 docs/img/cluster-overview.png   | Bin 33565 -> 22912 bytes
 docs/img/edge-cut.png   | Bin 12563 -> 0 bytes
 docs/img/edge_cut_vs_vertex_cut.png | Bin 79745 -> 51015 bytes
 docs/img/graph_parallel.png | Bin 92288 -> 0 bytes
 docs/img/graphx_logo.png| Bin 40324 -> 22875 bytes
 docs/img/graphx_performance_comparison.png  | Bin 166343 -> 0 bytes
 docs/img/ml-Pipeline.png| Bin 74030 -> 38536 bytes
 docs/img/ml-PipelineModel.png   | Bin 76019 -> 39228 bytes
 docs/img/property_graph.png | Bin 225151 -> 135699 bytes
 docs/img/spark-logo-hd.png  | Bin 16418 -> 11306 bytes
 docs/img/spark-webui-accumulators.png   | Bin 231065 -> 160167 bytes
 docs/img/streaming-arch.png | Bin 78954 -> 51972 bytes
 docs/img/streaming-dstream-ops.png  | Bin 48429 -> 33495 bytes
 docs/img/streaming-dstream-window.png   | Bin 40938 -> 26622 bytes
 docs/img/streaming-dstream.png  | Bin 26823 -> 17843 bytes
 docs/img/streaming-flow.png | Bin 31544 -> 20425 bytes
 docs/img/streaming-kinesis-arch.png | Bin 115277 -> 86336 bytes
 docs/img/structured-streaming-example-model.png | Bin 125504 -> 79409 bytes
 docs/img/structured-streaming-late-data.png | Bin 138226 -> 91513 bytes
 docs/img/structured-streaming-model.png | Bin 66098 -> 37321 bytes
 .../structured-streaming-stream-as-a-table.png  | Bin 82251 -> 47791 bytes
 docs/img/structured-streaming-window.png| Bin 132875 -> 88102 bytes
 docs/img/triplet.png| Bin 31489 -> 19255 bytes
 docs/img/vertex-cut.png | Bin 12246 -> 0 bytes
 docs/img/vertex_routing_edge_tables.png | Bin 570007 -> 323162 bytes
 26 files changed, 0 insertions(+), 0 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ecbb4470/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png
--
diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png
index ffe2550..cee2891 100644
Binary files 
a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
and 
b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
differ

http://git-wip-us.apache.org/repos/asf/spark/blob/ecbb4470/docs/img/cluster-overview.png
--
diff --git a/docs/img/cluster-overview.png b/docs/img/cluster-overview.png
index 317554c..b1b7c1a 100644
Binary files a/docs/img/cluster-overview.png and 
b/docs/img/cluster-overview.png differ

http://git-wip-us.apache.org/repos/asf/spark/blob/ecbb4470/docs/img/edge-cut.png
--
diff --git a/docs/img/edge-cut.png b/docs/img/edge-cut.png
deleted file mode 100644
index 698f4ff..000
Binary files a/docs/img/edge-cut.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/ecbb4470/docs/img/edge_cut_vs_vertex_cut.png
--
diff --git a/docs/img/edge_cut_vs_vertex_cut.png 
b/docs/img

spark git commit: [MINOR][DOCS] Remove unused images; crush PNGs that could use it for good measure

2016-07-04 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master a539b724c -> 18fb57f58


[MINOR][DOCS] Remove unused images; crush PNGs that could use it for good 
measure

## What changes were proposed in this pull request?

Coincidentally, I discovered that a couple images were unused in `docs/`, and 
then searched and found more, and then realized some PNGs were pretty big and 
could be crushed, and before I knew it, had done the same for the ASF site (not 
committed yet).

No functional change at all, just less superfluous image data.

## How was this patch tested?

`jekyll serve`

Author: Sean Owen 

Closes #14029 from srowen/RemoveCompressImages.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/18fb57f5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/18fb57f5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/18fb57f5

Branch: refs/heads/master
Commit: 18fb57f58a04685823408f3a174a8722f155fd4d
Parents: a539b72
Author: Sean Owen 
Authored: Mon Jul 4 09:21:58 2016 +0100
Committer: Sean Owen 
Committed: Mon Jul 4 09:21:58 2016 +0100

--
 .../spark/ui/static/spark-logo-77x50px-hd.png   | Bin 4182 -> 3077 bytes
 docs/img/cluster-overview.png   | Bin 33565 -> 22912 bytes
 docs/img/edge-cut.png   | Bin 12563 -> 0 bytes
 docs/img/edge_cut_vs_vertex_cut.png | Bin 79745 -> 51015 bytes
 docs/img/graph_parallel.png | Bin 92288 -> 0 bytes
 docs/img/graphx_logo.png| Bin 40324 -> 22875 bytes
 docs/img/graphx_performance_comparison.png  | Bin 166343 -> 0 bytes
 docs/img/ml-Pipeline.png| Bin 74030 -> 38536 bytes
 docs/img/ml-PipelineModel.png   | Bin 76019 -> 39228 bytes
 docs/img/property_graph.png | Bin 225151 -> 135699 bytes
 docs/img/spark-logo-hd.png  | Bin 16418 -> 11306 bytes
 docs/img/spark-webui-accumulators.png   | Bin 231065 -> 160167 bytes
 docs/img/streaming-arch.png | Bin 78954 -> 51972 bytes
 docs/img/streaming-dstream-ops.png  | Bin 48429 -> 33495 bytes
 docs/img/streaming-dstream-window.png   | Bin 40938 -> 26622 bytes
 docs/img/streaming-dstream.png  | Bin 26823 -> 17843 bytes
 docs/img/streaming-flow.png | Bin 31544 -> 20425 bytes
 docs/img/streaming-kinesis-arch.png | Bin 115277 -> 86336 bytes
 docs/img/structured-streaming-example-model.png | Bin 125504 -> 79409 bytes
 docs/img/structured-streaming-late-data.png | Bin 138226 -> 91513 bytes
 docs/img/structured-streaming-model.png | Bin 66098 -> 37321 bytes
 .../structured-streaming-stream-as-a-table.png  | Bin 82251 -> 47791 bytes
 docs/img/structured-streaming-window.png| Bin 132875 -> 88102 bytes
 docs/img/triplet.png| Bin 31489 -> 19255 bytes
 docs/img/vertex-cut.png | Bin 12246 -> 0 bytes
 docs/img/vertex_routing_edge_tables.png | Bin 570007 -> 323162 bytes
 26 files changed, 0 insertions(+), 0 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/18fb57f5/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png
--
diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png
index ffe2550..cee2891 100644
Binary files 
a/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
and 
b/core/src/main/resources/org/apache/spark/ui/static/spark-logo-77x50px-hd.png 
differ

http://git-wip-us.apache.org/repos/asf/spark/blob/18fb57f5/docs/img/cluster-overview.png
--
diff --git a/docs/img/cluster-overview.png b/docs/img/cluster-overview.png
index 317554c..b1b7c1a 100644
Binary files a/docs/img/cluster-overview.png and 
b/docs/img/cluster-overview.png differ

http://git-wip-us.apache.org/repos/asf/spark/blob/18fb57f5/docs/img/edge-cut.png
--
diff --git a/docs/img/edge-cut.png b/docs/img/edge-cut.png
deleted file mode 100644
index 698f4ff..000
Binary files a/docs/img/edge-cut.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/spark/blob/18fb57f5/docs/img/edge_cut_vs_vertex_cut.png
--
diff --git a/docs/img/edge_cut_vs_vertex_cut.png 
b/docs/img/edge_cut_vs_vertex_cut.png
index ae30396..5b1ed78 100644
Binary files a/docs/img/edge_cut_vs_vertex_cut.png and 
b/docs/img/edg

svn commit: r1751226 - in /spark: _includes/ images/ site/images/

2016-07-04 Thread srowen
Author: srowen
Date: Mon Jul  4 08:31:32 2016
New Revision: 1751226

URL: http://svn.apache.org/viewvc?rev=1751226&view=rev
Log:
Remove unused images from Spark site; crush large PNGs; remove obsolete .html 
_includes

Removed:
spark/_includes/footer.html
spark/_includes/navbar.html
spark/images/Summit-Logo-FINALtr-150x150px.png
spark/images/amplab-small.png
spark/images/download.png
spark/images/incubator-logo.png
spark/images/logistic-regression2.png
spark/images/scaling.png
spark/images/spark-lr.png
spark/images/spark-project-header1-cropped.png
spark/images/spark-project-header1.png
spark/images/spark-streaming-throughput.png
spark/site/images/Summit-Logo-FINALtr-150x150px.png
spark/site/images/amplab-small.png
spark/site/images/download.png
spark/site/images/incubator-logo.png
spark/site/images/logistic-regression2.png
spark/site/images/scaling.png
spark/site/images/spark-lr.png
spark/site/images/spark-project-header1-cropped.png
spark/site/images/spark-project-header1.png
spark/site/images/spark-streaming-throughput.png
Modified:
spark/images/0.8.0-ui-screenshot.png
spark/images/graphx-perf-comparison.png
spark/images/jdbc.png
spark/images/logistic-regression.png
spark/images/spark-logo-trademark.png
spark/images/spark-logo.png
spark/images/spark-runs-everywhere.png
spark/images/spark-stack.png
spark/images/spark-streaming-recovery.png
spark/images/sql-hive-arch.png
spark/site/images/0.8.0-ui-screenshot.png
spark/site/images/graphx-perf-comparison.png
spark/site/images/jdbc.png
spark/site/images/logistic-regression.png
spark/site/images/spark-logo-trademark.png
spark/site/images/spark-logo.png
spark/site/images/spark-runs-everywhere.png
spark/site/images/spark-stack.png
spark/site/images/spark-streaming-recovery.png
spark/site/images/sql-hive-arch.png

Modified: spark/images/0.8.0-ui-screenshot.png
URL: 
http://svn.apache.org/viewvc/spark/images/0.8.0-ui-screenshot.png?rev=1751226&r1=1751225&r2=1751226&view=diff
==
Binary files - no diff available.

Modified: spark/images/graphx-perf-comparison.png
URL: 
http://svn.apache.org/viewvc/spark/images/graphx-perf-comparison.png?rev=1751226&r1=1751225&r2=1751226&view=diff
==
Binary files - no diff available.

Modified: spark/images/jdbc.png
URL: 
http://svn.apache.org/viewvc/spark/images/jdbc.png?rev=1751226&r1=1751225&r2=1751226&view=diff
==
Binary files - no diff available.

Modified: spark/images/logistic-regression.png
URL: 
http://svn.apache.org/viewvc/spark/images/logistic-regression.png?rev=1751226&r1=1751225&r2=1751226&view=diff
==
Binary files - no diff available.

Modified: spark/images/spark-logo-trademark.png
URL: 
http://svn.apache.org/viewvc/spark/images/spark-logo-trademark.png?rev=1751226&r1=1751225&r2=1751226&view=diff
==
Binary files - no diff available.

Modified: spark/images/spark-logo.png
URL: 
http://svn.apache.org/viewvc/spark/images/spark-logo.png?rev=1751226&r1=1751225&r2=1751226&view=diff
==
Binary files - no diff available.

Modified: spark/images/spark-runs-everywhere.png
URL: 
http://svn.apache.org/viewvc/spark/images/spark-runs-everywhere.png?rev=1751226&r1=1751225&r2=1751226&view=diff
==
Binary files - no diff available.

Modified: spark/images/spark-stack.png
URL: 
http://svn.apache.org/viewvc/spark/images/spark-stack.png?rev=1751226&r1=1751225&r2=1751226&view=diff
==
Binary files - no diff available.

Modified: spark/images/spark-streaming-recovery.png
URL: 
http://svn.apache.org/viewvc/spark/images/spark-streaming-recovery.png?rev=1751226&r1=1751225&r2=1751226&view=diff
==
Binary files - no diff available.

Modified: spark/images/sql-hive-arch.png
URL: 
http://svn.apache.org/viewvc/spark/images/sql-hive-arch.png?rev=1751226&r1=1751225&r2=1751226&view=diff
==
Binary files - no diff available.

Modified: spark/site/images/0.8.0-ui-screenshot.png
URL: 
http://svn.apache.org/viewvc/spark/site/images/0.8.0-ui-screenshot.png?rev=1751226&r1=1751225&r2=1751226&view=diff

<    1   2   3   4   5   6   7   8   9   10   >