from:"jshao"

spark git commit: [SPARK-22587] Spark job fails if fs.defaultFS and application jar are different url

2018-01-10 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master 9b33dfc40 -> a6647ffbf


[SPARK-22587] Spark job fails if fs.defaultFS and application jar are different 
url

## What changes were proposed in this pull request?

Two filesystems comparing does not consider the authority of URI. This is 
specific for
WASB file storage system, where userInfo is honored to differentiate 
filesystems.
For example: wasbs://user1xyz.net, wasbs://user2xyz.net would consider as two 
filesystem.
Therefore, we have to add the authority to compare two filesystem, and  two 
filesystem with different authority can not be the same FS.

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Author: Mingjie Tang 

Closes #19885 from merlintang/EAR-7377.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a6647ffb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a6647ffb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a6647ffb

Branch: refs/heads/master
Commit: a6647ffbf7a312a3e119a9beef90880cc915aa60
Parents: 9b33dfc
Author: Mingjie Tang 
Authored: Thu Jan 11 11:51:03 2018 +0800
Committer: jerryshao 
Committed: Thu Jan 11 11:51:03 2018 +0800

--
 .../org/apache/spark/deploy/yarn/Client.scala   | 24 +++---
 .../apache/spark/deploy/yarn/ClientSuite.scala  | 33 
 2 files changed, 53 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a6647ffb/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
--
diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index 15328d0..8cd3cd9 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -1421,15 +1421,20 @@ private object Client extends Logging {
   }
 
   /**
-   * Return whether the two file systems are the same.
+   * Return whether two URI represent file system are the same
*/
-  private def compareFs(srcFs: FileSystem, destFs: FileSystem): Boolean = {
-val srcUri = srcFs.getUri()
-val dstUri = destFs.getUri()
+  private[spark] def compareUri(srcUri: URI, dstUri: URI): Boolean = {
+
 if (srcUri.getScheme() == null || srcUri.getScheme() != 
dstUri.getScheme()) {
   return false
 }
 
+val srcAuthority = srcUri.getAuthority()
+val dstAuthority = dstUri.getAuthority()
+if (srcAuthority != null && !srcAuthority.equalsIgnoreCase(dstAuthority)) {
+  return false
+}
+
 var srcHost = srcUri.getHost()
 var dstHost = dstUri.getHost()
 
@@ -1447,6 +1452,17 @@ private object Client extends Logging {
 }
 
 Objects.equal(srcHost, dstHost) && srcUri.getPort() == dstUri.getPort()
+
+  }
+
+  /**
+   * Return whether the two file systems are the same.
+   */
+  protected def compareFs(srcFs: FileSystem, destFs: FileSystem): Boolean = {
+val srcUri = srcFs.getUri()
+val dstUri = destFs.getUri()
+
+compareUri(srcUri, dstUri)
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/a6647ffb/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
--
diff --git 
a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
 
b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
index 9d5f5eb..7fa5971 100644
--- 
a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
+++ 
b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
@@ -357,6 +357,39 @@ class ClientSuite extends SparkFunSuite with Matchers {
 sparkConf.get(SECONDARY_JARS) should be (Some(Seq(new 
File(jar2.toURI).getName)))
   }
 
+  private val matching = Seq(
+("files URI match test1", "file:///file1", "file:///file2"),
+("files URI match test2", "file:///c:file1", "file://c:file2"),
+("files URI match test3", "file://host/file1", "file://host/file2"),
+("wasb URI match test", "wasb://bucket1@user", "wasb://bucket1@user/"),
+("hdfs URI match test", "hdfs:/path1", "hdfs:/path1")
+  )
+
+  matching.foreach { t =>
+  test(t._1) {
+assert(Client.compareUri(new URI(t._2), new URI(t._3)),
+  s"No match between ${t._2} and ${t._3}")
+  }
+  }
+
+  private val unmatching = Seq(
+("files URI unmatch test1", "file:///file1", "file://host/file2"),
+("files URI unmatch test2", "file://host/file1", "file:///file2"),
+("files URI unmatch test3", "file://host/file1

spark git commit: [SPARK-22587] Spark job fails if fs.defaultFS and application jar are different url

2018-01-10 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 551ccfba5 -> 317b0aaed


[SPARK-22587] Spark job fails if fs.defaultFS and application jar are different 
url

## What changes were proposed in this pull request?

Two filesystems comparing does not consider the authority of URI. This is 
specific for
WASB file storage system, where userInfo is honored to differentiate 
filesystems.
For example: wasbs://user1xyz.net, wasbs://user2xyz.net would consider as two 
filesystem.
Therefore, we have to add the authority to compare two filesystem, and  two 
filesystem with different authority can not be the same FS.

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Author: Mingjie Tang 

Closes #19885 from merlintang/EAR-7377.

(cherry picked from commit a6647ffbf7a312a3e119a9beef90880cc915aa60)
Signed-off-by: jerryshao 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/317b0aae
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/317b0aae
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/317b0aae

Branch: refs/heads/branch-2.3
Commit: 317b0aaed83e4bbf66f63ddc0d618da9f1f85085
Parents: 551ccfb
Author: Mingjie Tang 
Authored: Thu Jan 11 11:51:03 2018 +0800
Committer: jerryshao 
Committed: Thu Jan 11 11:51:34 2018 +0800

--
 .../org/apache/spark/deploy/yarn/Client.scala   | 24 +++---
 .../apache/spark/deploy/yarn/ClientSuite.scala  | 33 
 2 files changed, 53 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/317b0aae/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
--
diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index 15328d0..8cd3cd9 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -1421,15 +1421,20 @@ private object Client extends Logging {
   }
 
   /**
-   * Return whether the two file systems are the same.
+   * Return whether two URI represent file system are the same
*/
-  private def compareFs(srcFs: FileSystem, destFs: FileSystem): Boolean = {
-val srcUri = srcFs.getUri()
-val dstUri = destFs.getUri()
+  private[spark] def compareUri(srcUri: URI, dstUri: URI): Boolean = {
+
 if (srcUri.getScheme() == null || srcUri.getScheme() != 
dstUri.getScheme()) {
   return false
 }
 
+val srcAuthority = srcUri.getAuthority()
+val dstAuthority = dstUri.getAuthority()
+if (srcAuthority != null && !srcAuthority.equalsIgnoreCase(dstAuthority)) {
+  return false
+}
+
 var srcHost = srcUri.getHost()
 var dstHost = dstUri.getHost()
 
@@ -1447,6 +1452,17 @@ private object Client extends Logging {
 }
 
 Objects.equal(srcHost, dstHost) && srcUri.getPort() == dstUri.getPort()
+
+  }
+
+  /**
+   * Return whether the two file systems are the same.
+   */
+  protected def compareFs(srcFs: FileSystem, destFs: FileSystem): Boolean = {
+val srcUri = srcFs.getUri()
+val dstUri = destFs.getUri()
+
+compareUri(srcUri, dstUri)
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/spark/blob/317b0aae/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
--
diff --git 
a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
 
b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
index 9d5f5eb..7fa5971 100644
--- 
a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
+++ 
b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
@@ -357,6 +357,39 @@ class ClientSuite extends SparkFunSuite with Matchers {
 sparkConf.get(SECONDARY_JARS) should be (Some(Seq(new 
File(jar2.toURI).getName)))
   }
 
+  private val matching = Seq(
+("files URI match test1", "file:///file1", "file:///file2"),
+("files URI match test2", "file:///c:file1", "file://c:file2"),
+("files URI match test3", "file://host/file1", "file://host/file2"),
+("wasb URI match test", "wasb://bucket1@user", "wasb://bucket1@user/"),
+("hdfs URI match test", "hdfs:/path1", "hdfs:/path1")
+  )
+
+  matching.foreach { t =>
+  test(t._1) {
+assert(Client.compareUri(new URI(t._2), new URI(t._3)),
+  s"No match between ${t._2} and ${t._3}")
+  }
+  }
+
+  private val unmatching = Seq(
+("files URI unmatch test1", "file:///file1", "file://host/file2"),
+("files URI un

spark git commit: [SPARK-22976][CORE] Cluster mode driver dir removed while running

2018-01-21 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master 602c6d82d -> 11daeb833


[SPARK-22976][CORE] Cluster mode driver dir removed while running

## What changes were proposed in this pull request?

The clean up logic on the worker perviously determined the liveness of a
particular applicaiton based on whether or not it had running executors.
This would fail in the case that a directory was made for a driver
running in cluster mode if that driver had no running executors on the
same machine. To preserve driver directories we consider both executors
and running drivers when checking directory liveness.

## How was this patch tested?

Manually started up two node cluster with a single core on each node. Turned on 
worker directory cleanup and set the interval to 1 second and liveness to one 
second. Without the patch the driver directory is removed immediately after the 
app is launched. With the patch it is not

### Without Patch
```
INFO  2018-01-05 23:48:24,693 Logging.scala:54 - Asked to launch driver 
driver-20180105234824-
INFO  2018-01-05 23:48:25,293 Logging.scala:54 - Changing view acls to: 
cassandra
INFO  2018-01-05 23:48:25,293 Logging.scala:54 - Changing modify acls to: 
cassandra
INFO  2018-01-05 23:48:25,294 Logging.scala:54 - Changing view acls groups to:
INFO  2018-01-05 23:48:25,294 Logging.scala:54 - Changing modify acls groups to:
INFO  2018-01-05 23:48:25,294 Logging.scala:54 - SecurityManager: 
authentication disabled; ui acls disabled; users  with view permissions: 
Set(cassandra); groups with view permissions: Set(); users  with modify 
permissions: Set(cassandra); groups with modify permissions: Set()
INFO  2018-01-05 23:48:25,330 Logging.scala:54 - Copying user jar 
file:/home/automaton/writeRead-0.1.jar to 
/var/lib/spark/worker/driver-20180105234824-/writeRead-0.1.jar
INFO  2018-01-05 23:48:25,332 Logging.scala:54 - Copying 
/home/automaton/writeRead-0.1.jar to 
/var/lib/spark/worker/driver-20180105234824-/writeRead-0.1.jar
INFO  2018-01-05 23:48:25,361 Logging.scala:54 - Launch Command: 
"/usr/lib/jvm/jdk1.8.0_40//bin/java" 

INFO  2018-01-05 23:48:56,577 Logging.scala:54 - Removing directory: 
/var/lib/spark/worker/driver-20180105234824-  ### << Cleaned up

--
One minute passes while app runs (app has 1 minute sleep built in)
--

WARN  2018-01-05 23:49:58,080 ShuffleSecretManager.java:73 - Attempted to 
unregister application app-20180105234831- when it is not registered
INFO  2018-01-05 23:49:58,081 ExternalShuffleBlockResolver.java:163 - 
Application app-20180105234831- removed, cleanupLocalDirs = false
INFO  2018-01-05 23:49:58,081 ExternalShuffleBlockResolver.java:163 - 
Application app-20180105234831- removed, cleanupLocalDirs = false
INFO  2018-01-05 23:49:58,082 ExternalShuffleBlockResolver.java:163 - 
Application app-20180105234831- removed, cleanupLocalDirs = true
INFO  2018-01-05 23:50:00,999 Logging.scala:54 - Driver 
driver-20180105234824- exited successfully
```

With Patch
```
INFO  2018-01-08 23:19:54,603 Logging.scala:54 - Asked to launch driver 
driver-20180108231954-0002
INFO  2018-01-08 23:19:54,975 Logging.scala:54 - Changing view acls to: 
automaton
INFO  2018-01-08 23:19:54,976 Logging.scala:54 - Changing modify acls to: 
automaton
INFO  2018-01-08 23:19:54,976 Logging.scala:54 - Changing view acls groups to:
INFO  2018-01-08 23:19:54,976 Logging.scala:54 - Changing modify acls groups to:
INFO  2018-01-08 23:19:54,976 Logging.scala:54 - SecurityManager: 
authentication disabled; ui acls disabled; users  with view permissions: 
Set(automaton); groups with view permissions: Set(); users  with modify 
permissions: Set(automaton); groups with modify permissions: Set()
INFO  2018-01-08 23:19:55,029 Logging.scala:54 - Copying user jar 
file:/home/automaton/writeRead-0.1.jar to 
/var/lib/spark/worker/driver-20180108231954-0002/writeRead-0.1.jar
INFO  2018-01-08 23:19:55,031 Logging.scala:54 - Copying 
/home/automaton/writeRead-0.1.jar to 
/var/lib/spark/worker/driver-20180108231954-0002/writeRead-0.1.jar
INFO  2018-01-08 23:19:55,038 Logging.scala:54 - Launch Command: ..
INFO  2018-01-08 23:21:28,674 ShuffleSecretManager.java:69 - Unregistered 
shuffle secret for application app-20180108232000-
INFO  2018-01-08 23:21:28,675 ExternalShuffleBlockResolver.java:163 - 
Application app-20180108232000- removed, cleanupLocalDirs = false
INFO  2018-01-08 23:21:28,675 ExternalShuffleBlockResolver.java:163 - 
Application app-20180108232000- removed, cleanupLocalDirs = false
INFO  2018-01-08 23:21:28,681 ExternalShuffleBlockResolver.java:163 - 
Application app-20180108232000- removed, cleanupLocalDirs = true
INFO  2018-01-08 23:21:31,703 Logging.scala:54 - Driver 
driver-20180108231954-0002 exited successfully
*
INFO  2018-01-08 23:21:32,346 Logging.scala:54 - Removing directory: 
/var/lib/spark/worker/driver-20180108231954-0002 ### < Happening AFTER the Run 
completes rather than during it
*
```

A

spark git commit: [SPARK-22976][CORE] Cluster mode driver dir removed while running

2018-01-21 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 7520491bf -> 5781fa79e


[SPARK-22976][CORE] Cluster mode driver dir removed while running

## What changes were proposed in this pull request?

The clean up logic on the worker perviously determined the liveness of a
particular applicaiton based on whether or not it had running executors.
This would fail in the case that a directory was made for a driver
running in cluster mode if that driver had no running executors on the
same machine. To preserve driver directories we consider both executors
and running drivers when checking directory liveness.

## How was this patch tested?

Manually started up two node cluster with a single core on each node. Turned on 
worker directory cleanup and set the interval to 1 second and liveness to one 
second. Without the patch the driver directory is removed immediately after the 
app is launched. With the patch it is not

### Without Patch
```
INFO  2018-01-05 23:48:24,693 Logging.scala:54 - Asked to launch driver 
driver-20180105234824-
INFO  2018-01-05 23:48:25,293 Logging.scala:54 - Changing view acls to: 
cassandra
INFO  2018-01-05 23:48:25,293 Logging.scala:54 - Changing modify acls to: 
cassandra
INFO  2018-01-05 23:48:25,294 Logging.scala:54 - Changing view acls groups to:
INFO  2018-01-05 23:48:25,294 Logging.scala:54 - Changing modify acls groups to:
INFO  2018-01-05 23:48:25,294 Logging.scala:54 - SecurityManager: 
authentication disabled; ui acls disabled; users  with view permissions: 
Set(cassandra); groups with view permissions: Set(); users  with modify 
permissions: Set(cassandra); groups with modify permissions: Set()
INFO  2018-01-05 23:48:25,330 Logging.scala:54 - Copying user jar 
file:/home/automaton/writeRead-0.1.jar to 
/var/lib/spark/worker/driver-20180105234824-/writeRead-0.1.jar
INFO  2018-01-05 23:48:25,332 Logging.scala:54 - Copying 
/home/automaton/writeRead-0.1.jar to 
/var/lib/spark/worker/driver-20180105234824-/writeRead-0.1.jar
INFO  2018-01-05 23:48:25,361 Logging.scala:54 - Launch Command: 
"/usr/lib/jvm/jdk1.8.0_40//bin/java" 

INFO  2018-01-05 23:48:56,577 Logging.scala:54 - Removing directory: 
/var/lib/spark/worker/driver-20180105234824-  ### << Cleaned up

--
One minute passes while app runs (app has 1 minute sleep built in)
--

WARN  2018-01-05 23:49:58,080 ShuffleSecretManager.java:73 - Attempted to 
unregister application app-20180105234831- when it is not registered
INFO  2018-01-05 23:49:58,081 ExternalShuffleBlockResolver.java:163 - 
Application app-20180105234831- removed, cleanupLocalDirs = false
INFO  2018-01-05 23:49:58,081 ExternalShuffleBlockResolver.java:163 - 
Application app-20180105234831- removed, cleanupLocalDirs = false
INFO  2018-01-05 23:49:58,082 ExternalShuffleBlockResolver.java:163 - 
Application app-20180105234831- removed, cleanupLocalDirs = true
INFO  2018-01-05 23:50:00,999 Logging.scala:54 - Driver 
driver-20180105234824- exited successfully
```

With Patch
```
INFO  2018-01-08 23:19:54,603 Logging.scala:54 - Asked to launch driver 
driver-20180108231954-0002
INFO  2018-01-08 23:19:54,975 Logging.scala:54 - Changing view acls to: 
automaton
INFO  2018-01-08 23:19:54,976 Logging.scala:54 - Changing modify acls to: 
automaton
INFO  2018-01-08 23:19:54,976 Logging.scala:54 - Changing view acls groups to:
INFO  2018-01-08 23:19:54,976 Logging.scala:54 - Changing modify acls groups to:
INFO  2018-01-08 23:19:54,976 Logging.scala:54 - SecurityManager: 
authentication disabled; ui acls disabled; users  with view permissions: 
Set(automaton); groups with view permissions: Set(); users  with modify 
permissions: Set(automaton); groups with modify permissions: Set()
INFO  2018-01-08 23:19:55,029 Logging.scala:54 - Copying user jar 
file:/home/automaton/writeRead-0.1.jar to 
/var/lib/spark/worker/driver-20180108231954-0002/writeRead-0.1.jar
INFO  2018-01-08 23:19:55,031 Logging.scala:54 - Copying 
/home/automaton/writeRead-0.1.jar to 
/var/lib/spark/worker/driver-20180108231954-0002/writeRead-0.1.jar
INFO  2018-01-08 23:19:55,038 Logging.scala:54 - Launch Command: ..
INFO  2018-01-08 23:21:28,674 ShuffleSecretManager.java:69 - Unregistered 
shuffle secret for application app-20180108232000-
INFO  2018-01-08 23:21:28,675 ExternalShuffleBlockResolver.java:163 - 
Application app-20180108232000- removed, cleanupLocalDirs = false
INFO  2018-01-08 23:21:28,675 ExternalShuffleBlockResolver.java:163 - 
Application app-20180108232000- removed, cleanupLocalDirs = false
INFO  2018-01-08 23:21:28,681 ExternalShuffleBlockResolver.java:163 - 
Application app-20180108232000- removed, cleanupLocalDirs = true
INFO  2018-01-08 23:21:31,703 Logging.scala:54 - Driver 
driver-20180108231954-0002 exited successfully
*
INFO  2018-01-08 23:21:32,346 Logging.scala:54 - Removing directory: 
/var/lib/spark/worker/driver-20180108231954-0002 ### < Happening AFTER the Run 
completes rather than during it
*
``

spark git commit: [MINOR][DOC] Fix the path to the examples jar

2018-01-22 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master ec2289761 -> 60175e959


[MINOR][DOC] Fix the path to the examples jar

## What changes were proposed in this pull request?

The example jar file is now in ./examples/jars directory of Spark distribution.

Author: Arseniy Tashoyan 

Closes #20349 from tashoyan/patch-1.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/60175e95
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/60175e95
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/60175e95

Branch: refs/heads/master
Commit: 60175e959f275d2961798fbc5a9150dac9de51ff
Parents: ec22897
Author: Arseniy Tashoyan 
Authored: Mon Jan 22 20:17:05 2018 +0800
Committer: jerryshao 
Committed: Mon Jan 22 20:17:05 2018 +0800

--
 docs/running-on-yarn.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/60175e95/docs/running-on-yarn.md
--
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index e4f5a0c..c010af3 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -35,7 +35,7 @@ For example:
 --executor-memory 2g \
 --executor-cores 1 \
 --queue thequeue \
-lib/spark-examples*.jar \
+examples/jars/spark-examples*.jar \
 10
 
 The above starts a YARN client program which starts the default Application 
Master. Then SparkPi will be run as a child thread of Application Master. The 
client will periodically poll the Application Master for status updates and 
display them in the console. The client will exit once your application has 
finished running.  Refer to the "Debugging your Application" section below for 
how to see driver and executor logs.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][DOC] Fix the path to the examples jar

2018-01-22 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 57c320a0d -> cf078a205


[MINOR][DOC] Fix the path to the examples jar

## What changes were proposed in this pull request?

The example jar file is now in ./examples/jars directory of Spark distribution.

Author: Arseniy Tashoyan 

Closes #20349 from tashoyan/patch-1.

(cherry picked from commit 60175e959f275d2961798fbc5a9150dac9de51ff)
Signed-off-by: jerryshao 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cf078a20
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cf078a20
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cf078a20

Branch: refs/heads/branch-2.3
Commit: cf078a205a14d8709e2c4a9d9f23f6efa20b4fe7
Parents: 57c320a
Author: Arseniy Tashoyan 
Authored: Mon Jan 22 20:17:05 2018 +0800
Committer: jerryshao 
Committed: Mon Jan 22 20:20:45 2018 +0800

--
 docs/running-on-yarn.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cf078a20/docs/running-on-yarn.md
--
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index e4f5a0c..c010af3 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -35,7 +35,7 @@ For example:
 --executor-memory 2g \
 --executor-cores 1 \
 --queue thequeue \
-lib/spark-examples*.jar \
+examples/jars/spark-examples*.jar \
 10
 
 The above starts a YARN client program which starts the default Application 
Master. Then SparkPi will be run as a child thread of Application Master. The 
client will periodically poll the Application Master for status updates and 
display them in the console. The client will exit once your application has 
finished running.  Refer to the "Debugging your Application" section below for 
how to see driver and executor logs.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23200] Reset Kubernetes-specific config on Checkpoint restore

2018-01-25 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master 70a68b328 -> d1721816d


[SPARK-23200] Reset Kubernetes-specific config on Checkpoint restore

## What changes were proposed in this pull request?

When using the Kubernetes cluster-manager and spawning a Streaming workload, it 
is important to reset many spark.kubernetes.* properties that are generated by 
spark-submit but which would get rewritten when restoring a Checkpoint. This is 
so, because the spark-submit codepath creates Kubernetes resources, such as a 
ConfigMap, a Secret and other variables, which have an autogenerated name and 
the previous one will not resolve anymore.

In short, this change enables checkpoint restoration for streaming workloads, 
and thus enables Spark Streaming workloads in Kubernetes, which were not 
possible to restore from a checkpoint before if the workload went down.

## How was this patch tested?

This patch was tested with the twitter-streaming example in AWS, using 
checkpoints in s3 with the s3a:// protocol, as supported by Hadoop.

This is similar to the YARN related code for resetting a Spark Streaming 
workload, but for the Kubernetes scheduler. I'm adding the initcontainers 
properties because even if the discussion is not completely settled on the 
mailing list, my understanding is that at this moment they are going forward 
for the moment.

For a previous discussion, see the non-rebased work at: 
https://github.com/apache-spark-on-k8s/spark/pull/516

Author: Santiago Saavedra 

Closes #20383 from ssaavedra/fix-k8s-checkpointing.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d1721816
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d1721816
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d1721816

Branch: refs/heads/master
Commit: d1721816d26bedee3c72eeb75db49da500568376
Parents: 70a68b3
Author: Santiago Saavedra 
Authored: Fri Jan 26 15:24:06 2018 +0800
Committer: jerryshao 
Committed: Fri Jan 26 15:24:06 2018 +0800

--
 .../org/apache/spark/streaming/Checkpoint.scala | 16 
 1 file changed, 16 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d1721816/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
--
diff --git 
a/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala 
b/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
index aed67a5..ed2a896 100644
--- a/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
+++ b/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
@@ -53,6 +53,21 @@ class Checkpoint(ssc: StreamingContext, val checkpointTime: 
Time)
   "spark.driver.host",
   "spark.driver.bindAddress",
   "spark.driver.port",
+  "spark.kubernetes.driver.pod.name",
+  "spark.kubernetes.executor.podNamePrefix",
+  "spark.kubernetes.initcontainer.executor.configmapname",
+  "spark.kubernetes.initcontainer.executor.configmapkey",
+  "spark.kubernetes.initcontainer.downloadJarsResourceIdentifier",
+  "spark.kubernetes.initcontainer.downloadJarsSecretLocation",
+  "spark.kubernetes.initcontainer.downloadFilesResourceIdentifier",
+  "spark.kubernetes.initcontainer.downloadFilesSecretLocation",
+  "spark.kubernetes.initcontainer.remoteJars",
+  "spark.kubernetes.initcontainer.remoteFiles",
+  "spark.kubernetes.mountdependencies.jarsDownloadDir",
+  "spark.kubernetes.mountdependencies.filesDownloadDir",
+  "spark.kubernetes.initcontainer.executor.stagingServerSecret.name",
+  "spark.kubernetes.initcontainer.executor.stagingServerSecret.mountDir",
+  "spark.kubernetes.executor.limit.cores",
   "spark.master",
   "spark.yarn.jars",
   "spark.yarn.keytab",
@@ -66,6 +81,7 @@ class Checkpoint(ssc: StreamingContext, val checkpointTime: 
Time)
 val newSparkConf = new SparkConf(loadDefaults = 
false).setAll(sparkConfPairs)
   .remove("spark.driver.host")
   .remove("spark.driver.bindAddress")
+  .remove("spark.kubernetes.driver.pod.name")
   .remove("spark.driver.port")
 val newReloadConf = new SparkConf(loadDefaults = true)
 propertiesToReload.foreach { prop =>


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23088][CORE] History server not showing incomplete/running applications

2018-01-29 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master f235df66a -> 31bd1dab1


[SPARK-23088][CORE] History server not showing incomplete/running applications

## What changes were proposed in this pull request?

History server not showing incomplete/running applications when 
spark.history.ui.maxApplications property is set to a value that is smaller 
than the total number of applications.

## How was this patch tested?

Verified manually against master and 2.2.2 branch.

Author: Paul Mackles 

Closes #20335 from pmackles/SPARK-23088.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/31bd1dab
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/31bd1dab
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/31bd1dab

Branch: refs/heads/master
Commit: 31bd1dab1301d27a16c9d5d1b0b3301d618b0516
Parents: f235df6
Author: Paul Mackles 
Authored: Tue Jan 30 11:15:27 2018 +0800
Committer: jerryshao 
Committed: Tue Jan 30 11:15:27 2018 +0800

--
 .../main/resources/org/apache/spark/ui/static/historypage.js  | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/31bd1dab/core/src/main/resources/org/apache/spark/ui/static/historypage.js
--
diff --git a/core/src/main/resources/org/apache/spark/ui/static/historypage.js 
b/core/src/main/resources/org/apache/spark/ui/static/historypage.js
index 2cde66b..f0b2a5a 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/historypage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/historypage.js
@@ -108,7 +108,12 @@ $(document).ready(function() {
 requestedIncomplete = getParameterByName("showIncomplete", searchString);
 requestedIncomplete = (requestedIncomplete == "true" ? true : false);
 
-$.getJSON("api/v1/applications?limit=" + appLimit, 
function(response,status,jqXHR) {
+appParams = {
+  limit: appLimit,
+  status: (requestedIncomplete ? "running" : "completed")
+};
+
+$.getJSON("api/v1/applications", appParams, 
function(response,status,jqXHR) {
   var array = [];
   var hasMultipleAttempts = false;
   for (i in response) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23279][SS] Avoid triggering distributed job for Console sink

2018-01-30 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master ca04c3ff2 -> 8c6a9c90a


[SPARK-23279][SS] Avoid triggering distributed job for Console sink

## What changes were proposed in this pull request?

Console sink will redistribute collected local data and trigger a distributed 
job in each batch, this is not necessary, so here change to local job.

## How was this patch tested?

Existing UT and manual verification.

Author: jerryshao 

Closes #20447 from jerryshao/console-minor.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8c6a9c90
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8c6a9c90
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8c6a9c90

Branch: refs/heads/master
Commit: 8c6a9c90a36a938372f28ee8be72178192fbc313
Parents: ca04c3f
Author: jerryshao 
Authored: Wed Jan 31 13:59:21 2018 +0800
Committer: jerryshao 
Committed: Wed Jan 31 13:59:21 2018 +0800

--
 .../spark/sql/execution/streaming/sources/ConsoleWriter.scala| 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8c6a9c90/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriter.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriter.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriter.scala
index d46f4d7..c57bdc4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriter.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriter.scala
@@ -17,6 +17,8 @@
 
 package org.apache.spark.sql.execution.streaming.sources
 
+import scala.collection.JavaConverters._
+
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.{Row, SparkSession}
 import org.apache.spark.sql.sources.v2.DataSourceOptions
@@ -61,7 +63,7 @@ class ConsoleWriter(schema: StructType, options: 
DataSourceOptions)
 println("---")
 // scalastyle:off println
 spark
-  .createDataFrame(spark.sparkContext.parallelize(rows), schema)
+  .createDataFrame(rows.toList.asJava, schema)
   .show(numRowsToShow, isTruncated)
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23279][SS] Avoid triggering distributed job for Console sink

2018-01-30 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 b8778321b -> ab5a51055


[SPARK-23279][SS] Avoid triggering distributed job for Console sink

## What changes were proposed in this pull request?

Console sink will redistribute collected local data and trigger a distributed 
job in each batch, this is not necessary, so here change to local job.

## How was this patch tested?

Existing UT and manual verification.

Author: jerryshao 

Closes #20447 from jerryshao/console-minor.

(cherry picked from commit 8c6a9c90a36a938372f28ee8be72178192fbc313)
Signed-off-by: jerryshao 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ab5a5105
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ab5a5105
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ab5a5105

Branch: refs/heads/branch-2.3
Commit: ab5a5105502c545bed951538f0ce9409cfbde154
Parents: b877832
Author: jerryshao 
Authored: Wed Jan 31 13:59:21 2018 +0800
Committer: jerryshao 
Committed: Wed Jan 31 13:59:36 2018 +0800

--
 .../spark/sql/execution/streaming/sources/ConsoleWriter.scala| 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ab5a5105/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriter.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriter.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriter.scala
index d46f4d7..c57bdc4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriter.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriter.scala
@@ -17,6 +17,8 @@
 
 package org.apache.spark.sql.execution.streaming.sources
 
+import scala.collection.JavaConverters._
+
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.{Row, SparkSession}
 import org.apache.spark.sql.sources.v2.DataSourceOptions
@@ -61,7 +63,7 @@ class ConsoleWriter(schema: StructType, options: 
DataSourceOptions)
 println("---")
 // scalastyle:off println
 spark
-  .createDataFrame(spark.sparkContext.parallelize(rows), schema)
+  .createDataFrame(rows.toList.asJava, schema)
   .show(numRowsToShow, isTruncated)
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: Revert "[SPARK-23200] Reset Kubernetes-specific config on Checkpoint restore"

2018-01-31 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master b6b50efc8 -> 4b7cd479a


Revert "[SPARK-23200] Reset Kubernetes-specific config on Checkpoint restore"

This reverts commit d1721816d26bedee3c72eeb75db49da500568376.

The patch is not fully tested and out-of-date. So revert it.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4b7cd479
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4b7cd479
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4b7cd479

Branch: refs/heads/master
Commit: 4b7cd479a28b274f5a0802c9b017b3eb15002c21
Parents: b6b50ef
Author: jerryshao 
Authored: Thu Feb 1 13:58:13 2018 +0800
Committer: jerryshao 
Committed: Thu Feb 1 14:00:08 2018 +0800

--
 .../org/apache/spark/streaming/Checkpoint.scala | 16 
 1 file changed, 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4b7cd479/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
--
diff --git 
a/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala 
b/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
index ed2a896..aed67a5 100644
--- a/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
+++ b/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
@@ -53,21 +53,6 @@ class Checkpoint(ssc: StreamingContext, val checkpointTime: 
Time)
   "spark.driver.host",
   "spark.driver.bindAddress",
   "spark.driver.port",
-  "spark.kubernetes.driver.pod.name",
-  "spark.kubernetes.executor.podNamePrefix",
-  "spark.kubernetes.initcontainer.executor.configmapname",
-  "spark.kubernetes.initcontainer.executor.configmapkey",
-  "spark.kubernetes.initcontainer.downloadJarsResourceIdentifier",
-  "spark.kubernetes.initcontainer.downloadJarsSecretLocation",
-  "spark.kubernetes.initcontainer.downloadFilesResourceIdentifier",
-  "spark.kubernetes.initcontainer.downloadFilesSecretLocation",
-  "spark.kubernetes.initcontainer.remoteJars",
-  "spark.kubernetes.initcontainer.remoteFiles",
-  "spark.kubernetes.mountdependencies.jarsDownloadDir",
-  "spark.kubernetes.mountdependencies.filesDownloadDir",
-  "spark.kubernetes.initcontainer.executor.stagingServerSecret.name",
-  "spark.kubernetes.initcontainer.executor.stagingServerSecret.mountDir",
-  "spark.kubernetes.executor.limit.cores",
   "spark.master",
   "spark.yarn.jars",
   "spark.yarn.keytab",
@@ -81,7 +66,6 @@ class Checkpoint(ssc: StreamingContext, val checkpointTime: 
Time)
 val newSparkConf = new SparkConf(loadDefaults = 
false).setAll(sparkConfPairs)
   .remove("spark.driver.host")
   .remove("spark.driver.bindAddress")
-  .remove("spark.kubernetes.driver.pod.name")
   .remove("spark.driver.port")
 val newReloadConf = new SparkConf(loadDefaults = true)
 propertiesToReload.foreach { prop =>


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][YARN] Add disable yarn.nodemanager.vmem-check-enabled option to memLimitExceededLogMessage

2018-03-07 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master 4c587eb48 -> 04e71c316


[MINOR][YARN] Add disable yarn.nodemanager.vmem-check-enabled option to 
memLimitExceededLogMessage

My spark application sometimes will throw `Container killed by YARN for 
exceeding memory limits`.
Even I increased `spark.yarn.executor.memoryOverhead` to 10G, this error still 
happen.  The latest config:
https://user-images.githubusercontent.com/5399861/36975716-f5c548d2-20b5-11e8-95e5-b228d50917b9.png";>

And error message:
```
ExecutorLostFailure (executor 121 exited caused by one of the running tasks) 
Reason: Container killed by YARN for exceeding memory limits. 30.7 GB of 30 GB 
physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
```

This is because of [Linux glibc >= 2.10 (RHEL 6) malloc may show excessive 
virtual memory 
usage](https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en).
 So disable `yarn.nodemanager.vmem-check-enabled` looks like a good option as 
[MapR mentioned 
](https://mapr.com/blog/best-practices-yarn-resource-management).

This PR add disable `yarn.nodemanager.vmem-check-enabled` option to 
memLimitExceededLogMessage.

More details:
https://issues.apache.org/jira/browse/YARN-4714
https://stackoverflow.com/a/31450291
https://stackoverflow.com/a/42091255

After this PR:
https://user-images.githubusercontent.com/5399861/36975949-c8e7bbbe-20b6-11e8-9513-9f903b868d8d.png";>

N/A

Author: Yuming Wang 
Author: Yuming Wang 

Closes #20735 from wangyum/YARN-4714.

Change-Id: Ie10836e2c07b6384d228c3f9e89f802823bd9f16


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/04e71c31
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/04e71c31
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/04e71c31

Branch: refs/heads/master
Commit: 04e71c31603af3a13bc13300df799f003fe185f7
Parents: 4c587eb
Author: Yuming Wang 
Authored: Wed Mar 7 17:01:29 2018 +0800
Committer: jerryshao 
Committed: Wed Mar 7 17:01:29 2018 +0800

--
 .../main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/04e71c31/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
--
diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
index 506adb3..a537243 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
@@ -736,7 +736,8 @@ private object YarnAllocator {
   def memLimitExceededLogMessage(diagnostics: String, pattern: Pattern): 
String = {
 val matcher = pattern.matcher(diagnostics)
 val diag = if (matcher.find()) " " + matcher.group() + "." else ""
-("Container killed by YARN for exceeding memory limits." + diag
-  + " Consider boosting spark.yarn.executor.memoryOverhead.")
+s"Container killed by YARN for exceeding memory limits. $diag " +
+  "Consider boosting spark.yarn.executor.memoryOverhead or " +
+  "disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714."
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23644][CORE][UI] Use absolute path for REST call in SHS

2018-03-16 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master c2632edeb -> ca83526de


[SPARK-23644][CORE][UI] Use absolute path for REST call in SHS

## What changes were proposed in this pull request?

SHS is using a relative path for the REST API call to get the list of the 
application is a relative path call. In case of the SHS being consumed through 
a proxy, it can be an issue if the path doesn't end with a "/".

Therefore, we should use an absolute path for the REST call as it is done for 
all the other resources.

## How was this patch tested?

manual tests
Before the change:
![screen shot 2018-03-10 at 4 22 02 
pm](https://user-images.githubusercontent.com/8821783/37244190-8ccf9d40-2485-11e8-8fa9-345bc81472fc.png)

After the change:
![screen shot 2018-03-10 at 4 36 34 pm 
1](https://user-images.githubusercontent.com/8821783/37244201-a1922810-2485-11e8-8856-eeab2bf5e180.png)

Author: Marco Gaido 

Closes #20794 from mgaido91/SPARK-23644.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ca83526d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ca83526d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ca83526d

Branch: refs/heads/master
Commit: ca83526de55f0f8784df58cc8b7c0a7cb0c96e23
Parents: c2632ed
Author: Marco Gaido 
Authored: Fri Mar 16 15:12:26 2018 +0800
Committer: jerryshao 
Committed: Fri Mar 16 15:12:26 2018 +0800

--
 .../src/main/resources/org/apache/spark/ui/static/historypage.js | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ca83526d/core/src/main/resources/org/apache/spark/ui/static/historypage.js
--
diff --git a/core/src/main/resources/org/apache/spark/ui/static/historypage.js 
b/core/src/main/resources/org/apache/spark/ui/static/historypage.js
index f0b2a5a..abc2ec0 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/historypage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/historypage.js
@@ -113,7 +113,7 @@ $(document).ready(function() {
   status: (requestedIncomplete ? "running" : "completed")
 };
 
-$.getJSON("api/v1/applications", appParams, 
function(response,status,jqXHR) {
+$.getJSON(uiRoot + "/api/v1/applications", appParams, 
function(response,status,jqXHR) {
   var array = [];
   var hasMultipleAttempts = false;
   for (i in response) {
@@ -151,7 +151,7 @@ $(document).ready(function() {
 "showCompletedColumns": !requestedIncomplete,
   }
 
-  $.get("static/historypage-template.html", function(template) {
+  $.get(uiRoot + "/static/historypage-template.html", function(template) {
 var sibling = historySummary.prev();
 historySummary.detach();
 var apps = 
$(Mustache.render($(template).filter("#history-summary-template").html(),data));


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23635][YARN] AM env variable should not overwrite same name env variable set through spark.executorEnv.

2018-03-16 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master ca83526de -> c95200048


[SPARK-23635][YARN] AM env variable should not overwrite same name env variable 
set through spark.executorEnv.

## What changes were proposed in this pull request?

In the current Spark on YARN code, AM always will copy and overwrite its env 
variables to executors, so we cannot set different values for executors.

To reproduce issue, user could start spark-shell like:

```
./bin/spark-shell --master yarn-client --conf 
spark.executorEnv.SPARK_ABC=executor_val --conf  
spark.yarn.appMasterEnv.SPARK_ABC=am_val
```

Then check executor env variables by

```
sc.parallelize(1 to 1).flatMap \{ i => sys.env.toSeq }.collect.foreach(println)
```

We will always get `am_val` instead of `executor_val`. So we should not let AM 
to overwrite specifically set executor env variables.

## How was this patch tested?

Added UT and tested in local cluster.

Author: jerryshao 

Closes #20799 from jerryshao/SPARK-23635.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c9520004
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c9520004
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c9520004

Branch: refs/heads/master
Commit: c952000487ee003200221b3c4e25dcb06e359f0a
Parents: ca83526
Author: jerryshao 
Authored: Fri Mar 16 16:22:03 2018 +0800
Committer: jerryshao 
Committed: Fri Mar 16 16:22:03 2018 +0800

--
 .../spark/deploy/yarn/ExecutorRunnable.scala| 22 +++-
 .../spark/deploy/yarn/YarnClusterSuite.scala| 36 
 2 files changed, 50 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c9520004/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
--
diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
index 3f4d236..ab08698 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
@@ -220,12 +220,6 @@ private[yarn] class ExecutorRunnable(
 val env = new HashMap[String, String]()
 Client.populateClasspath(null, conf, sparkConf, env, 
sparkConf.get(EXECUTOR_CLASS_PATH))
 
-sparkConf.getExecutorEnv.foreach { case (key, value) =>
-  // This assumes each executor environment variable set here is a path
-  // This is kept for backward compatibility and consistency with hadoop
-  YarnSparkHadoopUtil.addPathToEnvironment(env, key, value)
-}
-
 // lookup appropriate http scheme for container log urls
 val yarnHttpPolicy = conf.get(
   YarnConfiguration.YARN_HTTP_POLICY_KEY,
@@ -233,6 +227,20 @@ private[yarn] class ExecutorRunnable(
 )
 val httpScheme = if (yarnHttpPolicy == "HTTPS_ONLY") "https://"; else 
"http://";
 
+System.getenv().asScala.filterKeys(_.startsWith("SPARK"))
+  .foreach { case (k, v) => env(k) = v }
+
+sparkConf.getExecutorEnv.foreach { case (key, value) =>
+  if (key == Environment.CLASSPATH.name()) {
+// If the key of env variable is CLASSPATH, we assume it is a path and 
append it.
+// This is kept for backward compatibility and consistency with hadoop
+YarnSparkHadoopUtil.addPathToEnvironment(env, key, value)
+  } else {
+// For other env variables, simply overwrite the value.
+env(key) = value
+  }
+}
+
 // Add log urls
 container.foreach { c =>
   sys.env.get("SPARK_USER").foreach { user =>
@@ -245,8 +253,6 @@ private[yarn] class ExecutorRunnable(
   }
 }
 
-System.getenv().asScala.filterKeys(_.startsWith("SPARK"))
-  .foreach { case (k, v) => env(k) = v }
 env
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/c9520004/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
--
diff --git 
a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 
b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
index 33d400a..a129be7 100644
--- 
a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
+++ 
b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
@@ -225,6 +225,14 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 finalState should be (SparkAppHandle.State.FAILED)
   }
 
+  test("executor env overwrite AM env in client mode") {
+testExecutorEnv(true)

spark git commit: [SPARK-23708][CORE] Correct comment for function addShutDownHook in ShutdownHookManager

2018-03-18 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master 61487b308 -> 745c8c090


[SPARK-23708][CORE] Correct comment for function addShutDownHook in 
ShutdownHookManager

## What changes were proposed in this pull request?
Minor modification.Comment below is not right.
```
/**
   * Adds a shutdown hook with the given priority. Hooks with lower priority 
values run
   * first.
   *
   * param hook The code to run during shutdown.
   * return A handle that can be used to unregister the shutdown hook.
   */
  def addShutdownHook(priority: Int)(hook: () => Unit): AnyRef = {
shutdownHooks.add(priority, hook)
  }
```

## How was this patch tested?

UT

Author: zhoukang 

Closes #20845 from caneGuy/zhoukang/fix-shutdowncomment.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/745c8c09
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/745c8c09
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/745c8c09

Branch: refs/heads/master
Commit: 745c8c0901ac522ba92c1356ca74bd0dd7701496
Parents: 61487b3
Author: zhoukang 
Authored: Mon Mar 19 13:31:21 2018 +0800
Committer: jerryshao 
Committed: Mon Mar 19 13:31:21 2018 +0800

--
 .../src/main/scala/org/apache/spark/util/ShutdownHookManager.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/745c8c09/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala 
b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
index 4001fac..b702838 100644
--- a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
+++ b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
@@ -143,7 +143,7 @@ private[spark] object ShutdownHookManager extends Logging {
   }
 
   /**
-   * Adds a shutdown hook with the given priority. Hooks with lower priority 
values run
+   * Adds a shutdown hook with the given priority. Hooks with higher priority 
values run
* first.
*
* @param hook The code to run during shutdown.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path for REST call in SHS

2018-03-19 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 5c1c03d08 -> 2f82c037d


[SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path for REST call in SHS

## What changes were proposed in this pull request?

SHS is using a relative path for the REST API call to get the list of the 
application is a relative path call. In case of the SHS being consumed through 
a proxy, it can be an issue if the path doesn't end with a "/".

Therefore, we should use an absolute path for the REST call as it is done for 
all the other resources.

## How was this patch tested?

manual tests
Before the change:
![screen shot 2018-03-10 at 4 22 02 
pm](https://user-images.githubusercontent.com/8821783/37244190-8ccf9d40-2485-11e8-8fa9-345bc81472fc.png)

After the change:
![screen shot 2018-03-10 at 4 36 34 pm 
1](https://user-images.githubusercontent.com/8821783/37244201-a1922810-2485-11e8-8856-eeab2bf5e180.png)

Author: Marco Gaido 

Closes #20847 from mgaido91/SPARK-23644_2.3.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2f82c037
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2f82c037
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2f82c037

Branch: refs/heads/branch-2.3
Commit: 2f82c037d90114705c0d0bd0bd7f82215aecfe3b
Parents: 5c1c03d
Author: Marco Gaido 
Authored: Tue Mar 20 10:07:27 2018 +0800
Committer: jerryshao 
Committed: Tue Mar 20 10:07:27 2018 +0800

--
 .../src/main/resources/org/apache/spark/ui/static/historypage.js | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2f82c037/core/src/main/resources/org/apache/spark/ui/static/historypage.js
--
diff --git a/core/src/main/resources/org/apache/spark/ui/static/historypage.js 
b/core/src/main/resources/org/apache/spark/ui/static/historypage.js
index 2cde66b..16d59be 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/historypage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/historypage.js
@@ -108,7 +108,7 @@ $(document).ready(function() {
 requestedIncomplete = getParameterByName("showIncomplete", searchString);
 requestedIncomplete = (requestedIncomplete == "true" ? true : false);
 
-$.getJSON("api/v1/applications?limit=" + appLimit, 
function(response,status,jqXHR) {
+$.getJSON(uiRoot + "/api/v1/applications?limit=" + appLimit, 
function(response,status,jqXHR) {
   var array = [];
   var hasMultipleAttempts = false;
   for (i in response) {
@@ -146,7 +146,7 @@ $(document).ready(function() {
 "showCompletedColumns": !requestedIncomplete,
   }
 
-  $.get("static/historypage-template.html", function(template) {
+  $.get(uiRoot + "/static/historypage-template.html", function(template) {
 var sibling = historySummary.prev();
 historySummary.detach();
 var apps = 
$(Mustache.render($(template).filter("#history-summary-template").html(),data));


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23361][YARN] Allow AM to restart after initial tokens expire.

2018-03-22 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master b2edc30db -> 5fa438471


[SPARK-23361][YARN] Allow AM to restart after initial tokens expire.

Currently, the Spark AM relies on the initial set of tokens created by
the submission client to be able to talk to HDFS and other services that
require delegation tokens. This means that after those tokens expire, a
new AM will fail to start (e.g. when there is an application failure and
re-attempts are enabled).

This PR makes it so that the first thing the AM does when the user provides
a principal and keytab is to create new delegation tokens for use. This
makes sure that the AM can be started irrespective of how old the original
token set is. It also allows all of the token management to be done by the
AM - there is no need for the submission client to set configuration values
to tell the AM when to renew tokens.

Note that even though in this case the AM will not be using the delegation
tokens created by the submission client, those tokens still need to be provided
to YARN, since they are used to do log aggregation.

To be able to re-use the code in the AMCredentialRenewal for the above
purposes, I refactored that class a bit so that it can fetch tokens into
a pre-defined UGI, insted of always logging in.

Another issue with re-attempts is that, after the fix that allows the AM
to restart correctly, new executors would get confused about when to
update credentials, because the credential updater used the update time
initially set up by the submission code. This could make the executor
fail to update credentials in time, since that value would be very out
of date in the situation described in the bug.

To fix that, I changed the YARN code to use the new RPC-based mechanism
for distributing tokens to executors. This allowed the old credential
updater code to be removed, and a lot of code in the renewer to be
simplified.

I also made two currently hardcoded values (the renewal time ratio, and
the retry wait) configurable; while this probably never needs to be set
by anyone in a production environment, it helps with testing; that's also
why they're not documented.

Tested on real cluster with a specially crafted application to test this
functionality: checked proper access to HDFS, Hive and HBase in cluster
mode with token renewal on and AM restarts. Tested things still work in
client mode too.

Author: Marcelo Vanzin 

Closes #20657 from vanzin/SPARK-23361.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5fa43847
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5fa43847
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5fa43847

Branch: refs/heads/master
Commit: 5fa438471110afbf4e2174df449ac79e292501f8
Parents: b2edc30
Author: Marcelo Vanzin 
Authored: Fri Mar 23 13:59:21 2018 +0800
Committer: jerryshao 
Committed: Fri Mar 23 13:59:21 2018 +0800

--
 .../main/scala/org/apache/spark/SparkConf.scala |  12 +-
 .../apache/spark/deploy/SparkHadoopUtil.scala   |  32 +-
 .../executor/CoarseGrainedExecutorBackend.scala |  12 -
 .../apache/spark/internal/config/package.scala  |  12 +
 .../MesosHadoopDelegationTokenManager.scala |  11 +-
 .../spark/deploy/yarn/ApplicationMaster.scala   | 117 +++-
 .../org/apache/spark/deploy/yarn/Client.scala   | 102 +++
 .../spark/deploy/yarn/YarnSparkHadoopUtil.scala |  20 --
 .../org/apache/spark/deploy/yarn/config.scala   |  25 --
 .../yarn/security/AMCredentialRenewer.scala | 291 ---
 .../yarn/security/CredentialUpdater.scala   | 131 -
 .../YARNHadoopDelegationTokenManager.scala  |   9 +-
 .../cluster/YarnClientSchedulerBackend.scala|   9 +-
 .../cluster/YarnSchedulerBackend.scala  |  10 +-
 .../YARNHadoopDelegationTokenManagerSuite.scala |   7 +-
 .../org/apache/spark/streaming/Checkpoint.scala |   3 -
 16 files changed, 238 insertions(+), 565 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5fa43847/core/src/main/scala/org/apache/spark/SparkConf.scala
--
diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala 
b/core/src/main/scala/org/apache/spark/SparkConf.scala
index f53b2be..129956e 100644
--- a/core/src/main/scala/org/apache/spark/SparkConf.scala
+++ b/core/src/main/scala/org/apache/spark/SparkConf.scala
@@ -603,13 +603,15 @@ private[spark] object SparkConf extends Logging {
 "Please use spark.kryoserializer.buffer instead. The default value for 
" +
   "spark.kryoserializer.buffer.mb was previously specified as '0.064'. 
Fractional values " +
   "are no longer accepted. To specify the equivalent now, one may use 
'64k'."),
-  DeprecatedConfig("spark.rpc", "2.0", "Not used any more."),
+  DeprecatedConfig("spark.rpc",

spark git commit: [SPARK-23787][TESTS] Fix file download test in SparkSubmitSuite for Hadoop 2.9.

2018-03-25 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master 087fb3142 -> eb48edf9c


[SPARK-23787][TESTS] Fix file download test in SparkSubmitSuite for Hadoop 2.9.

This particular test assumed that Hadoop libraries did not support
http as a file system. Hadoop 2.9 does, so the test failed. The test
now forces a non-existent implementation for the http fs, which
forces the expected error.

There were also a couple of other issues in the same test: SparkSubmit
arguments in the wrong order, and the wrong check later when asserting,
which was being masked by the previous issues.

Author: Marcelo Vanzin 

Closes #20895 from vanzin/SPARK-23787.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/eb48edf9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/eb48edf9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/eb48edf9

Branch: refs/heads/master
Commit: eb48edf9ca4f4b42c63f145718696472cb6a31ba
Parents: 087fb31
Author: Marcelo Vanzin 
Authored: Mon Mar 26 14:01:04 2018 +0800
Committer: jerryshao 
Committed: Mon Mar 26 14:01:04 2018 +0800

--
 .../apache/spark/deploy/SparkSubmitSuite.scala  | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/eb48edf9/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala 
b/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
index 2d0c192..d86ef90 100644
--- a/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
+++ b/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
@@ -959,25 +959,28 @@ class SparkSubmitSuite
   }
 
   test("download remote resource if it is not supported by yarn service") {
-testRemoteResources(isHttpSchemeBlacklisted = false, supportMockHttpFs = 
false)
+testRemoteResources(enableHttpFs = false, blacklistHttpFs = false)
   }
 
   test("avoid downloading remote resource if it is supported by yarn service") 
{
-testRemoteResources(isHttpSchemeBlacklisted = false, supportMockHttpFs = 
true)
+testRemoteResources(enableHttpFs = true, blacklistHttpFs = false)
   }
 
   test("force download from blacklisted schemes") {
-testRemoteResources(isHttpSchemeBlacklisted = true, supportMockHttpFs = 
true)
+testRemoteResources(enableHttpFs = true, blacklistHttpFs = true)
   }
 
-  private def testRemoteResources(isHttpSchemeBlacklisted: Boolean,
-  supportMockHttpFs: Boolean): Unit = {
+  private def testRemoteResources(
+  enableHttpFs: Boolean,
+  blacklistHttpFs: Boolean): Unit = {
 val hadoopConf = new Configuration()
 updateConfWithFakeS3Fs(hadoopConf)
-if (supportMockHttpFs) {
+if (enableHttpFs) {
   hadoopConf.set("fs.http.impl", classOf[TestFileSystem].getCanonicalName)
-  hadoopConf.set("fs.http.impl.disable.cache", "true")
+} else {
+  hadoopConf.set("fs.http.impl", getClass().getName() + ".DoesNotExist")
 }
+hadoopConf.set("fs.http.impl.disable.cache", "true")
 
 val tmpDir = Utils.createTempDir()
 val mainResource = File.createTempFile("tmpPy", ".py", tmpDir)
@@ -986,20 +989,19 @@ class SparkSubmitSuite
 val tmpHttpJar = TestUtils.createJarWithFiles(Map("test.resource" -> 
"USER"), tmpDir)
 val tmpHttpJarPath = s"http://${new 
File(tmpHttpJar.toURI).getAbsolutePath}"
 
+val forceDownloadArgs = if (blacklistHttpFs) {
+  Seq("--conf", "spark.yarn.dist.forceDownloadSchemes=http")
+} else {
+  Nil
+}
+
 val args = Seq(
   "--class", UserClasspathFirstTest.getClass.getName.stripPrefix("$"),
   "--name", "testApp",
   "--master", "yarn",
   "--deploy-mode", "client",
-  "--jars", s"$tmpS3JarPath,$tmpHttpJarPath",
-  s"s3a://$mainResource"
-) ++ (
-  if (isHttpSchemeBlacklisted) {
-Seq("--conf", "spark.yarn.dist.forceDownloadSchemes=http,https")
-  } else {
-Nil
-  }
-)
+  "--jars", s"$tmpS3JarPath,$tmpHttpJarPath"
+) ++ forceDownloadArgs ++ Seq(s"s3a://$mainResource")
 
 val appArgs = new SparkSubmitArguments(args)
 val (_, _, conf, _) = SparkSubmit.prepareSubmitEnvironment(appArgs, 
Some(hadoopConf))
@@ -1009,7 +1011,7 @@ class SparkSubmitSuite
 // The URI of remote S3 resource should still be remote.
 assert(jars.contains(tmpS3JarPath))
 
-if (supportMockHttpFs) {
+if (enableHttpFs && !blacklistHttpFs) {
   // If Http FS is supported by yarn service, the URI of remote http 
resource should
   // still be remote.
   assert(jars.contains(tmpHttpJarPath))


-
To unsubscribe, e-mail: commits-un

spark git commit: [SPARK-23743][SQL] Changed a comparison logic from containing 'slf4j' to starting with 'org.slf4j'

2018-03-29 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master b34890119 -> df05fb63a


[SPARK-23743][SQL] Changed a comparison logic from containing 'slf4j' to 
starting with 'org.slf4j'

## What changes were proposed in this pull request?
isSharedClass returns if some classes can/should be shared or not. It checks if 
the classes names have some keywords or start with some names. Following the 
logic, it can occur unintended behaviors when a custom package has `slf4j` 
inside the package or class name. As I guess, the first intention seems to 
figure out the class containing `org.slf4j`. It would be better to change the 
comparison logic to `name.startsWith("org.slf4j")`

## How was this patch tested?
This patch should pass all of the current tests and keep all of the current 
behaviors. In my case, I'm using ProtobufDeserializer to get a table schema 
from hive tables. Thus some Protobuf packages and names have `slf4j` inside. 
Without this patch, it cannot be resolved because of ClassCastException from 
different classloaders.

Author: Jongyoul Lee 

Closes #20860 from jongyoul/SPARK-23743.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/df05fb63
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/df05fb63
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/df05fb63

Branch: refs/heads/master
Commit: df05fb63abe6018ccbe572c34cf65fc3ecbf1166
Parents: b348901
Author: Jongyoul Lee 
Authored: Fri Mar 30 14:07:35 2018 +0800
Committer: jerryshao 
Committed: Fri Mar 30 14:07:35 2018 +0800

--
 .../org/apache/spark/sql/hive/client/IsolatedClientLoader.scala | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/df05fb63/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
index 12975bc..c2690ec 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
@@ -179,8 +179,9 @@ private[hive] class IsolatedClientLoader(
 val isHadoopClass =
   name.startsWith("org.apache.hadoop.") && 
!name.startsWith("org.apache.hadoop.hive.")
 
-name.contains("slf4j") ||
-name.contains("log4j") ||
+name.startsWith("org.slf4j") ||
+name.startsWith("org.apache.log4j") || // log4j1.x
+name.startsWith("org.apache.logging.log4j") || // log4j2
 name.startsWith("org.apache.spark.") ||
 (sharesHadoopClasses && isHadoopClass) ||
 name.startsWith("scala.") ||


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-22290][CORE] Avoid creating Hive delegation tokens when not necessary.

2017-10-18 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master 6f1d0dea1 -> dc2714da5


[SPARK-22290][CORE] Avoid creating Hive delegation tokens when not necessary.

Hive delegation tokens are only needed when the Spark driver has no access
to the kerberos TGT. That happens only in two situations:

- when using a proxy user
- when using cluster mode without a keytab

This change modifies the Hive provider so that it only generates delegation
tokens in those situations, and tweaks the YARN AM so that it makes the proper
user visible to the Hive code when running with keytabs, so that the TGT
can be used instead of a delegation token.

The effect of this change is that now it's possible to initialize multiple,
non-concurrent SparkContext instances in the same JVM. Before, the second
invocation would fail to fetch a new Hive delegation token, which then could
make the second (or third or...) application fail once the token expired.
With this change, the TGT will be used to authenticate to the HMS instead.

This change also avoids polluting the current logged in user's credentials
when launching applications. The credentials are copied only when running
applications as a proxy user. This makes it possible to implement SPARK-11035
later, where multiple threads might be launching applications, and each app
should have its own set of credentials.

Tested by verifying HDFS and Hive access in following scenarios:
- client and cluster mode
- client and cluster mode with proxy user
- client and cluster mode with principal / keytab
- long-running cluster app with principal / keytab
- pyspark app that creates (and stops) multiple SparkContext instances
  through its lifetime

Author: Marcelo Vanzin 

Closes #19509 from vanzin/SPARK-22290.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dc2714da
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dc2714da
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dc2714da

Branch: refs/heads/master
Commit: dc2714da50ecba1bf1fdf555a82a4314f763a76e
Parents: 6f1d0de
Author: Marcelo Vanzin 
Authored: Thu Oct 19 14:56:48 2017 +0800
Committer: jerryshao 
Committed: Thu Oct 19 14:56:48 2017 +0800

--
 .../apache/spark/deploy/SparkHadoopUtil.scala   | 17 +++--
 .../security/HBaseDelegationTokenProvider.scala |  4 +-
 .../security/HadoopDelegationTokenManager.scala |  2 +-
 .../HadoopDelegationTokenProvider.scala |  2 +-
 .../HadoopFSDelegationTokenProvider.scala   |  4 +-
 .../security/HiveDelegationTokenProvider.scala  | 20 +-
 docs/running-on-yarn.md |  9 +++
 .../spark/deploy/yarn/ApplicationMaster.scala   | 69 
 .../org/apache/spark/deploy/yarn/Client.scala   |  5 +-
 .../org/apache/spark/deploy/yarn/config.scala   |  4 ++
 .../spark/sql/hive/client/HiveClientImpl.scala  |  6 --
 11 files changed, 110 insertions(+), 32 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dc2714da/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
index 53775db..1fa10ab 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
@@ -61,13 +61,17 @@ class SparkHadoopUtil extends Logging {
* do a FileSystem.closeAllForUGI in order to avoid leaking Filesystems
*/
   def runAsSparkUser(func: () => Unit) {
+createSparkUser().doAs(new PrivilegedExceptionAction[Unit] {
+  def run: Unit = func()
+})
+  }
+
+  def createSparkUser(): UserGroupInformation = {
 val user = Utils.getCurrentUserName()
-logDebug("running as user: " + user)
+logDebug("creating UGI for user: " + user)
 val ugi = UserGroupInformation.createRemoteUser(user)
 transferCredentials(UserGroupInformation.getCurrentUser(), ugi)
-ugi.doAs(new PrivilegedExceptionAction[Unit] {
-  def run: Unit = func()
-})
+ugi
   }
 
   def transferCredentials(source: UserGroupInformation, dest: 
UserGroupInformation) {
@@ -417,6 +421,11 @@ class SparkHadoopUtil extends Logging {
 creds.readTokenStorageStream(new DataInputStream(tokensBuf))
 creds
   }
+
+  def isProxyUser(ugi: UserGroupInformation): Boolean = {
+ugi.getAuthenticationMethod() == 
UserGroupInformation.AuthenticationMethod.PROXY
+  }
+
 }
 
 object SparkHadoopUtil {

http://git-wip-us.apache.org/repos/asf/spark/blob/dc2714da/core/src/main/scala/org/apache/spark/deploy/security/HBaseDelegationTokenProvider.scala
--
diff --git 
a/core/src/main/scala/org/apache/spa

spark git commit: [SPARK-22319][CORE] call loginUserFromKeytab before accessing hdfs

2017-10-22 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master ca2a780e7 -> 57accf6e3


[SPARK-22319][CORE] call loginUserFromKeytab before accessing hdfs

In `SparkSubmit`, call `loginUserFromKeytab` before attempting to make RPC 
calls to the NameNode.

I manually tested this patch by:

1. Confirming that my Spark application failed to launch with the error 
reported in https://issues.apache.org/jira/browse/SPARK-22319.
2. Applying this patch and confirming that the app no longer fails to launch, 
even when I have not manually run `kinit` on the host.

Presumably we also want integration tests for secure clusters so that we catch 
this sort of thing. I'm happy to take a shot at this if it's feasible and 
someone can point me in the right direction.

Author: Steven Rand 

Closes #19540 from sjrand/SPARK-22319.

Change-Id: Ic306bfe7181107fbcf92f61d75856afcb5b6f761


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/57accf6e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/57accf6e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/57accf6e

Branch: refs/heads/master
Commit: 57accf6e3965ff69adc4408623916c5003918235
Parents: ca2a780
Author: Steven Rand 
Authored: Mon Oct 23 09:43:45 2017 +0800
Committer: jerryshao 
Committed: Mon Oct 23 09:43:45 2017 +0800

--
 .../org/apache/spark/deploy/SparkSubmit.scala   | 32 ++--
 1 file changed, 16 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/57accf6e/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
index 135bbe9..b7e6d0e 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -342,6 +342,22 @@ object SparkSubmit extends CommandLineUtils with Logging {
 val hadoopConf = 
conf.getOrElse(SparkHadoopUtil.newConfiguration(sparkConf))
 val targetDir = Utils.createTempDir()
 
+// assure a keytab is available from any place in a JVM
+if (clusterManager == YARN || clusterManager == LOCAL || clusterManager == 
MESOS) {
+  if (args.principal != null) {
+if (args.keytab != null) {
+  require(new File(args.keytab).exists(), s"Keytab file: 
${args.keytab} does not exist")
+  // Add keytab and principal configurations in sysProps to make them 
available
+  // for later use; e.g. in spark sql, the isolated class loader used 
to talk
+  // to HiveMetastore will use these settings. They will be set as 
Java system
+  // properties and then loaded by SparkConf
+  sysProps.put("spark.yarn.keytab", args.keytab)
+  sysProps.put("spark.yarn.principal", args.principal)
+  UserGroupInformation.loginUserFromKeytab(args.principal, args.keytab)
+}
+  }
+}
+
 // Resolve glob path for different resources.
 args.jars = Option(args.jars).map(resolveGlobPaths(_, hadoopConf)).orNull
 args.files = Option(args.files).map(resolveGlobPaths(_, hadoopConf)).orNull
@@ -641,22 +657,6 @@ object SparkSubmit extends CommandLineUtils with Logging {
   }
 }
 
-// assure a keytab is available from any place in a JVM
-if (clusterManager == YARN || clusterManager == LOCAL || clusterManager == 
MESOS) {
-  if (args.principal != null) {
-if (args.keytab != null) {
-  require(new File(args.keytab).exists(), s"Keytab file: 
${args.keytab} does not exist")
-  // Add keytab and principal configurations in sysProps to make them 
available
-  // for later use; e.g. in spark sql, the isolated class loader used 
to talk
-  // to HiveMetastore will use these settings. They will be set as 
Java system
-  // properties and then loaded by SparkConf
-  sysProps.put("spark.yarn.keytab", args.keytab)
-  sysProps.put("spark.yarn.principal", args.principal)
-  UserGroupInformation.loginUserFromKeytab(args.principal, args.keytab)
-}
-  }
-}
-
 if (clusterManager == MESOS && UserGroupInformation.isSecurityEnabled) {
   setRMPrincipal(sysProps)
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-22319][CORE][BACKPORT-2.2] call loginUserFromKeytab before accessing hdfs

2017-10-22 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 f8c83fdc5 -> bf8163f5b


[SPARK-22319][CORE][BACKPORT-2.2] call loginUserFromKeytab before accessing hdfs

In SparkSubmit, call loginUserFromKeytab before attempting to make RPC calls to 
the NameNode.

Same as #https://github.com/apache/spark/pull/19540, but for branch-2.2.

Manually tested for master as described in 
https://github.com/apache/spark/pull/19540.

Author: Steven Rand 

Closes #19554 from sjrand/SPARK-22319-branch-2.2.

Change-Id: Ic550a818fd6a3f38b356ac48029942d463738458


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bf8163f5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bf8163f5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bf8163f5

Branch: refs/heads/branch-2.2
Commit: bf8163f5be55a94e02849ccbaf755702a2c6c68f
Parents: f8c83fd
Author: Steven Rand 
Authored: Mon Oct 23 14:26:03 2017 +0800
Committer: jerryshao 
Committed: Mon Oct 23 14:26:03 2017 +0800

--
 .../org/apache/spark/deploy/SparkSubmit.scala   | 38 ++--
 1 file changed, 19 insertions(+), 19 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bf8163f5/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
index 86d578e..4f2f2c1 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -316,6 +316,25 @@ object SparkSubmit extends CommandLineUtils {
   RPackageUtils.checkAndBuildRPackage(args.jars, printStream, args.verbose)
 }
 
+// assure a keytab is available from any place in a JVM
+if (clusterManager == YARN || clusterManager == LOCAL) {
+  if (args.principal != null) {
+require(args.keytab != null, "Keytab must be specified when principal 
is specified")
+if (!new File(args.keytab).exists()) {
+  throw new SparkException(s"Keytab file: ${args.keytab} does not 
exist")
+} else {
+  // Add keytab and principal configurations in sysProps to make them 
available
+  // for later use; e.g. in spark sql, the isolated class loader used 
to talk
+  // to HiveMetastore will use these settings. They will be set as 
Java system
+  // properties and then loaded by SparkConf
+  sysProps.put("spark.yarn.keytab", args.keytab)
+  sysProps.put("spark.yarn.principal", args.principal)
+
+  UserGroupInformation.loginUserFromKeytab(args.principal, args.keytab)
+}
+  }
+}
+
 // In client mode, download remote files.
 var localPrimaryResource: String = null
 var localJars: String = null
@@ -582,25 +601,6 @@ object SparkSubmit extends CommandLineUtils {
   }
 }
 
-// assure a keytab is available from any place in a JVM
-if (clusterManager == YARN || clusterManager == LOCAL) {
-  if (args.principal != null) {
-require(args.keytab != null, "Keytab must be specified when principal 
is specified")
-if (!new File(args.keytab).exists()) {
-  throw new SparkException(s"Keytab file: ${args.keytab} does not 
exist")
-} else {
-  // Add keytab and principal configurations in sysProps to make them 
available
-  // for later use; e.g. in spark sql, the isolated class loader used 
to talk
-  // to HiveMetastore will use these settings. They will be set as 
Java system
-  // properties and then loaded by SparkConf
-  sysProps.put("spark.yarn.keytab", args.keytab)
-  sysProps.put("spark.yarn.principal", args.principal)
-
-  UserGroupInformation.loginUserFromKeytab(args.principal, args.keytab)
-}
-  }
-}
-
 // In yarn-cluster mode, use yarn.Client as a wrapper around the user class
 if (isYarnCluster) {
   childMainClass = "org.apache.spark.deploy.yarn.Client"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-21840][CORE] Add trait that allows conf to be directly set in application.

2017-10-26 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master 592cfeab9 -> 3073344a2


[SPARK-21840][CORE] Add trait that allows conf to be directly set in 
application.

Currently SparkSubmit uses system properties to propagate configuration to
applications. This makes it hard to implement features such as SPARK-11035,
which would allow multiple applications to be started in the same JVM. The
current code would cause the config data from multiple apps to get mixed
up.

This change introduces a new trait, currently internal to Spark, that allows
the app configuration to be passed directly to the application, without
having to use system properties. The current "call main() method" behavior
is maintained as an implementation of this new trait. This will be useful
to allow multiple cluster mode apps to be submitted from the same JVM.

As part of this, SparkSubmit was modified to collect all configuration
directly into a SparkConf instance. Most of the changes are to tests so
they use SparkConf instead of an opaque map.

Tested with existing and added unit tests.

Author: Marcelo Vanzin 

Closes #19519 from vanzin/SPARK-21840.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3073344a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3073344a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3073344a

Branch: refs/heads/master
Commit: 3073344a2551fb198d63f2114a519ab97904cb55
Parents: 592cfea
Author: Marcelo Vanzin 
Authored: Thu Oct 26 15:50:27 2017 +0800
Committer: jerryshao 
Committed: Thu Oct 26 15:50:27 2017 +0800

--
 .../apache/spark/deploy/SparkApplication.scala  |  55 +
 .../org/apache/spark/deploy/SparkSubmit.scala   | 160 +++---
 .../apache/spark/deploy/SparkSubmitSuite.scala  | 213 +++
 .../deploy/rest/StandaloneRestSubmitSuite.scala |   4 +-
 4 files changed, 257 insertions(+), 175 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3073344a/core/src/main/scala/org/apache/spark/deploy/SparkApplication.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkApplication.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkApplication.scala
new file mode 100644
index 000..118b460
--- /dev/null
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkApplication.scala
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.lang.reflect.Modifier
+
+import org.apache.spark.SparkConf
+
+/**
+ * Entry point for a Spark application. Implementations must provide a 
no-argument constructor.
+ */
+private[spark] trait SparkApplication {
+
+  def start(args: Array[String], conf: SparkConf): Unit
+
+}
+
+/**
+ * Implementation of SparkApplication that wraps a standard Java class with a 
"main" method.
+ *
+ * Configuration is propagated to the application via system properties, so 
running multiple
+ * of these in the same JVM may lead to undefined behavior due to 
configuration leaks.
+ */
+private[deploy] class JavaMainApplication(klass: Class[_]) extends 
SparkApplication {
+
+  override def start(args: Array[String], conf: SparkConf): Unit = {
+val mainMethod = klass.getMethod("main", new Array[String](0).getClass)
+if (!Modifier.isStatic(mainMethod.getModifiers)) {
+  throw new IllegalStateException("The main method in the given main class 
must be static")
+}
+
+val sysProps = conf.getAll.toMap
+sysProps.foreach { case (k, v) =>
+  sys.props(k) = v
+}
+
+mainMethod.invoke(null, args)
+  }
+
+}

http://git-wip-us.apache.org/repos/asf/spark/blob/3073344a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
index b7e6d0e..73b956e 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+

spark git commit: [SPARK-22172][CORE] Worker hangs when the external shuffle service port is already in use

2017-11-01 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master 556b5d215 -> 96798d14f


[SPARK-22172][CORE] Worker hangs when the external shuffle service port is 
already in use

## What changes were proposed in this pull request?

Handling the NonFatal exceptions while starting the external shuffle service, 
if there are any NonFatal exceptions it logs and continues without the external 
shuffle service.

## How was this patch tested?

I verified it manually, it logs the exception and continues to serve without 
external shuffle service when BindException occurs.

Author: Devaraj K 

Closes #19396 from devaraj-kavali/SPARK-22172.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/96798d14
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/96798d14
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/96798d14

Branch: refs/heads/master
Commit: 96798d14f07208796fa0a90af0ab369879bacd6c
Parents: 556b5d2
Author: Devaraj K 
Authored: Wed Nov 1 18:07:39 2017 +0800
Committer: jerryshao 
Committed: Wed Nov 1 18:07:39 2017 +0800

--
 .../scala/org/apache/spark/deploy/worker/Worker.scala   | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/96798d14/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
b/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
index ed5fa4b..3962d42 100755
--- a/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
@@ -199,7 +199,7 @@ private[deploy] class Worker(
 logInfo(s"Running Spark version ${org.apache.spark.SPARK_VERSION}")
 logInfo("Spark home: " + sparkHome)
 createWorkDir()
-shuffleService.startIfEnabled()
+startExternalShuffleService()
 webUi = new WorkerWebUI(this, workDir, webUiPort)
 webUi.bind()
 
@@ -367,6 +367,16 @@ private[deploy] class Worker(
 }
   }
 
+  private def startExternalShuffleService() {
+try {
+  shuffleService.startIfEnabled()
+} catch {
+  case e: Exception =>
+logError("Failed to start external shuffle service", e)
+System.exit(1)
+}
+  }
+
   private def sendRegisterMessageToMaster(masterEndpoint: RpcEndpointRef): 
Unit = {
 masterEndpoint.send(RegisterWorker(
   workerId,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] Git Push Summary

2018-09-15 Thread jshao

Repository: spark
Updated Tags:  refs/tags/v2.3.2-rc6 [created] 02b510728

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[2/2] spark git commit: Preparing development version 2.3.3-SNAPSHOT

2018-09-15 Thread jshao

Preparing development version 2.3.3-SNAPSHOT


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7b5da37c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7b5da37c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7b5da37c

Branch: refs/heads/branch-2.3
Commit: 7b5da37c0ad08e7b2f3d536de13be63758a2ed99
Parents: 02b5107
Author: Saisai Shao 
Authored: Sun Sep 16 11:31:22 2018 +0800
Committer: Saisai Shao 
Committed: Sun Sep 16 11:31:22 2018 +0800

--
 R/pkg/DESCRIPTION | 2 +-
 assembly/pom.xml  | 2 +-
 common/kvstore/pom.xml| 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 hadoop-cloud/pom.xml  | 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/kubernetes/core/pom.xml | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 41 files changed, 42 insertions(+), 42 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7b5da37c/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 8df2635..6ec4966 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.3.2
+Version: 2.3.3
 Title: R Frontend for Apache Spark
 Description: Provides an R Frontend for Apache Spark.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),

http://git-wip-us.apache.org/repos/asf/spark/blob/7b5da37c/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 57485fc..f8b15cc 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.2
+2.3.3-SNAPSHOT
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/7b5da37c/common/kvstore/pom.xml
--
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 53e58c2..e412a47 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.2
+2.3.3-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/7b5da37c/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index d05647c..d8f9a3d 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.2
+2.3.3-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/7b5da37c/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 8d46761..a1a4f87 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3

[1/2] spark git commit: Preparing Spark release v2.3.2-rc6

2018-09-15 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 0c1e3d109 -> 7b5da37c0


Preparing Spark release v2.3.2-rc6


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/02b51072
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/02b51072
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/02b51072

Branch: refs/heads/branch-2.3
Commit: 02b510728c31b70e6035ad541bfcdc2b59dcd79a
Parents: 0c1e3d1
Author: Saisai Shao 
Authored: Sun Sep 16 11:31:17 2018 +0800
Committer: Saisai Shao 
Committed: Sun Sep 16 11:31:17 2018 +0800

--
 R/pkg/DESCRIPTION | 2 +-
 assembly/pom.xml  | 2 +-
 common/kvstore/pom.xml| 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 hadoop-cloud/pom.xml  | 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/kubernetes/core/pom.xml | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 41 files changed, 42 insertions(+), 42 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/02b51072/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 6ec4966..8df2635 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.3.3
+Version: 2.3.2
 Title: R Frontend for Apache Spark
 Description: Provides an R Frontend for Apache Spark.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),

http://git-wip-us.apache.org/repos/asf/spark/blob/02b51072/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index f8b15cc..57485fc 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.3-SNAPSHOT
+2.3.2
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/02b51072/common/kvstore/pom.xml
--
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index e412a47..53e58c2 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.3-SNAPSHOT
+2.3.2
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/02b51072/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index d8f9a3d..d05647c 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.3-SNAPSHOT
+2.3.2
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/02b51072/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index a1a4f87..8d46761 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml

svn commit: r29421 - /dev/spark/v2.3.2-rc6-bin/

2018-09-16 Thread jshao

Author: jshao
Date: Sun Sep 16 13:30:43 2018
New Revision: 29421

Log:
Apache Spark v2.3.2-rc6

Added:
dev/spark/v2.3.2-rc6-bin/
dev/spark/v2.3.2-rc6-bin/SparkR_2.3.2.tar.gz   (with props)
dev/spark/v2.3.2-rc6-bin/SparkR_2.3.2.tar.gz.asc
dev/spark/v2.3.2-rc6-bin/SparkR_2.3.2.tar.gz.sha512
dev/spark/v2.3.2-rc6-bin/pyspark-2.3.2.tar.gz   (with props)
dev/spark/v2.3.2-rc6-bin/pyspark-2.3.2.tar.gz.asc
dev/spark/v2.3.2-rc6-bin/pyspark-2.3.2.tar.gz.sha512
dev/spark/v2.3.2-rc6-bin/spark-2.3.2-bin-hadoop2.6.tgz   (with props)
dev/spark/v2.3.2-rc6-bin/spark-2.3.2-bin-hadoop2.6.tgz.asc
dev/spark/v2.3.2-rc6-bin/spark-2.3.2-bin-hadoop2.6.tgz.sha512
dev/spark/v2.3.2-rc6-bin/spark-2.3.2-bin-hadoop2.7.tgz   (with props)
dev/spark/v2.3.2-rc6-bin/spark-2.3.2-bin-hadoop2.7.tgz.asc
dev/spark/v2.3.2-rc6-bin/spark-2.3.2-bin-hadoop2.7.tgz.sha512
dev/spark/v2.3.2-rc6-bin/spark-2.3.2-bin-without-hadoop.tgz   (with props)
dev/spark/v2.3.2-rc6-bin/spark-2.3.2-bin-without-hadoop.tgz.asc
dev/spark/v2.3.2-rc6-bin/spark-2.3.2-bin-without-hadoop.tgz.sha512
dev/spark/v2.3.2-rc6-bin/spark-2.3.2.tgz   (with props)
dev/spark/v2.3.2-rc6-bin/spark-2.3.2.tgz.asc
dev/spark/v2.3.2-rc6-bin/spark-2.3.2.tgz.sha512

Added: dev/spark/v2.3.2-rc6-bin/SparkR_2.3.2.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.3.2-rc6-bin/SparkR_2.3.2.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.3.2-rc6-bin/SparkR_2.3.2.tar.gz.asc
==
--- dev/spark/v2.3.2-rc6-bin/SparkR_2.3.2.tar.gz.asc (added)
+++ dev/spark/v2.3.2-rc6-bin/SparkR_2.3.2.tar.gz.asc Sun Sep 16 13:30:43 2018
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIcBAABCgAGBQJbnkkHAAoJENsLIaASlz/QGnMP/jKJ2zrZbpjd/ladRk5c6i3h
+DalLIk6+ZnSimBUvH+aZFqM2Xam41KlKJkgrXUS4wVOoHfcu0HxkwkpqhC0E/cTY
+KUTZ2Y2rFm7IVFUtwfwlqdR77v/4MEE0tMkOAxy8ZAumyKV5AAG+1OQ0k+X4q+E9
+Q6E8WicEhzr6Pi+9bOSJuHZE0LP1Vpou7Q9JhRQQC/cT1VbZu7+AeJ3RoiQLV6gp
+uigSK73pMDIPlaHpqyTJAvy9VVyF7DseACTDOGon/FOXMNXg2UZcQ00cViJ5Ykxd
+i/jFrFa3X79hedlLfC9RMI191G5DzePtnh+grqQxk80EK3xizx+Y1ptir7RRuO9V
+KWslgAI7cLxpJ6v8tvpWzqfheUD0HGoZ8JhSXsG02X0/v4ZNIIrzGF8eEKZvc5AW
+NTAHD7ws9myeghp4pcOiZuw64obBG7QIkMHe9a62ZdyfqZjkdpA2BiEhqFi0dI89
+lLf2bjmoz97Y5YuFrjix6XP4057xGUSFGnZuOWsfvjtg6dbTEYaIZxLqcplu6esD
+gBLk4Ct0pXH7wcv4aWEtby20Wq6YGR7GKCIpEnOtXIPkKdPi4iuCIyWZy9WXjwZY
+wJ4z2locysS5bgDahsdNSLQEN9UbxkPqi7GIpGPVvNrR97HXcumOOsmQeaWS2Xx4
+YsZoVDmqlgBu/oyW5Bw1
+=OlcR
+-END PGP SIGNATURE-

Added: dev/spark/v2.3.2-rc6-bin/SparkR_2.3.2.tar.gz.sha512
==
--- dev/spark/v2.3.2-rc6-bin/SparkR_2.3.2.tar.gz.sha512 (added)
+++ dev/spark/v2.3.2-rc6-bin/SparkR_2.3.2.tar.gz.sha512 Sun Sep 16 13:30:43 2018
@@ -0,0 +1,3 @@
+SparkR_2.3.2.tar.gz: BE4B6B28 DC3CB5FC 947E7B21 79AED9DD 55573A05 D0DEBB53
+ 86864B05 C02F32B4 FB997E7A 9643BA61 6BC495E1 A2FE03D9
+ AE2D2DC2 4D43A48C 39498238 7259F58D

Added: dev/spark/v2.3.2-rc6-bin/pyspark-2.3.2.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.3.2-rc6-bin/pyspark-2.3.2.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.3.2-rc6-bin/pyspark-2.3.2.tar.gz.asc
==
--- dev/spark/v2.3.2-rc6-bin/pyspark-2.3.2.tar.gz.asc (added)
+++ dev/spark/v2.3.2-rc6-bin/pyspark-2.3.2.tar.gz.asc Sun Sep 16 13:30:43 2018
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIcBAABCgAGBQJbnkXVAAoJENsLIaASlz/QaAsP/3MdsTgoz9cfqQTleKT2Kw6M
+P6rzaiFoTq9tlWlBoeWSmqR42TilQzPGSLerzSGNMuEIpdpzENc/aopqd/1vU2qf
+ghmmfGtyCn1Mj2wLHRAIEseaXCViZPOmiH6YpmcUziY7aybNtB0g9aZt/9M9N2ts
+BnCU06zk0esBYkZmnw4f/WYG32v7WQN7Lb/IewgoguhpGKRa0ypad56r24y2Qf0N
+Us1GUfQzu5XXTr+CJI9zukJudLCNnOdIlnUoSv25pePxWodNRw+49ixG+qQvxkvt
+WGsb/lWJh3tTvPeZFJcB5Yg2lU5YWKck0a6WNhIRSlbJgzizhEyQs9YrF3HBtlgC
+bAT6GEjcnwCXxdgUZKUnd0P3POK85Dd1XFxVj+yWwIjKBvdFlqlE50eAgPuKZMZ+
+aptQ3+XPakoukKFA07moywE38yQZrYpULGLn5V4W04PS1g/3DOm0pAvshJuA58Sf
+z76gMJGthcYgL2RmXGJslMyZetUVVjZkvm5GVAIJtxJlGA1vtsEVYUJQyW1M8Vh3
+lCiUBSpyZL/6XHLSObPWLX4NuagjaC0vSUMbfZJYOYMh8SGltWCWJt2/2SdzueJY
+4RdOfmkYmXub9NVn/MgAYCGoq+kx0NGNoG8fF2+x6xnm81pYKJTecQjVrfZUgSkC
+/oriBynvPpnJ0lBRRyw8
+=F1pu
+-END PGP SIGNATURE-

Added: dev/spark/v2.3.2-rc6-bin/pyspark-2.3.2.tar.gz.sha512
==
--- dev/spark/v2.3.2-rc6-bin/pyspark-2.3.2.tar.gz.sha512 (added)
+++ dev/spark/v2.3.2-rc6-bin/pyspark-2.3.2.tar.gz.sha512 Sun Sep 16 13:30:43 
2018
@@ -0,0 +1,3

svn commit: r29438 - in /dev/spark/v2.3.2-rc6-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark

2018-09-17 Thread jshao

Author: jshao
Date: Mon Sep 17 12:13:30 2018
New Revision: 29438

Log:
Apache Spark v2.3.2-rc6 docs


[This commit notification would consist of 1447 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] Git Push Summary

2018-09-24 Thread jshao

Repository: spark
Updated Tags:  refs/tags/v2.3.2 [created] 02b510728

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[10/51] [partial] spark-website git commit: Add docs for Spark 2.3.2

2018-09-25 Thread jshao

http://git-wip-us.apache.org/repos/asf/spark-website/blob/04a27dbf/site/docs/2.3.2/api/java/org/apache/spark/api/java/JavaPairRDD.html
--
diff --git 
a/site/docs/2.3.2/api/java/org/apache/spark/api/java/JavaPairRDD.html 
b/site/docs/2.3.2/api/java/org/apache/spark/api/java/JavaPairRDD.html
new file mode 100644
index 000..726bcd5
--- /dev/null
+++ b/site/docs/2.3.2/api/java/org/apache/spark/api/java/JavaPairRDD.html
@@ -0,0 +1,4020 @@
+http://www.w3.org/TR/html4/loose.dtd";>
+
+
+
+
+JavaPairRDD (Spark 2.3.2 JavaDoc)
+
+
+
+
+
+
+var methods = 
{"i0":9,"i1":10,"i2":10,"i3":10,"i4":10,"i5":9,"i6":9,"i7":10,"i8":10,"i9":10,"i10":10,"i11":10,"i12":10,"i13":10,"i14":10,"i15":10,"i16":10,"i17":10,"i18":10,"i19":9,"i20":10,"i21":9,"i22":9,"i23":10,"i24":10,"i25":10,"i26":10,"i27":9,"i28":9,"i29":9,"i30":9,"i31":9,"i32":10,"i33":10,"i34":10,"i35":9,"i36":10,"i37":10,"i38":10,"i39":9,"i40":9,"i41":9,"i42":10,"i43":10,"i44":10,"i45":10,"i46":9,"i47":9,"i48":9,"i49":10,"i50":9,"i51":10,"i52":10,"i53":10,"i54":9,"i55":9,"i56":9,"i57":9,"i58":9,"i59":9,"i60":10,"i61":10,"i62":10,"i63":9,"i64":9,"i65":9,"i66":9,"i67":9,"i68":9,"i69":10,"i70":10,"i71":10,"i72":10,"i73":10,"i74":10,"i75":9,"i76":10,"i77":9,"i78":9,"i79":9,"i80":10,"i81":10,"i82":10,"i83":10,"i84":9,"i85":10,"i86":10,"i87":10,"i88":10,"i89":10,"i90":9,"i91":9,"i92":9,"i93":9,"i94":9,"i95":9,"i96":9,"i97":9,"i98":9,"i99":9,"i100":9,"i101":10,"i102":9,"i103":9,"i104":9,"i105":10,"i106":9,"i107":9,"i108":10,"i109":9,"i110":9,"i111":9,"i112":9,"i113":9,"i114":10
 
,"i115":9,"i116":10,"i117":10,"i118":10,"i119":10,"i120":10,"i121":10,"i122":10,"i123":10,"i124":10,"i125":10,"i126":10,"i127":10,"i128":10,"i129":10,"i130":10,"i131":10,"i132":10,"i133":10,"i134":10,"i135":10,"i136":10,"i137":10,"i138":10,"i139":9,"i140":9,"i141":9,"i142":10,"i143":10,"i144":10,"i145":10,"i146":10,"i147":10,"i148":10,"i149":10,"i150":10,"i151":10,"i152":10,"i153":10,"i154":10,"i155":9,"i156":9,"i157":9,"i158":9,"i159":9,"i160":9,"i161":9,"i162":9,"i163":9,"i164":9,"i165":9,"i166":9,"i167":9,"i168":9,"i169":9,"i170":10,"i171":10,"i172":10,"i173":10,"i174":10,"i175":10,"i176":9,"i177":9,"i178":9,"i179":9};
+var tabs = {65535:["t0","All Methods"],1:["t1","Static 
Methods"],2:["t2","Instance Methods"],8:["t4","Concrete Methods"]};
+var altColor = "altColor";
+var rowColor = "rowColor";
+var tableTab = "tableTab";
+var activeTableTab = "activeTableTab";
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+Skip navigation links
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary: 
+Nested | 
+Field | 
+Constr | 
+Method
+
+
+Detail: 
+Field | 
+Constr | 
+Method
+
+
+
+
+
+
+
+
+org.apache.spark.api.java
+Class JavaPairRDD
+
+
+
+Object
+
+
+org.apache.spark.api.java.JavaPairRDD
+
+
+
+
+
+
+
+All Implemented Interfaces:
+java.io.Serializable, JavaRDDLike,JavaPairRDD>
+
+
+Direct Known Subclasses:
+JavaHadoopRDD, JavaNewHadoopRDD
+
+
+
+public class JavaPairRDD
+extends Object
+
+See Also:
+Serialized
 Form
+
+
+
+
+
+
+
+
+
+
+
+
+Constructor Summary
+
+Constructors 
+
+Constructor and Description
+
+
+JavaPairRDD(RDD> rdd,
+   scala.reflect.ClassTag kClassTag,
+   scala.reflect.ClassTag vClassTag) 
+
+
+
+
+
+
+
+
+
+Method Summary
+
+All Methods Static Methods Instance Methods Concrete Methods 
+
+Modifier and Type
+Method and Description
+
+
+static  U
+aggregate(U zeroValue,
+ Function2 seqOp,
+ Function2 combOp) 
+
+
+ JavaPairRDD
+aggregateByKey(U zeroValue,
+  Function2 seqFunc,
+  Function2 combFunc)
+Aggregate the values of each key, using given combine 
functions and a neutral "zero value".
+
+
+
+ JavaPairRDD
+aggregateByKey(U zeroValue,
+  int numPartitions,
+  Function2 seqFunc,
+  Function2 combFunc)
+Aggregate the values of each key, using given combine 
functions and a neutral "zero value".
+
+
+
+ JavaPairRDD
+aggregateByKey(U zeroValue,
+  Partitioner partitioner,
+  Function2 seqFunc,
+  Function2 combFunc)
+Aggregate the values of each key, using given combine 
functions and a neutral "zero value".
+
+
+
+JavaPairRDD
+cache()
+Persist this RDD with the default storage level 
(MEMORY_ONLY).
+
+
+
+static  Ja

[44/51] [partial] spark-website git commit: Add docs for Spark 2.3.2

2018-09-25 Thread jshao

http://git-wip-us.apache.org/repos/asf/spark-website/blob/04a27dbf/site/docs/2.3.2/api/R/dropTempView.html
--
diff --git a/site/docs/2.3.2/api/R/dropTempView.html 
b/site/docs/2.3.2/api/R/dropTempView.html
new file mode 100644
index 000..5ee2883
--- /dev/null
+++ b/site/docs/2.3.2/api/R/dropTempView.html
@@ -0,0 +1,63 @@
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>http://www.w3.org/1999/xhtml";>R: Drops the temporary view 
with the given view name in the...
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+dropTempView {SparkR}R 
Documentation
+
+Drops the temporary view with the given view name in the catalog.
+
+Description
+
+Drops the temporary view with the given view name in the catalog.
+If the view has been cached before, then it will also be uncached.
+
+
+
+Usage
+
+
+dropTempView(viewName)
+
+
+
+Arguments
+
+
+viewName
+
+the name of the temporary view to be dropped.
+
+
+
+
+Value
+
+TRUE if the view is dropped successfully, FALSE otherwise.
+
+
+
+Note
+
+since 2.0.0
+
+
+
+Examples
+
+## Not run: 
+##D sparkR.session()
+##D df <- read.df(path, "parquet")
+##D createOrReplaceTempView(df, "table")
+##D dropTempView("table")
+## End(Not run)
+
+
+
+[Package SparkR version 2.3.2 
Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/04a27dbf/site/docs/2.3.2/api/R/dtypes.html
--
diff --git a/site/docs/2.3.2/api/R/dtypes.html 
b/site/docs/2.3.2/api/R/dtypes.html
new file mode 100644
index 000..b19e0fd
--- /dev/null
+++ b/site/docs/2.3.2/api/R/dtypes.html
@@ -0,0 +1,106 @@
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>http://www.w3.org/1999/xhtml";>R: DataTypes
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+dtypes 
{SparkR}R Documentation
+
+DataTypes
+
+Description
+
+Return all column names and their data types as a list
+
+
+
+Usage
+
+
+dtypes(x)
+
+## S4 method for signature 'SparkDataFrame'
+dtypes(x)
+
+
+
+Arguments
+
+
+x
+
+A SparkDataFrame
+
+
+
+
+Note
+
+dtypes since 1.4.0
+
+
+
+See Also
+
+Other SparkDataFrame functions: SparkDataFrame-class,
+agg, alias,
+arrange, as.data.frame,
+attach,SparkDataFrame-method,
+broadcast, cache,
+checkpoint, coalesce,
+collect, colnames,
+coltypes,
+createOrReplaceTempView,
+crossJoin, cube,
+dapplyCollect, dapply,
+describe, dim,
+distinct, dropDuplicates,
+dropna, drop,
+except, explain,
+filter, first,
+gapplyCollect, gapply,
+getNumPartitions, group_by,
+head, hint,
+histogram, insertInto,
+intersect, isLocal,
+isStreaming, join,
+limit, localCheckpoint,
+merge, mutate,
+ncol, nrow,
+persist, printSchema,
+randomSplit, rbind,
+registerTempTable, rename,
+repartition, rollup,
+sample, saveAsTable,
+schema, selectExpr,
+select, showDF,
+show, storageLevel,
+str, subset,
+summary, take,
+toJSON, unionByName,
+union, unpersist,
+withColumn, withWatermark,
+with, write.df,
+write.jdbc, write.json,
+write.orc, write.parquet,
+write.stream, write.text
+
+
+
+Examples
+
+## Not run: 
+##D sparkR.session()
+##D path <- "path/to/file.json"
+##D df <- read.json(path)
+##D dtypes(df)
+## End(Not run)
+
+
+
+[Package SparkR version 2.3.2 
Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/04a27dbf/site/docs/2.3.2/api/R/endsWith.html
--
diff --git a/site/docs/2.3.2/api/R/endsWith.html 
b/site/docs/2.3.2/api/R/endsWith.html
new file mode 100644
index 000..24bea1f
--- /dev/null
+++ b/site/docs/2.3.2/api/R/endsWith.html
@@ -0,0 +1,56 @@
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>http://www.w3.org/1999/xhtml";>R: endsWith
+
+
+
+
+endsWith 
{SparkR}R Documentation
+
+endsWith
+
+Description
+
+Determines if entries of x end with string (entries of) suffix respectively,
+where strings are recycled to common lengths.
+
+
+
+Usage
+
+
+endsWith(x, suffix)
+
+## S4 method for signature 'Column'
+endsWith(x, suffix)
+
+
+
+Arguments
+
+
+x
+
+vector of character string whose "ends" are considered
+
+suffix
+
+character vector (often of length one)
+
+
+
+
+Note
+
+endsWith since 1.4.0
+
+
+
+See Also
+
+Other colum_func: alias,
+between, cast,
+otherwise, over,
+startsWith, substr
+
+
+[Package SparkR version 2.3.2 
Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/04a27dbf/site/docs/2.3.2/api/R/eq_null_safe.html
--
diff --git a/site/docs/2.3.2/

[41/51] [partial] spark-website git commit: Add docs for Spark 2.3.2

2018-09-25 Thread jshao

http://git-wip-us.apache.org/repos/asf/spark-website/blob/04a27dbf/site/docs/2.3.2/api/R/rbind.html
--
diff --git a/site/docs/2.3.2/api/R/rbind.html b/site/docs/2.3.2/api/R/rbind.html
new file mode 100644
index 000..890ab98
--- /dev/null
+++ b/site/docs/2.3.2/api/R/rbind.html
@@ -0,0 +1,128 @@
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>http://www.w3.org/1999/xhtml";>R: Union two or more 
SparkDataFrames
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+rbind 
{SparkR}R Documentation
+
+Union two or more SparkDataFrames
+
+Description
+
+Union two or more SparkDataFrames by row. As in R's rbind, 
this method
+requires that the input SparkDataFrames have the same column names.
+
+
+
+Usage
+
+
+rbind(..., deparse.level = 1)
+
+## S4 method for signature 'SparkDataFrame'
+rbind(x, ..., deparse.level = 1)
+
+
+
+Arguments
+
+
+...
+
+additional SparkDataFrame(s).
+
+deparse.level
+
+currently not used (put here to match the signature of
+the base implementation).
+
+x
+
+a SparkDataFrame.
+
+
+
+
+Details
+
+Note: This does not remove duplicate rows across the two SparkDataFrames.
+
+
+
+Value
+
+A SparkDataFrame containing the result of the union.
+
+
+
+Note
+
+rbind since 1.5.0
+
+
+
+See Also
+
+union unionByName
+
+Other SparkDataFrame functions: SparkDataFrame-class,
+agg, alias,
+arrange, as.data.frame,
+attach,SparkDataFrame-method,
+broadcast, cache,
+checkpoint, coalesce,
+collect, colnames,
+coltypes,
+createOrReplaceTempView,
+crossJoin, cube,
+dapplyCollect, dapply,
+describe, dim,
+distinct, dropDuplicates,
+dropna, drop,
+dtypes, except,
+explain, filter,
+first, gapplyCollect,
+gapply, getNumPartitions,
+group_by, head,
+hint, histogram,
+insertInto, intersect,
+isLocal, isStreaming,
+join, limit,
+localCheckpoint, merge,
+mutate, ncol,
+nrow, persist,
+printSchema, randomSplit,
+registerTempTable, rename,
+repartition, rollup,
+sample, saveAsTable,
+schema, selectExpr,
+select, showDF,
+show, storageLevel,
+str, subset,
+summary, take,
+toJSON, unionByName,
+union, unpersist,
+withColumn, withWatermark,
+with, write.df,
+write.jdbc, write.json,
+write.orc, write.parquet,
+write.stream, write.text
+
+
+
+Examples
+
+## Not run: 
+##D sparkR.session()
+##D unions <- rbind(df, df2, df3, df4)
+## End(Not run)
+
+
+
+[Package SparkR version 2.3.2 
Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/04a27dbf/site/docs/2.3.2/api/R/read.df.html
--
diff --git a/site/docs/2.3.2/api/R/read.df.html 
b/site/docs/2.3.2/api/R/read.df.html
new file mode 100644
index 000..2bd9c43
--- /dev/null
+++ b/site/docs/2.3.2/api/R/read.df.html
@@ -0,0 +1,106 @@
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>http://www.w3.org/1999/xhtml";>R: Load a 
SparkDataFrame
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+read.df 
{SparkR}R Documentation
+
+Load a SparkDataFrame
+
+Description
+
+Returns the dataset in a data source as a SparkDataFrame
+
+
+
+Usage
+
+
+## Default S3 method:
+read.df(path = NULL, source = NULL, schema = NULL,
+  na.strings = "NA", ...)
+
+## Default S3 method:
+loadDF(path = NULL, source = NULL, schema = NULL,
+  ...)
+
+
+
+Arguments
+
+
+path
+
+The path of files to load
+
+source
+
+The name of external data source
+
+schema
+
+The data schema defined in structType or a DDL-formatted string.
+
+na.strings
+
+Default string value for NA when source is "csv"
+
+...
+
+additional external data source specific named properties.
+
+
+
+
+Details
+
+The data source is specified by the source and a set of 
options(...).
+If source is not specified, the default data source configured by
+"spark.sql.sources.default" will be used. 
+Similar to R read.csv, when source is "csv", by 
default, a value of "NA" will be
+interpreted as NA.
+
+
+
+Value
+
+SparkDataFrame
+
+
+
+Note
+
+read.df since 1.4.0
+
+loadDF since 1.6.0
+
+
+
+See Also
+
+read.json
+
+
+
+Examples
+
+## Not run: 
+##D sparkR.session()
+##D df1 <- read.df("path/to/file.json", source = "json")
+##D schema <- structType(structField("name", "string"),
+##D  structField("info", 
"map"))
+##D df2 <- read.df(mapTypeJsonPath, "json", schema, multiLine = 
TRUE)
+##D df3 <- loadDF("data/test_table", "parquet", 
mergeSchema = "true")
+##D stringSchema <- "name STRING, info MAP"
+##D df4 <- read.df(mapTypeJsonPath, "json", stringSchema, 
multiLine = TRUE)
+#

[09/51] [partial] spark-website git commit: Add docs for Spark 2.3.2

2018-09-25 Thread jshao

http://git-wip-us.apache.org/repos/asf/spark-website/blob/04a27dbf/site/docs/2.3.2/api/java/org/apache/spark/api/java/JavaRDD.html
--
diff --git a/site/docs/2.3.2/api/java/org/apache/spark/api/java/JavaRDD.html 
b/site/docs/2.3.2/api/java/org/apache/spark/api/java/JavaRDD.html
new file mode 100644
index 000..6901011
--- /dev/null
+++ b/site/docs/2.3.2/api/java/org/apache/spark/api/java/JavaRDD.html
@@ -0,0 +1,1957 @@
+http://www.w3.org/TR/html4/loose.dtd";>
+
+
+
+
+JavaRDD (Spark 2.3.2 JavaDoc)
+
+
+
+
+
+
+var methods = 
{"i0":9,"i1":10,"i2":9,"i3":9,"i4":10,"i5":10,"i6":10,"i7":9,"i8":9,"i9":9,"i10":9,"i11":9,"i12":9,"i13":9,"i14":9,"i15":9,"i16":9,"i17":9,"i18":9,"i19":10,"i20":10,"i21":10,"i22":9,"i23":9,"i24":9,"i25":9,"i26":9,"i27":9,"i28":9,"i29":9,"i30":9,"i31":9,"i32":9,"i33":9,"i34":9,"i35":9,"i36":9,"i37":9,"i38":9,"i39":10,"i40":9,"i41":9,"i42":9,"i43":9,"i44":9,"i45":9,"i46":9,"i47":9,"i48":9,"i49":9,"i50":9,"i51":9,"i52":9,"i53":9,"i54":9,"i55":9,"i56":9,"i57":9,"i58":9,"i59":9,"i60":10,"i61":9,"i62":9,"i63":9,"i64":9,"i65":9,"i66":10,"i67":10,"i68":10,"i69":9,"i70":10,"i71":10,"i72":10,"i73":9,"i74":9,"i75":9,"i76":10,"i77":10,"i78":10,"i79":10,"i80":10,"i81":9,"i82":9,"i83":9,"i84":9,"i85":9,"i86":9,"i87":9,"i88":9,"i89":9,"i90":9,"i91":9,"i92":10,"i93":9,"i94":9,"i95":9,"i96":9,"i97":10,"i98":10,"i99":10,"i100":10,"i101":9,"i102":9,"i103":9,"i104":9};
+var tabs = {65535:["t0","All Methods"],1:["t1","Static 
Methods"],2:["t2","Instance Methods"],8:["t4","Concrete Methods"]};
+var altColor = "altColor";
+var rowColor = "rowColor";
+var tableTab = "tableTab";
+var activeTableTab = "activeTableTab";
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+Skip navigation links
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary: 
+Nested | 
+Field | 
+Constr | 
+Method
+
+
+Detail: 
+Field | 
+Constr | 
+Method
+
+
+
+
+
+
+
+
+org.apache.spark.api.java
+Class JavaRDD
+
+
+
+Object
+
+
+org.apache.spark.api.java.JavaRDD
+
+
+
+
+
+
+
+All Implemented Interfaces:
+java.io.Serializable, JavaRDDLike>
+
+
+
+public class JavaRDD
+extends Object
+
+See Also:
+Serialized
 Form
+
+
+
+
+
+
+
+
+
+
+
+
+Constructor Summary
+
+Constructors 
+
+Constructor and Description
+
+
+JavaRDD(RDD rdd,
+   scala.reflect.ClassTag classTag) 
+
+
+
+
+
+
+
+
+
+Method Summary
+
+All Methods Static Methods Instance Methods Concrete Methods 
+
+Modifier and Type
+Method and Description
+
+
+static  U
+aggregate(U zeroValue,
+ Function2 seqOp,
+ Function2 combOp) 
+
+
+JavaRDD
+cache()
+Persist this RDD with the default storage level 
(MEMORY_ONLY).
+
+
+
+static  JavaPairRDD
+cartesian(JavaRDDLike other) 
+
+
+static void
+checkpoint() 
+
+
+scala.reflect.ClassTag
+classTag() 
+
+
+JavaRDD
+coalesce(int numPartitions)
+Return a new RDD that is reduced into 
numPartitions partitions.
+
+
+
+JavaRDD
+coalesce(int numPartitions,
+boolean shuffle)
+Return a new RDD that is reduced into 
numPartitions partitions.
+
+
+
+static java.util.List
+collect() 
+
+
+static JavaFutureAction>
+collectAsync() 
+
+
+static java.util.List[]
+collectPartitions(int[] partitionIds) 
+
+
+static SparkContext
+context() 
+
+
+static long
+count() 
+
+
+static PartialResult
+countApprox(long timeout) 
+
+
+static PartialResult
+countApprox(long timeout,
+   double confidence) 
+
+
+static long
+countApproxDistinct(double relativeSD) 
+
+
+static JavaFutureAction
+countAsync() 
+
+
+static java.util.Map
+countByValue() 
+
+
+static PartialResult>
+countByValueApprox(long timeout) 
+
+
+static PartialResult>
+countByValueApprox(long timeout,
+  double confidence) 
+
+
+JavaRDD
+distinct()
+Return a new RDD containing the distinct elements in this 
RDD.
+
+
+
+JavaRDD
+distinct(int numPartitions)
+Return a new RDD containing the distinct elements in this 
RDD.
+
+
+
+JavaRDD
+filter(Function f)
+Return a new RDD containing only the elements that satisfy 
a predicate.
+
+
+
+static T
+first() 
+
+
+static  JavaRDD
+flatMap(FlatMapFunction f) 
+
+
+static JavaDoubleRDD
+flatMapToDouble(DoubleFlatMapFunction f) 
+
+
+static  JavaPairRDD
+flatMapToPair(PairFlatMapFunction

[47/51] [partial] spark-website git commit: Add docs for Spark 2.3.2

2018-09-25 Thread jshao

http://git-wip-us.apache.org/repos/asf/spark-website/blob/04a27dbf/site/docs/2.3.2/api/R/attach.html
--
diff --git a/site/docs/2.3.2/api/R/attach.html 
b/site/docs/2.3.2/api/R/attach.html
new file mode 100644
index 000..3d0058b
--- /dev/null
+++ b/site/docs/2.3.2/api/R/attach.html
@@ -0,0 +1,122 @@
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>http://www.w3.org/1999/xhtml";>R: Attach SparkDataFrame to R 
search path
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+attach,SparkDataFrame-method {SparkR}R Documentation
+
+Attach SparkDataFrame to R search path
+
+Description
+
+The specified SparkDataFrame is attached to the R search path. This means 
that
+the SparkDataFrame is searched by R when evaluating a variable, so columns in
+the SparkDataFrame can be accessed by simply giving their names.
+
+
+
+Usage
+
+
+## S4 method for signature 'SparkDataFrame'
+attach(what, pos = 2L,
+  name = deparse(substitute(what), backtick = FALSE),
+  warn.conflicts = TRUE)
+
+
+
+Arguments
+
+
+what
+
+(SparkDataFrame) The SparkDataFrame to attach
+
+pos
+
+(integer) Specify position in search() where to attach.
+
+name
+
+(character) Name to use for the attached SparkDataFrame. Names
+starting with package: are reserved for library.
+
+warn.conflicts
+
+(logical) If TRUE, warnings are printed about conflicts
+from attaching the database, unless that SparkDataFrame contains an object
+
+
+
+
+Note
+
+attach since 1.6.0
+
+
+
+See Also
+
+detach
+
+Other SparkDataFrame functions: SparkDataFrame-class,
+agg, alias,
+arrange, as.data.frame,
+broadcast, cache,
+checkpoint, coalesce,
+collect, colnames,
+coltypes,
+createOrReplaceTempView,
+crossJoin, cube,
+dapplyCollect, dapply,
+describe, dim,
+distinct, dropDuplicates,
+dropna, drop,
+dtypes, except,
+explain, filter,
+first, gapplyCollect,
+gapply, getNumPartitions,
+group_by, head,
+hint, histogram,
+insertInto, intersect,
+isLocal, isStreaming,
+join, limit,
+localCheckpoint, merge,
+mutate, ncol,
+nrow, persist,
+printSchema, randomSplit,
+rbind, registerTempTable,
+rename, repartition,
+rollup, sample,
+saveAsTable, schema,
+selectExpr, select,
+showDF, show,
+storageLevel, str,
+subset, summary,
+take, toJSON,
+unionByName, union,
+unpersist, withColumn,
+withWatermark, with,
+write.df, write.jdbc,
+write.json, write.orc,
+write.parquet, write.stream,
+write.text
+
+
+
+Examples
+
+## Not run: 
+##D attach(irisDf)
+##D summary(Sepal_Width)
+## End(Not run)
+
+
+
+[Package SparkR version 2.3.2 
Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/04a27dbf/site/docs/2.3.2/api/R/avg.html
--
diff --git a/site/docs/2.3.2/api/R/avg.html b/site/docs/2.3.2/api/R/avg.html
new file mode 100644
index 000..1306740
--- /dev/null
+++ b/site/docs/2.3.2/api/R/avg.html
@@ -0,0 +1,67 @@
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>http://www.w3.org/1999/xhtml";>R: avg
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+avg 
{SparkR}R Documentation
+
+avg
+
+Description
+
+Aggregate function: returns the average of the values in a group.
+
+
+
+Usage
+
+
+avg(x, ...)
+
+## S4 method for signature 'Column'
+avg(x)
+
+
+
+Arguments
+
+
+x
+
+Column to compute on or a GroupedData object.
+
+...
+
+additional argument(s) when x is a GroupedData object.
+
+
+
+
+Note
+
+avg since 1.4.0
+
+
+
+See Also
+
+Other aggregate functions: column_aggregate_functions,
+corr, count,
+cov, first,
+last
+
+
+
+Examples
+
+## Not run: avg(df$c)
+
+
+
+[Package SparkR version 2.3.2 
Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/04a27dbf/site/docs/2.3.2/api/R/awaitTermination.html
--
diff --git a/site/docs/2.3.2/api/R/awaitTermination.html 
b/site/docs/2.3.2/api/R/awaitTermination.html
new file mode 100644
index 000..b8a65a2
--- /dev/null
+++ b/site/docs/2.3.2/api/R/awaitTermination.html
@@ -0,0 +1,84 @@
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>http://www.w3.org/1999/xhtml";>R: awaitTermination
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+awaitTermination {SparkR}R Documentation
+
+awaitTermination
+
+Description
+
+Waits

[42/51] [partial] spark-website git commit: Add docs for Spark 2.3.2

2018-09-25 Thread jshao

http://git-wip-us.apache.org/repos/asf/spark-website/blob/04a27dbf/site/docs/2.3.2/api/R/match.html
--
diff --git a/site/docs/2.3.2/api/R/match.html b/site/docs/2.3.2/api/R/match.html
new file mode 100644
index 000..d405b90
--- /dev/null
+++ b/site/docs/2.3.2/api/R/match.html
@@ -0,0 +1,65 @@
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>http://www.w3.org/1999/xhtml";>R: Match a column with given 
values.
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+%in% 
{SparkR}R Documentation
+
+Match a column with given values.
+
+Description
+
+Match a column with given values.
+
+
+
+Usage
+
+
+## S4 method for signature 'Column'
+x %in% table
+
+
+
+Arguments
+
+
+x
+
+a Column.
+
+table
+
+a collection of values (coercible to list) to compare with.
+
+
+
+
+Value
+
+A matched values as a result of comparing with given values.
+
+
+
+Note
+
+%in% since 1.5.0
+
+
+
+Examples
+
+## Not run: 
+##D filter(df, "age in (10, 30)")
+##D where(df, df$age %in% c(10, 30))
+## End(Not run)
+
+
+
+[Package SparkR version 2.3.2 
Index]
+

http://git-wip-us.apache.org/repos/asf/spark-website/blob/04a27dbf/site/docs/2.3.2/api/R/merge.html
--
diff --git a/site/docs/2.3.2/api/R/merge.html b/site/docs/2.3.2/api/R/merge.html
new file mode 100644
index 000..3eb2a86
--- /dev/null
+++ b/site/docs/2.3.2/api/R/merge.html
@@ -0,0 +1,177 @@
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>http://www.w3.org/1999/xhtml";>R: Merges two data 
frames
+
+
+
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js";>
+https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js";>
+hljs.initHighlightingOnLoad();
+
+
+merge 
{SparkR}R Documentation
+
+Merges two data frames
+
+Description
+
+Merges two data frames
+
+
+
+Usage
+
+
+merge(x, y, ...)
+
+## S4 method for signature 'SparkDataFrame,SparkDataFrame'
+merge(x, y,
+  by = intersect(names(x), names(y)), by.x = by, by.y = by,
+  all = FALSE, all.x = all, all.y = all, sort = TRUE,
+  suffixes = c("_x", "_y"), ...)
+
+
+
+Arguments
+
+
+x
+
+the first data frame to be joined.
+
+y
+
+the second data frame to be joined.
+
+...
+
+additional argument(s) passed to the method.
+
+by
+
+a character vector specifying the join columns. If by is not
+specified, the common column names in x and y will 
be used.
+If by or both by.x and by.y are explicitly set to NULL or of length 0, the 
Cartesian
+Product of x and y will be returned.
+
+by.x
+
+a character vector specifying the joining columns for x.
+
+by.y
+
+a character vector specifying the joining columns for y.
+
+all
+
+a boolean value setting all.x and all.y
+if any of them are unset.
+
+all.x
+
+a boolean value indicating whether all the rows in x should
+be including in the join.
+
+all.y
+
+a boolean value indicating whether all the rows in y should
+be including in the join.
+
+sort
+
+a logical argument indicating whether the resulting columns should be 
sorted.
+
+suffixes
+
+a string vector of length 2 used to make colnames of
+x and y unique.
+The first element is appended to each colname of x.
+The second element is appended to each colname of y.
+
+
+
+
+Details
+
+If all.x and all.y are set to FALSE, a natural join will be returned. If
+all.x is set to TRUE and all.y is set to FALSE, a left outer join will
+be returned. If all.x is set to FALSE and all.y is set to TRUE, a right
+outer join will be returned. If all.x and all.y are set to TRUE, a full
+outer join will be returned.
+
+
+
+Note
+
+merge since 1.5.0
+
+
+
+See Also
+
+join crossJoin
+
+Other SparkDataFrame functions: SparkDataFrame-class,
+agg, alias,
+arrange, as.data.frame,
+attach,SparkDataFrame-method,
+broadcast, cache,
+checkpoint, coalesce,
+collect, colnames,
+coltypes,
+createOrReplaceTempView,
+crossJoin, cube,
+dapplyCollect, dapply,
+describe, dim,
+distinct, dropDuplicates,
+dropna, drop,
+dtypes, except,
+explain, filter,
+first, gapplyCollect,
+gapply, getNumPartitions,
+group_by, head,
+hint, histogram,
+insertInto, intersect,
+isLocal, isStreaming,
+join, limit,
+localCheckpoint, mutate,
+ncol, nrow,
+persist, printSchema,
+randomSplit, rbind,
+registerTempTable, rename,
+repartition, rollup,
+sample, saveAsTable,
+schema, selectExpr,
+select, showDF,
+show, storageLevel,
+str, subset,
+summary, take,
+toJSON, unionByName,
+union, unpersist,
+withColumn, withWatermark,
+with, write.df,
+write.jdbc, write.json,
+write.orc, write.parquet,
+write.stream, write.text
+
+
+
+Examples
+
+## Not run: 
+##D sparkR.session()
+##D df1 <- read.json(path)
+##D

[20/51] [partial] spark-website git commit: Add docs for Spark 2.3.2

2018-09-25 Thread jshao

http://git-wip-us.apache.org/repos/asf/spark-website/blob/04a27dbf/site/docs/2.3.2/api/java/org/apache/spark/SimpleFutureAction.html
--
diff --git a/site/docs/2.3.2/api/java/org/apache/spark/SimpleFutureAction.html 
b/site/docs/2.3.2/api/java/org/apache/spark/SimpleFutureAction.html
new file mode 100644
index 000..075265c
--- /dev/null
+++ b/site/docs/2.3.2/api/java/org/apache/spark/SimpleFutureAction.html
@@ -0,0 +1,517 @@
+http://www.w3.org/TR/html4/loose.dtd";>
+
+
+
+
+SimpleFutureAction (Spark 2.3.2 JavaDoc)
+
+
+
+
+
+
+var methods = 
{"i0":10,"i1":10,"i2":10,"i3":10,"i4":10,"i5":10,"i6":10,"i7":10,"i8":10,"i9":10};
+var tabs = {65535:["t0","All Methods"],2:["t2","Instance 
Methods"],8:["t4","Concrete Methods"]};
+var altColor = "altColor";
+var rowColor = "rowColor";
+var tableTab = "tableTab";
+var activeTableTab = "activeTableTab";
+
+
+JavaScript is disabled on your browser.
+
+
+
+
+
+Skip navigation links
+
+
+
+
+Overview
+Package
+Class
+Tree
+Deprecated
+Index
+Help
+
+
+
+
+Prev Class
+Next Class
+
+
+Frames
+No Frames
+
+
+All Classes
+
+
+
+
+
+
+
+Summary: 
+Nested | 
+Field | 
+Constr | 
+Method
+
+
+Detail: 
+Field | 
+Constr | 
+Method
+
+
+
+
+
+
+
+
+org.apache.spark
+Class 
SimpleFutureAction
+
+
+
+Object
+
+
+org.apache.spark.SimpleFutureAction
+
+
+
+
+
+
+
+All Implemented Interfaces:
+FutureAction, 
scala.concurrent.Awaitable, scala.concurrent.Future
+
+
+
+public class SimpleFutureAction
+extends Object
+implements FutureAction
+A FutureAction holding the 
result of an action that triggers a single job. Examples include
+ count, collect, reduce.
+
+
+
+
+
+
+
+
+
+
+
+Nested Class Summary
+
+
+
+
+Nested classes/interfaces inherited from 
interface scala.concurrent.Future
+scala.concurrent.Future.InternalCallbackExecutor$
+
+
+
+
+
+
+
+
+Method Summary
+
+All Methods Instance Methods Concrete Methods 
+
+Modifier and Type
+Method and Description
+
+
+void
+cancel()
+Cancels the execution of this action.
+
+
+
+boolean
+isCancelled()
+Returns whether the action has been cancelled.
+
+
+
+boolean
+isCompleted()
+Returns whether the action has already been completed with 
a value or an exception.
+
+
+
+scala.collection.Seq

1 2 >

1 - 100 of 154 matches

Mail list logo