[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15131


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-21 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79843763
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1426,7 +1426,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new URI(path)
+val uri = new Path(path).toUri
--- End diff --

ok thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-20 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79755939
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1426,7 +1426,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new URI(path)
+val uri = new Path(path).toUri
--- End diff --

I agree with @HyukjinKwon . Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79743521
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1426,7 +1426,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new URI(path)
+val uri = new Path(path).toUri
--- End diff --

I just check the documentation and tests. It seems Windows paths are being 
tested in 
https://github.com/apache/hadoop/blob/branch-2.7.2/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestPath.java
 as `toString` it based on `URI` instance in `Path` which `toUri` will return 
directly, I guess it'd be safe.

I could not find the explicit mention about Windows path in the 
documenation https://hadoop.apache.org/docs/stable2/api/index.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79735513
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1426,7 +1426,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new URI(path)
+val uri = new Path(path).toUri
--- End diff --

Meanwhile, we could alternatively use `Utils.resolveURI` as well which we 
are already testing within  Spark. However, this util seems not hadling 
`C:/a/b/c` case (not `C:\a\b\c` which we should fix). So I suggested `Path 
(...).toUri` instead but if you feel strongly about this, we could use that. I 
will try to find and share doc and tests for `Path` as I got in my home though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79732092
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1457,8 +1457,8 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
   logInfo(s"Added file $path at $key with timestamp $timestamp")
   // Fetch the file locally so that closures which are run on the 
driver can still use the
   // SparkFiles API to access files.
-  Utils.fetchFile(path, new File(SparkFiles.getRootDirectory()), conf, 
env.securityManager,
-hadoopConfiguration, timestamp, useCache = false)
+  Utils.fetchFile(uri.toString, new 
File(SparkFiles.getRootDirectory()), conf,
--- End diff --

In my peraonal opinion, I thought it's okay (I thought about this for a 
while) because `Utils.fetchFile` wants the first argument as url not path. So, 
this might be a valid correction without tests because we fixed the argument to 
what the function initially wants. But no strong opinion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79731142
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1426,7 +1426,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new URI(path)
+val uri = new Path(path).toUri
--- End diff --

I do understand your concern @felixcheung. However,  IMHO, it'd be okay to 
not test Hadoop library within Spark. I will try to find some 
tests/documentation related with Windows path and then will share to make sure.

FWIW, this case was verified by one of comitters before for Windows path. 
So, it'd be okay.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79723200
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1426,7 +1426,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new URI(path)
+val uri = new Path(path).toUri
--- End diff --

should there be some tests we can add for this change? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79723221
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1457,8 +1457,8 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
   logInfo(s"Added file $path at $key with timestamp $timestamp")
   // Fetch the file locally so that closures which are run on the 
driver can still use the
   // SparkFiles API to access files.
-  Utils.fetchFile(path, new File(SparkFiles.getRootDirectory()), conf, 
env.securityManager,
-hadoopConfiguration, timestamp, useCache = false)
+  Utils.fetchFile(uri.toString, new 
File(SparkFiles.getRootDirectory()), conf,
--- End diff --

should there be some tests we can add for this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79723039
  
--- Diff: R/pkg/R/context.R ---
@@ -225,6 +225,51 @@ setCheckpointDir <- function(sc, dirName) {
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(dirName
 }
 
+#' Add a file or directory to be downloaded with this Spark job on every 
node.
+#'
+#' The path passed can be either a local file, a file in HDFS (or other 
Hadoop-supported
+#' filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark 
jobs,
+#' use spark.getSparkFiles(fileName) to find its download location.
+#'
+#' @rdname spark.addFile
+#' @param path The path of the file to be added
+#' @examples
+#'\dontrun{
--- End diff --

add @export


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79722919
  
--- Diff: R/pkg/R/context.R ---
@@ -225,6 +225,51 @@ setCheckpointDir <- function(sc, dirName) {
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(dirName
 }
 
+#' Add a file or directory to be downloaded with this Spark job on every 
node.
+#'
+#' The path passed can be either a local file, a file in HDFS (or other 
Hadoop-supported
+#' filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark 
jobs,
+#' use spark.getSparkFiles(fileName) to find its download location.
+#'
+#' @rdname spark.addFile
+#' @param path The path of the file to be added
+#' @examples
+#'\dontrun{
+#' spark.addFile("~/myfile")
+#'}
+#' @note spark.addFile since 2.1.0
+spark.addFile <- function(path) {
+  sc <- getSparkContext()
+  invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path
+}
+
+#' Get the root directory that contains files added through spark.addFile.
+#'
+#' @rdname spark.getSparkFilesRootDirectory
+#' @return the root directory that contains files added through 
spark.addFile
+#' @examples
+#'\dontrun{
+#' spark.getSparkFilesRootDirectory()
+#'}
+#' @note spark.getSparkFilesRootDirectory since 2.1.0
+spark.getSparkFilesRootDirectory <- function() {
+  callJStatic("org.apache.spark.SparkFiles", "getRootDirectory")
+}
+
+#' Get the absolute path of a file added through spark.addFile.
+#'
+#' @rdname spark.getSparkFiles
+#' @param fileName The name of the file added through spark.addFile
+#' @return the absolute path of a file added through spark.addFile.
+#' @examples
--- End diff --

add @export


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79722962
  
--- Diff: R/pkg/R/context.R ---
@@ -225,6 +225,51 @@ setCheckpointDir <- function(sc, dirName) {
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(dirName
 }
 
+#' Add a file or directory to be downloaded with this Spark job on every 
node.
+#'
+#' The path passed can be either a local file, a file in HDFS (or other 
Hadoop-supported
+#' filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark 
jobs,
+#' use spark.getSparkFiles(fileName) to find its download location.
+#'
+#' @rdname spark.addFile
+#' @param path The path of the file to be added
+#' @examples
+#'\dontrun{
+#' spark.addFile("~/myfile")
+#'}
+#' @note spark.addFile since 2.1.0
+spark.addFile <- function(path) {
+  sc <- getSparkContext()
+  invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path
+}
+
+#' Get the root directory that contains files added through spark.addFile.
+#'
+#' @rdname spark.getSparkFilesRootDirectory
+#' @return the root directory that contains files added through 
spark.addFile
+#' @examples
--- End diff --

add @export


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-18 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79307283
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1426,7 +1426,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new URI(path)
+val uri = new Path(path).toUri
 val schemeCorrectedPath = uri.getScheme match {
   case null | "local" => new File(path).getCanonicalFile.toURI.toString
   case _ => path
--- End diff --

Updated, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79306433
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1426,7 +1426,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new URI(path)
+val uri = new Path(path).toUri
 val schemeCorrectedPath = uri.getScheme match {
   case null | "local" => new File(path).getCanonicalFile.toURI.toString
   case _ => path
--- End diff --

@yanboliang Yeap, it passes the tests at least.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79305494
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1426,7 +1426,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new URI(path)
+val uri = new Path(path).toUri
 val schemeCorrectedPath = uri.getScheme match {
   case null | "local" => new File(path).getCanonicalFile.toURI.toString
   case _ => path
--- End diff --

Just in case, I ran the tests after manually fixing this. Maybe we can wait 
for the result - 
https://ci.appveyor.com/project/HyukjinKwon/spark/build/108-pr-15131-path


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79305450
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1426,7 +1426,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new URI(path)
+val uri = new Path(path).toUri
 val schemeCorrectedPath = uri.getScheme match {
   case null | "local" => new File(path).getCanonicalFile.toURI.toString
   case _ => path
--- End diff --

In `Utils.fetchFile(path, ...)` below, it seems we can't pass path as it is 
because `new URI(path)` is called internally which fails to be parsed in case 
of Windows path.

Could I please ask to change this to `uri.toString`? It'd work fine as far 
as I know.

```scala
import java.net.URI
import org.apache.hadoop.fs.Path

scala> val a = new Path("C:\\a\\b\\c").toUri
a: java.net.URI = /C:/a/b/c

scala> val b = new URI(a.toString)
b: java.net.URI = /C:/a/b/c

scala> val a = new Path("C:/a/b/c").toUri
a: java.net.URI = /C:/a/b/c

scala> val b = new URI(a.toString)
b: java.net.URI = /C:/a/b/c

scala> val a = new Path("/a/b/c").toUri
a: java.net.URI = /a/b/c

scala> val b = new URI(a.toString)
b: java.net.URI = /a/b/c

scala> val a = new Path("file:///a/b/c").toUri
a: java.net.URI = file:///a/b/c

scala> val b = new URI(a.toString)
b: java.net.URI = file:///a/b/c

scala> new Path("http://localhost/a/b/c;).toUri
a: java.net.URI = http://localhost/a/b/c

scala> val b = new URI(a.toString)
b: java.net.URI = http://localhost/a/b/c
```







---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-18 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79303180
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1426,41 +1426,53 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
* supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new URI(path)
-val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
-  case _ => path
-}
-
-val hadoopPath = new Path(schemeCorrectedPath)
-val scheme = new URI(schemeCorrectedPath).getScheme
-if (!Array("http", "https", "ftp").contains(scheme)) {
-  val fs = hadoopPath.getFileSystem(hadoopConfiguration)
-  val isDir = fs.getFileStatus(hadoopPath).isDirectory
-  if (!isLocal && scheme == "file" && isDir) {
-throw new SparkException(s"addFile does not support local 
directories when not running " +
-  "local mode.")
+if (path == null) {
+  logWarning("null specified as parameter to addFile")
+} else {
+  var key = ""
+  if (path.contains("\\")) {
+// For local paths with backslashes on Windows, URI throws an 
exception
+key = env.rpcEnv.fileServer.addFile(new File(path))
+  } else {
+val uri = new URI(path)
+val schemeCorrectedPath = uri.getScheme match {
+  case null | "local" => new 
File(path).getCanonicalFile.toURI.toString
+  case _ => path
+}
+
+val hadoopPath = new Path(schemeCorrectedPath)
+val scheme = new URI(schemeCorrectedPath).getScheme
+if (!Array("http", "https", "ftp").contains(scheme)) {
+  val fs = hadoopPath.getFileSystem(hadoopConfiguration)
+  val isDir = fs.getFileStatus(hadoopPath).isDirectory
+  if (!isLocal && scheme == "file" && isDir) {
+throw new SparkException(s"addFile does not support local 
directories when not " +
+  "running local mode.")
+  }
+  if (!recursive && isDir) {
+throw new SparkException(s"Added file $hadoopPath is a 
directory and recursive " +
+  "is not turned on.")
+  }
+}
+
+key = if (!isLocal && scheme == "file") {
+  env.rpcEnv.fileServer.addFile(new File(uri.getPath))
+} else {
+  schemeCorrectedPath
+}
   }
-  if (!recursive && isDir) {
-throw new SparkException(s"Added file $hadoopPath is a directory 
and recursive is not " +
-  "turned on.")
+  if (key != null) {
+val timestamp = System.currentTimeMillis
+if (addedFiles.putIfAbsent(key, timestamp).isEmpty) {
+  logInfo(s"Added file $path at $key with timestamp $timestamp")
+  // Fetch the file locally so that closures which are run on the 
driver can still use the
+  // SparkFiles API to access files.
+  Utils.fetchFile(path, new File(SparkFiles.getRootDirectory()), 
conf, env.securityManager,
--- End diff --

@HyukjinKwon Thanks for your suggestion, will update it soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79303139
  
--- Diff: R/pkg/R/context.R ---
@@ -225,6 +225,42 @@ setCheckpointDir <- function(sc, dirName) {
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(dirName
 }
 
+#' Add a file or directory to be downloaded with this Spark job on every 
node.
+#'
+#' The path passed can be either a local file, a file in HDFS (or other 
Hadoop-supported
+#' filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark 
jobs,
+#' use sparkFiles.get(fileName) to find its download location.
+#'
+#' A directory can be given if the recursive option is set to true.
+#' Currently directories are only supported for Hadoop-supported 
filesystems.
+#'
+#' @param path The path of the file to be added
+#' @param recursive Recursive or not if the path is directory. Default is 
FALSE.
--- End diff --

this param is not in below?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79302542
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1426,41 +1426,53 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
* supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new URI(path)
-val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
-  case _ => path
-}
-
-val hadoopPath = new Path(schemeCorrectedPath)
-val scheme = new URI(schemeCorrectedPath).getScheme
-if (!Array("http", "https", "ftp").contains(scheme)) {
-  val fs = hadoopPath.getFileSystem(hadoopConfiguration)
-  val isDir = fs.getFileStatus(hadoopPath).isDirectory
-  if (!isLocal && scheme == "file" && isDir) {
-throw new SparkException(s"addFile does not support local 
directories when not running " +
-  "local mode.")
+if (path == null) {
+  logWarning("null specified as parameter to addFile")
+} else {
+  var key = ""
+  if (path.contains("\\")) {
+// For local paths with backslashes on Windows, URI throws an 
exception
+key = env.rpcEnv.fileServer.addFile(new File(path))
+  } else {
+val uri = new URI(path)
+val schemeCorrectedPath = uri.getScheme match {
+  case null | "local" => new 
File(path).getCanonicalFile.toURI.toString
+  case _ => path
+}
+
+val hadoopPath = new Path(schemeCorrectedPath)
+val scheme = new URI(schemeCorrectedPath).getScheme
+if (!Array("http", "https", "ftp").contains(scheme)) {
+  val fs = hadoopPath.getFileSystem(hadoopConfiguration)
+  val isDir = fs.getFileStatus(hadoopPath).isDirectory
+  if (!isLocal && scheme == "file" && isDir) {
+throw new SparkException(s"addFile does not support local 
directories when not " +
+  "running local mode.")
+  }
+  if (!recursive && isDir) {
+throw new SparkException(s"Added file $hadoopPath is a 
directory and recursive " +
+  "is not turned on.")
+  }
+}
+
+key = if (!isLocal && scheme == "file") {
+  env.rpcEnv.fileServer.addFile(new File(uri.getPath))
+} else {
+  schemeCorrectedPath
+}
   }
-  if (!recursive && isDir) {
-throw new SparkException(s"Added file $hadoopPath is a directory 
and recursive is not " +
-  "turned on.")
+  if (key != null) {
+val timestamp = System.currentTimeMillis
+if (addedFiles.putIfAbsent(key, timestamp).isEmpty) {
+  logInfo(s"Added file $path at $key with timestamp $timestamp")
+  // Fetch the file locally so that closures which are run on the 
driver can still use the
+  // SparkFiles API to access files.
+  Utils.fetchFile(path, new File(SparkFiles.getRootDirectory()), 
conf, env.securityManager,
--- End diff --

(@yanboliang BTW - I don't mind triggering builds manually. Please feel 
free to submit more commits for test purposes if you will)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15131#discussion_r79302302
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1426,41 +1426,53 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
* supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new URI(path)
-val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
-  case _ => path
-}
-
-val hadoopPath = new Path(schemeCorrectedPath)
-val scheme = new URI(schemeCorrectedPath).getScheme
-if (!Array("http", "https", "ftp").contains(scheme)) {
-  val fs = hadoopPath.getFileSystem(hadoopConfiguration)
-  val isDir = fs.getFileStatus(hadoopPath).isDirectory
-  if (!isLocal && scheme == "file" && isDir) {
-throw new SparkException(s"addFile does not support local 
directories when not running " +
-  "local mode.")
+if (path == null) {
+  logWarning("null specified as parameter to addFile")
+} else {
+  var key = ""
+  if (path.contains("\\")) {
+// For local paths with backslashes on Windows, URI throws an 
exception
+key = env.rpcEnv.fileServer.addFile(new File(path))
+  } else {
+val uri = new URI(path)
+val schemeCorrectedPath = uri.getScheme match {
+  case null | "local" => new 
File(path).getCanonicalFile.toURI.toString
+  case _ => path
+}
+
+val hadoopPath = new Path(schemeCorrectedPath)
+val scheme = new URI(schemeCorrectedPath).getScheme
+if (!Array("http", "https", "ftp").contains(scheme)) {
+  val fs = hadoopPath.getFileSystem(hadoopConfiguration)
+  val isDir = fs.getFileStatus(hadoopPath).isDirectory
+  if (!isLocal && scheme == "file" && isDir) {
+throw new SparkException(s"addFile does not support local 
directories when not " +
+  "running local mode.")
+  }
+  if (!recursive && isDir) {
+throw new SparkException(s"Added file $hadoopPath is a 
directory and recursive " +
+  "is not turned on.")
+  }
+}
+
+key = if (!isLocal && scheme == "file") {
+  env.rpcEnv.fileServer.addFile(new File(uri.getPath))
+} else {
+  schemeCorrectedPath
+}
   }
-  if (!recursive && isDir) {
-throw new SparkException(s"Added file $hadoopPath is a directory 
and recursive is not " +
-  "turned on.")
+  if (key != null) {
+val timestamp = System.currentTimeMillis
+if (addedFiles.putIfAbsent(key, timestamp).isEmpty) {
+  logInfo(s"Added file $path at $key with timestamp $timestamp")
+  // Fetch the file locally so that closures which are run on the 
driver can still use the
+  // SparkFiles API to access files.
+  Utils.fetchFile(path, new File(SparkFiles.getRootDirectory()), 
conf, env.securityManager,
--- End diff --

Ah, sorry, I should've checked the codes further. It seems 
`Utils.fetchFile` is asking `url` as the first argument not path. How about 
changing `val uri = new URI(path)` to `val uri = new Path(path).toUri` rather 
than resembling `addJar`?

I remember we discussed this problem in 
https://github.com/apache/spark/pull/14960. If comma separated multiple paths 
are allow in `path`, it will be problematic but it seems the `path` is only 
allowed as a single.

```scala
import java.net.URI
import org.apache.hadoop.fs.Path

scala> new Path("C:\\a\\b\\c").toUri
res1: java.net.URI = /C:/a/b/c

scala> new Path("C:/a/b/c").toUri
res2: java.net.URI = /C:/a/b/c

scala> new Path("/a/b/c").toUri
res3: java.net.URI = /a/b/c

scala> new Path("file:///a/b/c").toUri
res3: java.net.URI = file:///a/b/c

scala> new Path("http://localhost/a/b/c;).toUri
res3: java.net.URI = http://localhost/a/b/c
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org