[GitHub] spark pull request: SPARK-4687. [WIP] Add an addDirectory API

vanzin Wed, 28 Jan 2015 12:54:07 -0800

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3670#discussion_r23722554
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -996,12 +1004,49 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
        * filesystems), or an HTTP, HTTPS or FTP URI.  To access the file in 
Spark jobs,
        * use `SparkFiles.get(fileName)` to find its download location.
        */
    -  def addFile(path: String) {
    +  def addFile(path: String): Unit = {
    +    addFile(path, false)
    +  }
    +
    +    /**
    +   * Add a file to be downloaded with this Spark job on every node.
    +   * The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
    +   * filesystems), or an HTTP, HTTPS or FTP URI.  To access the file in 
Spark jobs,
    +   * use `SparkFiles.get(fileName)` to find its download location.
    +   *
    +   * A directory can be given if the recursive option is set to true. 
Currently directories are only
    +   * supported for Hadoop-supported filesystems.
    +   */
    +  def addFile(path: String, recursive: Boolean): Unit = {
    +    val isLocalMode = conf.get("spark.master").startsWith("local")
         val uri = new URI(path)
    -    val key = uri.getScheme match {
    -      case null | "file" => env.httpFileServer.addFile(new 
File(uri.getPath))
    -      case "local"       => "file:" + uri.getPath
    -      case _             => path
    +    val schemeCorrectedPath = uri.getScheme match {
    +      case null | "local" => "file:" + uri.getPath
    +      case _              => path
    +    }
    +
    +    val hadoopPath = new Path(schemeCorrectedPath)
    +    val scheme = new URI(schemeCorrectedPath).getScheme
    +    if (!Array("http", "https", "ftp").contains(scheme)) {
    +      val fs = hadoopPath.getFileSystem(hadoopConfiguration)
    +      if (!fs.exists(hadoopPath)) {
    +        throw new SparkException(s"Added file $hadoopPath does not exist.")
    +      }
    +      val isDir = fs.isDirectory(hadoopPath)
    --- End diff --
    
    I think we use `isDir` in other places because of hadoop 1.x.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: SPARK-4687. [WIP] Add an addDirectory API

Reply via email to