Re: Unable to ship external Python libraries in PYSPARK

Davies Liu Tue, 16 Sep 2014 20:17:07 -0700

Yes, sc.addFile() is what you want:

 |  addFile(self, path)
 |      Add a file to be downloaded with this Spark job on every node.
 |      The C{path} passed can be either a local file, a file in HDFS
 |      (or other Hadoop-supported filesystems), or an HTTP, HTTPS or
 |      FTP URI.
 |
 |      To access the file in Spark jobs, use
 |      L{SparkFiles.get(fileName)<pyspark.files.SparkFiles.get>} with the
 |      filename to find its download location.
 |
 |      >>> from pyspark import SparkFiles
 |      >>> path = os.path.join(tempdir, "test.txt")
 |      >>> with open(path, "w") as testFile:
 |      ...    testFile.write("100")
 |      >>> sc.addFile(path)
 |      >>> def func(iterator):
 |      ...    with open(SparkFiles.get("test.txt")) as testFile:
 |      ...        fileVal = int(testFile.readline())
 |      ...        return [x * fileVal for x in iterator]
 |      >>> sc.parallelize([1, 2, 3, 4]).mapPartitions(func).collect()
 |      [100, 200, 300, 400]


On Tue, Sep 16, 2014 at 7:02 PM, daijia <jia_...@intsig.com> wrote:
> Is there some way to ship textfile just like ship python libraries？
>
> Thanks in advance
> Daijia
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-ship-external-Python-libraries-in-PYSPARK-tp14074p14412.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Unable to ship external Python libraries in PYSPARK

Reply via email to