Yes, sc.addFile() is what you want: | addFile(self, path) | Add a file to be downloaded with this Spark job on every node. | The C{path} passed can be either a local file, a file in HDFS | (or other Hadoop-supported filesystems), or an HTTP, HTTPS or | FTP URI. | | To access the file in Spark jobs, use | L{SparkFiles.get(fileName)<pyspark.files.SparkFiles.get>} with the | filename to find its download location. | | >>> from pyspark import SparkFiles | >>> path = os.path.join(tempdir, "test.txt") | >>> with open(path, "w") as testFile: | ... testFile.write("100") | >>> sc.addFile(path) | >>> def func(iterator): | ... with open(SparkFiles.get("test.txt")) as testFile: | ... fileVal = int(testFile.readline()) | ... return [x * fileVal for x in iterator] | >>> sc.parallelize([1, 2, 3, 4]).mapPartitions(func).collect() | [100, 200, 300, 400]
On Tue, Sep 16, 2014 at 7:02 PM, daijia <jia_...@intsig.com> wrote: > Is there some way to ship textfile just like ship python libraries? > > Thanks in advance > Daijia > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-ship-external-Python-libraries-in-PYSPARK-tp14074p14412.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org