Re: writing to local files on a worker

Joe Sun, 11 Nov 2018 20:38:20 -0800

Hello,

You could try using mapPartitions function if you can send partial datato your C++ program:


mapPartitions(func):

Similar to map, but runs separately on each partition (block) of theRDD, so /func/ must be of type Iterator<T> => Iterator<U> when runningon an RDD of type T.

That way you can write partition data to temp file, call your C++ app,then delete the temp file. Of course your data would be limited to allrows in one partition.


Also the latest release of Spark (2.4.0) introduced barrier execution mode:
https://issues.apache.org/jira/browse/SPARK-24374

Maybe you could combine the two, just using mapPartitions will give yousingle partition data only, and your app call will be repeated on allnodes, not necessarily at the same time.

Spark's strong point is parallel execution, so what you're trying to dokind of defeats that.But if you do not need to combine all the data before calling your appthen you could do it.

Or you could split your job into Spark -> app -> Spark chain.
Good luck,

Joe



On 11/11/2018 02:13 PM, Steve Lewis wrote:

I have a problem where a critical step needs to be performed by athird party c++ application. I can send or install this program on theworker nodes. I can construct a function holding all the data thisprogram needs to process. The problem is that the program is designedto read and write from the local file system. I can call the programfrom Java and read its output as a local file - then deleting alltemporary files but I doubt that it is possible to get the program toread from hdfs or any shared file system.My question is can a function running on a worker node createtemporary files and pass the names of these to a local processassuming everything is cleaned up after the call?
--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: writing to local files on a worker

Reply via email to