Hello,

I am trying to use the RDD pipe method to integrate Spark with external 
commands to be executed on each partition. My program roughly looks like:

rdd.pipe(cmd1).pipe(cmd2)

The output of cmd1 and input of cmd2 is raw binary data.
However, the pipe method in RDD requires converting data to strings, which 
implies loss of data between the two command calls.

I am now thinking of extending RDD.scala and PipedRDD.scala so as to give 
end-user direct user to the PrintWriter created in PipedRDD.

Is there any better solution to do this ?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to