How do you perform blocking IO in apache spark job?

DrKhu Mon, 08 Sep 2014 08:32:25 -0700

What if, when I traverse RDD, I need to calculate values in dataset by
calling external (blocking) service? How do you think that could be
achieved?


val values: Future[RDD[Double]] = Future sequence tasks

I've tried to create a list of Futures, but as RDD id not Traversable,
Future.sequence is not suitable.

I just wonder, if anyone had such a problem, and how did you solve it? What
I'm trying to achieve is to get a parallelism on a single worker node, so I
can call that external service 3000 times per second.

Probably, there is another solution, more suitable for spark, like having
multiple working nodes on single host.

It's interesting to know, how do you cope with such a challenge? Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-perform-blocking-IO-in-apache-spark-job-tp13704.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

How do you perform blocking IO in apache spark job?

Reply via email to