What if, when I traverse RDD, I need to calculate values in dataset by calling external (blocking) service? How do you think that could be achieved?
val values: Future[RDD[Double]] = Future sequence tasks I've tried to create a list of Futures, but as RDD id not Traversable, Future.sequence is not suitable. I just wonder, if anyone had such a problem, and how did you solve it? What I'm trying to achieve is to get a parallelism on a single worker node, so I can call that external service 3000 times per second. Probably, there is another solution, more suitable for spark, like having multiple working nodes on single host. It's interesting to know, how do you cope with such a challenge? Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-perform-blocking-IO-in-apache-spark-job-tp13704.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org