Hi, I What does the external service provide? Data? Calculations? Can the service push data to you via Kafka and Spark streaming ? Can you fetch the necessary data beforehand from the service? The solution to your question depends on your answers.
I would not recommend to connect to a blocking service during spark jobs execution. What do you do if a node crashes? Is order of service calls for you relevant? Best regards Le 8 sept. 2014 17:31, "DrKhu" <khudyakov....@gmail.com> a écrit : > What if, when I traverse RDD, I need to calculate values in dataset by > calling external (blocking) service? How do you think that could be > achieved? > > val values: Future[RDD[Double]] = Future sequence tasks > > I've tried to create a list of Futures, but as RDD id not Traversable, > Future.sequence is not suitable. > > I just wonder, if anyone had such a problem, and how did you solve it? What > I'm trying to achieve is to get a parallelism on a single worker node, so I > can call that external service 3000 times per second. > > Probably, there is another solution, more suitable for spark, like having > multiple working nodes on single host. > > It's interesting to know, how do you cope with such a challenge? Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-perform-blocking-IO-in-apache-spark-job-tp13704.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >