You cannot guarantee that each key will forever be on the same executor. That is flawed approach to designing an application if you have to take ensure fault-tolerance toward executor failures.
On Thu, Jan 7, 2016 at 9:34 AM, Lin Zhao <l...@exabeam.com> wrote: > I have a need to route the dstream through the streming pipeline by some > key, such that data with the same key always goes through the same > executor. > > There doesn't seem to be a way to do manual routing with Spark Streaming. > The closest I can come up with is: > > stream.foreachRDD {rdd => > rdd.groupBy(rdd.key).flatMap { line =>…}.map(…).map(…) > } > > Does this do what I expect? How about between batches? Does it guarrantee > the same key goes to the same executor in all batches? > > Thanks, > > Lin >