I have a need to route the dstream through the streming pipeline by some key, such that data with the same key always goes through the same executor.
There doesn't seem to be a way to do manual routing with Spark Streaming. The
closest I can come up with is:
stream.foreachRDD {rdd =>
rdd.groupBy(rdd.key).flatMap { line =>...}.map(...).map(...)
}
Does this do what I expect? How about between batches? Does it guarrantee the
same key goes to the same executor in all batches?
Thanks,
Lin
