Thanks TD, that is very useful.
On Tue, Jul 14, 2015 at 10:19 PM, Tathagata Das wrote:
> You can do this.
>
> // global variable to keep track of latest stuff
> var latestTime = _
> var latestRDD = _
>
>
> dstream.foreachRDD((rdd: RDD[..], time: Time) => {
> latestTime = time
> latestRDD
You can do this.
// global variable to keep track of latest stuff
var latestTime = _
var latestRDD = _
dstream.foreachRDD((rdd: RDD[..], time: Time) => {
latestTime = time
latestRDD = rdd
})
Now you can asynchronously access the latest RDD. However if you are going
to run jobs on the la
I have been POC adding a rest service in a Spark Streaming job. Say I
create a stateful DStream X by using updateStateByKey, and each time there
is a HTTP request, I want to apply some transformations/actions on the
latest RDD of X and collect the results immediately but not scheduled by
streaming