Hi, As I understand, a DStream consists of 1 or more RDDs. And foreachRDD will run a given func on each and every RDD inside a DStream.
I created a simple program which reads log files from a folder every hour: JavaStreamingContext stcObj = new JavaStreamingContext(confObj, new Duration(60 * 60 * 1000)); //1 hour JavaDStream<String> obj = stcObj.textFileStream("/Users/path/to/Input"); When the interval is reached, Spark reads all the files and creates one and only one RDD (as i verified from a sysout inside foreachRDD). The streaming doc at a lot of places gives an indication that many operations (e.g. flatMap) on a DStream are applied individually to a RDD and the resulting DStream consists of the mapped RDDs in the same number as the input DStream. ref: https://spark.apache.org/docs/latest/streaming-programming-guide.html#dstreams If that is the case, how can i generate a scenario where in I have multiple RDDs inside a DStream in my example ? Regards, Sanjay