See the inline response. On Wed, Sep 24, 2014 at 4:05 PM, Reddy Raja <areddyr...@gmail.com> wrote:
> Given this program.. I have the following queries.. > > val sparkConf = new SparkConf().setAppName("NetworkWordCount") > > sparkConf.set("spark.master", "local[2]") > > val ssc = new StreamingContext(sparkConf, Seconds(10)) > > val > ** > *lines* = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel. > MEMORY_AND_DISK_SER) > > > ** > *val words = lines.flatMap(_.split(" "))* > > > ** > * val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)* > > > ** > *wordCounts.print()* > > ssc.start() > > ssc.awaitTermination() > > > Q1) How do I know which part of the program is executing every 10 sec.. > > My requirements is that, I want to execute a method and insert data > into Cassandra every time a set of messages comes in > ==> Those highlighted lines will be executed in every 10 sec. Basically whatever operations that you are doing on *lines* will be executed in every 10 secs, So to solve your problem you need to have a map function on the lines which will do your data insertion to Cassandra. Eg: > *val dumdum = lines.map(x => { whatever you want to do with x (like > insert into Cassandra) })* Q2) Is there a function I can pass, so that, it gets executed when the next > set of messages comes in. > ==> Hope the first answer covers it. > Q3) If I have a method in-beween the following lines > > val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _) > > wordCounts.print() > > ** > * my_method(stread rdd)..* > > ssc.start() > > > ==> No!! my_method will only execute one time . > The method is not getting executed.. > > > Can some one answer these questions? > > --Reddy >