I am new to spark, i am using Spark streaming with Kafka..

My streaming duration is 1s..

Assume i get 100 records in 1s and 120 records in 2s and 80 records in 3s

--> {sec 1   1,2,...100} --> {sec 2 1,2..120} --> {sec 3 1,2,..80}
I apply my logic in sec 1 and have a result => result1

i want to use result1 in 2s and have a combined result of both result1 and
120 records of 2s as => result2

I tried to cache the result but i am not able to get the cached result1 in
2s is it possible.. or show some light on how to achieve my goal here?

 JavaPairReceiverInputDStream<String, String> messages =  
KafkaUtils.createStream(jssc, String.class,String.class,
StringDecoder.class,StringDecoder.class, kafkaParams, topicMap,
StorageLevel.MEMORY_AND_DISK_SER_2());
i process messages and find word which is the result for 1s ...

if(resultCp!=null){
                resultCp.print();
                result = resultCp.union(words.mapValues(new Sum()));

            }else{
                result = words.mapValues(new Sum());
            }

 resultCp =  result.cache();
when in 2s the resultCp should not be null but it returns null value so at
any given time i have that particular seconds data alone i want to find the
cumulative result. Do any one know how to do it..

I learnt that once spark streaming is started jssc.start() the control is no
more at our end it lies with spark.. so is it possible to send the result of
1s to 2s to find the accumulated value?

Any help is much much appreciated.. Thanks in advance



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/checkpoint-and-not-running-out-of-disk-space-tp1525p16790.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to