How to restrict foreach on a streaming RDD only once upon receiver completion

Hari Polisetty Mon, 06 Apr 2015 12:32:55 -0700

I have created a Custom Receiver to fetch records pertaining to a specific 
query from Elastic Search and have implemented Streaming RDD transformations to 
process the data generated by the receiver.


The final RDD is a sorted list of name value pairs and I want to read the top 
20 results programmatically rather than write to an external file.
I use "foreach" on the RDD and take the top 20 values into a list. I see that 
forEach is processed every time there is a new microbatch from the receiver.

However, I want the foreach computation to be done only once when the receiver 
has finished fetching all the records from Elastic Search and before the 
streaming context is killed so that I can populate the results into a list and 
process it in my driver program. 

Appreciate any guidance in this regard.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

How to restrict foreach on a streaming RDD only once upon receiver completion

Reply via email to