I have created a Custom Receiver to fetch records pertaining to a specific query from Elastic Search and have implemented Streaming RDD transformations to process the data generated by the receiver.
The final RDD is a sorted list of name value pairs and I want to read the top 20 results programmatically rather than write to an external file. I use "foreach" on the RDD and take the top 20 values into a list. I see that forEach is processed every time there is a new microbatch from the receiver. However, I want the foreach computation to be done only once when the receiver has finished fetching all the records from Elastic Search and before the streaming context is killed so that I can populate the results into a list and process it in my driver program. Appreciate any guidance in this regard. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org