Hi, Are there any ways to store DStreams / RDD read from Kafka in memory to be processed at a later time ? What we need to do is to read data from Kafka, process it to be keyed by some attribute that is present in the Kafka messages, and write out the data related to each key when we have accumulated enough data for that key to write out a file that is close to the HDFS block size, say 64MB. We are looking at ways to avoid writing out some file of the entire Kafka content periodically and then later run a second job to read those files and split them out to another set of files as necessary.
Thanks.
