Hi, We want to keep the model created and loaded in memory through Spark batch context since blocked matrix operations are required to optimize on runtime.
The data is streamed in through Kafka / raw sockets and Spark Streaming Context. We want to run some prediction operations with the streaming data and model loaded in memory through batch context. Do I need to open up a API on top of the batch context or it is possible to use a RDD created by batch context through streaming context ? Most likely not since both streaming context and batch context can't exist in the same spark job but I am curious. If I have to open up an API, does it makes sense to come up with a generic serving api for mllib and let all mllib algorithms expose a serving API ? The API can be spawned using Spark's actor system itself specially since spray is merging to akka-httpx and akka is a dependency in spark already. May be it's not a good idea since it needs maintaining another actor system for the API. Thanks. Deb