Hello, On Mon, Dec 23, 2013 at 3:23 PM, Jie Deng <[email protected]> wrote:
> I am using Java, and Spark has APIs for Java as well. Though there is a > saying that Java in Spark is slower than Scala shell, well, depends on your > requirement. > I am not an expert in Spark, but as far as I know, Spark provide different > level of storage including memory or disk. And for the disk part, HDFS is > just a choice. I am not using hdfs myself, but you will loss the benefit of > hdfs as well. In other words, it's also just based on your requirements. > And MongoDB or S3 are also doable, at least with Java APIs, I suppose. > > I guess that answers the question of whether it is doable. Where/how do I find out how it is doable? :) I am guessing every pipeline is a "custom job" of sorts - hence it is the developer's job to write the "connectors" to 0mq or dynamodb, for example? Or....? Is there some kind of a "plug in" system for Spark? Thanks!
