Hi Philip, Indeed, Spark's API allows direct creation of complex workflows the same way Cascading would. Cascading built that functionality on top of MapReduce (translating user operations down to a series of MapReduce jobs), but Spark's engine supports complex workflows from the start and the API goes directly to those. So they are indeed alternatives in this way. Of course, you can also mix both in a deployment because they can share data through HDFS.
There may be other differences as well -- for example, Cascading has a specific data model for interchange between the operators (each record has to be a tuple), while Spark works directly on Java objects, and Spark also has Python and Scala APIs. Matei On Oct 28, 2013, at 10:11 AM, Philip Ogren <[email protected]> wrote: > > My team is investigating a number of technologies in the Big Data space. A > team member recently got turned on to Cascading as an application layer for > orchestrating complex workflows/scenarios. He asked me if Spark had an > "application layer"? My initial reaction is "no" that Spark would not have a > separate orchestration/application layer. Instead, the core Spark API (along > with Streaming) would compete directly with Cascading for this kind of > functionality and that the two would not likely be all that complementary. I > realize that I am exposing my ignorance here and could be way off. Is there > anyone who knows a bit about both of these technologies who could speak to > this in broad strokes? > > Thanks! > Philip >
