Re: compare/contrast Spark with Cascading

Matei Zaharia Mon, 28 Oct 2013 11:12:37 -0700

Hi Philip,

Indeed, Spark's API allows direct creation of complex workflows the same way 
Cascading would. Cascading built that functionality on top of MapReduce 
(translating user operations down to a series of MapReduce jobs), but Spark's 
engine supports complex workflows from the start and the API goes directly to 
those. So they are indeed alternatives in this way. Of course, you can also mix 
both in a deployment because they can share data through HDFS.


There may be other differences as well -- for example, Cascading has a specific 
data model for interchange between the operators (each record has to be a 
tuple), while Spark works directly on Java objects, and Spark also has Python 
and Scala APIs.

Matei

On Oct 28, 2013, at 10:11 AM, Philip Ogren <[email protected]> wrote:

> 
> My team is investigating a number of technologies in the Big Data space.  A 
> team member recently got turned on to Cascading as an application layer for 
> orchestrating complex workflows/scenarios.  He asked me if Spark had an 
> "application layer"?  My initial reaction is "no" that Spark would not have a 
> separate orchestration/application layer.  Instead, the core Spark API (along 
> with Streaming) would compete directly with Cascading for this kind of 
> functionality and that the two would not likely be all that complementary.  I 
> realize that I am exposing my ignorance here and could be way off.  Is there 
> anyone who knows a bit about both of these technologies who could speak to 
> this in broad strokes?  
> 
> Thanks!
> Philip
>

Re: compare/contrast Spark with Cascading

Reply via email to