Hi,

I'm fairly new to Crunch and my motivation to investigate crunch is to have a 
portable higher-level Java API to construct data pipelines.


Reading the docs I found this:


Minimal abstractions. Crunch pipelines provide a thin veneer on top of 
MapReduce. Developers have access to low-level MapReduce APIs whenever they 
need them. This mimimalism also means that Crunch is extremely fast, only 
slightly slower than a hand-tuned pipeline developed with the MapReduce APIs, 
and the community is working on making it faster all the time. That said, one 
of the goals of the project is portability, and the abstractions that Crunch 
provides are designed to ease the transition from Hadoop 1.0 to Hadoop 2.0 and 
to provide transparent support for future data processing frameworks that run 
on Hadoop, including Apache Spark<http://spark.incubator.apache.org/> and 
Apache Tez<http://tez.incubator.apache.org/>.


This is exactly what I'm looking for. However I'm also curious to know if its 
possible to use Spark APIs to optimize performance. Does anyone have this 
"use-case" and can share some experiences on how you went about mixing Crunch 
and Spark APIs?


Thanks,

Shiv

Reply via email to