Henry mentored Crunch through incubation... Maybe he can tell you more
context.
For me, Gora is essentially an extremely easy storage abstraction
framework. I do not currently use the Query API meaning that the analysis
of data is delegated to Gora data store.
This is my current usage of the code base.
On Saturday, March 21, 2015, Furkan KAMACI furkankam...@gmail.com wrote:
Hi Lewis,
I am talking in context of GORA-418 and GORA-386, we can say GSoC. I've
talked with Talat about design of that implementation. I just wanted to
check other projects for does any of them such kind of feature.
Here is what is in my mind for Apache Gora for Spark supoort: developing a
layer which abstracts functionality of Spark, Tez, etc (GORA-418). There
will be implementations for each of them (and Spark will be one of them:
GORA-386)
i.e. you will write a word count example as Gora style, you will use one
of implementation and run it (as like storing data at Solr or Mongo via
Gora).
When I check Crunch I realize that:
*Every Crunch job begins with a Pipeline instance that manages the
execution lifecycle of your data pipeline. As of the 0.9.0 release, there
are three implementations of the Pipeline interface:*
*MRPipeline: Executes a pipeline as a series of MapReduce jobs that can
run locally or on a Hadoop cluster.*
*MemPipeline: Executes a pipeline in-memory on the client.*
*SparkPipeline: Executes a pipeline by running a series of Apache Spark
jobs, either locally or on a Hadoop cluster.*
So, I am curious about that supporting Crunch may help us what we want
with Spark support at Gora? Actually, I am new to such projects, I want to
learn what should be achieved with GORA-386 and not to be get lost because
of overthinking :) I see that you can use Gora for storing your data with
Gora-style, running jobs with Gora-style but have a flexibility of using
either HDFS, Solr, MongoDB, etc. or MaprReduce, Spark, Tez, etc.
PS: I know there is a similar issue at Apache Gora for Cascading support:
https://issues.apache.org/jira/browse/GORA-112
Kind Regards,
Furkan KAMACI
On Sat, Mar 21, 2015 at 8:14 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com
javascript:_e(%7B%7D,'cvml','lewis.mcgibb...@gmail.com'); wrote:
Hi Furkan,
In what context are we talking here?
GSoC or Just development?
I am very keen to essentially work towards what we can release as Gora 1.0
Thank you Furkan
On Saturday, March 21, 2015, Furkan KAMACI furkankam...@gmail.com
javascript:_e(%7B%7D,'cvml','furkankam...@gmail.com'); wrote:
As you know that there is an issue for integration Apache Spark and
Apache Gora [1]. Apache Spark is a popular project and in contrast to
Hadoop's two-stage disk-based MapReduce paradigm, Spark's in-memory
primitives provide performance up to 100 times faster for certain
applications [2]. There are also some alternatives to Apache Spark, i.e.
Apache Tez [3].
When implementing an integration for Spark, it should be considered to
have an abstraction for such kind of projects as an architectural design
and there is a related issue for it: [4].
There is another Apache project which aims to provide a framework named
as Apache Crunch [5] for writing, testing, and running MapReduce pipelines.
Its goal is to make pipelines that are composed of many user-defined
functions simple to write, easy to test, and efficient to run. It is an
high-level tool for writing data pipelines, as opposed to developing
against the MapReduce, Spark, Tez APIs or etc. directly [6].
I would like to learn how Apache Crunch fits with creating a multi
execution engine for Gora [4]? What kind of benefits we can get with
integrating Apache Gora and Apache Crunch and what kind of gaps we still
can have instead of developing a custom engine for our purpose?
Kind Regards,
Furkan KAMACI
[1] https://issues.apache.org/jira/browse/GORA-386
[2] Xin, Reynold; Rosen, Josh; Zaharia, Matei; Franklin, Michael;
Shenker, Scott; Stoica, Ion (June 2013).
[3] http://tez.apache.org/
[4] https://issues.apache.org/jira/browse/GORA-418
[5] https://crunch.apache.org/
[6] https://crunch.apache.org/user-guide.html#motivation
--
*Lewis*
--
*Lewis*