+1 for removing. This interface does not bring us any value when we
decide to move closer to hadoop. Writing a backend is almost writing
half of Pig. I don't think this interface is attractive to most
developers. Instead, I +1 for Milind's idea to make intermediate
artifacts available, or provide some hook for user to peek/morph the
plan at different stages. This opens the door for developers to
visualize/debug/improve Pig without knowing every details of Pig.
Daniel
Alan Gates wrote:
A couple of years ago we had this concept that Pig as is should be
able to run on other backends (like say Dryad if it were open
source). So we built this whole backend interface and (mostly) kept
Hadoop specific objects out of the front end.
Recently we have modified that stand and said that this implementation
of Pig is Hadoop specific. Pig Latin itself will still stay Hadoop
independent. So the ability to have multiple backends is fine. But
the ability to have non-Hadoop backends is not really interesting now.
So I at least see the proposal here as getting rid of generic code
that tries to hide the fact that we are working on top of Hadoop
(things like DataStorage and ExecutionEngine).
Alan.
On Apr 22, 2010, at 4:14 PM, Arun C Murthy wrote:
I read it as getting rid of concepts parallel to hadoop in src/org/
apache/pig/backend/hadoop/datastorage.
Is that true?
thanks,
Arun
On Apr 22, 2010, at 1:34 PM, Dmitriy Ryaboy wrote:
I kind of dig the concept of being able to plug in a different
backend,
though I definitely thing we should get rid of the dead localmode
code. Can
you give an example of how this will simplify the codebase? Is it
more than
just GenericClass foo = new SpecificClass(), and the associated
extra files?
-D
On Thu, Apr 22, 2010 at 1:25 PM, Arun C Murthy <[email protected]>
wrote:
+1
Arun
On Apr 22, 2010, at 11:35 AM, Richard Ding wrote:
Pig has an abstraction layer (interfaces and abstract classes) to
support multiple execution engines. After PIG-1053, Hadoop is the
only
execution engine supported by Pig. I wonder if we should remove
this
layer of code, and make Hadoop THE execution engine for Pig. This
will
simplify a lot the backend code.
Thanks,
-Richard