Re: Executing Pig on another framework

Alan Gates Thu, 15 Dec 2011 07:52:22 -0800

On Dec 15, 2011, at 7:24 AM, Tharindu Mathew wrote:

> Thanks folks. You've been really helpful.
> 
> Just to clear my doubts, does the local mode in [1] refer to the Hadoop
> local mode?
Yes, since 0.6 or 0.7 we use Hadoop's LocalJobRunner to execute local mode 
queries.


Alan.

> 
> [1] - http://pig.apache.org/docs/r0.9.1/cont.html#embed-java
> 
> On Thu, Dec 15, 2011 at 12:36 AM, Dmitriy Ryaboy <[email protected]> wrote:
> 
>> To add to what Alan said -- one other implementation existed (local,
>> in-memory mode), and it was consistently just different from MR mode enough
>> to cause all kinds of confusion and bugs for users. We also noticed that
>> our abstractions wound up being partial wrappers of Hadoop's abstractions,
>> and less extensible at that. My feeling is that Hadoop already has quite a
>> few abstraction layers that allow one to implement his own input splits,
>> file systems, etc, so sneaking in a different system could be done at that
>> layer.
>> 
>> You could try to add a new mode and intercept the output of the logical
>> planner, when the job is still represented as a DAG of logical operators;
>> you'd have to write all of your own physical operators, but it could be
>> done. It'd be a fair amount of work.
>> 
>> D
>> 
>> On Wed, Dec 14, 2011 at 10:19 AM, Alan Gates <[email protected]>
>> wrote:
>> 
>>> The issue was that trying to be non-backend specific forced us to write
>>> abstractions for all of Hadoop's features rather than use them directly.
>>> For example, we had to have an abstract notion of a file system rather
>>> than use HDFS, an abstract notion of input rather than use InputFormats.
>>> This caused efficiency issues for the runtime system and made it much
>>> harder for developers to understand what was going on in the code.  Given
>>> that no other implementations existed, it did not make sense to pay this
>>> price.
>>> 
>>> Alan.
>>> 
>>> On Dec 14, 2011, at 8:13 AM, Tharindu Mathew wrote:
>>> 
>>>> Hi Gianmarco,
>>>> 
>>>> Thanks for the response...
>>>> 
>>>> I do understand the Pig Latin is not coupled with anything. Hence, the
>>>> question for use in another framework.
>>>> 
>>>> I was referring to what you said secondly. It would be great to run Pig
>>> as
>>>> a Pig Latin Executor without Hadoop.
>>>> 
>>>> Why was such an extension point a burden? It seems like the right thing
>>> to
>>>> do in terms of design.
>>>> 
>>>> On Wed, Dec 14, 2011 at 7:29 PM, Gianmarco De Francisci Morales <
>>>> [email protected]> wrote:
>>>> 
>>>>> Hi,
>>>>> Pig Latin (the language) is not coupled to Hadoop and can be used on
>> any
>>>>> other framework.
>>>>> Pig (the system) however, is tightly coupled to Hadoop.
>>>>> Extension point existed but were removed because they were a major
>>> burden.
>>>>> That said, it is theoretically possible to build another Pig Latin
>>> executor
>>>>> that runs on a different framework.
>>>>> 
>>>>> Cheers,
>>>>> --
>>>>> Gianmarco
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Dec 14, 2011 at 13:17, Tharindu Mathew <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> I understand that Pig Latin is a data flow language. In that sense it
>>>>>> should be theoretically possible to execute Pig Latin in any
>> framework
>>>>>> though currently and it is meant to be executed in a Hadoop
>>> enviornment.
>>>>>> How hard would it be to switch Pig Latin to run on a different
>>> framework?
>>>>>> Are there any extension points for this if at all or is Pig Latin is
>>>>>> tightly coupled to Hadoop?
>>>>>> 
>>>>>> --
>>>>>> Regards,
>>>>>> 
>>>>>> Tharindu
>>>>>> 
>>>>>> blog: http://mackiemathew.com/
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Regards,
>>>> 
>>>> Tharindu
>>>> 
>>>> blog: http://mackiemathew.com/
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Regards,
> 
> Tharindu
> 
> blog: http://mackiemathew.com/

Re: Executing Pig on another framework

Reply via email to