Re: Revisit Pig Philosophy?

Milind A Bhandarkar Fri, 18 Sep 2009 20:03:05 -0700

It's Friday evening, so I have some time to discuss philosophy ;-)

Before we discuss any question about revisiting pig philosophy, the  
first question that needs to be answered is "what is pig" ? (this  
corresponds to the Hindu philosophy's basic argument, that any deep  
personal philosophical investigations need to start with a question  
"koham?" (in Sanskrit, it means 'who am I?'))

So, coming back to approx 4000 years after the origin of that  
philosophy, we need to ask "what is pig?" (incidentally, pig, or  
varaaha in Sanskrit, was the second incarnation of lord Vishnu in  
hindu scriptures, but that's not relevant here.)

What we need to decide is, is pig is a dataflow language ? I think  
not. "Pig Latin" is the language. Pig is referred to in countless  
slide decks ( aka pig scriptures, btw I own 50% of these scriptures)  
as a runtime system that interprets pig Latin, kind of like java and  
jvm. (Duality of nature, called "dwaita" philosophy in sanskrit is  
applicable here. But I won't go deeper than that.)

So, pig-Latin-the-language's stance  could still be that it could be  
implemented on any runtime. But pig the runtime's philosophy could be  
that it is a thin layer on top of hadoop. And all the world could  
breathe a sigh of relief. (mostly, by not having to answer these  
philosophical questions.)

So, 'koham' is the 4000 year old question this project needs to  
answer. That's all.

AUM...... (it's Friday.)

- (swami) Milind ;-)

On Sep 18, 2009, at 19:05, "Jeff Hammerbacher" <ham...@cloudera.com>  
wrote:

> Hey,
>
>> 2. Local mode and other parallel frameworks
>>
>> <snip>
>> Pigs Live Anywhere
>>
>> Pig is intended to be a language for parallel data processing. It  
>> is not
>> tied to one particular parallel framework. It has been implemented  
>> first
>> on hadoop, but we do not intend that to be only on hadoop.
>> </snip>
>>
>> Are we still holding onto this? What about local mode? Local mode  
>> is not
>> being treated on equal footing with that of Hadoop for practical
>> reasons. However, users expect things that work on local mode to work
>> without any hitches on Hadoop.
>>
>> Are we still designing the system assuming that Pig will be stacked  
>> on
>> top of other parallel frameworks?
>>
>
> FWIW, I appreciate this philosophical stance from Pig. Allowing  
> locally
> tested scripts to be migrated to the cluster without breakage is a  
> noble
> goal, and keeping the option of (one day) developing an alternative
> execution environment for Pig that runs over HDFS but uses a richer  
> physical
> set of operators than MapReduce would be great.
>
> Of course, those of you who are running Pig in production will have  
> a much
> better sense of the feasibility, rather than desirability, of this
> philosophical stance.
>
> Later,
> Jeff

Re: Revisit Pig Philosophy?

Reply via email to