RE: pig needed?

Olga Natkovich Tue, 16 Nov 2010 10:38:03 -0800

Functions or rather inline macros are coming in Pig 0.9: 
http://wiki.apache.org/pig/TuringCompletePig


Olga

-----Original Message-----
From: Anze [mailto:[email protected]] 
Sent: Tuesday, November 16, 2010 9:34 AM
To: [email protected]
Subject: Re: pig needed?


My 0.02 EUR: 
I think it's not the learning curve that makes Pig a better tool for some 
applications. In my experience the learning curve is even *steeper* for pig 
than for raw MR. MR can very easily be learned from Tom White's book while Pig 
- well, Pig is in there too, but it's quite a short chapter and lacks good 
examples. The online tutorials however are almost non-existent, or at least I 
couldn't find any.

Where Pig excels is the power with which you can manipulate data. You can 
write complex queries in just a few lines whereas with MR you end up writing 
hundreds of lines of code.

The major drawback of Pig however (in my limited experience) is its lack of 
functions (or objects :), making any larger piece of code spaghetti-like. 
Also, it is still very much evolving so if you are dealing with anything else 
than raw HDFS files... well, good luck. :)

While we are at it, I am curios how other users use Pig? I am writing in 
PigPen Eclipse plugin and then copy+paste to Pig shell (I wasn't able to make 
PigPen work with cluster directly), which is pretty cumbersome. So this is 
another downside for me. 

But I still love Pig as it makes me control the data much more easily and it 
makes writing ad-hoc queries much easier. And it will only get better with 
time.

But if your code works in MR, why rewrite it? Let it be, unless you have 
problems with the code and needs to be rewritten anyway.

Enjoy!

Anze


On Tuesday 16 November 2010, Renato Marroquín Mogrovejo wrote:
> Pig has some clear advantages over raw mapreduce code, but IHMO the most
> important is the learning curve. But, if you are just loading, probably you
> don't want to just translate it into pig, well, maybe just for the fun of
> it (: but if you are planing to do some more other operations like joining
> or grouping, it would be a lot more simple to do it from pig.
> 
> Give this a look, it will help you understand better the bigger picture.
> http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hado
> op-pig
> 
> Renato M.
> 
> 
> If you already have it as a hadoop job, why would you want it pass to pig?
> 
> 2010/11/15 Gerrit van Vuuren <[email protected]>
> 
> > Is this a bot?
> > 
> > Y si no, si puedes utilizar pig anque te consejo reutilizar lo que ya se
> > ha desarollado y no repetir udfs si existe :)
> > 
> > 
> > ----- Original Message -----
> > From: Cornelio Iñigo <[email protected]>
> > To: [email protected] <[email protected]>
> > Sent: Mon Nov 15 20:48:35 2010
> > Subject: pig needed?
> > 
> > Hi
> > 
> > My name is Cornelio Iñigo and I´m a developer just beginning with this of
> > hadoop and pig.
> > I have a doubt about developing an application on pig, I already have my
> > program on hadoop, this program gets just a column from a dataset (csv
> > file)
> > and process this data with some functions (like language analisis,
> > analysis of the content)
> > 
> >  note that in the process of the file I dont use FILTERS COUNTS or any
> > 
> > built
> > in function of Pig, I think that all the fucntions have to be User
> > Defined Functions
> > 
> >  so Is a good idea (has sense ) to develop this program in Pig?
> > 
> > Thanks in advice
> > --
> > *Cornelio*

RE: pig needed?

Reply via email to