Thanks Nihal for explanation. most of the task/processing which is possible thru PIG can be easily achieved by using SPARK, in much lesser easy to understandable code and since SPARK is in memory its 100x faster than any hadoop map-reduce tasks.
but I think this can not be the reason to not have pig interpreter. Pig's syntax and execution engine is different and that's enough to have interpreter, i think. Thanks, moon On Thu, Oct 1, 2015 at 2:15 PM Nihal Bhagchandani < nihal_bhagchand...@yahoo.com> wrote: > Hi, > so as per my understanding: > > *PIG*: Uses a scripting language called Pig Latin, which is more workflow > driven. Is an abstraction layer on top of map-reduce. Pig use batch > oriented frameworks, which means your analytic jobs will run for minutes or > may be hours depending upon the volume of data. think PIG as step by step > SQL execution. > > *Spark SQL* : Allows us to do SQL like actions in HDFS or file-system > with 100x faster performance than Map reduce when SQL performed in > memory.Else on Disk its ten time faster. > > Pig, a *SQL-like language* that gracefully tolerates inconsistent > schemas, and that runs on Hadoop. > > The basic concepts in SQL map pretty well onto Pig. There are analogues > for the major SQL keywords, and as a result you can write a query in your > head as SQL and then translate it into Pig Latin without undue mental > gymnastics. > WHERE → FILTER > The syntax is different, but conceptually this is still putting your data > into a funnel to create a smaller dataset. > HAVING → FILTER > Because a FILTER is done in a separate step from a GROUP or an > aggregation, the distinction between HAVING and WHERE doesn’t exist in Pig. > ORDER BY → ORDER > This keyword behaves pretty much the same in Pig as in SQL. > JOIN > In Pig, joins can have their execution specified, and they look a little > different, but in essence these are the same joins you know from SQL, and > you can think about them in the same way. There are INNER and OUTER joins, > RIGHT and LEFT specifications, and even CROSS for those rare moments that > you actually want a Cartesian product. > Because Pig is most appropriately used for data pipelines, there are often > fewer distinct relations or tables than you would expect to see in a > traditional normalized relational database. > Control over Execution > SQL performance tuning generally involves some fiddling with indexes, > punctuated by the occasional yelling at an explain plan that has > inexplicably decided to join the two largest tables first. It can mean > getting a different plan the second time you run a query, or having the > plan suddenly change after several weeks of use because the statistics have > evolved, throwing your query’s performance into the proverbial toilet. > Various SQL implementations offer hints to combat this problem—you can use > a hint to tell your SQL optimizer that it should use an index, or to force > a given table to be first in the join order. Unfortunately, because hints > are dependent on the particular SQL implementation, what you actually have > at your disposal varies by platform. > Pig offers a few different ways to control the execution plan. The first > is just the explicit ordering of operations. You can write your FILTER > before your JOIN (the reverse of SQL’s order) and be clever about > eliminating unused fields along the way, and have confidence that the > executed order will not be worse. > Secondly, the philosophy of Pig is to allow users to choose > implementations where multiple ones are possible. As a result, there are > three specialized joins that a can be used when the features of the data > are known, and are less appropriate for a regular join. For regular joins, > the order of the arguments dictates execution—the larger data set should > appear last in this type of join. > As with SQL, in Pig you can pretty much ignore the performance tweaks > until you can’t. Because of the explicit control of ordering, it can be > useful to have a general sense of the “good” order to do things in, though > Pig’s optimizer will also try to push up FILTERs and LIMITs, taking some of > the pressure off. > > here is dennylee's link where you can find SPARK vs PIG > http://dennyglee.com/2013/08/19/why-all-this-interest-in-spark/ > > most of the task/processing which is possible thru PIG can be easily > achieved by using SPARK, in much lesser easy to understandable code and > since SPARK is in memory its 100x faster than any hadoop map-reduce tasks. > > Regards > Nihal > > > > > > > > > On Thursday, 1 October 2015 3:35 PM, moon soo Lee <m...@apache.org> wrote: > > > I dont know Pig very well, but It's little bit difficult to think how > spark-sql can help pig users. Can you explain more? > > Thanks, > moon > On 2015년 10월 1일 (목) at 오전 11:39 Nihal Bhagchandani < > nihal_bhagchand...@yahoo.com> wrote: > > Is there is any extra advantage to have a PIG Interpreter when zeppelin > already support SPARK-SQL? > > Nihal > > Sent from my iPhone > > On 01-Oct-2015, at 12:54, moon soo Lee <m...@apache.org> wrote: > > Hi, > > As far as i know, there're no ongoing work for a pig interpreter. But no > reason to not having one. How about file an issue for it? > > Thanks, > moon > On 2015년 9월 23일 (수) at 오후 11:23 Michael Parco < > 33pa...@cardinalmail.cua.edu> wrote: > > Is there any current work or plans for a Pig interpreter in Zeppelin? > > > >