Re: Pig on Spark

Mayur Rustagi Fri, 25 Apr 2014 13:16:29 -0700

One core segment that frequently asks for systems like Pig & Hive are
analyst who want to deal with data. The key place I see pig fitting in is
getting non-developers deal with data at scale & free up developers to deal
with code, udf rather than manage day to day dataflow changes & updates.
A byproduct of this is that big data computation is made available to folks
beyond those who know what maven & sbt are :)



Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Sat, Apr 26, 2014 at 12:04 AM, Bharath Mundlapudi
<mundlap...@gmail.com>wrote:

> >> I've only had a quick look at Pig, but it seems that a declarative
> >> layer on top of Spark couldn't be anything other than a big win, as it
> >> allows developers to declare *what* they want, permitting the compiler
> >> to determine how best poke at the RDD API to implement it.
>
> The devil is in the details - allowing developers to declare *what* they
> want - seems not practical in a declarative world since we are bound by the
> DSL constructs. The work around or rather hack is to have UDFs to have full
> language constructs. Some problems are hard, you will have twist your mind
> to solve in a restrictive way. At that time, we think, we wish we have
> complete language power.
>
> Being in Big Data world for short time (7 years), seen enough problems
> with Hive/Pig. All I am providing here is a thought to spark the Spark
> community to think beyond declarative constructs.
>
> I am sure there is a place for Pig and Hive.
>
> -Bharath
>
>
>
>
> On Fri, Apr 25, 2014 at 10:21 AM, Michael Armbrust <mich...@databricks.com
> > wrote:
>
>> On Fri, Apr 25, 2014 at 6:30 AM, Mark Baker <dist...@acm.org> wrote:
>>
>>> I've only had a quick look at Pig, but it seems that a declarative
>>> layer on top of Spark couldn't be anything other than a big win, as it
>>> allows developers to declare *what* they want, permitting the compiler
>>> to determine how best poke at the RDD API to implement it.
>>>
>>
>> Having Pig too would certainly be a win, but Spark 
>> SQL<http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html>is
>>  also a declarative layer on top of Spark.  Since the optimization is
>> lazy, you can chain multiple SQL statements in a row and still optimize
>> them holistically (similar to a pig job).  Alpha version coming soon to a
>> Spark 1.0 release near you!
>>
>> Spark SQL also lets to drop back into functional Scala when that is more
>> natural for a particular task.
>>
>
>

Re: Pig on Spark

Reply via email to