Re: Pig on Spark

Bharath Mundlapudi Fri, 25 Apr 2014 11:43:26 -0700

>> I've only had a quick look at Pig, but it seems that a declarative
>> layer on top of Spark couldn't be anything other than a big win, as it
>> allows developers to declare *what* they want, permitting the compiler
>> to determine how best poke at the RDD API to implement it.

The devil is in the details - allowing developers to declare *what* they
want - seems not practical in a declarative world since we are bound by the
DSL constructs. The work around or rather hack is to have UDFs to have full
language constructs. Some problems are hard, you will have twist your mind
to solve in a restrictive way. At that time, we think, we wish we have
complete language power.

Being in Big Data world for short time (7 years), seen enough problems with
Hive/Pig. All I am providing here is a thought to spark the Spark community
to think beyond declarative constructs.

I am sure there is a place for Pig and Hive.

-Bharath

On Fri, Apr 25, 2014 at 10:21 AM, Michael Armbrust
<mich...@databricks.com>wrote:

> On Fri, Apr 25, 2014 at 6:30 AM, Mark Baker <dist...@acm.org> wrote:
>
>> I've only had a quick look at Pig, but it seems that a declarative
>> layer on top of Spark couldn't be anything other than a big win, as it
>> allows developers to declare *what* they want, permitting the compiler
>> to determine how best poke at the RDD API to implement it.
>>
>
> Having Pig too would certainly be a win, but Spark 
> SQL<http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html>is
>  also a declarative layer on top of Spark.  Since the optimization is
> lazy, you can chain multiple SQL statements in a row and still optimize
> them holistically (similar to a pig job).  Alpha version coming soon to a
> Spark 1.0 release near you!
>
> Spark SQL also lets to drop back into functional Scala when that is more
> natural for a particular task.
>

Re: Pig on Spark

Reply via email to