One core segment that frequently asks for systems like Pig & Hive are analyst who want to deal with data. The key place I see pig fitting in is getting non-developers deal with data at scale & free up developers to deal with code, udf rather than manage day to day dataflow changes & updates. A byproduct of this is that big data computation is made available to folks beyond those who know what maven & sbt are :)
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Sat, Apr 26, 2014 at 12:04 AM, Bharath Mundlapudi <mundlap...@gmail.com>wrote: > >> I've only had a quick look at Pig, but it seems that a declarative > >> layer on top of Spark couldn't be anything other than a big win, as it > >> allows developers to declare *what* they want, permitting the compiler > >> to determine how best poke at the RDD API to implement it. > > The devil is in the details - allowing developers to declare *what* they > want - seems not practical in a declarative world since we are bound by the > DSL constructs. The work around or rather hack is to have UDFs to have full > language constructs. Some problems are hard, you will have twist your mind > to solve in a restrictive way. At that time, we think, we wish we have > complete language power. > > Being in Big Data world for short time (7 years), seen enough problems > with Hive/Pig. All I am providing here is a thought to spark the Spark > community to think beyond declarative constructs. > > I am sure there is a place for Pig and Hive. > > -Bharath > > > > > On Fri, Apr 25, 2014 at 10:21 AM, Michael Armbrust <mich...@databricks.com > > wrote: > >> On Fri, Apr 25, 2014 at 6:30 AM, Mark Baker <dist...@acm.org> wrote: >> >>> I've only had a quick look at Pig, but it seems that a declarative >>> layer on top of Spark couldn't be anything other than a big win, as it >>> allows developers to declare *what* they want, permitting the compiler >>> to determine how best poke at the RDD API to implement it. >>> >> >> Having Pig too would certainly be a win, but Spark >> SQL<http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html>is >> also a declarative layer on top of Spark. Since the optimization is >> lazy, you can chain multiple SQL statements in a row and still optimize >> them holistically (similar to a pig job). Alpha version coming soon to a >> Spark 1.0 release near you! >> >> Spark SQL also lets to drop back into functional Scala when that is more >> natural for a particular task. >> > >