>> I've only had a quick look at Pig, but it seems that a declarative >> layer on top of Spark couldn't be anything other than a big win, as it >> allows developers to declare *what* they want, permitting the compiler >> to determine how best poke at the RDD API to implement it.
The devil is in the details - allowing developers to declare *what* they want - seems not practical in a declarative world since we are bound by the DSL constructs. The work around or rather hack is to have UDFs to have full language constructs. Some problems are hard, you will have twist your mind to solve in a restrictive way. At that time, we think, we wish we have complete language power. Being in Big Data world for short time (7 years), seen enough problems with Hive/Pig. All I am providing here is a thought to spark the Spark community to think beyond declarative constructs. I am sure there is a place for Pig and Hive. -Bharath On Fri, Apr 25, 2014 at 10:21 AM, Michael Armbrust <mich...@databricks.com>wrote: > On Fri, Apr 25, 2014 at 6:30 AM, Mark Baker <dist...@acm.org> wrote: > >> I've only had a quick look at Pig, but it seems that a declarative >> layer on top of Spark couldn't be anything other than a big win, as it >> allows developers to declare *what* they want, permitting the compiler >> to determine how best poke at the RDD API to implement it. >> > > Having Pig too would certainly be a win, but Spark > SQL<http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html>is > also a declarative layer on top of Spark. Since the optimization is > lazy, you can chain multiple SQL statements in a row and still optimize > them holistically (similar to a pig job). Alpha version coming soon to a > Spark 1.0 release near you! > > Spark SQL also lets to drop back into functional Scala when that is more > natural for a particular task. >