Re: Pig on Spark

Xiangrui Meng Mon, 10 Mar 2014 23:01:25 -0700

Hi Sameer,

Lin (cc'ed) could also give you some updates about Pig on Spark
development on her side.


Best,
Xiangrui

On Mon, Mar 10, 2014 at 12:52 PM, Sameer Tilak <ssti...@live.com> wrote:
> Hi Mayur,
> We are planning to upgrade our distribution MR1> MR2 (YARN) and the goal is
> to get SPROK set up next month. I will keep you posted. Can you please keep
> me informed about your progress as well.
>
> ________________________________
> From: mayur.rust...@gmail.com
> Date: Mon, 10 Mar 2014 11:47:56 -0700
>
> Subject: Re: Pig on Spark
> To: user@spark.apache.org
>
>
> Hi Sameer,
> Did you make any progress on this. My team is also trying it out would love
> to know some detail so progress.
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi
>
>
>
> On Thu, Mar 6, 2014 at 2:20 PM, Sameer Tilak <ssti...@live.com> wrote:
>
> Hi Aniket,
> Many thanks! I will check this out.
>
> ________________________________
> Date: Thu, 6 Mar 2014 13:46:50 -0800
> Subject: Re: Pig on Spark
> From: aniket...@gmail.com
> To: user@spark.apache.org; tgraves...@yahoo.com
>
>
> There is some work to make this work on yarn at
> https://github.com/aniket486/pig. (So, compile pig with ant
> -Dhadoopversion=23)
>
> You can look at https://github.com/aniket486/pig/blob/spork/pig-spark to
> find out what sort of env variables you need (sorry, I haven't been able to
> clean this up- in-progress). There are few known issues with this, I will
> work on fixing them soon.
>
> Known issues-
> 1. Limit does not work (spork-fix)
> 2. Foreach requires to turn off schema-tuple-backend (should be a pig-jira)
> 3. Algebraic udfs dont work (spork-fix in-progress)
> 4. Group by rework (to avoid OOMs)
> 5. UDF Classloader issue (requires SPARK-1053, then you can put
> pig-withouthadoop.jar as SPARK_JARS in SparkContext along with udf jars)
>
> ~Aniket
>
>
>
>
> On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves <tgraves...@yahoo.com> wrote:
>
> I had asked a similar question on the dev mailing list a while back (Jan
> 22nd).
>
> See the archives:
> http://mail-archives.apache.org/mod_mbox/spark-dev/201401.mbox/browser ->
> look for spork.
>
> Basically Matei said:
>
> Yup, that was it, though I believe people at Twitter picked it up again
> recently. I'd suggest
> asking Dmitriy if you know him. I've seen interest in this from several
> other groups, and
> if there's enough of it, maybe we can start another open source repo to
> track it. The work
> in that repo you pointed to was done over one week, and already had most of
> Pig's operators
> working. (I helped out with this prototype over Twitter's hack week.) That
> work also calls
> the Scala API directly, because it was done before we had a Java API; it
> should be easier
> with the Java one.
>
>
> Tom
>
>
>
> On Thursday, March 6, 2014 3:11 PM, Sameer Tilak <ssti...@live.com> wrote:
> Hi everyone,
>
> We are using to Pig to build our data pipeline. I came across Spork -- Pig
> on Spark at: https://github.com/dvryaboy/pig and not sure if it is still
> active.
>
> Can someone please let me know the status of Spork or any other effort that
> will let us run Pig on Spark? We can significantly benefit by using Spark,
> but we would like to keep using the existing Pig scripts.
>
>
>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>
>

Re: Pig on Spark

Reply via email to