Awesome!
On Sun, Jun 16, 2013 at 3:15 PM, Connor Woodson <[email protected]>wrote: > I mentioned a few months ago that I was interested in creating a new > Scripting Engine for Pig based off of the R language. I have finally gotten > that project to a point where I feel comfortable sharing it with the Pig > community. > > This project can be found at: http://www.github.com/cd-wood/pigaddons > > RScriptEngine is a scripting engine for Apache Pig that interprets the R > language <http://www.r-project.org/>. The goal behind this scripting > engine > is compatability and ease of use of the R language in Amazon EMR jobs. > Included /scripts is the rpig-bootstrap.sh script, that is meant as a > bootstrap script for Amazon EMR instances; it can also be used on personal > instances to set up an environment compatible with the scripting engine. > This interpreter makes use of JRI <http://www.rforge.net/JRI/> to an > instance of R to run inside of the Java process. > > By combining R with Pig, I feel that a large number of new analyses are > possible that can not be done natively in Pig; while there are already > other languages for creating UDFs, the more options the better. > > A cool feature that is possible by including R in a big-data analysis > package is the ease of generating images / plotting data provided by R. > While not currently implemented, one upcoming feature is the integration of > JavaGD which will allow all images generated by the R script to be rendered > into a Java class, from which it might be possible to save, email, or do > other stuff with those saved images. > > To showcase using R with Pig, I've included a Naive Bayes (contrived) > example that is a simplistic form of classifying emails as spam based off > of the presence of certain words. > > I have tested this scripting engine on Pig 0.9.2 to make sure that it works > in Amazon EMR, however I haven't had a chance to test it in EMR yet. If > someone does, please let me know how it goes, and if anyone has more cool > examples of using R, I'd be happy to include them. > > And of course, please let me know of any bugs you find or any other > suggestions you may have. > > Thanks, > > - Connor > -- Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
