Very cool!
2013/6/17 Russell Jurney <[email protected]> > Awesome! > > > On Sun, Jun 16, 2013 at 3:15 PM, Connor Woodson <[email protected] > >wrote: > > > I mentioned a few months ago that I was interested in creating a new > > Scripting Engine for Pig based off of the R language. I have finally > gotten > > that project to a point where I feel comfortable sharing it with the Pig > > community. > > > > This project can be found at: http://www.github.com/cd-wood/pigaddons > > > > RScriptEngine is a scripting engine for Apache Pig that interprets the R > > language <http://www.r-project.org/>. The goal behind this scripting > > engine > > is compatability and ease of use of the R language in Amazon EMR jobs. > > Included /scripts is the rpig-bootstrap.sh script, that is meant as a > > bootstrap script for Amazon EMR instances; it can also be used on > personal > > instances to set up an environment compatible with the scripting engine. > > This interpreter makes use of JRI <http://www.rforge.net/JRI/> to an > > instance of R to run inside of the Java process. > > > > By combining R with Pig, I feel that a large number of new analyses are > > possible that can not be done natively in Pig; while there are already > > other languages for creating UDFs, the more options the better. > > > > A cool feature that is possible by including R in a big-data analysis > > package is the ease of generating images / plotting data provided by R. > > While not currently implemented, one upcoming feature is the integration > of > > JavaGD which will allow all images generated by the R script to be > rendered > > into a Java class, from which it might be possible to save, email, or do > > other stuff with those saved images. > > > > To showcase using R with Pig, I've included a Naive Bayes (contrived) > > example that is a simplistic form of classifying emails as spam based off > > of the presence of certain words. > > > > I have tested this scripting engine on Pig 0.9.2 to make sure that it > works > > in Amazon EMR, however I haven't had a chance to test it in EMR yet. If > > someone does, please let me know how it goes, and if anyone has more cool > > examples of using R, I'd be happy to include them. > > > > And of course, please let me know of any bugs you find or any other > > suggestions you may have. > > > > Thanks, > > > > - Connor > > > > > > -- > Russell Jurney twitter.com/rjurney [email protected] > datasyndrome.com >
