Hi Pig users, I wanted to share with you all that we recently open sourced a library we have been developing at LinkedIn. In it we have collected many of the useful UDFs we have developed for products such as "People You May Know" and "Skills". There are UDFs for median, quantiles, set operations, bag operations, pagerank, etc. All the UDFs are pretty well documented and unit tested (also tracking code coverage). It would be great to get people's feedback on it. Also if anyone would like to contribute please let us know :)
Project page: http://sna-projects.com/datafu/ Github: https://github.com/linkedin/datafu Thanks, Matt Hayes
