Re: PySpark / scikit-learn integration sprint at Cloudera - Strata Conference Friday 14th Feb 2014

2013-12-05 Thread Olivier Grisel
2013/12/4 Josh Rosen : > Thanks for organizing this! I'll definitely be attending. Great. Looking forward to meet you to. Uri, you might want to register as well on the wiki :) -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel

PySpark - Dill serialization

2013-12-05 Thread Nick Pentreath
Hi devs I came across Dill ( http://trac.mystic.cacr.caltech.edu/project/pathos/wiki/dill) for Python serialization. Was wondering if it may be a replacement to the cloudpickle stuff (and remove that piece of code that needs to be maintained within PySpark)? Josh have you looked into Dill? Any th

Re: Spark streaming quantile?

2013-12-05 Thread Sam Bessalah
Just as stated before Algebird has many data structure to compute those like QTree, or Ted's tvdigest . Or you can look at stream-lib q digest https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/quantile/QDigest.java Or another one Frugal Streaming w

IntelliJ Scala Import Organizer Plugin

2013-12-05 Thread Aaron Davidson
Hi guys, just wanted to share a little plugin I wrote for IntelliJ to help auto-organize Scala imports. Anyone who has submitted a patch to Spark has probably felt the exhilaration of manually sorting and bucketing your imports. Well, now you can let your IDE have some fun! It's in the plugin repo

Re: PySpark - Dill serialization

2013-12-05 Thread Josh Rosen
Thanks for the link! I wasn't aware of Dill, but it looks like a nice library. I like that it's being actively developed: https://github.com/uqfoundation/dill It also seems to work correctly for a few edge-cases that cloudpickle didn't handle properly, such as serializing operator.itemgetter ins

Re: PySpark - Dill serialization

2013-12-05 Thread Matei Zaharia
Looks cool! Josh, if you replace CloudPickle with this, make sure to also update the LICENSE file, which is supposed to contain third-party licenses. Matei On Dec 5, 2013, at 8:02 PM, Josh Rosen wrote: > Thanks for the link! I wasn't aware of Dill, but it looks like a nice > library. I like

Re: IntelliJ Scala Import Organizer Plugin

2013-12-05 Thread Imran Rashid
awesome, thanks. I've been wanting this even for all my scala projects for a while On Thu, Dec 5, 2013 at 7:00 PM, Aaron Davidson wrote: > Hi guys, just wanted to share a little plugin I wrote for IntelliJ to help > auto-organize Scala imports. Anyone who has submitted a patch to Spark has > p