Re: What is the best way to migrate existing scikit-learn code to PySpark?

2015-09-13 Thread Nick Pentreath
I should point out that I'm not sure what the performance of that project is. I'd expect that native data frame in PySpark will be significantly more efficient than their DictRDD.  It would be interesting to see a performance comparison for the pipelines relative to native Spark ML

Re: What is the best way to migrate existing scikit-learn code to PySpark?

2015-09-12 Thread Jörn Franke
I fear you have to do the plumbing all yourself. This is the same for all commercial and non-commercial libraries/analytics packages. It often also depends on the functional requirements on how you distribute. Le sam. 12 sept. 2015 à 20:18, Rex X a écrit : > Hi everyone, > >

Re: What is the best way to migrate existing scikit-learn code to PySpark?

2015-09-12 Thread Rex X
Jorn and Nick, Thanks for answering. Nick, the sparkit-learn project looks interesting. Thanks for mentioning it. Rex On Sat, Sep 12, 2015 at 12:05 PM, Nick Pentreath wrote: > You might want to check out https://github.com/lensacom/sparkit-learn >

What is the best way to migrate existing scikit-learn code to PySpark?

2015-09-12 Thread Rex X
Hi everyone, What is the best way to migrate existing scikit-learn code to PySpark cluster? Then we can bring together the full power of both scikit-learn and spark, to do scalable machine learning. (I know we have MLlib. But the existing code base is big, and some functions are not fully

Re: What is the best way to migrate existing scikit-learn code to PySpark?

2015-09-12 Thread Nick Pentreath
You might want to check out https://github.com/lensacom/sparkit-learn Though it's true for random Forests / trees you will need to use MLlib — Sent from Mailbox On Sat, Sep 12, 2015 at 9:00 PM, Jörn Franke wrote: > I fear you have to do the plumbing all yourself.