Re: [Scikit-learn-general] Using sklearn in Hadoop

2013-02-04 Thread Peter Prettenhofer
Cool example - thanks Nick! 2013/2/4 Robert Kern : > On Mon, Feb 4, 2013 at 2:50 PM, Nick Pentreath > wrote: >> @Robert sorry for the delay in responding, I was away on vacation. >> >> Here's a link to a gist of a very simple implementation of parallelized SGD >> using Spark (https://gist.github

Re: [Scikit-learn-general] Using sklearn in Hadoop

2013-02-04 Thread Robert Kern
On Mon, Feb 4, 2013 at 2:50 PM, Nick Pentreath wrote: > @Robert sorry for the delay in responding, I was away on vacation. > > Here's a link to a gist of a very simple implementation of parallelized SGD > using Spark (https://gist.github.com/4707012). It basically replicates the > existing Spark l

Re: [Scikit-learn-general] Using sklearn in Hadoop

2013-02-04 Thread Nick Pentreath
@Robert sorry for the delay in responding, I was away on vacation. Here's a link to a gist of a very simple implementation of parallelized SGD using Spark (https://gist.github.com/4707012). It basically replicates the existing Spark logistic regression example, but using sklearn's linear_model mod

Re: [Scikit-learn-general] Using sklearn in Hadoop

2013-01-27 Thread Robert Kern
On Thu, Jan 24, 2013 at 10:06 AM, Nick Pentreath wrote: > May I suggest you look at Spark (http://spark-project.org/ and > https://github.com/mesos/spark). > > It is written in Scala, has a Java API and the current master branch has the > new Python API (0.7.0 release when it happens). I've been d

Re: [Scikit-learn-general] Using sklearn in Hadoop

2013-01-24 Thread Nick Pentreath
May I suggest you look at Spark (http://spark-project.org/ and https://github.com/mesos/spark). It is written in Scala, has a Java API and the current master branch has the new Python API (0.7.0 release when it happens). I've been doing some testing, including using sklearn together with Spark, an

Re: [Scikit-learn-general] Using sklearn in Hadoop

2013-01-23 Thread JAGANADH G
Hi Peter, Thanks for sharing the experience and code. I will try the same. @Jaques : Thanks for the link. My plan is to use sklearn only . If I have to use Mahout the entire project has to be converted to java. I am interested to accomplish it in Python only !! Best regards jaganadh On Wed,

Re: [Scikit-learn-general] Using sklearn in Hadoop

2013-01-23 Thread Peter Prettenhofer
Hi Jaganadh, I once used hadoop to implement grid search / multi-task learning with hadoop streaming. The setup was fairly simple: I put the serialized dataset (joblib dump) on HDFS and created an input file - one line for each parameter setting for grid search. The map script deserialized the dat

Re: [Scikit-learn-general] Using sklearn in Hadoop

2013-01-23 Thread Jaques Grobler
2013/1/23 JAGANADH G > Hadoop/Dumbo or hadoop This thread may be of some interest : http://news.ycombinator.com/item?id=4968609 Regards J -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Win

[Scikit-learn-general] Using sklearn in Hadoop

2013-01-23 Thread JAGANADH G
Hi All, Does anybody tried using sklearn with Hadoop/Dumbo or hadoop streaming. Please share your thoughts and experience. Best regards -- ** JAGANADH G http://jaganadhg.in *ILUGCBE* http://ilugcbe.org.in --