Hi Jaganadh, I once used hadoop to implement grid search / multi-task learning with hadoop streaming. The setup was fairly simple: I put the serialized dataset (joblib dump) on HDFS and created an input file - one line for each parameter setting for grid search. The map script deserialized the dataset from HDFS (in the init of the script) and for each map task (=parameter setting) it trained a model, computed the prediction error and emitted it. You can find some of the code here [1].
I used Hadoop because I had a Hadoop cluster at my disposal - nowadays I'd use IPython.parallel and starcluster instead - much simpler IMHO. best, Peter [1] https://github.com/pprett/nut/blob/master/nut/structlearn/dumbomapper.py (this is the mapper script; the code which creates the input files and puts everything onto HDFS is in the auxstrategy.py file) 2013/1/23 JAGANADH G <[email protected]>: > Hi All, > > Does anybody tried using sklearn with Hadoop/Dumbo or hadoop streaming. > Please share your thoughts and experience. > > Best regards > > -- > ********************************** > JAGANADH G > http://jaganadhg.in > ILUGCBE > http://ilugcbe.org.in > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnnow-d2d > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > -- Peter Prettenhofer ------------------------------------------------------------------------------ Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
