May I suggest you look at Spark (http://spark-project.org/ and
https://github.com/mesos/spark).

It is written in Scala, has a Java API and the current master branch has
the new Python API (0.7.0 release when it happens). I've been doing some
testing, including using sklearn together with Spark, and so far it looks
good. The bonus is no Hadoop MapReduce (but fully HDFS compatible if you
need the filesystem), and you can write all your code directly in Python.

N


On Thu, Jan 24, 2013 at 7:01 AM, JAGANADH G <[email protected]> wrote:

> Hi Peter,
>
> Thanks for sharing the experience and code. I will try the same.
>
> @Jaques : Thanks for the link. My plan is to use sklearn only . If I have
> to use Mahout the entire project has to be converted to java. I am
> interested to accomplish it in Python only !!
>
> Best regards
>
> jaganadh
>
>
>
> On Wed, Jan 23, 2013 at 6:43 PM, Peter Prettenhofer <
> [email protected]> wrote:
>
>> Hi Jaganadh,
>>
>> I once used hadoop to implement grid search / multi-task learning with
>> hadoop streaming. The setup was fairly simple: I put the serialized
>> dataset (joblib dump) on HDFS and created an input file - one line for
>> each parameter setting for grid search. The map script deserialized
>> the dataset from HDFS (in the init of the script) and for each map
>> task (=parameter setting) it trained a model, computed the prediction
>> error and emitted it. You can find some of the code here [1].
>>
>> I used Hadoop because I had a Hadoop cluster at my disposal - nowadays
>> I'd use IPython.parallel and starcluster instead - much simpler IMHO.
>>
>> best,
>>  Peter
>>
>> [1]
>> https://github.com/pprett/nut/blob/master/nut/structlearn/dumbomapper.py
>>  (this is the mapper script; the code which creates the input files
>> and puts everything onto HDFS is in the auxstrategy.py file)
>>
>> 2013/1/23 JAGANADH G <[email protected]>:
>> > Hi All,
>> >
>> > Does anybody tried using sklearn with Hadoop/Dumbo or hadoop streaming.
>> > Please share your thoughts and experience.
>> >
>> > Best regards
>> >
>> > --
>> > **********************************
>> > JAGANADH G
>> > http://jaganadhg.in
>> > ILUGCBE
>> > http://ilugcbe.org.in
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>> > MVPs and experts. ON SALE this month only -- learn more at:
>> > http://p.sf.net/sfu/learnnow-d2d
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > [email protected]
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >
>>
>>
>>
>> --
>> Peter Prettenhofer
>>
>>
>> ------------------------------------------------------------------------------
>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>> MVPs and experts. ON SALE this month only -- learn more at:
>> http://p.sf.net/sfu/learnnow-d2d
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> --
> **********************************
> JAGANADH G
> http://jaganadhg.in
> *ILUGCBE*
> http://ilugcbe.org.in
>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. ON SALE this month only -- learn more at:
> http://p.sf.net/sfu/learnnow-d2d
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to