This project looks interesting https://github.com/lensacom/sparkit-learn <https://github.com/lensacom/sparkit-learn/blob/master/README.rst>
and a nice coded project name :) On Sat, Sep 12, 2015 at 11:24 AM, Jacob Schreiber <jmschreibe...@gmail.com> wrote: > As a side note, multithreaded single decision tree training is something > on our radar. It may be possible that afterwards we work towards supporting > distributed training, but I wouldn't count on it for a while. > > > > On Sat, Sep 12, 2015 at 10:18 AM, Gilles Louppe <g.lou...@gmail.com> > wrote: > >> Hi, >> >> > But the question is how to make the scikit-learn code, decisionTree >> Regressor for example, running in distributed computing mode, to benefit >> the power of Spark? >> >> I am sorry but you cant. The tree implementation in scikit-learn was >> not designed for this use case. >> >> Maybe you should have a look at MLlib >> (https://spark.apache.org/mllib/), which implements a bunch of machine >> learning algorithms (including forests) on top of Spark. >> >> Best, >> Gilles >> >> On 12 September 2015 at 20:11, Rex X <dnsr...@gmail.com> wrote: >> > What is the best way to migrate existing scikit-learn code to PySpark >> > cluster? Then we can bring together the full power of both scikit-learn >> and >> > spark, to do scalable machine learning. >> > >> > Currently I use multiprocessing module of Python to boost the speed. But >> > this only works for one node, while the data set is small. >> > >> > For many real cases, we may need to deal with gigabytes or even >> terabytes of >> > data, with thousands of raw categorical attributes, which can lead to >> > millions of discrete features, using 1-of-k representation. >> > >> > For these cases, one solution is to use distributed memory. That's why >> I am >> > considering spark. And spark support Python! >> > With Pyspark, we can import scikit-learn. >> > >> > But the question is how to make the scikit-learn code, decisionTree >> > Regressor for example, running in distributed computing mode, to >> benefit the >> > power of Spark? >> > >> > >> > Best, >> > Rex >> > >> > >> ------------------------------------------------------------------------------ >> > >> > _______________________________________________ >> > Scikit-learn-general mailing list >> > Scikit-learn-general@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > >> >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general