As a side note, multithreaded single decision tree training is something on our radar. It may be possible that afterwards we work towards supporting distributed training, but I wouldn't count on it for a while.
On Sat, Sep 12, 2015 at 10:18 AM, Gilles Louppe <g.lou...@gmail.com> wrote: > Hi, > > > But the question is how to make the scikit-learn code, decisionTree > Regressor for example, running in distributed computing mode, to benefit > the power of Spark? > > I am sorry but you cant. The tree implementation in scikit-learn was > not designed for this use case. > > Maybe you should have a look at MLlib > (https://spark.apache.org/mllib/), which implements a bunch of machine > learning algorithms (including forests) on top of Spark. > > Best, > Gilles > > On 12 September 2015 at 20:11, Rex X <dnsr...@gmail.com> wrote: > > What is the best way to migrate existing scikit-learn code to PySpark > > cluster? Then we can bring together the full power of both scikit-learn > and > > spark, to do scalable machine learning. > > > > Currently I use multiprocessing module of Python to boost the speed. But > > this only works for one node, while the data set is small. > > > > For many real cases, we may need to deal with gigabytes or even > terabytes of > > data, with thousands of raw categorical attributes, which can lead to > > millions of discrete features, using 1-of-k representation. > > > > For these cases, one solution is to use distributed memory. That's why I > am > > considering spark. And spark support Python! > > With Pyspark, we can import scikit-learn. > > > > But the question is how to make the scikit-learn code, decisionTree > > Regressor for example, running in distributed computing mode, to benefit > the > > power of Spark? > > > > > > Best, > > Rex > > > > > ------------------------------------------------------------------------------ > > > > _______________________________________________ > > Scikit-learn-general mailing list > > Scikit-learn-general@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general