As a side note, multithreaded single decision tree training is something on
our radar. It may be possible that afterwards we work towards supporting
distributed training, but I wouldn't count on it for a while.



On Sat, Sep 12, 2015 at 10:18 AM, Gilles Louppe <g.lou...@gmail.com> wrote:

> Hi,
>
> > But the question is how to make the scikit-learn code, decisionTree
> Regressor for example, running in distributed computing mode, to benefit
> the power of Spark?
>
> I am sorry but you cant. The tree implementation in scikit-learn was
> not designed for this use case.
>
> Maybe you should have a look at MLlib
> (https://spark.apache.org/mllib/), which implements a bunch of machine
> learning algorithms (including forests) on top of Spark.
>
> Best,
> Gilles
>
> On 12 September 2015 at 20:11, Rex X <dnsr...@gmail.com> wrote:
> > What is the best way to migrate existing scikit-learn code to PySpark
> > cluster? Then we can bring together the full power of both scikit-learn
> and
> > spark, to do scalable machine learning.
> >
> > Currently I use multiprocessing module of Python to boost the speed. But
> > this only works for one node, while the data set is small.
> >
> > For many real cases, we may need to deal with gigabytes or even
> terabytes of
> > data, with thousands of raw categorical attributes, which can lead to
> > millions of discrete features, using 1-of-k representation.
> >
> > For these cases, one solution is to use distributed memory. That's why I
> am
> > considering spark. And spark support Python!
> > With Pyspark, we can import scikit-learn.
> >
> > But the question is how to make the scikit-learn code, decisionTree
> > Regressor for example, running in distributed computing mode, to benefit
> the
> > power of Spark?
> >
> >
> > Best,
> > Rex
> >
> >
> ------------------------------------------------------------------------------
> >
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to