This project looks interesting
https://github.com/lensacom/sparkit-learn
<https://github.com/lensacom/sparkit-learn/blob/master/README.rst>

and a nice coded project name :)


On Sat, Sep 12, 2015 at 11:24 AM, Jacob Schreiber <jmschreibe...@gmail.com>
wrote:

> As a side note, multithreaded single decision tree training is something
> on our radar. It may be possible that afterwards we work towards supporting
> distributed training, but I wouldn't count on it for a while.
>
>
>
> On Sat, Sep 12, 2015 at 10:18 AM, Gilles Louppe <g.lou...@gmail.com>
> wrote:
>
>> Hi,
>>
>> > But the question is how to make the scikit-learn code, decisionTree
>> Regressor for example, running in distributed computing mode, to benefit
>> the power of Spark?
>>
>> I am sorry but you cant. The tree implementation in scikit-learn was
>> not designed for this use case.
>>
>> Maybe you should have a look at MLlib
>> (https://spark.apache.org/mllib/), which implements a bunch of machine
>> learning algorithms (including forests) on top of Spark.
>>
>> Best,
>> Gilles
>>
>> On 12 September 2015 at 20:11, Rex X <dnsr...@gmail.com> wrote:
>> > What is the best way to migrate existing scikit-learn code to PySpark
>> > cluster? Then we can bring together the full power of both scikit-learn
>> and
>> > spark, to do scalable machine learning.
>> >
>> > Currently I use multiprocessing module of Python to boost the speed. But
>> > this only works for one node, while the data set is small.
>> >
>> > For many real cases, we may need to deal with gigabytes or even
>> terabytes of
>> > data, with thousands of raw categorical attributes, which can lead to
>> > millions of discrete features, using 1-of-k representation.
>> >
>> > For these cases, one solution is to use distributed memory. That's why
>> I am
>> > considering spark. And spark support Python!
>> > With Pyspark, we can import scikit-learn.
>> >
>> > But the question is how to make the scikit-learn code, decisionTree
>> > Regressor for example, running in distributed computing mode, to
>> benefit the
>> > power of Spark?
>> >
>> >
>> > Best,
>> > Rex
>> >
>> >
>> ------------------------------------------------------------------------------
>> >
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > Scikit-learn-general@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to