While I think collaborative filtering / recommendations may have a place in
sklearn, it is true that the problem setting is a little different from
most of the sklearn models.
You may want to take a look into mrec (https://github.com/mendeley/mrec)
where many well established CF approaches are implemented in an sklearn
API-friendly manner. This package also provides for some parallel training
of models.
Furthermore if you're looking at very large-scale data Spark's new Python
bindings to MLlib allow one to use the efficient cluster-parallel ALS
implementation from Python:
https://github.com/apache/incubator-spark/pull/283
On Wed, Jan 15, 2014 at 8:42 PM, Kyle Kastner <kastnerk...@gmail.com> wrote:
> I looked into this once upon a time, and one of the key problems (from
> talking to Jake IIRC) is how to handle the "missing values" in the input
> array. You would either need a mask, or some kind of indexing system for
> describing which value goes where in the input matrix. Either way, this
> extra argument would be a requirement for CF, but not for the existing
> algorithms in sklearn.
>
> Maybe it would only operate on sparse arrays, and infer that the values
> which are missing are the ones to be imputed ("recommended")? But not
> supporting dense arrays would basically be the opposite of other modules in
> sklearn, where dense input is default. Maybe someone can comment on this?
>
> I don't know how well this lines up with the existing API/functionality
> and the future directions there, but how to deal with the missing values is
> probably the primary concern for implementing CF algorithms in sklearn IMO.
>
>
> On Wed, Jan 15, 2014 at 12:07 PM, Manoj Kumar <
> manojkumarsivaraj...@gmail.com> wrote:
>
>> Hello,
>>
>> First of all, thanks to the scikit-learn community for guiding new
>> developers. I'm thankful for all the help that I've got with my Pull
>> Requests till now.
>>
>> I hope that this is the right place to discuss GSoC related ideas (I've
>> idled at the scikit-learn irc channel for quite a few occasions, but I
>> could not meet any core developer). I was browsing through the threads of
>> last year, when I found this idea related to collaborative filtering (CF)
>> quite interesting,
>> http://sourceforge.net/mailarchive/message.php?msg_id=30725712 , though
>> this was sadly not accepted.
>>
>> If the scikit-learn community is still enthusiastic about a recsys module
>> with CF algorithms implemented, I would love this to be my GSoC proposal
>> and we could discuss more about the algorithms, gelling with the present
>> sklearn API, how much we could possibly fit in a 3 month period etc.
>>
>> Awaiting a reply.
>>
>> --
>> Regards,
>> Manoj Kumar,
>> Mech Undergrad
>> http://manojbits.wordpress.com
>>
>>
>> ------------------------------------------------------------------------------
>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>> Critical Workloads, Development Environments & Everything In Between.
>> Get a Quote or Start a Free Trial Today.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general