I looked into this once upon a time, and one of the key problems (from
talking to Jake IIRC) is how to handle the "missing values" in the input
array. You would either need a mask, or some kind of indexing system for
describing which value goes where in the input matrix. Either way, this
extra argument would be a requirement for CF, but not for the existing
algorithms in sklearn.

Maybe it would only operate on sparse arrays, and infer that the values
which are missing are the ones to be imputed ("recommended")? But not
supporting dense arrays would basically be the opposite of other modules in
sklearn, where dense input is default. Maybe someone can comment on this?

I don't know how well this lines up with the existing API/functionality and
the future directions there, but how to deal with the missing values is
probably the primary concern for implementing CF algorithms in sklearn IMO.


On Wed, Jan 15, 2014 at 12:07 PM, Manoj Kumar <
manojkumarsivaraj...@gmail.com> wrote:

> Hello,
>
> First of all, thanks to the scikit-learn community for guiding new
> developers. I'm thankful for all the help that I've got with my Pull
> Requests till now.
>
> I hope that this is the right place to discuss GSoC related ideas (I've
> idled at the scikit-learn irc channel for quite a few occasions, but I
> could not meet any core developer). I was browsing through the threads of
> last year, when I found this idea related to collaborative filtering (CF)
> quite interesting,
> http://sourceforge.net/mailarchive/message.php?msg_id=30725712 , though
> this was sadly not accepted.
>
> If the scikit-learn community is still enthusiastic about a recsys module
> with CF algorithms implemented, I would love this to be my GSoC proposal
> and we could discuss more about the algorithms, gelling with the present
> sklearn API, how much we could possibly fit in a 3 month period etc.
>
> Awaiting a reply.
>
> --
> Regards,
> Manoj Kumar,
> Mech Undergrad
> http://manojbits.wordpress.com
>
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to