Thanks for your responses.

@Kyle:
At the risk of sounding really naive, I'd like to make the following
comments. I'm referring to this paper that Sukru had posted,
http://www.stat.osu.edu/~dmsl/Sarwar_2001.pdf which is item based
collaborative filtering. I don't think there is really any need for masking
the items that are not selected by the target user (or the user for which
you need to predict the item rating) here. I believe it would work for
dense cases too. Lets look at a sample session here.

    from sklearn.recsys import item_cf  # Tentative names.
    clf = item_cf()  # Here arguments like similarity criteria, number of
recommendations can be given in the __init__
    # Lets say there are n users who have have already rated,
    # X is an 2-D array with the first dimension of n, the second can vary
according to the number of items they have
    # rated.
    # y is the ratings they have provided. This can be either binary like
+1 or -1 , or continuous.
    clf.fit(X, y)
    # After doing clf.fit(X, y) , an attribute clf.items_ would return the
total number of items.
    clf.predict(x)  # This will return the top n recommendations of x
    # For each item in clf.items_ provided item is not in x, similarity is
calculated by taking the top k similar items in x.

For user based CF, yes we need to provide a mask for the item for which we
need to predict the rating, but I suppose that can be provided in the
__init__ (can't it)?

@Alex and Nick: Thanks for your references, I'll have a look right now.

However a point I don't intutively understand what clf.transform() /
clf.fit_transform must be doing in these cases. Any pointers?  Considering
the mentor problem, I don't think that would be a problem if the community
is genuinely interested in this project. If I do get a +1, I can start
thinking about the timeline, algorithms I'd like to implement etc. I'm
really looking forward to extending my really minor scikit-learn work right
now as part of GSoC.
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to