>
> On Jun 10, 2014, at 6:13 PM, Sahil Sharma <[email protected]> wrote:
>
> Hi,
>
> Better yet (maybe controversial since I don’t know the mathematical
> justification for this) but you could cluster the indicator matrix of items
> by similar items. This is at least clustering “important” similar items.
>
> I'm sorry if what I said was interpreted as user-based clustering, I meant
> Item-based clustering, like you pointed out !
The indicator matrix is item by item, so this will cluster items.
If you want to cluster the input matrix by item, just transpose it and cluster.
The rows will be items and the columns users. so you will get clustered items.
I’m dubious of this because of how sparse it is and you’ll also have
uninteresting interactions because is hasn’t been scrubbed by the process that
creates the indicator matrix (RowSimilarityJob).
Still I'd shy away from clustering. It’s hard to get right. I’d try the
similarity approach below first.
> But it is even easier than clustering if you know a couple items the user has
> preferred just get the most similar to those directly from the indicator
> matrix. The indicator matrix is organized by an item per row and each row has
> similar items by strength of similarity. Add all the rows the user has
> interacted with (using the strength values), sort, and recommend the top n.
> The in-memory item-based recommender will give you the similar items for each
> item the user preferred, all you need to do is add an sort.
>
> I did try out the item-based recommender, but maybe it took a lot of
> computational time because I tried out the Boolean indicator matrix based
> recommender. ( GenericBooleanPrefItemBasedRecommender )
Using the in-memory recommender may be a problem if your data is very large. If
you do use it you could implement a check of how much data the user has and ask
the recommender for recs given a userID or collect up similar items using
recommender.mostSimilarItems( (long[] itemIDs, int howMany ) from the
interactions you do have. There are other ways to do this with the Hadoop
version of the Item-based recommender.
/**
* @param itemIDs
* IDs of item for which to find most similar other items
* @param howMany
* desired number of most similar items to find estimates used to
determine most similar items
* @return items most similar to the given items, ordered from most similar
to least
* @throws TasteException
* if an error occurs while accessing the {@link
org.apache.mahout.cf.taste.model.DataModel}
*/
List<RecommendedItem> mostSimilarItems(long[] itemIDs, int howMany) throws
TasteException;
>
> You are certainly welcome here but questions like this usually go to the
> [email protected] list.
> ᐧ
>
> Thanks for pointing it out! I'll be careful from now on.
>
> On Wed, Jun 11, 2014 at 4:37 AM, Pat Ferrel <[email protected]> wrote:
> There are simple ways to do this without maintaining a separate recommender.
>
> First you can simply cluster the input matrix of users by items. Then
> recommend items closest to the centroid of the cluster the user’s couple of
> items were in. But this seems dubious for several reasons.
>
> Better yet (maybe controversial since I don’t know the mathematical
> justification for this) but you could cluster the indicator matrix of items
> by similar items. This is at least clustering “important” similar items.
>
> But it is even easier than clustering if you know a couple items the user has
> preferred just get the most similar to those directly from the indicator
> matrix. The indicator matrix is organized by an item per row and each row has
> similar items by strength of similarity. Add all the rows the user has
> interacted with (using the strength values), sort, and recommend the top n.
> The in-memory item-based recommender will give you the similar items for each
> item the user preferred, all you need to do is add an sort.
>
> To truly solve the cold start problem you have items and/or users with no
> interactions. This calls for a metadata recommender and some context. If a
> user is on a page of a product with no interactions, the metadata must tell
> which items are similar. In the case where you have a user with no
> interactions and no context, you have to rely on things like the time-worn
> popular and trending items.
>
> You are certainly welcome here but questions like this usually go to the
> [email protected] list.
>
> On Jun 10, 2014, at 4:50 AM, Sahil Sharma <[email protected]> wrote:
>
> Hi,
>
> One place where tree based recommenders(that is using hierarchical
> clustering) might be useful is a cold start problem. That is suppose a
> user has only bought a few items ( say 2 or 3) It's kind of hard to
> capture that user's interests using a user-based collaborative filtering
> recommender.
> Also the use of item-based collaborative filtering recommender turns out to
> be time consuming.
> In such a setting it makes sense to cluster the items together ( using some
> clustering algorithm) and then use the user's purchased item to
> recommend(based on which cluster those purchased items belong to).
> On Jun 10, 2014 4:41 PM, "Sebastian Schelter" <[email protected]> wrote:
>
> > Hi Sahil,
> >
> > don't worry, you're not breaking any rules. We removed the tree-based
> > recommenders because we have never heard of anyone using them over the
> > years.
> >
> > --sebastian
> >
> > On 06/10/2014 09:01 AM, Sahil Sharma wrote:
> >
> >> Hi,
> >>
> >> Firstly I apologize if I'm breaking certain rules by mailing this way, I'm
> >> new to this and would appreciate any help I could get.
> >>
> >> I was just playing around with the tree-based Recommender ( which seems to
> >> be deprecated in the current version "for the lack of use" ) .
> >>
> >> Why was it deprecated?
> >>
> >> Also, I just looked at the code, and it seems to be doing a lot of
> >> redundant computations, for example we could store a matrix of
> >> cluster-cluster distances ( and hence avoid recomputing the closest
> >> clusters every time by updating the matrix whenever we merge two clusters)
> >> and also , when trying to determine the farthest distance based similarity
> >> between two clusters again the pair which realizes this could be stored ,
> >> and updated upon merging so that this computation need not to repeated
> >> again and again.
> >>
> >> Just wondering if this repeated computation was not a reason for
> >> deprecating the class ( since people might have found a slow recommender
> >> "lacking use" ) .
> >>
> >> Would be glad to hear the thoughts of others on this, and also implement
> >> an
> >> efficient version if the community agrees.
> >>
> >>
> >
>
>
>
>
> --
> Best,
> Sahil
>