Re: Merging similarities from two different approaches

Sean Owen Thu, 22 Mar 2012 14:39:13 -0700

You're implementing userSimilarity(), but appear to be computing
item-item similarity. Halfway through, you use the item IDs as user
IDs. I can't see what this is intending to do as a result?


On Thu, Mar 22, 2012 at 9:33 PM, Ahmed Abdeen Hamed
<[email protected]> wrote:
> Hello Sean,
>
> I am trying to cluster not only based on sales data in a data model, but
> also based on a content as well. Down below is the function that is doing
> that:
>
> @Override
>
> public double userSimilarity(long itemID1, long itemID2) throws
> TasteException {
>
> // converting the item ids from long to String
>
> String itemOneID = String.valueOf(itemID1);
>
> String itemTwoID = String.valueOf(itemID2);
>
>
> // looking up the ids in the hashmap
>
> String itemOneValue = productIdAttributesMap.get(itemOneID);
>
> String itemTwoValue = productIdAttributesMap.get(itemTwoID);
>
>
> // load the tfidf object with many documents
>
> for(String s: productIdAttributesMap.values())
>
> tfIdf.handle(s);
>
>
> // compute the distance and return it...
>
> double proximity = 0;
>
> if(itemOneValue!=null && itemTwoValue!=null){
>
> proximity = tfIdf.proximity(itemOneValue, itemTwoValue);
>
> }
>
>
> // now computing similarity between items from sales data
>
> DataModel dataModel = getDataModel();
>
> FastIDSet prefs1 = dataModel.getItemIDsFromUser(itemID1);
>
> FastIDSet prefs2 = dataModel.getItemIDsFromUser(itemID2);
>
>
> long prefs1Size = prefs1.size();
>
> long prefs2Size = prefs2.size();
>
> long intersectionSize =
>
> prefs1Size < prefs2Size ? prefs2.intersectionSize(prefs1) :
> prefs1.intersectionSize(prefs2);
>
> if (intersectionSize == 0) {
>
> return Double.NaN;
>
> }
>
> long numItems = dataModel.getNumItems();
>
> double logLikelihood =
>
> LogLikelihood.logLikelihoodRatio(intersectionSize,
>
> prefs2Size - intersectionSize,
>
> prefs1Size - intersectionSize,
>
> numItems - prefs1Size - prefs2Size + intersectionSize);
>
> // merging the distance and the loglikelihood similarity
>
> return ExperimentParams.LOGLIKELIHOOD_WEIGHT*(1.0 - 1.0 / (1.0 +
> logLikelihood)) + (ExperimentParams.PROXIMITY_WEIGHT * proximity);
>
>
> }
>
>
> Please let me know if this is clearer now.
>
> Thanks very much,
>
> -Ahmed
>
>
> On Thu, Mar 22, 2012 at 5:26 PM, Sean Owen <[email protected]> wrote:
>>
>> What do you mean that you have a user-item association from a
>> log-likelihood metric?
>>
>> Combining two values is easy in the sense that you can average them or
>> something, but only if they are in the same "units". Log likelihood
>> may be viewed as a probability. The distance function you derive from
>> it -- and your own TFIDF distance -- it's not clear if these are
>> comparable.
>>
>> Rather than get into this, I wonder whether you need any of this at
>> all, since I'm not sure what the user-item value is to begin with.
>> That's your output, not an input.
>>
>> On Thu, Mar 22, 2012 at 9:18 PM, Ahmed Abdeen Hamed
>> <[email protected]> wrote:
>> > Hello,
>> >
>> > I developed a recommender that computes the distance between two items
>> > based on contents. However, I also need to include the association
>> > between
>> > the user-item. But, when I do that, I end up having a similarity score
>> > from
>> > the item-item content based and also another similarity score based on
>> > the
>> > item-user association (loglikelihood). I am now designing some
>> > experiments
>> > to consider different weights for each approach before I add them
>> > together.
>> > Here is the mathematical model what I have in mind:
>> >
>> > LOGLIKELIHOOD_WEIGHT*(1.0 - 1.0 / (1.0 + logLikelihood)) +
>> > (CONTENT_WEIGHT* content-proximity) such that
>> >
>> > [1] LOGLIKELIHOOD_WEIGHT (weight between 0, 1 e.g., 0.6)
>> >
>> > [2] CONTENT_WEIGHT (weight between 0, 1 e.g., 0.4)
>> >
>> > [3] logLikelihood is a variable that gets populated by a logLikelihood
>> > similarity metric based on the user-item association
>> >
>> > [4] content-proximity is variable that gets populated by
>> > a contents-based similarity algorithm (TFIDF).
>> >
>> > My question now is: Does this mathematical model make sense? Can we add
>> > the
>> > two different scores even though they are from two different
>> > distributions
>> > the way I did above or the outcome will be skewed?
>> >
>> > Please let me know if you have an answer for me.
>> >
>> > Thanks very much,
>> >
>> > -Ahmed
>
>

Re: Merging similarities from two different approaches

Reply via email to