Hello, I developed a recommender that computes the distance between two items based on contents. However, I also need to include the association between the user-item. But, when I do that, I end up having a similarity score from the item-item content based and also another similarity score based on the item-user association (loglikelihood). I am now designing some experiments to consider different weights for each approach before I add them together. Here is the mathematical model what I have in mind:
LOGLIKELIHOOD_WEIGHT*(1.0 - 1.0 / (1.0 + logLikelihood)) + (CONTENT_WEIGHT* content-proximity) such that [1] LOGLIKELIHOOD_WEIGHT (weight between 0, 1 e.g., 0.6) [2] CONTENT_WEIGHT (weight between 0, 1 e.g., 0.4) [3] logLikelihood is a variable that gets populated by a logLikelihood similarity metric based on the user-item association [4] content-proximity is variable that gets populated by a contents-based similarity algorithm (TFIDF). My question now is: Does this mathematical model make sense? Can we add the two different scores even though they are from two different distributions the way I did above or the outcome will be skewed? Please let me know if you have an answer for me. Thanks very much, -Ahmed
