A velocity measure of sorts, makes a lot of sense for a “what’s hot” list.
The particular thing I’m looking at now is how to rank a list of items by some measure of popularity when you don’t have a velocity. There is an introduction date though so another way to look at popularity might be to decay it with something like e^-t where t is it’s age. You can see the decay in the views histogram. On Feb 6, 2014, at 4:35 PM, Ted Dunning <[email protected]> wrote: Rising popularity is often a better match to what people want to see on a "most popular" page. The best measure for that in my experience is log (new_count + offset) / (old_count + offset) where new and old counts are the number of views during the periods in question and offset is used partly to avoid log(0) or x/0 problems, but also to give a Bayesian grounding to the measure. On Thu, Feb 6, 2014 at 5:33 PM, Sean Owen <[email protected]> wrote: > Agree - I thought by asking for most popular you meant to look for apple > pie. > > Agree with you and Ted that the sum of similarity says something > interesting even if it is not popularity exactly. > On Feb 6, 2014 11:16 AM, "Pat Ferrel" <[email protected]> wrote: > >> The problem with the usual preference count is that big hit items can be >> overwhelmingly popular. If you want to know which ones the most people > saw >> and are likely to have an opinion about then this seems a good measure. > But >> these hugely popular items may not differentiate taste. >> >> So we calculate the “important” taste indicators with LLR. The benefit of >> the similarity matrix is that it attempts to model the “important” >> cooccurrences. >> >> There is an affect of hugely popular items where they really say nothing >> about similarity of taste. Everyone likes motherhood and Apple pie so it >> doesn’t say much about us if we both do to. This is usually accounted for >> with something like TFIDF so I suppose another weighted popularity > measure >> would be to run the preference matrix through TFIDF to de-weight >> non-differentiating preferences. >> >> On Feb 6, 2014, at 7:14 AM, Ted Dunning <[email protected]> wrote: >> >> If you look at the indicator matrix (cooccurrence reduced by LLR), you > will >> usually have asymmetry due to limitations on the number of indicators per >> row. >> >> This will give you some interesting results when you look at the column >> sums. I wouldn't call it popularity, but it is an interesting measure. >> >> >> >> On Thu, Feb 6, 2014 at 2:15 PM, Sean Owen <[email protected]> wrote: >> >>> I have always defined popularity as just the number of ratings/prefs, >>> yes. You could rank on some kind of 'net promoter score' -- good >>> ratings minus bad ratings -- though that becomes more like 'most >>> liked'. >>> >>> How do you get popularity from similarity -- similarity to what? >>> Ranking by sum of similarities seems more like a measure of how much >>> the item is the 'centroid' of all items. Not necessarily most popular >>> but 'least eccentric'. >>> >>> >>> On Thu, Feb 6, 2014 at 7:41 AM, Tevfik Aytekin < > [email protected] >>> >>> wrote: >>>> Well, I think what you are suggesting is to define popularity as being >>>> similar to other items. So in this way most popular items will be >>>> those which are most similar to all other items, like the centroids in >>>> K-means. >>>> >>>> I would first check the correlation between this definition and the >>>> standard one (that is, the definition of popularity as having the >>>> highest number of ratings). But my intuition is that they are >>>> different things. For example. an item might lie at the center in the >>>> similarity space but it might not be a popular item. However, there >>>> might still be some correlation, it would be interesting to check it. >>>> >>>> hope it helps >>>> >>>> >>>> >>>> >>>> On Wed, Feb 5, 2014 at 3:27 AM, Pat Ferrel <[email protected]> >>> wrote: >>>>> Trying to come up with a relative measure of popularity for items in > a >>> recommender. Something that could be used to rank items. >>>>> >>>>> The user - item preference matrix would be the obvious thought. Just >>> add the number of preferences per item. Maybe transpose the preference >>> matrix (the temp DRM created by the recommender), then for each row >> vector >>> (now that a row = item) grab the number of non zero preferences. This >>> corresponds to the number of preferences, and would give one measure of >>> popularity. In the case where the items are not boolean you'd sum the >>> weights. >>>>> >>>>> However it might be a better idea to look at the item-item similarity >>> matrix. It doesn't need to be transposed and contains the "important" >>> similarities--as calculated by LLR for example. Here similarity means >>> similarity in which users preferred an item. So summing the non-zero >>> weights would give perhaps an even better relative "popularity" > measure. >>> For the same reason clustering the similarity matrix would yield >>> "important" clusters. >>>>> >>>>> Anyone have intuition about this? >>>>> >>>>> I started to think about this because transposing the user-item > matrix >>> seems to yield a fromat that cannot be sent directly into clustering. >>> >> >> >
