Regarding overfitting, don't forget dithering. That can be the most
important single step you take in building a good recommender.

Dithering can be inversely proportional to amount of exposures so far if
you like to give novel items more exposure.

This doesn't have to be very fancy. I have had very good results by
generating a long list of recommendations, computing a pseudo score based
on rank, adding a bit of noise and resorting. I also scanned down the list
and penalized items that showed insufficient diversity.  Then I resorted
again. Typically, the pseudo score was something like exp(-r) where r is

The noise scale is adjusted to leave a good proportion of originally
recommended items in the first page. It could have easily been scaled by
1/sqrt(exposures) to let the newbies move around more.

The parameters here should be adjusted a bit based on experiments, but a
heuristic first hack works pretty well as a start.

On Sun, Nov 12, 2017 at 10:34 PM, Pat Ferrel <> wrote:

> Part of what Ted is talking about can be seen in the carousels on Netflix
> or Amazon. Some are not recommendations like “trending” videos, or “new”
> videos, or “prime” videos (substitute your own promotions here). Nothing to
> do with recommender created items but presented along with
> recommender-based carousels. They are based on analytics or business rules
> and ideally have some randomness built in. The reason for this is 1) it
> works by exposing users to items that they would not see in recommendations
> and 2) it provides data to build the recommender model from.
> A recommender cannot work in an app that has no non-recommended items
> displayed or there will be no un-biased data to create recommendations
> from. This would lead to crippling overfitting. Most apps have placements
> like the ones mentioned above and also have search and browse. However you
> do it, it must be prominent and aways available. The moral of this
> paragraph is; don’t try to make everything a recommendation, it will be
> self-defeating. In fact make sure not every video watch comes from a
> recommendation.
> Likewise think of placements (reflecting a particular recommender use) as
> experimentation grounds. Try things like finding a recommended category and
> then recommending items in that category all based on user behavior. Or try
> a placement based on a single thing a user watched like “because you
> watched xyz you might like these”. Don’t just show the most popular
> categories for the user and recommend items in them. This would be a type
> of overfitting too.
> I’m sure we have strayed far from your original question but maybe it’s
> covered somewhere in here.
> On Nov 12, 2017, at 12:11 PM, Johannes Schulte <>
> wrote:
> I did "second order" recommendations before but more to fight sparsity and
> find more significant associations in situations with less traffic, so
> recommending categories instead of products. There needs to be some third
> order sorting / boosting like you mentioned with "new music", or maybe
> popularity or hotness to avoid quasi-random order. For events with limited
> lifetime it's probably some mixture of spatial distance and freshness.
> We will definetely keep an eye on the generation process of data for new
> items. It depends on the domain but in the time of multi channel promotion
> of videos, shows and products, it's also helps that there is traffic driven
> from external sources.
> Thanks for the detailed  hints - now it's time to see what comes out of
> this.
> Johannes
> On Sun, Nov 12, 2017 at 7:52 AM, Ted Dunning <>
> wrote:
> > Events have the natural good quality that having a cold start means that
> > you will naturally favor recent interactions simply because there won't
> be
> > any old interactions to deal with.
> >
> > Unfortunately, that also means that you will likely be facing serious
> cold
> > start issues all the time. I have used two strategies to deal with cold
> > starts, both fairly successfully.
> >
> > *Method 1: Second order recommendation*
> >
> > For novel items with no history, you typically do have some kind of
> > information about the content. For an event, you may know the performer,
> > the organizer, the venue, possibly something about the content of the
> event
> > as well (especially for a tour event). As such, you can build a
> recommender
> > that recommends this secondary information and then do a search with
> > recommended secondary information to find events. This actually works
> > pretty well, at least for the domains where I have used (music and
> videos).
> > For instance, in music, you can easily recommend a new album based on the
> > artist (s) and track list.
> >
> > The trick here is to determine when and how to blend in normal
> > recommendations. One way is query blending where you combine the second
> > order query with a normal recommendation query, but I think that a fair
> bit
> > of experimentation is warranted here.
> >
> > *Method 2: What's new and what's trending*
> >
> > It is always important to provide alternative avenues of information
> > gathering for recommendation. Especially for the user generated video
> case,
> > there was pretty high interest in the "What's new" and "What's hot"
> pages.
> > If you do a decent job of dithering here, you keep reasonably good
> content
> > on the what's new page longer than content that doesn't pull. That
> > maintains interest in the page. Similarly, you can have a bit of a lower
> > bar for new content to be classified as hot than established content.
> That
> > way you keep the page fresh (because new stuff appears transiently), but
> > you also have a fair bit of really good stuff as well. If done well,
> these
> > pages will provide enough interactions with new items so that they don't
> > start entirely cold. You may need to have genre specific or location
> > specific versions of these pages to avoid interesting content being
> > overwhelmed. You might also be able to spot content that has intense
> > interest from a sub-population as opposed to diffuse interest from a mass
> > population.
> >
> > You can also use novelty and trending boosts for content in the normal
> > recommendation engine. I have avoided this in the past because I felt it
> > was better to have specialized pages for what's new and hot rather than
> > because I had data saying it was bad to do. I have put a very weak
> > recommendation effect on the what's hot pages so that people tend to see
> > trending material that they like. That doesn't help on what's new pages
> for
> > obvious reasons unless you use a touch of second order recommendation.
> >
> >
> >
> >
> >
> > On Sat, Nov 11, 2017 at 11:00 PM, Johannes Schulte <
> >> wrote:
> >
> >> Well the greece thing was just an example for a thing you don't know
> >> upfront - it could be any of the modeled feature on the cross
> recommender
> >> input side (user segment, country, city, previous buys), some
> > subpopulation
> >> getting active, so the current approach, probably with sampling that
> >> favours newer events, will be the best here. Luckily a sampling strategy
> > is
> >> a big topic anyway since we're trying to go for the near real time way -
> >> pat, you talked about it some while ago on this list and i still have to
> >> look at the flink talk from trevor grant but I'm really eager to attack
> >> this after years of batch :)
> >>
> >> Thanks for your thoughts, I am happy I can rule something out given the
> >> domain (poisson llr). Luckily the domain I'm working on is event
> >> recommendations, so there is a natural deterministic item expiry (as
> >> compared to christmas like stuff).
> >>
> >> Again,
> >> thanks!
> >>
> >>
> >> On Sat, Nov 11, 2017 at 7:00 PM, Ted Dunning <>
> >> wrote:
> >>
> >>> Inline.
> >>>
> >>> On Sat, Nov 11, 2017 at 6:31 PM, Pat Ferrel <>
> >> wrote:
> >>>
> >>>> If Mahout were to use it would tend to
> > favor
> >>>> new events in calculating the LLR score for later use in the
> > threshold
> >>> for
> >>>> whether a co or cross-occurrence iss incorporated in the model.
> >>>
> >>>
> >>> I don't think that this would actually help for most recommendation
> >>> purposes.
> >>>
> >>> It might help to determine that some item or other has broken out of
> >>> historical rates. Thus, we might have "hotness" as a detected feature
> >> that
> >>> could be used as a boost at recommendation time. We might also have
> > "not
> >>> hotness" as a negative boost feature.
> >>>
> >>> Since we have a pretty good handle on the "other" counts, I don't think
> >>> that the Poisson test would help much with the cooccurrence stuff
> > itself.
> >>>
> >>> Changing the sampling rule could make a difference to temporality and
> >> would
> >>> be more like what Johannes is asking about.
> >>>
> >>>
> >>>> But it doesn’t relate to popularity as I think Ted is saying.
> >>>>
> >>>> Are you looking for 1) personal recommendations biased by hotness in
> >>>> Greece or 2) things hot in Greece?
> >>>>
> >>>> 1) create a secondary indicator for “watched in some locale” the
> >> local-id
> >>>> uses a country-code+postal-code maybe but not lat-lon. Something that
> >>>> includes a good number of people/events. The the query would be
> >> user-id,
> >>>> and user-locale. This would yield personal recs preferred in the
> > user’s
> >>>> locale. Athens-west-side in this case.
> >>>>
> >>>
> >>> And this works in the current regime. Simply add location tags to the
> >> user
> >>> histories and do cooccurrence against content. Locations will pop out
> > as
> >>> indicators for some content and not for others. Then when somebody
> >> appears
> >>> in some location, their tags will retrieve localized content.
> >>>
> >>> For localization based on strict geography, say for restaurant search,
> > we
> >>> can just add business rules based on geo-search. A very large bank
> >> customer
> >>> of ours does that, for instance.
> >>>
> >>>
> >>>> 2) split the data into locales and do the hot calc I mention. The
> > query
> >>>> would have no user-id since it is not personalized but would yield
> > “hot
> >>> in
> >>>> Greece”
> >>>>
> >>>
> >>> I think that this is a good approach.
> >>>
> >>>
> >>>>
> >>>> Ted’s “Christmas video” tag is what I was calling a business rule and
> >> can
> >>>> be added to either of the above techniques.
> >>>>
> >>>
> >>> But the (not) hotness feature might help with automated this.
> >>>
> >>>
> >>>
> >>>
> >>>>
> >>>> On Nov 11, 2017, at 4:01 AM, Ted Dunning <>
> >> wrote:
> >>>>
> >>>> So ... there are a few different threads here.
> >>>>
> >>>> 1) LLR but with time. Quite possible, but not really what Johannes is
> >>>> talking about, I think. See for a quick
> >>>> discussion.
> >>>>
> >>>> 2) time varying recommendation. As Johannes notes, this can make use
> > of
> >>>> windowed counts. The problem is that rarely accessed items should
> >>> probably
> >>>> have longer windows so that we use longer term trends when we have
> > less
> >>>> data.
> >>>>
> >>>> The good news here is that this some part of this is nearly already
> > in
> >>> the
> >>>> code. The trick is that the down-sampling used in the system can be
> >>> adapted
> >>>> to favor recent events over older ones. That means that if the
> > meaning
> >> of
> >>>> something changes over time, the system will catch on. Likewise, if
> >>>> something appears out of nowhere, it will quickly train up. This
> >> handles
> >>>> the popular in Greece right now problem.
> >>>>
> >>>> But this isn't the whole story of changing recommendations. Another
> >>> problem
> >>>> that we commonly face is what I call the christmas music issue. The
> >> idea
> >>> is
> >>>> that there are lots of recommendations for music that are highly
> >>> seasonal.
> >>>> Thus, Bing Crosby fans want to hear White Christmas
> >>>> <> until the day after
> >>>> christmas
> >>>> at which point this becomes a really bad recommendation. To some
> >> degree,
> >>>> this can be partially dealt with by using temporal tags as
> > indicators,
> >>> but
> >>>> that doesn't really allow a recommendation to be completely shut
> > down.
> >>>>
> >>>> The only way that I have seen to deal with this in the past is with a
> >>>> manually designed kill switch. As much as possible, we would tag the
> >>>> obviously seasonal content and then add a filter to kill or downgrade
> >>> that
> >>>> content the moment it went out of fashion.
> >>>>
> >>>>
> >>>>
> >>>> On Sat, Nov 11, 2017 at 9:43 AM, Johannes Schulte <
> >>>>> wrote:
> >>>>
> >>>>> Pat, thanks for your help. especially the insights on how you
> > handle
> >>> the
> >>>>> system in production and the tips for multiple acyclic buckets.
> >>>>> Doing the combination signalls when querying sounds okay but as you
> >>> say,
> >>>>> it's always hard to find the right boosts without setting up some
> > ltr
> >>>>> system. If there would be a way to use the hotness when calculating
> >> the
> >>>>> indicators for subpopulations it would be great., especially for a
> >>> cross
> >>>>> recommender.
> >>>>>
> >>>>> e.g. people in greece _now_ are viewing this show/product  whatever
> >>>>>
> >>>>> And here the popularity of the recommended item in this
> > subpopulation
> >>>> could
> >>>>> be overrseen when just looking at the overall derivatives of
> >> activity.
> >>>>>
> >>>>> Maybe one could do multiple G-Tests using sliding windows
> >>>>> * itemA&itemB  vs population (classic)
> >>>>> * itemA&itemB(t) vs itemA&itemB(t-1)
> >>>>> ..
> >>>>>
> >>>>> and derive multiple indicators per item to be indexed.
> >>>>>
> >>>>> But this all relies on discretizing time into buckets and not
> > looking
> >>> at
> >>>>> the distribution of time between events like in presentation above
> > -
> >>>> maybe
> >>>>> there is  something way smarter
> >>>>>
> >>>>> Johannes
> >>>>>
> >>>>> On Sat, Nov 11, 2017 at 2:50 AM, Pat Ferrel <
> >>
> >>>> wrote:
> >>>>>
> >>>>>> BTW you should take time buckets that are relatively free of daily
> >>>> cycles
> >>>>>> like 3 day, week, or month buckets for “hot”. This is to remove
> >>> cyclical
> >>>>>> affects from the frequencies as much as possible since you need 3
> >>>> buckets
> >>>>>> to see the change in change, 2 for the change, and 1 for the event
> >>>>> volume.
> >>>>>>
> >>>>>>
> >>>>>> On Nov 10, 2017, at 4:12 PM, Pat Ferrel <>
> >>> wrote:
> >>>>>>
> >>>>>> So your idea is to find anomalies in event frequencies to detect
> >> “hot”
> >>>>>> items?
> >>>>>>
> >>>>>> Interesting, maybe Ted will chime in.
> >>>>>>
> >>>>>> What I do is take the frequency, first, and second, derivatives as
> >>>>>> measures of popularity, increasing popularity, and increasingly
> >>>>> increasing
> >>>>>> popularity. Put another way popular, trending, and hot. This is
> >> simple
> >>>> to
> >>>>>> do by taking 1, 2, or 3 time buckets and looking at the number of
> >>>> events,
> >>>>>> derivative (difference), and second derivative. Ranking all items
> > by
> >>>>> these
> >>>>>> value gives various measures of popularity or its increase.
> >>>>>>
> >>>>>> If your use is in a recommender you can add a ranking field to all
> >>> items
> >>>>>> and query for “hot” by using the ranking you calculated.
> >>>>>>
> >>>>>> If you want to bias recommendations by hotness, query with user
> >>> history
> >>>>>> and boost by your hot field. I suspect the hot field will tend to
> >>>>> overwhelm
> >>>>>> your user history in this case as it would if you used anomalies
> > so
> >>>> you’d
> >>>>>> also have to normalize the hotness to some range closer to the one
> >>>>> created
> >>>>>> by the user history matching score. I haven’t found a vey good way
> >> to
> >>>> mix
> >>>>>> these in a model so use hot as a method of backfill if you cannot
> >>> return
> >>>>>> enough recommendations or in places where you may want to show
> > just
> >>> hot
> >>>>>> items. There are several benefits to this method of using hot to
> >> rank
> >>>> all
> >>>>>> items including the fact that you can apply business rules to them
> >>> just
> >>>>> as
> >>>>>> normal recommendations—so you can ask for hot in “electronics” if
> >> you
> >>>>> know
> >>>>>> categories, or hot "in-stock" items, or ...
> >>>>>>
> >>>>>> Still anomaly detection does sound like an interesting approach.
> >>>>>>
> >>>>>>
> >>>>>> On Nov 10, 2017, at 3:13 PM, Johannes Schulte <
> >>>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hi "all",
> >>>>>>
> >>>>>> I am wondering what would be the best way to incorporate event
> > time
> >>>>>> information into the calculation of the G-Test.
> >>>>>>
> >>>>>> There is a claim here
> >>>>>>
> >>>>>>
> >>>>>> saying "Time aware variant of G-Test is possible"
> >>>>>>
> >>>>>> I remember i experimented with exponentially decayed counts some
> >> years
> >>>>> ago
> >>>>>> and this involved changing the counts to doubles, but I suspect
> >> there
> >>> is
> >>>>>> some smarter way. What I don't get is the relation to a data
> >> structure
> >>>>> like
> >>>>>> T-Digest when working with a lot of counts / cells for every
> >>> combination
> >>>>> of
> >>>>>> items. Keeping a t-digest for every combination seems unfeasible.
> >>>>>>
> >>>>>> How would one incorporate event time into recommendations to
> > detect
> >>>>>> "hotness" of certain relations? Glad if someone has an idea...
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> Johannes
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >

Reply via email to