Regarding overfitting, don't forget dithering. That can be the most important single step you take in building a good recommender.
Dithering can be inversely proportional to amount of exposures so far if you like to give novel items more exposure. This doesn't have to be very fancy. I have had very good results by generating a long list of recommendations, computing a pseudo score based on rank, adding a bit of noise and resorting. I also scanned down the list and penalized items that showed insufficient diversity. Then I resorted again. Typically, the pseudo score was something like exp(-r) where r is rank. The noise scale is adjusted to leave a good proportion of originally recommended items in the first page. It could have easily been scaled by 1/sqrt(exposures) to let the newbies move around more. The parameters here should be adjusted a bit based on experiments, but a heuristic first hack works pretty well as a start. On Sun, Nov 12, 2017 at 10:34 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > Part of what Ted is talking about can be seen in the carousels on Netflix > or Amazon. Some are not recommendations like “trending” videos, or “new” > videos, or “prime” videos (substitute your own promotions here). Nothing to > do with recommender created items but presented along with > recommender-based carousels. They are based on analytics or business rules > and ideally have some randomness built in. The reason for this is 1) it > works by exposing users to items that they would not see in recommendations > and 2) it provides data to build the recommender model from. > > A recommender cannot work in an app that has no non-recommended items > displayed or there will be no un-biased data to create recommendations > from. This would lead to crippling overfitting. Most apps have placements > like the ones mentioned above and also have search and browse. However you > do it, it must be prominent and aways available. The moral of this > paragraph is; don’t try to make everything a recommendation, it will be > self-defeating. In fact make sure not every video watch comes from a > recommendation. > > Likewise think of placements (reflecting a particular recommender use) as > experimentation grounds. Try things like finding a recommended category and > then recommending items in that category all based on user behavior. Or try > a placement based on a single thing a user watched like “because you > watched xyz you might like these”. Don’t just show the most popular > categories for the user and recommend items in them. This would be a type > of overfitting too. > > I’m sure we have strayed far from your original question but maybe it’s > covered somewhere in here. > > > On Nov 12, 2017, at 12:11 PM, Johannes Schulte <johannes.schu...@gmail.com> > wrote: > > I did "second order" recommendations before but more to fight sparsity and > find more significant associations in situations with less traffic, so > recommending categories instead of products. There needs to be some third > order sorting / boosting like you mentioned with "new music", or maybe > popularity or hotness to avoid quasi-random order. For events with limited > lifetime it's probably some mixture of spatial distance and freshness. > > We will definetely keep an eye on the generation process of data for new > items. It depends on the domain but in the time of multi channel promotion > of videos, shows and products, it's also helps that there is traffic driven > from external sources. > > Thanks for the detailed hints - now it's time to see what comes out of > this. > > Johannes > > On Sun, Nov 12, 2017 at 7:52 AM, Ted Dunning <ted.dunn...@gmail.com> > wrote: > > > Events have the natural good quality that having a cold start means that > > you will naturally favor recent interactions simply because there won't > be > > any old interactions to deal with. > > > > Unfortunately, that also means that you will likely be facing serious > cold > > start issues all the time. I have used two strategies to deal with cold > > starts, both fairly successfully. > > > > *Method 1: Second order recommendation* > > > > For novel items with no history, you typically do have some kind of > > information about the content. For an event, you may know the performer, > > the organizer, the venue, possibly something about the content of the > event > > as well (especially for a tour event). As such, you can build a > recommender > > that recommends this secondary information and then do a search with > > recommended secondary information to find events. This actually works > > pretty well, at least for the domains where I have used (music and > videos). > > For instance, in music, you can easily recommend a new album based on the > > artist (s) and track list. > > > > The trick here is to determine when and how to blend in normal > > recommendations. One way is query blending where you combine the second > > order query with a normal recommendation query, but I think that a fair > bit > > of experimentation is warranted here. > > > > *Method 2: What's new and what's trending* > > > > It is always important to provide alternative avenues of information > > gathering for recommendation. Especially for the user generated video > case, > > there was pretty high interest in the "What's new" and "What's hot" > pages. > > If you do a decent job of dithering here, you keep reasonably good > content > > on the what's new page longer than content that doesn't pull. That > > maintains interest in the page. Similarly, you can have a bit of a lower > > bar for new content to be classified as hot than established content. > That > > way you keep the page fresh (because new stuff appears transiently), but > > you also have a fair bit of really good stuff as well. If done well, > these > > pages will provide enough interactions with new items so that they don't > > start entirely cold. You may need to have genre specific or location > > specific versions of these pages to avoid interesting content being > > overwhelmed. You might also be able to spot content that has intense > > interest from a sub-population as opposed to diffuse interest from a mass > > population. > > > > You can also use novelty and trending boosts for content in the normal > > recommendation engine. I have avoided this in the past because I felt it > > was better to have specialized pages for what's new and hot rather than > > because I had data saying it was bad to do. I have put a very weak > > recommendation effect on the what's hot pages so that people tend to see > > trending material that they like. That doesn't help on what's new pages > for > > obvious reasons unless you use a touch of second order recommendation. > > > > > > > > > > > > On Sat, Nov 11, 2017 at 11:00 PM, Johannes Schulte < > > johannes.schu...@gmail.com> wrote: > > > >> Well the greece thing was just an example for a thing you don't know > >> upfront - it could be any of the modeled feature on the cross > recommender > >> input side (user segment, country, city, previous buys), some > > subpopulation > >> getting active, so the current approach, probably with sampling that > >> favours newer events, will be the best here. Luckily a sampling strategy > > is > >> a big topic anyway since we're trying to go for the near real time way - > >> pat, you talked about it some while ago on this list and i still have to > >> look at the flink talk from trevor grant but I'm really eager to attack > >> this after years of batch :) > >> > >> Thanks for your thoughts, I am happy I can rule something out given the > >> domain (poisson llr). Luckily the domain I'm working on is event > >> recommendations, so there is a natural deterministic item expiry (as > >> compared to christmas like stuff). > >> > >> Again, > >> thanks! > >> > >> > >> On Sat, Nov 11, 2017 at 7:00 PM, Ted Dunning <ted.dunn...@gmail.com> > >> wrote: > >> > >>> Inline. > >>> > >>> On Sat, Nov 11, 2017 at 6:31 PM, Pat Ferrel <p...@occamsmachete.com> > >> wrote: > >>> > >>>> If Mahout were to use http://bit.ly/poisson-llr it would tend to > > favor > >>>> new events in calculating the LLR score for later use in the > > threshold > >>> for > >>>> whether a co or cross-occurrence iss incorporated in the model. > >>> > >>> > >>> I don't think that this would actually help for most recommendation > >>> purposes. > >>> > >>> It might help to determine that some item or other has broken out of > >>> historical rates. Thus, we might have "hotness" as a detected feature > >> that > >>> could be used as a boost at recommendation time. We might also have > > "not > >>> hotness" as a negative boost feature. > >>> > >>> Since we have a pretty good handle on the "other" counts, I don't think > >>> that the Poisson test would help much with the cooccurrence stuff > > itself. > >>> > >>> Changing the sampling rule could make a difference to temporality and > >> would > >>> be more like what Johannes is asking about. > >>> > >>> > >>>> But it doesn’t relate to popularity as I think Ted is saying. > >>>> > >>>> Are you looking for 1) personal recommendations biased by hotness in > >>>> Greece or 2) things hot in Greece? > >>>> > >>>> 1) create a secondary indicator for “watched in some locale” the > >> local-id > >>>> uses a country-code+postal-code maybe but not lat-lon. Something that > >>>> includes a good number of people/events. The the query would be > >> user-id, > >>>> and user-locale. This would yield personal recs preferred in the > > user’s > >>>> locale. Athens-west-side in this case. > >>>> > >>> > >>> And this works in the current regime. Simply add location tags to the > >> user > >>> histories and do cooccurrence against content. Locations will pop out > > as > >>> indicators for some content and not for others. Then when somebody > >> appears > >>> in some location, their tags will retrieve localized content. > >>> > >>> For localization based on strict geography, say for restaurant search, > > we > >>> can just add business rules based on geo-search. A very large bank > >> customer > >>> of ours does that, for instance. > >>> > >>> > >>>> 2) split the data into locales and do the hot calc I mention. The > > query > >>>> would have no user-id since it is not personalized but would yield > > “hot > >>> in > >>>> Greece” > >>>> > >>> > >>> I think that this is a good approach. > >>> > >>> > >>>> > >>>> Ted’s “Christmas video” tag is what I was calling a business rule and > >> can > >>>> be added to either of the above techniques. > >>>> > >>> > >>> But the (not) hotness feature might help with automated this. > >>> > >>> > >>> > >>> > >>>> > >>>> On Nov 11, 2017, at 4:01 AM, Ted Dunning <ted.dunn...@gmail.com> > >> wrote: > >>>> > >>>> So ... there are a few different threads here. > >>>> > >>>> 1) LLR but with time. Quite possible, but not really what Johannes is > >>>> talking about, I think. See http://bit.ly/poisson-llr for a quick > >>>> discussion. > >>>> > >>>> 2) time varying recommendation. As Johannes notes, this can make use > > of > >>>> windowed counts. The problem is that rarely accessed items should > >>> probably > >>>> have longer windows so that we use longer term trends when we have > > less > >>>> data. > >>>> > >>>> The good news here is that this some part of this is nearly already > > in > >>> the > >>>> code. The trick is that the down-sampling used in the system can be > >>> adapted > >>>> to favor recent events over older ones. That means that if the > > meaning > >> of > >>>> something changes over time, the system will catch on. Likewise, if > >>>> something appears out of nowhere, it will quickly train up. This > >> handles > >>>> the popular in Greece right now problem. > >>>> > >>>> But this isn't the whole story of changing recommendations. Another > >>> problem > >>>> that we commonly face is what I call the christmas music issue. The > >> idea > >>> is > >>>> that there are lots of recommendations for music that are highly > >>> seasonal. > >>>> Thus, Bing Crosby fans want to hear White Christmas > >>>> <https://www.youtube.com/watch?v=P8Ozdqzjigg> until the day after > >>>> christmas > >>>> at which point this becomes a really bad recommendation. To some > >> degree, > >>>> this can be partially dealt with by using temporal tags as > > indicators, > >>> but > >>>> that doesn't really allow a recommendation to be completely shut > > down. > >>>> > >>>> The only way that I have seen to deal with this in the past is with a > >>>> manually designed kill switch. As much as possible, we would tag the > >>>> obviously seasonal content and then add a filter to kill or downgrade > >>> that > >>>> content the moment it went out of fashion. > >>>> > >>>> > >>>> > >>>> On Sat, Nov 11, 2017 at 9:43 AM, Johannes Schulte < > >>>> johannes.schu...@gmail.com> wrote: > >>>> > >>>>> Pat, thanks for your help. especially the insights on how you > > handle > >>> the > >>>>> system in production and the tips for multiple acyclic buckets. > >>>>> Doing the combination signalls when querying sounds okay but as you > >>> say, > >>>>> it's always hard to find the right boosts without setting up some > > ltr > >>>>> system. If there would be a way to use the hotness when calculating > >> the > >>>>> indicators for subpopulations it would be great., especially for a > >>> cross > >>>>> recommender. > >>>>> > >>>>> e.g. people in greece _now_ are viewing this show/product whatever > >>>>> > >>>>> And here the popularity of the recommended item in this > > subpopulation > >>>> could > >>>>> be overrseen when just looking at the overall derivatives of > >> activity. > >>>>> > >>>>> Maybe one could do multiple G-Tests using sliding windows > >>>>> * itemA&itemB vs population (classic) > >>>>> * itemA&itemB(t) vs itemA&itemB(t-1) > >>>>> .. > >>>>> > >>>>> and derive multiple indicators per item to be indexed. > >>>>> > >>>>> But this all relies on discretizing time into buckets and not > > looking > >>> at > >>>>> the distribution of time between events like in presentation above > > - > >>>> maybe > >>>>> there is something way smarter > >>>>> > >>>>> Johannes > >>>>> > >>>>> On Sat, Nov 11, 2017 at 2:50 AM, Pat Ferrel <p...@occamsmachete.com > >> > >>>> wrote: > >>>>> > >>>>>> BTW you should take time buckets that are relatively free of daily > >>>> cycles > >>>>>> like 3 day, week, or month buckets for “hot”. This is to remove > >>> cyclical > >>>>>> affects from the frequencies as much as possible since you need 3 > >>>> buckets > >>>>>> to see the change in change, 2 for the change, and 1 for the event > >>>>> volume. > >>>>>> > >>>>>> > >>>>>> On Nov 10, 2017, at 4:12 PM, Pat Ferrel <p...@occamsmachete.com> > >>> wrote: > >>>>>> > >>>>>> So your idea is to find anomalies in event frequencies to detect > >> “hot” > >>>>>> items? > >>>>>> > >>>>>> Interesting, maybe Ted will chime in. > >>>>>> > >>>>>> What I do is take the frequency, first, and second, derivatives as > >>>>>> measures of popularity, increasing popularity, and increasingly > >>>>> increasing > >>>>>> popularity. Put another way popular, trending, and hot. This is > >> simple > >>>> to > >>>>>> do by taking 1, 2, or 3 time buckets and looking at the number of > >>>> events, > >>>>>> derivative (difference), and second derivative. Ranking all items > > by > >>>>> these > >>>>>> value gives various measures of popularity or its increase. > >>>>>> > >>>>>> If your use is in a recommender you can add a ranking field to all > >>> items > >>>>>> and query for “hot” by using the ranking you calculated. > >>>>>> > >>>>>> If you want to bias recommendations by hotness, query with user > >>> history > >>>>>> and boost by your hot field. I suspect the hot field will tend to > >>>>> overwhelm > >>>>>> your user history in this case as it would if you used anomalies > > so > >>>> you’d > >>>>>> also have to normalize the hotness to some range closer to the one > >>>>> created > >>>>>> by the user history matching score. I haven’t found a vey good way > >> to > >>>> mix > >>>>>> these in a model so use hot as a method of backfill if you cannot > >>> return > >>>>>> enough recommendations or in places where you may want to show > > just > >>> hot > >>>>>> items. There are several benefits to this method of using hot to > >> rank > >>>> all > >>>>>> items including the fact that you can apply business rules to them > >>> just > >>>>> as > >>>>>> normal recommendations—so you can ask for hot in “electronics” if > >> you > >>>>> know > >>>>>> categories, or hot "in-stock" items, or ... > >>>>>> > >>>>>> Still anomaly detection does sound like an interesting approach. > >>>>>> > >>>>>> > >>>>>> On Nov 10, 2017, at 3:13 PM, Johannes Schulte < > >>>>> johannes.schu...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>> Hi "all", > >>>>>> > >>>>>> I am wondering what would be the best way to incorporate event > > time > >>>>>> information into the calculation of the G-Test. > >>>>>> > >>>>>> There is a claim here > >>>>>> https://de.slideshare.net/tdunning/finding-changes-in-real-data > >>>>>> > >>>>>> saying "Time aware variant of G-Test is possible" > >>>>>> > >>>>>> I remember i experimented with exponentially decayed counts some > >> years > >>>>> ago > >>>>>> and this involved changing the counts to doubles, but I suspect > >> there > >>> is > >>>>>> some smarter way. What I don't get is the relation to a data > >> structure > >>>>> like > >>>>>> T-Digest when working with a lot of counts / cells for every > >>> combination > >>>>> of > >>>>>> items. Keeping a t-digest for every combination seems unfeasible. > >>>>>> > >>>>>> How would one incorporate event time into recommendations to > > detect > >>>>>> "hotness" of certain relations? Glad if someone has an idea... > >>>>>> > >>>>>> Cheers, > >>>>>> > >>>>>> Johannes > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>> > >> > > > >