Thanks to you again Ted. These are some great suggestions for helping out us newbies.
On Mon, Aug 27, 2012 at 11:28 PM, Ted Dunning <[email protected]> wrote: > In another forum, I responded to this question this way: > > One short answer is that you only need enough test data to drive the > > accuracy of your PR estimates to the point you need them. That isn't all > > that much data so the sequential version should do rather well. > > The gold standard, of course, is actual user behavior. Especially when > you > > are starting out, views are going to be entirely driven by your other > > discovery mechanisms such as search. This means that maximizing recall > > precision is going to drive your recommender to replicate your current > > discovery patterns which isn't really what you want. > > Regarding your use of raw views, you will have problems if your videos > > have lots of misleading meta-data since users will click on things that > > they don't really want to watch. This is a key user satisfaction issue, > of > > course. > > You should also consider dithering in your system for lots of reasons. > > Also, make sure you have alternative discovery mechanisms. A "recently > > added" page is really helpful for this. > > > And then added this about dithering: > > All clicks are implicit data and you can use boolean methods on any or all > > of them. Nothing in these kinds of data prevents you from using LLR > methods > > or matrix factorization methods. > > For dithering, what I do is set a synthetic score that looks like > > exp(-rank). Then I add random noise to this that is exponentially > > distributed (aka -log(random()) ). I scale the noise as small as I would > > like. This method means that the top items generally mix with just the > top > > and deeper items mix with much deeper items. > > You can experiment with this using the following R commands (with sample > > output): > > > > > > *order(-exp(-(0:99)/4) + rexp(100, rate=10))** * > > [1] 2 1 4 3 6 8 5 10 7 12 29 11 26 21 70 86 79 52 > > [19] 14 68 17 83 44 72 30 89 35 34 84 39 74 100 73 87 78 56 > > [37] 15 66 46 40 9 95 96 67 16 49 80 90 53 32 27 48 37 76 > > [55] 77 91 88 62 98 51 19 50 93 99 23 28 65 33 25 54 71 97 > > [73] 43 57 18 92 94 45 22 38 81 75 85 13 20 82 41 42 58 64 > > [91] 60 59 61 69 47 55 31 24 36 63 > > > *order(-exp(-(0:99)/4) + rexp(100, rate=10))** * > > [1] 1 2 3 4 5 6 9 12 7 10 15 23 78 72 16 60 95 68 > > [19] 24 65 90 94 55 22 40 21 17 47 39 71 59 66 79 88 97 56 > > [37] 26 99 74 41 44 45 50 70 49 75 62 31 84 51 11 33 91 19 > > [55] 61 28 77 18 52 54 48 43 87 25 35 38 30 73 27 89 53 8 > > [73] 82 83 93 57 13 36 69 29 98 63 76 85 64 37 96 46 81 67 > > [91] 92 20 80 42 58 34 86 32 14 100 > > > > > > > > As you can see, the top items stay near the top, but mixing down deeper is > > quite strong. > > > > You can use uniform noise to get kind of a different effect. >
