In another forum, I responded to this question this way:

One short answer is that you only need enough test data to drive the
> accuracy of your PR estimates to the point you need them. That isn't all
> that much data so the sequential version should do rather well.
> The gold standard, of course, is actual user behavior. Especially when you
> are starting out, views are going to be entirely driven by your other
> discovery mechanisms such as search. This means that maximizing recall
> precision is going to drive your recommender to replicate your current
> discovery patterns which isn't really what you want.
> Regarding your use of raw views, you will have problems if your videos
> have lots of misleading meta-data since users will click on things that
> they don't really want to watch. This is a key user satisfaction issue, of
> course.
> You should also consider dithering in your system for lots of reasons.
> Also, make sure you have alternative discovery mechanisms. A "recently
> added" page is really helpful for this.


And then added this about dithering:

All clicks are implicit data and you can use boolean methods on any or all
> of them. Nothing in these kinds of data prevents you from using LLR methods
> or matrix factorization methods.
> For dithering, what I do is set a synthetic score that looks like
> exp(-rank). Then I add random noise to this that is exponentially
> distributed (aka -log(random()) ). I scale the noise as small as I would
> like. This method means that the top items generally mix with just the top
> and deeper items mix with much deeper items.
> You can experiment with this using the following R commands (with sample
> output):
>


> *order(-exp(-(0:99)/4) + rexp(100, rate=10))** *
> [1] 2 1 4 3 6 8 5 10 7 12 29 11 26 21 70 86 79 52
> [19] 14 68 17 83 44 72 30 89 35 34 84 39 74 100 73 87 78 56
> [37] 15 66 46 40 9 95 96 67 16 49 80 90 53 32 27 48 37 76
> [55] 77 91 88 62 98 51 19 50 93 99 23 28 65 33 25 54 71 97
> [73] 43 57 18 92 94 45 22 38 81 75 85 13 20 82 41 42 58 64
> [91] 60 59 61 69 47 55 31 24 36 63
> > *order(-exp(-(0:99)/4) + rexp(100, rate=10))** *
> [1] 1 2 3 4 5 6 9 12 7 10 15 23 78 72 16 60 95 68
> [19] 24 65 90 94 55 22 40 21 17 47 39 71 59 66 79 88 97 56
> [37] 26 99 74 41 44 45 50 70 49 75 62 31 84 51 11 33 91 19
> [55] 61 28 77 18 52 54 48 43 87 25 35 38 30 73 27 89 53 8
> [73] 82 83 93 57 13 36 69 29 98 63 76 85 64 37 96 46 81 67
> [91] 92 20 80 42 58 34 86 32 14 100
> >
>


As you can see, the top items stay near the top, but mixing down deeper is
> quite strong.



You can use uniform noise to get kind of a different effect.

Reply via email to