Re: Recommender Evaluator

Pat Ferrel Mon, 03 Dec 2012 13:22:44 -0800

Great, thanks. Not sure if it's worth changing because as I said my data is 
very very incomplete. This is an experiment and we're mining a site "politely" 
so it will take months to accumulate a good share.

In the meantime to temporarily get around the low rate of cooccurrence we look 
at the strength of the recommendation. We're using a small neighborhood (3). 
Looking through all of the recommendations we get a few pretty high 
strengths--say 2.8-1.5. While it's hard to tell by just looking these seem to 
be reasonably good recommendations. 

The intuition for all of this being, we have a very weak recommender for the 
average user but a good one for a lucky few. I suppose adding the user's 
individual P and R to the eval criteria would help validate this judgement? The 
P&R come from a slightly different recommender than the actual recommendations 
due to using the eval subset. It seems like a high strength would correspond to 
a higher P&R since the strength is the sum of user similarities. 

We are then using these few highly ranked recommendations to get an early 
somewhat subjective look at value. Earlier I asked if strengths could be used 
to compare one user's recommendation to another's and concluded that they could 
(all caveats about the actual meaning of strengths kept in mind). Any obvious 
flaw in this reasoning? 

On Dec 3, 2012, at 12:11 PM, Sean Owen <[email protected]> wrote:

This value can only be calculated if there are both at least one
recommended item, and at least one item considered "relevant". The
others could have a value if at least one of those is true. That's the
likely explanation.

With very little data these tests are going to mean very little -- a
lot will be just chance. If there's so little that nDCG can't even be
calculated it kind of seems like this should be an error. But I think
a near-assertion may be going a little far... maybe those conditions
should allow NaN because that object's role is more about transporting
the answer than judging it. The caller will have to decide what NaN
means.

I am happy to change that -- but would not pay attention to these
tests at this scale.

On Mon, Dec 3, 2012 at 7:55 PM, Pat Ferrel <[email protected]> wrote:
> I'm doing a very simple recommender based on binary data. Using 
> GenericRecommenderIRStatsEvaluator I get nDCG = NaN for each user. My data is 
> still very incomplete, which means an extremely low cooccurrence rate but 
> there are some since otherwise I'd expect P and R to be 0 and they are not. 
> For nDCG to be NaN it looks like the running average is never initialized 
> because the user values are never initialized. How should I interpret this?
> 
> I catch the exception at the end when the average nDCG is calculated but the 
> P, R, and F should still be OK, right? I wonder if an exception is really 
> what you want here because it makes otherwise valid values inaccessible. I 
> commented out the nDCG precondition and the results are weak as I'd expect 
> but valid AFAIK.
> 
> 12/12/03 10:55:11 INFO eval.GenericRecommenderIRStatsEvaluator: 
> Precision/recall/fall-out/nDCG: 0.01214798453892877 / 0.010180472003701981 / 
> 5.687917781641289E-5 / NaN
> 12/12/03 10:55:11 INFO eval.GenericRecommenderIRStatsEvaluator: Evaluated 
> with user 2146441897 in 24ms
> 12/12/03 10:55:11 INFO eval.GenericRecommenderIRStatsEvaluator: 
> Precision/recall/fall-out/nDCG: 0.012141280353200884 / 0.010175763182238659 / 
> 5.6884648356688493E-5 / NaN
> Precision = 0.012141280353200884
> Recall = 0.010175763182238659
> F1 = 0.011071967790639152
>

Re: Recommender Evaluator

Reply via email to