Great, thanks. Not sure if it's worth changing because as I said my data is very very incomplete. This is an experiment and we're mining a site "politely" so it will take months to accumulate a good share.
In the meantime to temporarily get around the low rate of cooccurrence we look at the strength of the recommendation. We're using a small neighborhood (3). Looking through all of the recommendations we get a few pretty high strengths--say 2.8-1.5. While it's hard to tell by just looking these seem to be reasonably good recommendations. The intuition for all of this being, we have a very weak recommender for the average user but a good one for a lucky few. I suppose adding the user's individual P and R to the eval criteria would help validate this judgement? The P&R come from a slightly different recommender than the actual recommendations due to using the eval subset. It seems like a high strength would correspond to a higher P&R since the strength is the sum of user similarities. We are then using these few highly ranked recommendations to get an early somewhat subjective look at value. Earlier I asked if strengths could be used to compare one user's recommendation to another's and concluded that they could (all caveats about the actual meaning of strengths kept in mind). Any obvious flaw in this reasoning? On Dec 3, 2012, at 12:11 PM, Sean Owen <[email protected]> wrote: This value can only be calculated if there are both at least one recommended item, and at least one item considered "relevant". The others could have a value if at least one of those is true. That's the likely explanation. With very little data these tests are going to mean very little -- a lot will be just chance. If there's so little that nDCG can't even be calculated it kind of seems like this should be an error. But I think a near-assertion may be going a little far... maybe those conditions should allow NaN because that object's role is more about transporting the answer than judging it. The caller will have to decide what NaN means. I am happy to change that -- but would not pay attention to these tests at this scale. On Mon, Dec 3, 2012 at 7:55 PM, Pat Ferrel <[email protected]> wrote: > I'm doing a very simple recommender based on binary data. Using > GenericRecommenderIRStatsEvaluator I get nDCG = NaN for each user. My data is > still very incomplete, which means an extremely low cooccurrence rate but > there are some since otherwise I'd expect P and R to be 0 and they are not. > For nDCG to be NaN it looks like the running average is never initialized > because the user values are never initialized. How should I interpret this? > > I catch the exception at the end when the average nDCG is calculated but the > P, R, and F should still be OK, right? I wonder if an exception is really > what you want here because it makes otherwise valid values inaccessible. I > commented out the nDCG precondition and the results are weak as I'd expect > but valid AFAIK. > > 12/12/03 10:55:11 INFO eval.GenericRecommenderIRStatsEvaluator: > Precision/recall/fall-out/nDCG: 0.01214798453892877 / 0.010180472003701981 / > 5.687917781641289E-5 / NaN > 12/12/03 10:55:11 INFO eval.GenericRecommenderIRStatsEvaluator: Evaluated > with user 2146441897 in 24ms > 12/12/03 10:55:11 INFO eval.GenericRecommenderIRStatsEvaluator: > Precision/recall/fall-out/nDCG: 0.012141280353200884 / 0.010175763182238659 / > 5.6884648356688493E-5 / NaN > Precision = 0.012141280353200884 > Recall = 0.010175763182238659 > F1 = 0.011071967790639152 >
