Hi Sean,

OK, I understand, thanks. I am working with Boolean data for the time being, so I'm using the IRStatsEvaluator. But I'll revisit the issue if and when I go back to integer preferences.

On 11/29/2011 08:19 PM, Sean Owen wrote:
The recommendation process ends with steps:

1. Estimate a pref for each candidate item
2. (Optionally, rescore or filter those pref values)
3. Sort by estimated pref and return top items by pref

The evaluator is not evaluating the result at step #3, but at step #1 -- as
a proxy for evaluating the quality of the ultimate recommendations. It's
not necessarily any less valid to see how well it estimates the pref for an
item that happens to be expired. So yes I'd say the current behavior is
intended.

I take your point though. You could fairly easily
modify AbstractDifferenceRecommenderEvaluator to construct whatever test
and training data set you like. For example, you would probably put all
expired items in your training set and not in the test set.

If you're OK just modifying the code, go for that.
If you'd like to think of a clean way to incorporate a hook that lets you
replace the random test/training selection with custom logic, that's cool
too. I think it would be some work, if not a great deal, to cleanly
refactor out the random sampling.

On Tue, Nov 29, 2011 at 4:09 PM, Anatoliy Kats<[email protected]>  wrote:

Hi,

I brought up this question in dev a few weeks ago.  I have a
recommendation algorithm that learns the similarity matrix relying on both
current items, and expired ones that should not be recommended.  However,
AverageAbsoluteDifferenceRecom**menderEvaluator compares the predicted
and actual ratings for all items, expired or not.  I believe the evaluation
would be more realistic if it did not -- it corresponds more closely to how
the algorithm is normally deployed in production.  For example, the newer
items generally have fewer clicks, so this kind of an evaluation emphasizes
the cold start problem we would experience in production.

The evaluation uses expired items even if if I write a recommender class
that forces all recommendations to use an IDRescorer that sets their scores
to NaN.  The reason is that the ...Evaluator calls the 
Recommender::**doEstimatePreference
function to calculate the predicted rating, bypassing the recommend
function.  I checked for the presence of expired items by running my
recommender in the debugger, and checking the item IDs when
doEstimatePreference is called.

Do I understand the evaluator's behavior correctly?  Do you think this is
considered a bug?

Thanks,

Anatoliy


Reply via email to