Yes, this is what we do. We split by date into 10-90 or 20-80. The metric we 
use is MAP@k for precision and as a proxy for recall we look at the % of people 
in the test set that get recs (turn off popularity backfill or everyone will 
get some kind of recs, if only popular ones. The more independent events you 
have in the data the larger your recall number will be. Expect small precision 
numbers, they are on average but larger is better. Do not use it to compare 
different algorithms, only A/B tests work for that no matter what the academics 
do. Use your cross-validation scores to compare tunings. Start with the default 
for everything as your baseline and tune from there.


On Nov 24, 2017, at 12:54 AM, Andy Rao <[email protected]> wrote:

Hi, 

I have successfully trained our rec model using universal recommender, but I do 
not know how to evaluate the trained model. 

The first idea come from my head is to split our dataset into train and test 
dataset, and then use recall metrics evaluate. But I'm not sure whether this is 
a good idea or not.

Any help or suggestion is much appreciated.
Hongyao  
 

Reply via email to