For the UR we have scripts that do this instead of the Evaluation APIs, which 
are pretty limited and to not do what we want, which is hyper-prameter search. 
This requires changing some params that require models to be re-created, and 
other tests vary query params. All these are only possible with scripts that 
control the whole system from the outside.


On Nov 24, 2017, at 9:42 AM, Pat Ferrel <[email protected]> wrote:

Yes, this is what we do. We split by date into 10-90 or 20-80. The metric we 
use is MAP@k for precision and as a proxy for recall we look at the % of people 
in the test set that get recs (turn off popularity backfill or everyone will 
get some kind of recs, if only popular ones. The more independent events you 
have in the data the larger your recall number will be. Expect small precision 
numbers, they are on average but larger is better. Do not use it to compare 
different algorithms, only A/B tests work for that no matter what the academics 
do. Use your cross-validation scores to compare tunings. Start with the default 
for everything as your baseline and tune from there.


On Nov 24, 2017, at 12:54 AM, Andy Rao <[email protected]> wrote:

Hi, 

I have successfully trained our rec model using universal recommender, but I do 
not know how to evaluate the trained model. 

The first idea come from my head is to split our dataset into train and test 
dataset, and then use recall metrics evaluate. But I'm not sure whether this is 
a good idea or not.

Any help or suggestion is much appreciated.
Hongyao  



Reply via email to