Yeah... what Pat said.

Off-line evaluations are difficult.  At most, they provide directional
guidance to be refined using live A/B testing.  Of course, A/B testing of
recommenders comes with a new set of tricky issues like different
recommenders learning from each other.

On Sun, Mar 30, 2014 at 4:54 PM, Pat Ferrel <p...@occamsmachete.com> wrote:

> Seems like most people agree that ranking is more important than rating in
> most recommender deployments. RMSE was used for a long time with
> cross-validation (partly because it was the choice of Netflix during the
> competition) but it is really a measure of total rating error.  In the past
> we’ve used mean-average-precision as a good measure of ranking quality. We
> chose hold-out tests based on time, so something like 10% of the most
> recent data was held out for cross-validaton and we measured MAP@n for
> tuning parameters.
>
> http://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision
>
> For our data (ecommerce shopping data) most of the ALS tuning parameters
> had very little affect on MAP. However cooccurrence recommenders performed
> much better using the same data. Unfortunately comparing two algorithms
> with offline tests is of questionable value. Still with nothing else to go
> on we went with the cooccurrence recommender.
>
>

Reply via email to