If you have temporal information, you should use these to split the data.
Try to predict later interactions from older ones.

Am 26.08.2012 17:04 schrieb
>
> It's the same idea, but yes you'd have to re-implement it for Hadoop.
>
> Randomly select a subset of users. Identify a small number of
> most-preferred items for that user -- perhaps the video(s) watched
> most often. Hold these data points out as a test set. Run your process
> on all the rest.
>
> Make recommendations for the selected users. You then just see how
> many in the list were among the test data you held out. The percentage
> of recs that were in the test list is precision, and the percent of
> the test list in the recs is recall.
>
> Precision and recall are not good tests, but among the only ones you
> can carry out in the lab. Slightly better are variations on these two
> metrics, like F1 measure and normalized discounted cumulative gain.
> Also look up mean average precision.
>
> On Sun, Aug 26, 2012 at 10:47 AM, Jonathan Hodges <[email protected]>
> wrote:
> > Hi,
> >
> > We have been tasked with producing video recommendations for our users.
> We
> > get about 100 million video views per month and track users and the
> videos
> > they watch, but currently we don’t collect rating value or preference.
> > Later we plan on using implicit data like percentage of video watched to
> > surmise preferences but for the first release we are stuck with Boolean
> > viewing data. To that end we started by using Mahout’s distributed
> > RecommenderJob with LoglikelihoodSimilarity algorithm to generate 50
> video
> > recommendations for each user. We would like to gauge how well we are
> doing
> > by offline measuring precision and recall of these recommendations. We
> know
> > we should divide the viewing data into training and test data, but not
> real
> > sure what steps to take next. For the non-distributed approach we would
> > leverage IRStatistics to get the precision and recall values, but it
> seems
> > there isn’t as simple a solution within the Mahout framework for the
> Hadoop
> > based calculations.
> >
> > Can someone please share/suggest their techniques for evaluating
> > recommendation accuracy with Mahout’s Hadoop-based distributed
> algorithms?
> >
> > Thanks in advance,
> >
> > Jonathan
>

Reply via email to