Why not test both the original and pruned data set? The low-rating data may still help, even when the rating is forgotten. I would not base the decision just on whether you can make recommendations to N users but the quality of recommendations overall.
In this particular data set, which is rich and un-noisy, the ratings are probably valuable information and I imagine you will do better with any approach that doesn't drop them. On Fri, Jan 25, 2013 at 2:19 AM, Koobas <[email protected]> wrote: > They use a boolean recommender on the 10M MovieLens data > with negative ratings removed (including only 3 stars or more). > I wonder if this is a valid approach, as opposed to not removing anything. > > I actually went through the exercise of removing negative ratings from the > 10M MovieLens set, > and made the following observations: > > - It removes about 17% of all ratings, > - 15 users disappear (out of 70,000), > - 79 movies disappear (out of 10,000). > > So, it does not seem to hurt the overall exercise. > Reasonably small fraction of ratings is gone. > We will not recommend movies to a dozen users, who did not line anything. > We will not be recommending movies which nobody liked. > > I would definitely appreciate some comments about that approach.
