Why not test both the original and pruned data set? The low-rating
data may still help, even when the rating is forgotten.
I would not base the decision just on whether you can make
recommendations to N users but the quality of recommendations overall.

In this particular data set, which is rich and un-noisy, the ratings
are probably valuable information and I imagine you will do better
with any approach that doesn't drop them.

On Fri, Jan 25, 2013 at 2:19 AM, Koobas <[email protected]> wrote:
> They use a boolean recommender on the 10M MovieLens data
> with negative ratings removed (including only 3 stars or more).
> I wonder if this is a valid approach, as opposed to not removing anything.
>
> I actually went through the exercise of removing negative ratings from the
> 10M MovieLens set,
> and made the following observations:
>
> - It removes about 17% of all ratings,
> - 15 users disappear (out of 70,000),
> - 79 movies disappear (out of 10,000).
>
> So, it does not seem to hurt the overall exercise.
> Reasonably small fraction of ratings is gone.
> We will not recommend movies to a dozen users, who did not line anything.
> We will not be recommending movies which nobody liked.
>
> I would definitely appreciate some comments about that approach.

Reply via email to