On Thu, Dec 1, 2011 at 6:11 AM, Daniel Zohar <[email protected]> wrote:

> I think from the above curves one can clearly see that a lot of my data is
> not needed to be checked when looking for similar items. That's because if
> a user had only a single choice in the past, there's no point of checking
> for his other choices at all while doing item similarities.
>

It would be fine to not load those users.


>
> I would think it's something that should be integrated into the DataModel.
> Maybe there should be one Set that holds only users which had made more
> than one choice. This will greatly improve performance in my case. What do
> you think?
>

But there is the other problem that several of your users have made an
absurd number of choices.  This is commonly due to QA processes or spiders.
 You can moderate this effect by filtering what you consider to be an
interaction to be something that requires a human to be engaged with the
content.

You can also downsample these users without eliminating them.  This is done
in the off-line processing of recommendation data for item-based
recommendations, but I am not sure about on-line recommendations.

Reply via email to