On Thu, Dec 1, 2011 at 4:26 PM, Ted Dunning <[email protected]> wrote:
> On Thu, Dec 1, 2011 at 6:11 AM, Daniel Zohar <[email protected]> wrote: > > > I think from the above curves one can clearly see that a lot of my data > is > > not needed to be checked when looking for similar items. That's because > if > > a user had only a single choice in the past, there's no point of checking > > for his other choices at all while doing item similarities. > > > > It would be fine to not load those users. > Does Mahout offer something like that out-of-the-box or should I implement my own DataModel? Do you agree that this is the right place to do so? > > > > > > I would think it's something that should be integrated into the > DataModel. > > Maybe there should be one Set that holds only users which had made more > > than one choice. This will greatly improve performance in my case. What > do > > you think? > > > > But there is the other problem that several of your users have made an > absurd number of choices. This is commonly due to QA processes or spiders. > You can moderate this effect by filtering what you consider to be an > interaction to be something that requires a human to be engaged with the > content. > > You can also downsample these users without eliminating them. This is done > in the off-line processing of recommendation data for item-based > recommendations, but I am not sure about on-line recommendations. >
