Re: Mahout performance issues

Daniel Zohar Thu, 01 Dec 2011 06:33:16 -0800

On Thu, Dec 1, 2011 at 4:26 PM, Ted Dunning <[email protected]> wrote:


> On Thu, Dec 1, 2011 at 6:11 AM, Daniel Zohar <[email protected]> wrote:
>
> > I think from the above curves one can clearly see that a lot of my data
> is
> > not needed to be checked when looking for similar items. That's because
> if
> > a user had only a single choice in the past, there's no point of checking
> > for his other choices at all while doing item similarities.
> >
>
> It would be fine to not load those users.
>

Does Mahout offer something like that out-of-the-box or should I implement
my own DataModel? Do you agree that this is the right place to do so?

>
>
> >
> > I would think it's something that should be integrated into the
> DataModel.
> > Maybe there should be one Set that holds only users which had made more
> > than one choice. This will greatly improve performance in my case. What
> do
> > you think?
> >
>
> But there is the other problem that several of your users have made an
> absurd number of choices.  This is commonly due to QA processes or spiders.
>  You can moderate this effect by filtering what you consider to be an
> interaction to be something that requires a human to be engaged with the
> content.
>
> You can also downsample these users without eliminating them.  This is done
> in the off-line processing of recommendation data for item-based
> recommendations, but I am not sure about on-line recommendations.
>

Re: Mahout performance issues

Reply via email to