What exactly does 'didnt buy' mean here ? Was the user shown the item or its
just an item they never considered?

To find the 'best' metric here you could simply run an offline evaluation
across your dataset. But what appears to be the most important thing is what
does each representation actually mean?

This link is to a paper about a binary approach, read the evaluation section
-> [PDF] from 
psu.edu<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.109.2430&rep=rep1&type=pdf>

On Tue, Apr 26, 2011 at 8:16 AM, Sean Owen <[email protected]> wrote:

> I think my comment mostly addressed his comments. Yes, this is the
> definition of cosine distance, and is implemented. No it doesn't work over
> true binary data. There is no "0", only "1" or non-existent.
>
> What is the remaining question?
>
> On Tue, Apr 26, 2011 at 3:21 AM, Chris Waggoner <[email protected]
> >wrote:
> >
> >
> > I've never used Mahout but what this @allclaws wants sounds like a simple
> > proposition. Given a vector like
> >
> > bought
> > didn't buy
> > didn't buy
> > didn't buy
> > didn't buy
> > didn't buy
> > didn't buy
> > bought
> > didn't buy
> > bought
> > bought
> > bought
> >
> >
> >
> > define "bought" == 1 and "didn't buy" == 0. Define distance between two
> > such vectors to be { A dot B } over { |A| times |B| }. Not that I find
> this
> > compelling as a definition of similarity but @allclaws called this a
> first,
> > rough pass.
>

Reply via email to