The simplest and best answer is that 0 is not the same as null. The
framework does not treat them as the same, as a rule. A preference of 0 has
some effect on computations; a preference that does not exist has none.

The twist here is that there is no such thing as "null" for the mathematical
entity that is a vector. A vector's value is implicitly 0.0 in any dimension
that has not otherwise been set. This is its "null". In the context of
recommenders it should still be true that these "null" dimensions have no
effect on the computation, which means this principle is preserved.

I can't say I'm 100% sure that 0.0 preferences have no effect on
computations involving RandomAccessSparseVector in the way that non-existent
preferences have no effect on the non-distributed computations. It's a good
principle, but not guaranteed, I would say. It's an artifact of the fact
that things aren't 100% the same when projected entirely into the world of
vector math, useful as that projection is.

However it does mean that there's no way to actually express a 0.0
preference in the context of Hadoop-based computation that involves the
likes of RandomAccessSparseVector. This is a non-trivial difference and
problem indeed.

Don't add an epsilon to all ratings, no. That makes your vectors complete
un-sparse, which is a killer. Instead if you really want to express 0.0, I
might set the value to some very small value instead, perhaps
Double.MIN_VALUE. This slight distortion ought make no practical difference.


On Mon, Oct 18, 2010 at 10:32 AM, gabeweb <[email protected]> wrote:

>
> I wanted to ask about the problem of deciding to treat 0.0 as a null value
> or
> a valid number in the Mahout recommenders.  Basically, Mahout treats 0.0 as
> the null value for ratings, probably because this is compatible with the
> sparse vector representations (most obviously with
> RandomAccessSparseVector).  But in principle, there is no reason why 0.0
> should not be a valid rating, and the null value should be represented by
> "null".  Have other folks come up against this problem, and if so, how have
> they solved it?  Modifying RandomAccessSparseVector looks like a lot of
> work.  I could just add n to all of my ratings so that min(rating+n) > 0
> and
> then subtract n from the predicted ratings, but that's really, really ugly.
> Does anyone have a better idea?  Thanks.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/0-0-as-null-versus-number-in-recommender-tp1723927p1723927.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>

Reply via email to