When I first started reading the Manning book, I was a little surprised by
the description of data structures for preferences in the collaborative
filtering section.  Before getting the book I had really only played around
with the Vector implementations and I was used to the Vectors being generic
lists of <int, double> pairs.  So I was a little bit surprised to read the
description of all the collaborative filtering implementations using generic
lists of <long, float> pairs.

I was wondering if I could get some general comments on the reason for this
disparity.  I'm guessing it's a matter of history and optimization -- taste
was optimized for storing more info at the index level and less at the
"rating" level whereas vectors were intended to be generic with the ability
to maintain the maximum amount of precision.  Unfortunately the lowest
common denominator is int/float, so if you want to go between models you
have to fit into the smaller footprint constraint of each without getting
the benefit of the smaller footprint constraint of each...

It ends up feeling like there are two faces to mahout which are somewhat
incompatible.  Are there any thoughts about bridging the gap between the two
models in the future?  If this really is a matter of each model being
optimized for it's problem space, maybe it would just help to have a clear
delineation of which utilities belong on which side of the fence -- as well
as some utility for shifting generic types between the models (with the
warning that there might be loss of precision or the ability to maintain as
many ids).  That way utilities that already exist on the one side could be
reused on the other side.

Reply via email to