When I first started reading the Manning book, I was a little surprised by the description of data structures for preferences in the collaborative filtering section. Before getting the book I had really only played around with the Vector implementations and I was used to the Vectors being generic lists of <int, double> pairs. So I was a little bit surprised to read the description of all the collaborative filtering implementations using generic lists of <long, float> pairs.
I was wondering if I could get some general comments on the reason for this disparity. I'm guessing it's a matter of history and optimization -- taste was optimized for storing more info at the index level and less at the "rating" level whereas vectors were intended to be generic with the ability to maintain the maximum amount of precision. Unfortunately the lowest common denominator is int/float, so if you want to go between models you have to fit into the smaller footprint constraint of each without getting the benefit of the smaller footprint constraint of each... It ends up feeling like there are two faces to mahout which are somewhat incompatible. Are there any thoughts about bridging the gap between the two models in the future? If this really is a matter of each model being optimized for it's problem space, maybe it would just help to have a clear delineation of which utilities belong on which side of the fence -- as well as some utility for shifting generic types between the models (with the warning that there might be loss of precision or the ability to maintain as many ids). That way utilities that already exist on the one side could be reused on the other side.
