On Tue, Aug 16, 2011 at 3:16 PM, Ted Dunning <[email protected]> wrote:
> There are major costs incurred if we move to long indexes for matrices. > That might be a good thing to do and it would be pretty easy to provide > legacy access points, but it would hurt me to spend 30% on memory to do > this. > > The need on the recommendation side was to have id's that would not collide > without having to check. That is a bit different from the matrix world > where you have a conceptually dense set of integer indexes. > Why is it conceptually different than, say, the old DocumentVectorizer, which takes a random jumble of vocabulary, and creates a dictionary, which is a strictly no-collision mapping of (term: string) <-> (termId: int)? Why not do the same thing in the recommender world (other than for legacy reasons), for user and item ids? -jake > > On Tue, Aug 16, 2011 at 11:44 AM, Jake Mannix <[email protected]> > wrote: > > > But while we've talked about this, adding a proliferation of FloatVector, > > DoubleVector (and BooleanVector), together with LongMatrix vs IntMatrix > > would really complicate all of the higher-level apis. It's possible, but > > could > > be ugly. > > > > So then we could instead standardize everything to "one size fits all", > > and break backwards compatibility with either all Taste users, or all of > > our algorithms and data in the classification / clustering / > vectorization > > codebase. > > > > Or we could write some simple utilities (some we already have) to > > convert formats internally when needed (warning when collisions are > > possible on key range folding, and possibly losing precision or bloating > > the data size). > > > > I think the latter approach is probably best, IMO. > > >
