On Tue, Aug 16, 2011 at 3:16 PM, Ted Dunning <[email protected]> wrote:

> There are major costs incurred if we move to long indexes for matrices.
>  That might be a good thing to do and it would be pretty easy to provide
> legacy access points, but it would hurt me to spend 30% on memory to do
> this.
>
> The need on the recommendation side was to have id's that would not collide
> without having to check.  That is a bit different from the matrix world
> where you have a conceptually dense set of integer indexes.
>

Why is it conceptually different than, say, the old DocumentVectorizer,
which
takes a random jumble of vocabulary, and creates a dictionary, which is a
strictly no-collision mapping of (term: string) <-> (termId: int)?

Why not do the same thing in the recommender world (other than for legacy
reasons), for user and item ids?

  -jake


>
> On Tue, Aug 16, 2011 at 11:44 AM, Jake Mannix <[email protected]>
> wrote:
>
> > But while we've talked about this, adding a proliferation of FloatVector,
> > DoubleVector (and BooleanVector), together with LongMatrix vs IntMatrix
> > would really complicate all of the higher-level apis.  It's possible, but
> > could
> > be ugly.
> >
> > So then we could instead standardize everything to "one size fits all",
> > and break backwards compatibility with either all Taste users, or all of
> > our algorithms and data in the classification / clustering /
> vectorization
> > codebase.
> >
> > Or we could write some simple utilities (some we already have) to
> > convert formats internally when needed (warning when collisions are
> > possible on key range folding, and possibly losing precision or bloating
> > the data size).
> >
> > I think the latter approach is probably best, IMO.
> >
>

Reply via email to