On Wed, Jun 9, 2010 at 7:14 PM, Jake Mannix <[email protected]> wrote:
> The ItemSimilarityJob actually uses implementations of the Vector
> class hierarchy?  I think that's the issue - if the on-disk and in-mapper
> representations are never Vectors, then they won't interoperate with
> any of the matrix operations...

Yes they are Vectors.

> And yeah, keying on ints is necessary for now, unless we want to
> make a new matrix type (at least for distributed matrices) which
> keys on longs (which actually might be a good idea: now that
> we're using VInt and VLong, the disk space and network usage
> should be not be adversely affected - just the in-memory
> representation).

Oh I see. Well that's not a problem. Already, IDs have to be mapped to
ints to be used as dimensions in a Vector. So in most cases things are
keyed by these int pseudo-IDs. That's OK too.

A matrix is a bunch of vectors -- at least, that's a nice structure
for a SequenceFile. Row (or col) ID mapped to row (column) vector.

is that not what other jobs are using?
what's the better alternative we could think about converging on.

Reply via email to