Ah! So if it was a sparse vector it could be indexed directly. Or the mapping could be with a hash-indexed representation as used with Lucene vectors.
On Sun, Jun 12, 2011 at 3:43 AM, Sean Owen <sro...@gmail.com> wrote: > The keys have to be hashed to be used as int offsets into a vector. While > loading the mapping isn't ideal it does only scale as the number of items > and users. > On Jun 12, 2011 3:47 AM, "Lance Norskog" <goks...@gmail.com> wrote: >> The RecommenderJob makes a "side" file which maps a fabricated integer >> index to a long ItemID. Why is this needed? Couldn't the >> RecommenderJob propagate the long ItemID directly? Note that this >> forces all instances of AggregateAndReduceRecommender to load the >> entire map. Part of the Map/Reduce rules are 'nothing needs to know >> everything'. >> >> Is this a sparse/dense optimization? If so, have the distributed >> algorithms advanced enough to make this indirection unnecessary? >> >> -- >> Lance Norskog >> goks...@gmail.com > -- Lance Norskog goks...@gmail.com