The use case for augmenting vector writable would be where you have a bunch of vectors that you want to cluster and you want to keep around auxiliary data associated with each vector rather than do a join down-stream from the clustering.
I say do the join. The cost won't be that different. The clustering will go faster for not having to schlep around the payload which will probably more than compensate for having to read the file to join later. Many processes will preserve order so the final join can be done as a map-side merge. Where the map-side merge isn't possible, then you may have to do a full reduce side join, but that is still going to be close to break-even. On Tue, Oct 12, 2010 at 3:51 PM, Sean Owen <[email protected]> wrote: > If that's all that's meant -- seems like you just want to write > VectorAndThingWritable rather than inject an optional Thing into > VectorWritable. It'd work either way but seems cleaner to compose it that > way. VectorAndThingWritable might belong in core depending on how general > "Thing" is. > > On Tue, Oct 12, 2010 at 10:41 PM, Ted Dunning <[email protected]> > wrote: > > > There is currently no provision for a payload in the VectorWritable. It > is > > plausible that such a capability could be added. > > > > >
