The use case for augmenting vector writable would be where you have a bunch
of vectors that you want to cluster and you want to keep around auxiliary
data associated with each vector rather than do a join down-stream from the
clustering.

I say do the join.  The cost won't be that different.  The clustering will
go faster for not having to schlep around the payload which will probably
more than compensate for having to read the file to join later.  Many
processes will preserve order so the final join can be done as a map-side
merge.  Where the map-side merge isn't possible, then you may have to do a
full reduce side join, but that is still going to be close to break-even.

On Tue, Oct 12, 2010 at 3:51 PM, Sean Owen <[email protected]> wrote:

> If that's all that's meant -- seems like you just want to write
> VectorAndThingWritable rather than inject an optional Thing into
> VectorWritable. It'd work either way but seems cleaner to compose it that
> way. VectorAndThingWritable might belong in core depending on how general
> "Thing" is.
>
> On Tue, Oct 12, 2010 at 10:41 PM, Ted Dunning <[email protected]>
> wrote:
>
> > There is currently no provision for a payload in the VectorWritable.  It
> is
> > plausible that such a capability could be added.
> >
> >
>

Reply via email to