Re: Vectorizing arbitrary value types with seq2sparse

Ted Dunning Fri, 06 May 2011 13:18:15 -0700

This is definitely desirable but is very different from the current tool.

My guess is the big difficulty will be describing the vectorization to be
done.  The hashed representations would make that easier, but still not
trivial.  Dictionary based methods add multiple dictionary specifications
and also require that we figure out how to combine vectors by concatenation
or overlay.


On Fri, May 6, 2011 at 1:02 PM, Frank Scholten <[email protected]>wrote:

> Hi everyone,
>
> At the moment seq2sparse can generate vectors from sequence values of
> type Text. More specifically, SequenceFileTokenizerMapper handles Text
> values.
>
> Would it be useful if seq2sparse could be configured to vectorize
> value types such as a Blog article with several textual fields like
> title, content, tags and so on?
>
> Or is it easier to create a separate job for this or use Pig or
> anything like that?
>
> Frank
>

Re: Vectorizing arbitrary value types with seq2sparse

Reply via email to