On Thu, May 26, 2011 at 10:35 AM, David Saile <[email protected]> wrote:
> I assume, this exception occurs because the new vectors have a different > cardinality than the previously computed clusters. > Correct > Is there some way to assign a fixed cardinality to all vectors? Or is there > any other solution for this? > I think that there is a way to use a fixed dictionary. If we don't already have it, there should be a provision for adding an extra slot for unknown words to fit into. The other option is to use the hashing encoders. They inherently produce output of fixed cardinality. The down-side with that is that the meaning of lots of distance measures is hard to understand in the hashed frameworks. Distances that are invariant under linear transformations work perfectly. Some others like Manhattan distance work pretty well. Others can be totally confused.
