Re: CardinalityException during data clustering

Ted Dunning Fri, 27 May 2011 08:26:52 -0700

You have to write or adapt some code.  This is the big current down-side of
the hashing encoders.


On Fri, May 27, 2011 at 2:38 AM, David Saile <[email protected]> wrote:

> > The other option is to use the hashing encoders.  They inherently produce
> > output of fixed cardinality.  The down-side with that is that the meaning
> of
> > lots of distance measures is hard to understand in the hashed frameworks.
> > Distances that are invariant under linear transformations work perfectly.
> > Some others like Manhattan distance work pretty well.  Others can be
> > totally confused.
>
> This sounds like an option that eliminates the need for a global dictionary
> (in regards to multiple vecotrizer runs).
> How can I specify the use of hashing encoders for vectorization?

Re: CardinalityException during data clustering

Reply via email to