You have to write or adapt some code. This is the big current down-side of the hashing encoders.
On Fri, May 27, 2011 at 2:38 AM, David Saile <[email protected]> wrote: > > The other option is to use the hashing encoders. They inherently produce > > output of fixed cardinality. The down-side with that is that the meaning > of > > lots of distance measures is hard to understand in the hashed frameworks. > > Distances that are invariant under linear transformations work perfectly. > > Some others like Manhattan distance work pretty well. Others can be > > totally confused. > > This sounds like an option that eliminates the need for a global dictionary > (in regards to multiple vecotrizer runs). > How can I specify the use of hashing encoders for vectorization?
