Re: Incremental clustering - Kmeans + Canopy

Sean Owen Fri, 21 Jan 2011 01:35:23 -0800

(This is how the recommender handles textual IDs for users and items, by the
way.)


On Fri, Jan 21, 2011 at 5:57 AM, Ted Dunning <[email protected]> wrote:

> Yes.  The assignment of features to locations in a fixed size vector is
> done
> using hashing rather than a dictionary.  With a reasonably large vector
> collisions will on average not be too terrible.  With smaller vectors or
> where we are using massive vocabularies due to feature interactions or
> simply to worry less, we can use multiple hashing to assign a single
> feature
> multiple locations.  We can prove that the resulting hashed representation
> retains all the information we want and that learning a linear classifier
> using the hashed representation should work pretty much as well as a full
> representation using the same tricks as the random projection guys use
> because, well, the hashed representation *is* a random linear projection.
>
>

Re: Incremental clustering - Kmeans + Canopy

Reply via email to