I'd like to understand the hashed vector stuff better so will take on the task of creating a "clean command line integration from text => hashed vector => clusters" ... just as soon as I finish our write-up for 588 ;-)
On Fri, Mar 18, 2011 at 10:50 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: > We have the encoders and the resulting vectors should cluster as easily as > anything. > > What we don't have is a clean command line integration from text => hashed > vector => clusters > > > On Fri, Mar 18, 2011 at 9:44 AM, Grant Ingersoll <gsing...@apache.org>wrote: > >> > Another option is to use hashed feature vectors. These will retain >> > essentially all of the data of the larger vectors but will allow your >> > centroids to be more moderate in size. This also helps in not requiring >> a >> > pass over your data to assign vector locations. >> >> Do we have code for using this with our existing algorithms? > > >