I'd like to understand the hashed vector stuff better so will take on the
task of creating a "clean command line integration from text => hashed
vector => clusters" ... just as soon as I finish our write-up for 588 ;-)

On Fri, Mar 18, 2011 at 10:50 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> We have the encoders and the resulting vectors should cluster as easily as
> anything.
>
> What we don't have is a clean command line integration from text => hashed
> vector => clusters
>
>
> On Fri, Mar 18, 2011 at 9:44 AM, Grant Ingersoll <gsing...@apache.org>wrote:
>
>> > Another option is to use hashed feature vectors.  These will retain
>> > essentially all of the data of the larger vectors but will allow your
>> > centroids to be more moderate in size.  This also helps in not requiring
>> a
>> > pass over your data to assign vector locations.
>>
>> Do we have code for using this with our existing algorithms?
>
>
>

Reply via email to