Re: getting mahout clustering info back into lucene

Grant Ingersoll Sat, 05 Nov 2011 07:06:09 -0700

On Nov 5, 2011, at 8:36 AM, Robert Stewart wrote:

> If I run mahout clustering on lucene vectors, how would I go about getting 
> that cluster information back into lucene, in order to use the cluster 
> identifiers in field collapsing?
>


Since Lucene doesn't have incremental field update (which is seriously 
non-trivial to do in an inverted index), the only way to do this is to 
re-index.  Once DocValues are updateable, this may be a lot easier.   You 
could, also, perhaps use the ParallelReader, but that has some restrictions 
(you have to keep docids in sync)


> I know I can re-index with the new cluster info, but is there any way to put 
> cluster info into an existing index (which also may be non-optimized and 
> quite large)?  One way maybe to have a custom field collapsing component that 
> can read mahout cluster output.  Any thoughts?

Solr has some plugins around clustering already, if you are using that.  I've 
done some prototyping on hooking in Mahout, but there is nothing official yet.  
I haven't looked at field collapsing in depth yet.

On trunk, you might be able to do some other fancy tricks to make this work via 
codecs.

-Grant
--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com

Re: getting mahout clustering info back into lucene

Reply via email to