Can you gist (gist.github.org) or pastebin your code?

On Jan 29, 2013, at 5:12 PM, vybe3142 wrote:

> Reposting - I wasn't subscribed to the group earlier
> 
> 
> VS 
> 
> first ingesting the data into SOLR and then invoking mahout on the SOLR
> index (clustering on the contents of the field "text") 
> 
> defined as 
> 
> Field: text
> Field-Type:org.apache.solr.schema.TextFieldProperties:Indexed,Tokenized,Multivalued,TermVector
> StoredSchema:Indexed,Tokenized,Multivalued,TermVector StoredIndex:(unstored
> field)
> PI Gap:100
> Docs:21578
> 
> Index Analyzer:
> org.apache.solr.analysis.TokenizerChain
> Query Analyzer:
> org.apache.solr.analysis.TokenizerChain
> and executing a  "similar" command set 
> 
> I get vastly differing results: 
> 
> The lucene / kmeans approach yeids 20 cluster whereas the solr approach
> yields just one cluster. 
> 
> I'm obviously doing something wrong. Any pointers? 
> 
> Thanks
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Clustering-using-Solr-Index-vs-Lucene-Index-Different-Results-tp4037198.html
> Sent from the Mahout User List mailing list archive at Nabble.com.

--------------------------------------------
Grant Ingersoll
http://www.lucidworks.com




Reply via email to