Can you gist (gist.github.org) or pastebin your code? On Jan 29, 2013, at 5:12 PM, vybe3142 wrote:
> Reposting - I wasn't subscribed to the group earlier > > > VS > > first ingesting the data into SOLR and then invoking mahout on the SOLR > index (clustering on the contents of the field "text") > > defined as > > Field: text > Field-Type:org.apache.solr.schema.TextFieldProperties:Indexed,Tokenized,Multivalued,TermVector > StoredSchema:Indexed,Tokenized,Multivalued,TermVector StoredIndex:(unstored > field) > PI Gap:100 > Docs:21578 > > Index Analyzer: > org.apache.solr.analysis.TokenizerChain > Query Analyzer: > org.apache.solr.analysis.TokenizerChain > and executing a "similar" command set > > I get vastly differing results: > > The lucene / kmeans approach yeids 20 cluster whereas the solr approach > yields just one cluster. > > I'm obviously doing something wrong. Any pointers? > > Thanks > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Clustering-using-Solr-Index-vs-Lucene-Index-Different-Results-tp4037198.html > Sent from the Mahout User List mailing list archive at Nabble.com. -------------------------------------------- Grant Ingersoll http://www.lucidworks.com
