Just a set of mahout commands. Here they are.
https://gist.github.com/4674331
For what it's worth,t he relevant solr config from the schema was
<field name="text" type="text_general" indexed="true" stored="false"
multiValued="true"
termVectors="true"/>
Thank You
On Wed, Jan 30, 2013 at 4:37 AM, Grant Ingersoll <[email protected]>wrote:
> Can you gist (gist.github.org) or pastebin your code?
>
> On Jan 29, 2013, at 5:12 PM, vybe3142 wrote:
>
> > Reposting - I wasn't subscribed to the group earlier
> >
> >
> > VS
> >
> > first ingesting the data into SOLR and then invoking mahout on the SOLR
> > index (clustering on the contents of the field "text")
> >
> > defined as
> >
> > Field: text
> >
> Field-Type:org.apache.solr.schema.TextFieldProperties:Indexed,Tokenized,Multivalued,TermVector
> > StoredSchema:Indexed,Tokenized,Multivalued,TermVector
> StoredIndex:(unstored
> > field)
> > PI Gap:100
> > Docs:21578
> >
> > Index Analyzer:
> > org.apache.solr.analysis.TokenizerChain
> > Query Analyzer:
> > org.apache.solr.analysis.TokenizerChain
> > and executing a "similar" command set
> >
> > I get vastly differing results:
> >
> > The lucene / kmeans approach yeids 20 cluster whereas the solr approach
> > yields just one cluster.
> >
> > I'm obviously doing something wrong. Any pointers?
> >
> > Thanks
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Clustering-using-Solr-Index-vs-Lucene-Index-Different-Results-tp4037198.html
> > Sent from the Mahout User List mailing list archive at Nabble.com.
>
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidworks.com
>
>
>
>
>