Thanks Otis! Do you know under what circumstances or application should we cluster the whole corpus of documents vs just the search results?
Jeffrey On Fri, Jun 12, 2009 at 1:39 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > > Jeffrey, > > Are you looking to cluster a whole corpus of documents of just the search > results? If it's the latter, use Carrot2. If it's the former, look at > Mahout. Clustering top 1M matching documents doesn't really make sense. > Usually top 100-200 is sufficient. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- > > From: Jeffrey Tiong <jeffrey.ti...@gmail.com> > > To: solr-user@lucene.apache.org > > Sent: Friday, June 12, 2009 12:44:55 AM > > Subject: Re: Faceting on text fields > > > > Hi all, > > > > We are thinking of using the carrot clustering too. But we saw that > carrot > > maybe can only cluster up to 1000 search snippets. Does anyone know how > can > > we cluster snippets that is much more than that ? (maybe in the million > > range?) > > > > And what is the difference between mahout and carrot? > > > > Thank! > > > > Jeffrey > > > > On Thu, Jun 11, 2009 at 9:47 PM, Michael Ludwig wrote: > > > > > Yao Ge schrieb: > > > > > >> BTW, Carrot2 has a very impressive Clustering Workbench (based on > > >> eclipse) that has built-in integration with Solr. If you have a Solr > > >> service running, it is a just a matter of point the workbench to it. > > >> The clustering results and visualization are amazing. > > >> (http://project.carrot2.org/download.html). > > >> > > > > > > A new world opens up for me ... > > > > > > Thanks for pointing out how cool this is! > > > > > > Hint for other newcomers: Open the View Menu to configure the details > of > > > how you perform your search, e.g. your Solr URL in case it differs from > > > the default, or your "summary field", which is what gets used to > analyze > > > the data in order to determine clusters, if I understand correctly. > > > > > > Michael Ludwig > > > > >