Thanks Otis!

Do you know under what circumstances or application should we cluster the
whole corpus of documents vs just the search results?

Jeffrey

On Fri, Jun 12, 2009 at 1:39 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

>
> Jeffrey,
>
> Are you looking to cluster a whole corpus of documents of just the search
> results?  If it's the latter, use Carrot2.  If it's the former, look at
> Mahout.  Clustering top 1M matching documents doesn't really make sense.
>  Usually top 100-200 is sufficient.
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
> > From: Jeffrey Tiong <jeffrey.ti...@gmail.com>
> > To: solr-user@lucene.apache.org
> > Sent: Friday, June 12, 2009 12:44:55 AM
> > Subject: Re: Faceting on text fields
> >
> > Hi all,
> >
> > We are thinking of using the carrot clustering too. But we saw that
> carrot
> > maybe can only cluster up to 1000 search snippets. Does anyone know how
> can
> > we cluster snippets that is much more than that ? (maybe in the million
> > range?)
> >
> > And what is the difference between mahout and carrot?
> >
> > Thank!
> >
> > Jeffrey
> >
> > On Thu, Jun 11, 2009 at 9:47 PM, Michael Ludwig wrote:
> >
> > > Yao Ge schrieb:
> > >
> > >> BTW, Carrot2 has a very impressive Clustering Workbench (based on
> > >> eclipse) that has built-in integration with Solr. If you have a Solr
> > >> service running, it is a just a matter of point the workbench to it.
> > >> The clustering results and visualization are amazing.
> > >> (http://project.carrot2.org/download.html).
> > >>
> > >
> > > A new world opens up for me ...
> > >
> > > Thanks for pointing out how cool this is!
> > >
> > > Hint for other newcomers: Open the View Menu to configure the details
> of
> > > how you perform your search, e.g. your Solr URL in case it differs from
> > > the default, or your "summary field", which is what gets used to
> analyze
> > > the data in order to determine clusters, if I understand correctly.
> > >
> > > Michael Ludwig
> > >
>
>

Reply via email to