Re: Faceting on text fields

2009-06-11 Thread Jeffrey Tiong
Hi all,

We are thinking of using the carrot clustering too. But we saw that carrot
maybe can only cluster up to 1000 search snippets. Does anyone know how can
we cluster snippets that is much more than that ? (maybe in the million
range?)

And what is the difference between mahout and carrot?

Thank!

Jeffrey

On Thu, Jun 11, 2009 at 9:47 PM, Michael Ludwig m...@as-guides.com wrote:

 Yao Ge schrieb:

 BTW, Carrot2 has a very impressive Clustering Workbench (based on
 eclipse) that has built-in integration with Solr. If you have a Solr
 service running, it is a just a matter of point the workbench to it.
 The clustering results and visualization are amazing.
 (http://project.carrot2.org/download.html).


 A new world opens up for me ...

 Thanks for pointing out how cool this is!

 Hint for other newcomers: Open the View Menu to configure the details of
 how you perform your search, e.g. your Solr URL in case it differs from
 the default, or your summary field, which is what gets used to analyze
 the data in order to determine clusters, if I understand correctly.

 Michael Ludwig



Re: Faceting on text fields

2009-06-11 Thread Jeffrey Tiong
Thanks Otis!

Do you know under what circumstances or application should we cluster the
whole corpus of documents vs just the search results?

Jeffrey

On Fri, Jun 12, 2009 at 1:39 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:


 Jeffrey,

 Are you looking to cluster a whole corpus of documents of just the search
 results?  If it's the latter, use Carrot2.  If it's the former, look at
 Mahout.  Clustering top 1M matching documents doesn't really make sense.
  Usually top 100-200 is sufficient.

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Jeffrey Tiong jeffrey.ti...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Friday, June 12, 2009 12:44:55 AM
  Subject: Re: Faceting on text fields
 
  Hi all,
 
  We are thinking of using the carrot clustering too. But we saw that
 carrot
  maybe can only cluster up to 1000 search snippets. Does anyone know how
 can
  we cluster snippets that is much more than that ? (maybe in the million
  range?)
 
  And what is the difference between mahout and carrot?
 
  Thank!
 
  Jeffrey
 
  On Thu, Jun 11, 2009 at 9:47 PM, Michael Ludwig wrote:
 
   Yao Ge schrieb:
  
   BTW, Carrot2 has a very impressive Clustering Workbench (based on
   eclipse) that has built-in integration with Solr. If you have a Solr
   service running, it is a just a matter of point the workbench to it.
   The clustering results and visualization are amazing.
   (http://project.carrot2.org/download.html).
  
  
   A new world opens up for me ...
  
   Thanks for pointing out how cool this is!
  
   Hint for other newcomers: Open the View Menu to configure the details
 of
   how you perform your search, e.g. your Solr URL in case it differs from
   the default, or your summary field, which is what gets used to
 analyze
   the data in order to determine clusters, if I understand correctly.
  
   Michael Ludwig
  




How to combine facets count from multiple query into one query

2009-05-10 Thread Jeffrey Tiong
Hi,

I have a schema that has the following fields,

publisher_name
book_title
year
abstract

Currently if I do a facet count when I have a query q=abstract:philosophy
AND publisher_name:publisher1 , it can give me results like below,

str name=qabstract:philosophy AND publisher_name:publisher1/str
lst name=book_title
 int name=book1 70 /int
 int name=book2 60 /int
 int name=book3 20 /int
/lst
lst name=year
 int name=1990 78 /int
 int name=1991 62 /int
 int name=1992 19 /int
/lst


Likewise for q=abstract:philosophy AND publisher_name:publisher2 -

str name=qabstract:philosophy AND publisher_name:publisher2/str
lst name=book_title
 int name=book1 3 /int
 int name=book2 1 /int
 int name=book3 1 /int
/lst
lst name=year
 int name=1989 3 /int
 int name=1990 1 /int
 int name=1992 1 /int
/lst


However I have to do the query separately and get the facet count for each
of them separately. Is there a way for me to combine all these into one
query and get the facet count for each of them at one query?  because
sometimes it may go up to 20 queries in order to get all the separate
counts.


Thanks!

Jef


Enquiry on Search Results counting

2007-08-20 Thread Jeffrey Tiong
Hi,

I am trying to do some counting on certain fields of the search results,
currently I am using PHP to do the counting, but it is impossible to do this
when the results sets reach a few hundred thousands. Does anyone here has
any idea on how to do this?

Example of scenario,

   1. The solr schema index fields such as product name, manufacturer
   name and product code
   2. User inputs certain query, there are 100,000 results returned
   3. Out from that 100,000 results, we want to show the users what are
   the top 5 most frequent product name, manufacturer name and product
   code
   4. for example, when users search for hard drive, we show the users
   top 5 manufacturer names are seagate, samsung, ibm etc

Thanks a lot!

Regards,
Jeffrey