Re: Faceting on text fields
Hi all, We are thinking of using the carrot clustering too. But we saw that carrot maybe can only cluster up to 1000 search snippets. Does anyone know how can we cluster snippets that is much more than that ? (maybe in the million range?) And what is the difference between mahout and carrot? Thank! Jeffrey On Thu, Jun 11, 2009 at 9:47 PM, Michael Ludwig m...@as-guides.com wrote: Yao Ge schrieb: BTW, Carrot2 has a very impressive Clustering Workbench (based on eclipse) that has built-in integration with Solr. If you have a Solr service running, it is a just a matter of point the workbench to it. The clustering results and visualization are amazing. (http://project.carrot2.org/download.html). A new world opens up for me ... Thanks for pointing out how cool this is! Hint for other newcomers: Open the View Menu to configure the details of how you perform your search, e.g. your Solr URL in case it differs from the default, or your summary field, which is what gets used to analyze the data in order to determine clusters, if I understand correctly. Michael Ludwig
Re: Faceting on text fields
Thanks Otis! Do you know under what circumstances or application should we cluster the whole corpus of documents vs just the search results? Jeffrey On Fri, Jun 12, 2009 at 1:39 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Jeffrey, Are you looking to cluster a whole corpus of documents of just the search results? If it's the latter, use Carrot2. If it's the former, look at Mahout. Clustering top 1M matching documents doesn't really make sense. Usually top 100-200 is sufficient. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jeffrey Tiong jeffrey.ti...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, June 12, 2009 12:44:55 AM Subject: Re: Faceting on text fields Hi all, We are thinking of using the carrot clustering too. But we saw that carrot maybe can only cluster up to 1000 search snippets. Does anyone know how can we cluster snippets that is much more than that ? (maybe in the million range?) And what is the difference between mahout and carrot? Thank! Jeffrey On Thu, Jun 11, 2009 at 9:47 PM, Michael Ludwig wrote: Yao Ge schrieb: BTW, Carrot2 has a very impressive Clustering Workbench (based on eclipse) that has built-in integration with Solr. If you have a Solr service running, it is a just a matter of point the workbench to it. The clustering results and visualization are amazing. (http://project.carrot2.org/download.html). A new world opens up for me ... Thanks for pointing out how cool this is! Hint for other newcomers: Open the View Menu to configure the details of how you perform your search, e.g. your Solr URL in case it differs from the default, or your summary field, which is what gets used to analyze the data in order to determine clusters, if I understand correctly. Michael Ludwig
How to combine facets count from multiple query into one query
Hi, I have a schema that has the following fields, publisher_name book_title year abstract Currently if I do a facet count when I have a query q=abstract:philosophy AND publisher_name:publisher1 , it can give me results like below, str name=qabstract:philosophy AND publisher_name:publisher1/str lst name=book_title int name=book1 70 /int int name=book2 60 /int int name=book3 20 /int /lst lst name=year int name=1990 78 /int int name=1991 62 /int int name=1992 19 /int /lst Likewise for q=abstract:philosophy AND publisher_name:publisher2 - str name=qabstract:philosophy AND publisher_name:publisher2/str lst name=book_title int name=book1 3 /int int name=book2 1 /int int name=book3 1 /int /lst lst name=year int name=1989 3 /int int name=1990 1 /int int name=1992 1 /int /lst However I have to do the query separately and get the facet count for each of them separately. Is there a way for me to combine all these into one query and get the facet count for each of them at one query? because sometimes it may go up to 20 queries in order to get all the separate counts. Thanks! Jef
Enquiry on Search Results counting
Hi, I am trying to do some counting on certain fields of the search results, currently I am using PHP to do the counting, but it is impossible to do this when the results sets reach a few hundred thousands. Does anyone here has any idea on how to do this? Example of scenario, 1. The solr schema index fields such as product name, manufacturer name and product code 2. User inputs certain query, there are 100,000 results returned 3. Out from that 100,000 results, we want to show the users what are the top 5 most frequent product name, manufacturer name and product code 4. for example, when users search for hard drive, we show the users top 5 manufacturer names are seagate, samsung, ibm etc Thanks a lot! Regards, Jeffrey