[Solr Wiki] Update of "ClusteringComponent" by YonikSee ley

Apache Wiki Wed, 21 Oct 2009 12:38:09 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The "ClusteringComponent" page has been changed by YonikSeeley.
The comment on this change is: move up example into quickstart.
http://wiki.apache.org/solr/ClusteringComponent?action=diff&rev1=31&rev2=32

--------------------------------------------------

  This component can cluster both search results and documents.  In case you're 
wondering what clustering is good for, think of it as a quick way to summarize 
a whole bunch of results/documents, or as a way to group together like 
results/documents.
  
  See http://en.wikipedia.org/wiki/Data_clustering for more background, as well 
as links to further reading.
- 
  
  = Clustering Component =
  
@@ -21, +20 @@

  
  == Installation ==
  
- The !ClusteringComponent is in the contrib area of Solr.  Due to some 
dependencies on LGPL libraries for the Carrot2 implementation, we cannot 
package a complete binary solution (with all the dependencies).  To get the 
Carrot2 solution, you will need to download these libraries.  To do this, on 
the command line in the contrib/clustering directory, run {{{ant 
get-libraries}}}.  This will create a downloads directory under the lib 
directory.  From there, you just need to grab the Solr clustering JAR and all 
the libraries and it should work.  To see an example of it working, try running 
{{{ant example}}} and then switching over to $SOLR_HOME/example/clustering and 
follow the directions below.
+ The !ClusteringComponent is in the contrib area of Solr.  Due to some 
dependencies on LGPL libraries for the Carrot2 implementation, we cannot 
package a complete binary solution (with all the dependencies).  To get the 
Carrot2 solution, you will need to download these libraries.  To do this, on 
the command line in the contrib/clustering directory, run {{{ant 
get-libraries}}}.  This will create a downloads directory under the lib 
directory for the downloaded jars.
+ 
+ == Quick Start ==
+ 
+ To run the example, cd to the Solr install directory, then:
+ {{{
+ $ ant example #builds the local example for clustering, including downloading 
jars
+ $ cd example
+ $ java -Dsolr.solr.home=../contrib/clustering/example -jar start.jar
+ }}}
+ Then, in a different window, add some docs using the post tool in the 
exampledocs directory.
+ {{{
+ $ cd example/exampledocs
+ $ ./post.sh *.xml
+ }}}
+ Now try a query that turns on clustering (clustering=true):
+ {{{
+ http://localhost:8983/solr/select?indent=on&q=*:*&rows=10&clustering=true
+ }}}
+ This should yield results that include cluster information at the bottom of 
the response, like:
+ {{{
+ <arr name="clusters">
+  <lst>
+   <arr name="labels">
+       <str>DDR</str>
+   </arr>
+   <arr name="docs">
+       <str>TWINX2048-3200PRO</str>
+       <str>VS1GB400C3</str>
+       <str>VDBDB1A16</str>
+   </arr>
+  </lst>
+  <lst>
+   <arr name="labels">
+       <str>Car Power Adapter</str>
+   </arr>
+   <arr name="docs">
+       <str>F8V7067-APL-KIT</str>
+       <str>IW-02</str>
+   </arr>
+  </lst>
+  <lst>
+   <arr name="labels">
+       <str>Hard Drive</str>
+   </arr>
+   <arr name="docs">
+       <str>SP2514N</str>
+       <str>6H500F0</str>
+   </arr>
+  </lst>
+  <lst>
+ [...]
+ }}}
+ 
+ Clusters produced by Carrot2 group the results into different product 
categories: DDR (memory), Car Power Adapter, Display, Hard Drive. Notice that, 
depending on the quality of input documents, some clusters may not make much 
sense.
+ 
  
  == Configuration ==
  
@@ -39, +93 @@

  
  == Carrot2 Clustering ==
  
- Carrot2 is a scalable, BSD licensed search results clustering engine.  It can 
cluster many different types of search results, including Y!, Google, etc.  Our 
implementation, naturally, clusters Solr/Lucene results.
+ Carrot2 is a scalable, BSD licensed search results clustering engine.  It can 
cluster many different types of search results, including Y!, Google, etc.  Our 
implementation, naturally, clusters Solr results.
  
  Carrot2 is best suited for clustering small-to-medium collections of short 
documents. While Carrot2 may work for longer documents, processing times may be 
too long to meet on-line clustering requirements.
  
  See http://project.carrot2.org
- 
- == Example ==
- 
- The contrib/clustering sub directory contains a simple example that works off 
of the existing sample documents, but does clustering on them.
- 
- To run the example, cd to the Solr install directory, then:
- {{{
- $ ant example //builds the local example for clustering
- $ cd example
- $ java -Dsolr.solr.home=../contrib/clustering/example -jar start.jar
- }}}
- Then, add some docs using the post tool in the exampledocs directory.
- 
  
  The configuration (solrconfig.xml) looks like:
  {{{
@@ -121, +162 @@

  
  The thing to note here is the mapping of Solr Fields (name, id, etc.) to the 
Carrot2 needs of title, snippet and url. Clustering will take into account the 
text of title and snippet.
  
- Next, inputting a query that turns on clustering (clustering=true:
- {{{
- http://localhost:8983/solr/select?indent=on&q=*:*&rows=10&clustering=true
- }}}
- 
- yields the results like:
- {{{
- <arr name="clusters">
-  <lst>
-   <arr name="labels">
-       <str>DDR</str>
-   </arr>
-   <arr name="docs">
-       <str>TWINX2048-3200PRO</str>
-       <str>VS1GB400C3</str>
-       <str>VDBDB1A16</str>
-   </arr>
-  </lst>
-  <lst>
-   <arr name="labels">
-       <str>Car Power Adapter</str>
-   </arr>
-   <arr name="docs">
-       <str>F8V7067-APL-KIT</str>
-       <str>IW-02</str>
-   </arr>
-  </lst>
-  <lst>
-   <arr name="labels">
-       <str>Hard Drive</str>
-   </arr>
-   <arr name="docs">
-       <str>SP2514N</str>
-       <str>6H500F0</str>
-   </arr>
-  </lst>
-  <lst>
- [...]
- }}}
- 
- Clusters produced by Carrot2 group the results into different product 
categories: DDR (memory), Car Power Adapter, Display, Hard Drive. Notice that, 
depending on the quality of input documents, some clusters may not make much 
sense.
  
  == Tuning Carrot2 clustering ==

[Solr Wiki] Update of "ClusteringComponent" by YonikSee ley

Reply via email to