[Solr Wiki] Update of "SolrFacetingOverview" by HossMan

Apache Wiki Tue, 02 Dec 2008 16:22:31 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The following page has been changed by HossMan:
http://wiki.apache.org/solr/SolrFacetingOverview

The comment on the change is:
updates to reflect new facet.method param and UnInvertedField

------------------------------------------------------------------------------
    
  == FacetFields ==
  
- Any number of [:SimpleFacetParameters#facet.field:facet.field] parameters can 
be passed to the request handler.  For each facet.field, one of two approaches 
will be used based on the Field definiton in schema.xml:
+ Any number of [:SimpleFacetParameters#facet.field:facet.field] parameters can 
be passed to the request handler.  For each facet.field, one of two approaches 
will be used based on the [:SimpleFacetParameters#facet.method:facet.method] or 
the field type:
-   
+ 
-     * '''Field Queries''':  If the facet field is defined in the schema as 
multi-valued, boolean, or tokenized, then every indexed value for the field 
will be iterated and a facet query will be executed and cached (as described 
above).  This is excellent for fields where there is a small set of distinct 
values.  For example, faceting on a field with U.S. States e.g. `Alabama, 
Alaska, ... Wyoming` would lead to fifty cached queries which would be used 
over and over again.  It also works in the case when the facet field can have 
multiple values for each document.  However, it requires excessive amounts of 
memory and time when the number of field values is large, and especially when 
it exceeds the filter cache size defined in 
[:SolrCaching#filterCache:filterCache]
+     * '''Enum Based Field Queries''':  If {{{facet.method=enum}}} or the 
field is defined in the schema as boolean, then every indexed value for the 
field will be iterated and a facet query will be executed and cached (as 
described above).  This is excellent for fields where there is a small set of 
distinct values.  For example, faceting on a field with U.S. States e.g. 
`Alabama, Alaska, ... Wyoming` would lead to fifty cached queries which would 
be used over and over again. However, it requires excessive amounts of memory 
and time when the number of field values is large, and especially when it 
exceeds the filter cache size defined in [:SolrCaching#filterCache:filterCache]
      
-     * '''Field Cache''': If the facet field is not tokenized, not 
multi-valued, and not boolean, then a field-cache approach will be used.  This 
is currently implemented with the Lucene 
[http://lucene.apache.org/java/docs/api/org/apache/lucene/search/FieldCache.html
 FieldCache] mechanism used for results sorting.  An array of integers (one for 
every document in the index) is allocated, pre-filled with the first (only?) 
indexed value for that field in each document (offset into a table of strings 
for fields indexed as strings), and cached.  Every time that facet.field is 
used for faceting a query, all the document IDs resulting from the query are 
looked up in the field cache and any value found has its tally incremented.  
This is excellent for situations where the number of indexed values for the 
field is too large to be practical using the field queries mechanism, such as 
faceting against authors or titles.  However it is currently much slower and 
more memory-intensive than
  the field query mechanism for fields with a small number of values. 
+     * '''Field Cache''': If {{{facet.method=fc}}} then a field-cache approach 
will be used.  This is currently implemented using either the the Lucene 
[http://lucene.apache.org/java/docs/api/org/apache/lucene/search/FieldCache.html
 FieldCache] or (starting in Solr 1.4) an !UnInvertedField if the field is 
multivalued or tokenized.  Every time that {{{facet.field}}} is used for 
faceting a query, all the document IDs resulting from the query are looked up 
in the cache and any value found has its tally incremented.  This is excellent 
for situations where the number of indexed values for the field is too large to 
be practical using the field queries mechanism, such as faceting against 
authors or titles.  However it is currently much slower and more 
memory-intensive than the field query mechanism for fields with a small number 
of values. 
  
- Note that at this time there is no way to manually control whether 
facet.field is handled via field queries or field cache, other than defining in 
the schema whether the field is single- or multi-valued and the analyzer used: 
`solr.TextField` is always tokenized while `solr.StrField` is never tokenized.  
Control may be improved in the future, along with a means to handle 
multi-valued fields with a variant of the Field Cache mechanism.
-

[Solr Wiki] Update of "SolrFacetingOverview" by HossMan

Reply via email to