I have implemented faceted browsing in prototype I have been working on with Solr, but I would like to ask some more experienced hands about performance implications. Currently, I calculate the count of a given facet as follows:
DocSet valueDocSet = req.getSearcher().getDocSet(item.getQuery()); long count = valueDocSet.intersectionSize(results); Is this the preferred way to obtain such a count, or ithere another way, such as dealing directly with BitSets (something I avoided, since it appears getBits() is deprecated in the DocSet interface)? Similarly, since this method is commented as "cache-aware", does that mean that the item itself does not need to worry about caching its results, only its terms, since the results will end up in the queryResultCache? Or is this assumption incorrect, and should each facet/item be concerned with caching its results as well? Apologies for sending this to solr-dev, and not solr-user, but I thought this might also segue into a discussion on faceted browsing in general. To that end, my current structure defines: - a <facetHandler/> entry in solrconfig.xml, the only current implementation of which loads a set of Facet definitions from an xml file. - each Facet contains an id for lookups and a List of FacetItems (some statically configured, some constructed dynamically from available Terms, though not backed by any cache yet.) - each FacetItem contains a displayName and Query (and associated queryString) Adding these parameters to the query, then a request with these parameters: &ft=xmlfacets&f=man&f=instock Would use the facetHandler "xmlfacets" to add this to the results: <lst name="facets"> <arr name="man"> <lst> <str name="fq">manu_exact:"ASUS Computer Inc."</str> <long name="count">0</long> <str name="displayName">ASUS Computer Inc.</str> </lst> <lst> <str name="fq">manu_exact:"ATI Technologies"</str> <long name="count">0</long> <str name="displayName">ATI Technologies</str> </lst> <lst> <str name="fq">manu_exact:"Dell, Inc."</str> <long name="count">1</long> <str name="displayName">Dell, Inc.</str> </lst> </arr> <arr name="instock"> <lst> <str name="fq">inStock:true</str> <long name="count">1</long> <str name="displayName">In Stock</str> </lst> <lst> <str name="fq">inStock:false</str> <long name="count">0</long> <str name="displayName">Out of Stock</str> </lst> </arr> </lst> The basic handling and output format work for my prototype's purposes, but I have not delved deeply into caching at this time. Does this setup seem appropriate, and the abovementioned caching assumption seem valid, or have I missed something that would help support facets on a larger scale? Thanks, Greg