David, A Pivot Facet could possibly provide these results by the following syntax:
facet.pivot=category,includes We would presume that includes is a tokenized field and thus a set of facet values would be rendered from the terms resoling from that tokenization. This would be nested in each category…and, of course, the entire set of documents considered for these facets is constrained by the current query. I think this maps to your requirement. Jason On May 16, 2013, at 12:29 PM, David Larochelle <dlaroche...@cyber.law.harvard.edu> wrote: > Is there a way to get aggregate word counts over a subset of documents? > > For example given the following data: > > { > "id": "1", > "category": "cat1", > "includes": "The green car.", > }, > { > "id": "2", > "category": "cat1", > "includes": "The red car.", > }, > { > "id": "3", > "category": "cat2", > "includes": "The black car.", > } > > I'd like to be able to get total term frequency counts per category. e.g. > > <category name="cat1"> > <lst name="the">2</lst> > <lst name="car">2</lst> > <lst name="green">1</lst> > <lst name="red">1</lst> > </category> > <category name="cat2"> > <lst name="the">1</lst> > <lst name="car">1</lst> > <lst name="black">1</lst> > </category> > > I was initially hoping to do this within Solr and I tried using the > TermFrequencyComponent. This gives term frequencies for individual > documents and term frequencies for the entire index but doesn't seem to > help with subsets. For example, TermFrequencyComponent would tell me that > car occurs 3 times over all documents in the index and 1 time in document 1 > but not that it occurs 2 times over cat1 documents and 1 time over cat2 > documents. > > Is there a good way to use Solr/Lucene to gather aggregate results like > this? I've been focusing on just using Solr with XML files but I could > certainly write Java code if necessary. > > Thanks, > > David