Re: Aggregate word counts over a subset of documents

Jason Hellman Thu, 16 May 2013 13:00:53 -0700

David,

A Pivot Facet could possibly provide these results by the following syntax:


facet.pivot=category,includes

We would presume that includes is a tokenized field and thus a set of facet 
values would be rendered from the terms resoling from that tokenization.  This 
would be nested in each category…and, of course, the entire set of documents 
considered for these facets is constrained by the current query.

I think this maps to your requirement.

Jason

On May 16, 2013, at 12:29 PM, David Larochelle 
<dlaroche...@cyber.law.harvard.edu> wrote:

> Is there a way to get aggregate word counts over a subset of documents?
> 
> For example given the following data:
> 
>  {
>    "id": "1",
>    "category": "cat1",
>    "includes": "The green car.",
>  },
>  {
>    "id": "2",
>    "category": "cat1",
>    "includes": "The red car.",
>  },
>  {
>    "id": "3",
>    "category": "cat2",
>    "includes": "The black car.",
>  }
> 
> I'd like to be able to get total term frequency counts per category. e.g.
> 
> <category name="cat1">
>   <lst name="the">2</lst>
>   <lst name="car">2</lst>
>   <lst name="green">1</lst>
>   <lst name="red">1</lst>
> </category>
> <category name="cat2">
>   <lst name="the">1</lst>
>   <lst name="car">1</lst>
>   <lst name="black">1</lst>
> </category>
> 
> I was initially hoping to do this within Solr and I tried using the
> TermFrequencyComponent. This gives term frequencies for individual
> documents and term frequencies for the entire index but doesn't seem to
> help with subsets. For example, TermFrequencyComponent would tell me that
> car occurs 3 times over all documents in the index and 1 time in document 1
> but not that it occurs 2 times over cat1 documents and 1 time over cat2
> documents.
> 
> Is there a good way to use Solr/Lucene to gather aggregate results like
> this? I've been focusing on just using Solr with XML files but I could
> certainly write Java code if necessary.
> 
> Thanks,
> 
> David

Re: Aggregate word counts over a subset of documents

Reply via email to