Shai Erera created LUCENE-4757:
----------------------------------

             Summary: Cleanup FacetsAccumulator API path
                 Key: LUCENE-4757
                 URL: https://issues.apache.org/jira/browse/LUCENE-4757
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/facet
            Reporter: Shai Erera
            Assignee: Shai Erera


FacetsAccumulator and FacetRequest expose too many things to users, even when 
they are not needed, e.g. complements and partitions. Also, Aggregator is 
created per-FacetRequest, while in fact applied per category list. This is 
confusing, because if you want to do two aggregations, e.g. count and 
sum-score, you need to separate the two dimensions into two different category 
lists at indexing time.

It's not so easy to refactor everything in one go, since there's a lot of code 
involved. So in this issue I will:

* Remove complements from FacetRequest. It is only relevant to 
CountFacetRequest anyway. In the future, it should be a special Accumulator.

* Make FacetsAccumulator concrete class, and StandardFacetsAccumulator extend 
it and handles all the stuff that's relevant to sampling, complements and 
partitions. Gradually, these things will be migrated to the new API, and 
hopefully StandardFacetsAccumulator will go away.

* Aggregator is per-document. I could not break its API b/c some features (e.g. 
complement) depend on it. So rather I created a new FacetsAggregator, with a 
bulk, per-segment, API. So far migrated Counting and SumScore to that API.
** In the new API, you need to override FacetsAccumulator to define an 
Aggregator for use, the default is CountingFacetsAggregator.

* Started to refactor FacetResultsHandler, which its API was guided by the use 
of partitions. I added a simple {{compute(FacetArrays)}} to it, which by 
default delegates to the nasty API, but overridden by specific classes. This 
will get cleaned further along too.

* FacetRequest has a .getValueOf() which resolves an ordinal to its value (i.e. 
which of the two arrays to use). I added FacetRequest.FacetArraysSource and 
specialize when they are INT or FLOAT, creating a special FacetResultsHandler 
which does not go back to FR.getValueOf for every ordinal. I think that we can 
migrate other FacetResultsHandlers to behave like that ... at the expense of 
code duplication.
** I also added a TODO to get rid of getValueOf entirely .. will be done 
separately.

* Got rid of CountingFacetsCollector and StandardFacetsCollector in favor of a 
single FacetsCollector which collects matching documents, and optionally 
scores, per-segment. I wrote a migration class from these per-segment 
MatchingDocs to ScoredDocIDs (which is global), so that the rest of the code 
works, but the new code works w/ the optimized per-segment API. I hope 
performance is still roughly the same w/ these changes too.

There will be follow-on issues to migrate more features to the new API, and 
more cleanups ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to