[
https://issues.apache.org/jira/browse/SOLR-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yonik Seeley updated SOLR-153:
------------------------------
Attachment: facettree.patch
Much more complete code, algorithm-wise.
I added code to build a tree. It's based on a priority queue, but it only
takes unionSize into account when selecting nodes to merge (not maxDf at all),
and is thus sub-optimal. I expect it to be replaced in the future, but it may
work well enough for the first working version.
I added searching code that traverses the tree and expands nodes, estimating
child intersection counts based on the parent count multiplied by the fraction
of bits set in the child union.
Right now, the next node to evaluate is based on estimatedIntersectionCount *
maxDf, but something like estimatedIntersectionCount * sqrt(maxDf) might work
better in the future.
This is still all really brainstorming code, all in one file, completely
untested, and it will not work since there is no code to hook it up to Solr
(construct a request or get the result). This update is really just to back up
the code somewhere, or in case I get hit by a bus :-)
> Facet Index
> -----------
>
> Key: SOLR-153
> URL: https://issues.apache.org/jira/browse/SOLR-153
> Project: Solr
> Issue Type: New Feature
> Reporter: Yonik Seeley
> Attachments: facettree.patch, facettree.patch, facettree.patch
>
>
> A facet index, initially for non-hierarchical facets.
> Start with all terms, and a set of documents for each term. Group lower
> level nodes by taking the union of the sets, but keep track of the largest
> set going back all the way to the leaves (the max doc-freq for that node).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.