[ https://issues.apache.org/jira/browse/LUCENE-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris M. Hostetter updated LUCENE-10571: ---------------------------------------- Attachment: LUCENE-10571.patch Status: Open (was: Open) I'm attaching a patch with a {{HuperDuperTermFilteredPresearcher}} (name just a placeholder) that works the way described by introducing a {{MISSING_FILTERS_FIELD}} into (Query) documents in the {{QueryIndex}} which we then search when a Document doesn't contain any values in a specific filter field. The easiest way to really see what the impact of this is compared to {{TermFilteredPresearcher}} is to compare the two new {{testMissingFieldFiltering}} methods and the differnet expected results for each impl. At the moment this new class is largely a lot of copy/paste duplication of {{TermFilteredPresearcher}} with small additions, because i'm not sure how we might want to really expose this functionality to users.... Obviously even if other folks agree that this is a better way to do "term filtering" in Monitor then how {{TermFilteredPresearcher}} currently works, changing the internals of {{TermFilteredPresearcher}} to "invert" it's logic like this would be a huge back compat break -- but what i'm not sure is if it would make sense to make this behavior "configurable" in {{TermFilteredPresearcher}} or refactor some of the internals to all this new functionality in a new subclass (which would probably be straightforward, but would also require _another_ subclass to support "multipass" in combination with this alternative filtering) > Monitor alternative "TermFilter" Presearcher for sparse filter fields > --------------------------------------------------------------------- > > Key: LUCENE-10571 > URL: https://issues.apache.org/jira/browse/LUCENE-10571 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/monitor > Reporter: Chris M. Hostetter > Priority: Major > Attachments: LUCENE-10571.patch > > > One of the things that surprised me the most when looking into how the > {{TermFilteredPresearcher}} worked was what happens when Queries and/or > Documents do _NOT_ have a value in a configured filter field. > per the javadocs... > {quote}Filtering by additional fields can be configured by passing a set of > field names. Documents that contain values in those fields will only be > checked against \{@link MonitorQuery} instances that have the same > fieldname-value mapping in their metadata. > {quote} > ...which is straightforward and useful in the tested example where every > registered Query has {{"language"}} metadata, and every Document has a > {{"language"}} field, but gives unintuitive results when a Query or Document > does *NOT* have a {{"language"}} > A more "intuitive" & useful (in my opinions) implementation would be > something that could be documented as ... > {quote}Filtering by additional fields can be configured by passing a set of > field names. Documents that contain values in those fields will only be > checked against \{@link MonitorQuery} instances > that have the same fieldname-value mapping in their metadata <em>or have no > mapping for that fieldname</em>. > Documents that do not contain values in those fields will only be checked > against \{@link MonitorQuery} instances that also have no mapping for that > fieldname. > {quote} > ...ie: instead of being a straight "filter candidate queries by what we find > in the filter fields in the documents" we can instead "derive the queries > that are viable candidates for each document if we were restricting the set > of documents by those values during a "forward search" -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org