[jira] [Commented] (LUCENE-8031) DOCS_ONLY fields set incorrect length norms
[ https://issues.apache.org/jira/browse/LUCENE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375632#comment-16375632 ] Robert Muir commented on LUCENE-8031: - Thank you for doing the hard part Adrien! > DOCS_ONLY fields set incorrect length norms > --- > > Key: LUCENE-8031 > URL: https://issues.apache.org/jira/browse/LUCENE-8031 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir >Priority: Major > Fix For: master (8.0) > > Attachments: LUCENE-8031.patch > > > Term frequencies are discarded in the DOCS_ONLY case from the postings list > but they still count against the length normalization, which looks like it > may screw stuff up. > I ran some quick experiments on LUCENE-8025, by encoding > fieldInvertState.getUniqueTermCount() and it seemed worth fixing (e.g. 20% or > 30% improvement potentially). Happy to do testing for real, if we want to fix. > But this seems tricky, today you can downgrade to DOCS_ONLY on the fly, and > its hard for me to think about that case (i think its generally screwed up > besides this, but still). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8031) DOCS_ONLY fields set incorrect length norms
[ https://issues.apache.org/jira/browse/LUCENE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375631#comment-16375631 ] ASF subversion and git services commented on LUCENE-8031: - Commit 29e5b8abcee8a566cc057b862ab99c5ffef13a76 in lucene-solr's branch refs/heads/master from [~rcmuir] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=29e5b8a ] LUCENE-8031: DOCS_ONLY fields set incorrect length norm > DOCS_ONLY fields set incorrect length norms > --- > > Key: LUCENE-8031 > URL: https://issues.apache.org/jira/browse/LUCENE-8031 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir >Priority: Major > Fix For: master (8.0) > > Attachments: LUCENE-8031.patch > > > Term frequencies are discarded in the DOCS_ONLY case from the postings list > but they still count against the length normalization, which looks like it > may screw stuff up. > I ran some quick experiments on LUCENE-8025, by encoding > fieldInvertState.getUniqueTermCount() and it seemed worth fixing (e.g. 20% or > 30% improvement potentially). Happy to do testing for real, if we want to fix. > But this seems tricky, today you can downgrade to DOCS_ONLY on the fly, and > its hard for me to think about that case (i think its generally screwed up > besides this, but still). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8031) DOCS_ONLY fields set incorrect length norms
[ https://issues.apache.org/jira/browse/LUCENE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365669#comment-16365669 ] Adrien Grand commented on LUCENE-8031: -- Let's move forward with your change now that LUCENE-8134 is merged? > DOCS_ONLY fields set incorrect length norms > --- > > Key: LUCENE-8031 > URL: https://issues.apache.org/jira/browse/LUCENE-8031 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-8031.patch > > > Term frequencies are discarded in the DOCS_ONLY case from the postings list > but they still count against the length normalization, which looks like it > may screw stuff up. > I ran some quick experiments on LUCENE-8025, by encoding > fieldInvertState.getUniqueTermCount() and it seemed worth fixing (e.g. 20% or > 30% improvement potentially). Happy to do testing for real, if we want to fix. > But this seems tricky, today you can downgrade to DOCS_ONLY on the fly, and > its hard for me to think about that case (i think its generally screwed up > besides this, but still). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8031) DOCS_ONLY fields set incorrect length norms
[ https://issues.apache.org/jira/browse/LUCENE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325975#comment-16325975 ] Adrien Grand commented on LUCENE-8031: -- +1 to disallow downgrading > DOCS_ONLY fields set incorrect length norms > --- > > Key: LUCENE-8031 > URL: https://issues.apache.org/jira/browse/LUCENE-8031 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-8031.patch > > > Term frequencies are discarded in the DOCS_ONLY case from the postings list > but they still count against the length normalization, which looks like it > may screw stuff up. > I ran some quick experiments on LUCENE-8025, by encoding > fieldInvertState.getUniqueTermCount() and it seemed worth fixing (e.g. 20% or > 30% improvement potentially). Happy to do testing for real, if we want to fix. > But this seems tricky, today you can downgrade to DOCS_ONLY on the fly, and > its hard for me to think about that case (i think its generally screwed up > besides this, but still). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8031) DOCS_ONLY fields set incorrect length norms
[ https://issues.apache.org/jira/browse/LUCENE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235003#comment-16235003 ] Michael McCandless commented on LUCENE-8031: bq. But this seems tricky, today you can downgrade to DOCS_ONLY on the fly, Maybe we should stop allowing this? I.e. throw an exception if the index options try to downgrade for a field. > DOCS_ONLY fields set incorrect length norms > --- > > Key: LUCENE-8031 > URL: https://issues.apache.org/jira/browse/LUCENE-8031 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir >Priority: Major > > Term frequencies are discarded in the DOCS_ONLY case from the postings list > but they still count against the length normalization, which looks like it > may screw stuff up. > I ran some quick experiments on LUCENE-8025, by encoding > fieldInvertState.getUniqueTermCount() and it seemed worth fixing (e.g. 20% or > 30% improvement potentially). Happy to do testing for real, if we want to fix. > But this seems tricky, today you can downgrade to DOCS_ONLY on the fly, and > its hard for me to think about that case (i think its generally screwed up > besides this, but still). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org