[jira] [Commented] (LUCENE-8031) DOCS_ONLY fields set incorrect length norms

2018-02-24 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375632#comment-16375632
 ] 

Robert Muir commented on LUCENE-8031:
-

Thank you for doing the hard part Adrien!

> DOCS_ONLY fields set incorrect length norms
> ---
>
> Key: LUCENE-8031
> URL: https://issues.apache.org/jira/browse/LUCENE-8031
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: LUCENE-8031.patch
>
>
> Term frequencies are discarded in the DOCS_ONLY case from the postings list 
> but they still count against the length normalization, which looks like it 
> may screw stuff up.
> I ran some quick experiments on LUCENE-8025, by encoding 
> fieldInvertState.getUniqueTermCount() and it seemed worth fixing (e.g. 20% or 
> 30% improvement potentially). Happy to do testing for real, if we want to fix.
> But this seems tricky, today you can downgrade to DOCS_ONLY on the fly, and 
> its hard for me to think about that case (i think its generally screwed up 
> besides this, but still).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8031) DOCS_ONLY fields set incorrect length norms

2018-02-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375631#comment-16375631
 ] 

ASF subversion and git services commented on LUCENE-8031:
-

Commit 29e5b8abcee8a566cc057b862ab99c5ffef13a76 in lucene-solr's branch 
refs/heads/master from [~rcmuir]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=29e5b8a ]

LUCENE-8031: DOCS_ONLY fields set incorrect length norm


> DOCS_ONLY fields set incorrect length norms
> ---
>
> Key: LUCENE-8031
> URL: https://issues.apache.org/jira/browse/LUCENE-8031
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: LUCENE-8031.patch
>
>
> Term frequencies are discarded in the DOCS_ONLY case from the postings list 
> but they still count against the length normalization, which looks like it 
> may screw stuff up.
> I ran some quick experiments on LUCENE-8025, by encoding 
> fieldInvertState.getUniqueTermCount() and it seemed worth fixing (e.g. 20% or 
> 30% improvement potentially). Happy to do testing for real, if we want to fix.
> But this seems tricky, today you can downgrade to DOCS_ONLY on the fly, and 
> its hard for me to think about that case (i think its generally screwed up 
> besides this, but still).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8031) DOCS_ONLY fields set incorrect length norms

2018-02-15 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365669#comment-16365669
 ] 

Adrien Grand commented on LUCENE-8031:
--

Let's move forward with your change now that LUCENE-8134 is merged?

> DOCS_ONLY fields set incorrect length norms
> ---
>
> Key: LUCENE-8031
> URL: https://issues.apache.org/jira/browse/LUCENE-8031
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-8031.patch
>
>
> Term frequencies are discarded in the DOCS_ONLY case from the postings list 
> but they still count against the length normalization, which looks like it 
> may screw stuff up.
> I ran some quick experiments on LUCENE-8025, by encoding 
> fieldInvertState.getUniqueTermCount() and it seemed worth fixing (e.g. 20% or 
> 30% improvement potentially). Happy to do testing for real, if we want to fix.
> But this seems tricky, today you can downgrade to DOCS_ONLY on the fly, and 
> its hard for me to think about that case (i think its generally screwed up 
> besides this, but still).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8031) DOCS_ONLY fields set incorrect length norms

2018-01-15 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325975#comment-16325975
 ] 

Adrien Grand commented on LUCENE-8031:
--

+1 to disallow downgrading

> DOCS_ONLY fields set incorrect length norms
> ---
>
> Key: LUCENE-8031
> URL: https://issues.apache.org/jira/browse/LUCENE-8031
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-8031.patch
>
>
> Term frequencies are discarded in the DOCS_ONLY case from the postings list 
> but they still count against the length normalization, which looks like it 
> may screw stuff up.
> I ran some quick experiments on LUCENE-8025, by encoding 
> fieldInvertState.getUniqueTermCount() and it seemed worth fixing (e.g. 20% or 
> 30% improvement potentially). Happy to do testing for real, if we want to fix.
> But this seems tricky, today you can downgrade to DOCS_ONLY on the fly, and 
> its hard for me to think about that case (i think its generally screwed up 
> besides this, but still).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8031) DOCS_ONLY fields set incorrect length norms

2017-11-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235003#comment-16235003
 ] 

Michael McCandless commented on LUCENE-8031:


bq. But this seems tricky, today you can downgrade to DOCS_ONLY on the fly,

Maybe we should stop allowing this?  I.e. throw an exception if the index 
options try to downgrade for a field.

> DOCS_ONLY fields set incorrect length norms
> ---
>
> Key: LUCENE-8031
> URL: https://issues.apache.org/jira/browse/LUCENE-8031
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
>
> Term frequencies are discarded in the DOCS_ONLY case from the postings list 
> but they still count against the length normalization, which looks like it 
> may screw stuff up.
> I ran some quick experiments on LUCENE-8025, by encoding 
> fieldInvertState.getUniqueTermCount() and it seemed worth fixing (e.g. 20% or 
> 30% improvement potentially). Happy to do testing for real, if we want to fix.
> But this seems tricky, today you can downgrade to DOCS_ONLY on the fly, and 
> its hard for me to think about that case (i think its generally screwed up 
> besides this, but still).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org